• Apache Cassandra
  • Technical
Node throughput Benchmarking for Apache Cassandra® on AWS, Azure and GCP

At NetApp, we are often asked for comparative benchmarking of our Instaclustr Managed Platform across various cloud providers and the different node sizes we offer.  

For Apache Cassandra® on the Instaclustr Managed Platform, we have recently completed an extensive benchmarking exercise that will help our customers evaluate the node types to use and the differing performance between cloud providers. 

Each cloud service provider is continually introducing new and improved nodes, which we carefully select and curate to provide our customers with a range of options to suit their specific workloads. The results of these benchmarks can assist you in evaluating and selecting the best node sizes that provide optimal performance at the lowest cost. 

How Should Instaclustr Customers Use the Node Benchmark Data? 

Instaclustr node benchmarks provide throughput for each node performed under the same test conditions, serving as valuable performance metrics for comparison. 

As with any generic benchmarking results, for different data models or application workload, the performance may vary from the benchmark. Instaclustr node benchmarks do not account for specific configurations of the customer environment and should only be used to compare the relative performance of node sizes.  

Instaclustr recommends customers complete testing and data modelling in their own environment across a selection of potential nodes.

Test Objective

Cassandra Node Benchmarking creates a Performance Metric value for each General Availability (GA) node type that is available for Apache Cassandra across AWS, GCP, and Azure AZ. Each node is benchmarked before being released to General Availability. The tests ensure that all nodes will deliver sufficient performance based on their configuration.  

All Cassandra production nodes Generally Available on the Instaclustr platform are stress-tested to find the operational capacity of the nodes. This is accomplished by measuring throughput to find the highest ops rate the test cluster can achieve without performance degradation.

Methodology

We used the following configuration to conduct this benchmarking: 

  • Cassandra-stress tool
  • QUORUM consistency level for data operations
  • 3-node cluster using Cassandra 4.0.10 with each node in a separate rack within a single datacenter

On each node size, we run the following testing procedure: 

  1. Fill Cassandra with enough records to approximate 5x the system memory of the node using the “small” size writes. This is done to ensure that the entire data model is not sitting in memory, and to give a better representation of a production cluster. 
  2. Allow the cluster to finish all compactions. This is done to ensure that all clusters are given the same starting point for each test run and to make test runs comparable and verifiable. 
  3. Allow compactions to finish between each step. 
  4. Perform multiple stress tests on the cluster, increasing the number of operations per second for each test, until we overload it. The cluster will be considered overloaded if one of three conditions is met: 
    1. The latency of the operation is above the provided threshold. This tells us that the cluster cannot keep up with the current load and is unable to respond quickly. 
    2. If a node has more than 20 pending compactions a minute after the test completes. This tells us that the cluster is not keeping up with compactions, and this load is not sustainable in the long term. 
    3. The CPU load is above the provided threshold. This is the reason that we pass the number of cores into the Cassandra Stress Tool. 
  5. Using this definition of “overloaded”, the following tests are run to measure maximum throughput: 
    1. Perform a combination of reads and write operations in 30-minute test runs. The combination of writes, simple reads, and range reads is at a rate of 10:10:1, increasing threads until we reach a read median latency of 20ms indicating if a mixed workload throughput has changed. 

Results

Following this approach, the tables below show the results we measured on the different providers and different node sizes we offer.

AWS
Node Type (Node Size Used) Ram / Cores Result (MixedOPS)
t4g.small (CAS-DEV-t4g.small-30) 2 GiB / 2 728*
t4g.medium (CAS-DEV-t4g.medium-80) 4 GiB / 2 3,582
r6g.medium (CAS-PRD-r6g.medium-120) 8 GiB / 1 1,849
m6g.large (CAS-PRD-m6g.large-120) 8 GiB / 2 2,144
r6g.large (CAS-PRD-r6g.large-250) 16 GiB / 2 3,117
r6g.xlarge (CAS-PRD-r6g.xlarge-400) 32 GiB / 4 16,653
r6g.2xlarge (CAS-PRD-r6g.2xlarge-800) 64 GiB / 8 36,494
r6g.4xlarge (CAS-PRD-r6g.4xlarge-1600) 128 GiB / 16 62,195
r7g.medium (CAS-PRD-r7g.medium-120) 8 GiB / 1 2,527
m7g.large (CAS-PRD-m7g.large-120) 8 GiB / 2 2,271
r7g.large (CAS-PRD-r7g.large-120) 16 GiB / 2 3,039
r7g.xlarge (CAS-PRD-r7g.xlarge-400) 32 GiB / 4 20,897
r7g.2xlarge (CAS-PRD-r7g.2xlarge-800) 64 GiB / 8 39,801
r7g.4xlarge (CAS-PRD-r7g.4xlarge-800) 128 GiB / 16 70,880
c5d.2xlarge (c5d.2xlarge-v2) 16 GiB / 8 26,494
c6gd.2xlarge (CAS-PRD-c6gd.2xlarge-441) 16 GiB / 8 27,066
is4gen.xlarge (CAS-PRD-is4gen.xlarge-3492) 24 GiB / 4 19,066
is4gen.2xlarge (CAS-PRD-is4gen.2xlarge-6984) 48 GiB / 8 34,437
im4gn.2xlarge (CAS-PRD-im4gn.2xlarge-3492) 32 GiB / 8 31,090
im4gn.4xlarge (CAS-PRD-im4gn.4xlarge-6984) 64 GiB / 16 59,410
i3en.xlarge (i3en.xlarge) 32 GiB / 4 19,895
i3en.2xlarge (CAS-PRD-i3en.2xlarge-4656) 64 GiB / 8 40,796
i3.2xlarge (i3.2xlarge-v2) 61 GiB / 8 24,184
i3.4xlarge (CAS-PRD-i3.4xlarge-3538) 122 GiB / 16 42,234

* Nodes overloaded with the stress test started dropping out

Azure
Node Type (Node Size Used) Ram / Cores Result (MixedOPS)
Standard_DS12_v2 (Standard_DS12_v2-512-an) 28 GiB / 4 23,878
Standard_DS2_v2 (Standard_DS2_v2-256-an) 7 GiB / 2 1,535
L8s_v2 (L8s_v2-an) 64 GiB / 8 24,188
Standard_L8s_v3 (CAS-PRD-Standard_L8s_v3-1788) 64 GiB / 8 33,990
Standard_DS13_v2 (Standard_DS13_v2-2046-an) 56 GiB / 8 37,908
D15_v2 (D15_v2-an) 140 GiB / 20 68,226
Standard_L16s_v2 (CAS-PRD-Standard_L16s_v2-3576-an) 128 GiB / 16 33,969
GCP
Node Type (Node Size Used) Ram / Cores Result (MixedOPS)
n1-standard-1 (CAS-DEV-n1-standard-1-5) 3.75 GiB / 1 No data*
n1-standard-2 (CAS-DEV-n1-standard-2-80) 7.5 GiB / 2 3,159
t2d-standard-2 (CAS-PRD-t2d-standard-2-80) 8 GiB / 2 6,045
n2-standard-2 (CAS-PRD-n2-standard-2-120) 8 GiB / 2 4,199
n2-highmem-2 (CAS-PRD-n2-highmem-2-250) 16 GiB / 2 5,520
n2-highmem-4 (CAS-PRD-n2-highmem-4-400) 32 GiB / 8 19,893
n2-standard-8 (cassandra-production-n2-standard-8-375) 32 GiB / 8 37,437
n2-highmem-8 (CAS-PRD-n2-highmem-8-800) 64 GiB / 8 38,139
n2-highmem-16 (CAS-PRD-n2-highmem-16-800) 128 GiB / 16 73,918

* Cannot be tested due to the fill data requirement being greater than the available disk space

Conclusion: What We Discovered

In general, we see that clusters with more processing power (CPUs and RAM) produce higher throughput as expected. Some of the key takeaways include: 

  • When it comes to the price-to-performance ratio of Cassandra, there is a sweet spot around the XL/2XL node size (eg r6g.xlarge or r6g.2xlarge).
    • Moving from L to XL nodes, doubles performance, and moving from XL to 2XL, almost always doubles  performance again.
    • However, moving from 2XL to 4XL, the increase in performance is less than double. This is expected as you move from a memory-constrained state to a state with more resources than can be utilized. These findings are specific to Cassandra. 
  • Different node families are tailored for workloads, significantly impacting their performance. Hence, nodes should be selected based on workload requirements and use cases. 

Overall, these benchmarks provide an indication of potential throughput for different environments and instance sizes. Performance can vary significantly depending on individual use cases, and we always recommend benchmarking with your own specific use case prior to production deployment. 

The easiest way to see all the new node offerings for your provider is to log into our Console. We ensure that you have access to the latest instance types by supporting different node types from each cloud provider on our platform to help you get high performance at reduced costs.  

Do you know that R8g instances from AWS deliver up to 30% better performance than Graviton3-based R7g instances? Keep an eye out, as we will make these new instances available on our platform as soon as they are released for general availability. 

If you are interested in migrating your existing Cassandra clusters to the Instaclustr Managed Platform, our highly experienced Technical Operations team can provide all the assistance you need. We have built several tried-and-tested node replacement strategies to provide zero-downtime, non-disruptive migrations. Read our Advanced Node Replacement blog for more details on one such strategy. 

If you want to know more about this benchmarking or need clarification on when to use which instance type for Cassandra, reach out to our Support team (if you are an existing customer) or contact our Sales team. Our support team can assist you in resizing your existing clusters. Alternatively, you can use our in-place data-center resizing feature to do it on your own.