- White Paper
Overview
Many people consider Apache Cassandra® and DynamoDB as potential datastore technologies when looking to build high-scale, high-reliability services in the cloud. Both technologies are popular and well-proven to deliver at scale. However, choosing the technology most appropriate for your use case can have a significant impact on the cost of building, maintaining, and running your application.
This whitepaper considers a real-world use case, analyzes the costs of running on Instaclustr Managed Apache Cassandra vs DynamoDB, and discusses how the features and cost models of the two technologies could impact the architecture of your solution. The use case we are considering is at the heart of Instaclustr’s monitoring system, Instametrics.
The key attributes of the Instametrics cluster
- 36 i3.2xlarge nodes (co-hosting Apache Cassandra and Apache Spark) (this cluster runs continuously with no scaling up/down for peaks).
- Each metric event written is, on average, ~100 bytes of data.
- Baseline load (raw metrics received) of 3060 batch writes per second. Each batch contains ~150 rows for a total of ~460k writes/second baseload.
- Additional load when writing roll-up results in 16,200 batch writes/second. Each batch contains ~100 rows for a total of 1.6M writes/second from this load and total peak of just over 2M writes per second. This peak load occurs for about 1 minute out of every 5 (20% of the time).
- The baseline read load on the cluster is about 18,000 reads per second. Each read retrieves ~15 rows for a total baseline read load on the cluster of 270k rows/sec.
- Additional loads when reading data for the roll-ups is about 144,000 reads per second. These reads are actually using Cassandra functions to aggregate data before returning with each read using data from ~15 rows for 2.1M rows/sec read in total. The cluster is also at peak read load for about 20% of the time.
- The cluster currently stores around 54TB of data with a replication factor of 2.
- Fill out the form on the right to download the white paper.
Thank you for your submission
Download Resource
-
- Infographics
The modern approach to streamlining data flow management
Despite the complexity of open source, enterprises that design data infrastructures with a strategic partner can build any number of data flows based on business requirements. Learn more in this infographic.
-
- Videos
Rethink a managed service for open source
Discover all the reasons why you should consider adopting a managed service to get the most from your open source technologies including Apache Cassandra, Apache Kafka, OpenSearch, PostgreSQL, ClickHouse, Valkey, Cadence, and more.
-
- Datasheets
ClickHouse® datasheet
ClickHouse is a popular open source database designed to efficiently handle high volumes of data in real-time, making it ideal for big data and analytics. With Instaclustr for ClickHouse, you get a fully managed service including around-the-clock expert technical support, allowing you to deploy production-ready ClickHouse clusters in minutes.