Overview – Why Apache Cassandra Over DynamoDB
DynamoDB and Apache Cassandra are both very popular distributed data store technologies. Both are used successfully in many applications and production-proven at phenomenal scale.
At Instaclustr, we live and breathe Apache Cassandra (and Apache Kafka). We have many customers at all levels of size and maturity who have built successful businesses around Cassandra-based applications. Many of those customers have undertaken significant evaluation exercises before choosing Cassandra over DynamoDB and several have migrated running applications from DynamoDB to Cassandra.
This blog distills the top reasons that our customers have chosen Apache Cassandra over DynamoDB.
Reason 1: Significant Cost of Writes to DynamoDB
For many use cases, Apache Cassandra can offer a significant cost saving over DynamoDB. This is particularly the case of requirements that are write-heavy. The cost of write to DynamoDB is five times that cost of the read (reflected directly in your AWS bill). For Apache Cassandra, write are several times cheaper than reads (reflected in system resource usage).
Reason 2: Portability
DynamoDB is available in AWS and nowhere else. For multi-tenant SaaS offerings where only a single instance of the application will ever exist, then being all-in on AWS is not a major issue. However, many applications, for a lot of good reasons, still need to be installed and managed on a per-customer basis and many customers (often the largest ones!) will not want to run on AWS. Choosing Cassandra allows your application to run anywhere you can run a linux box.
Reason 3: Design Without Having to Worry About Pricing Models
DynamoDB’s pricing is complex with two different pricing models and multiple pricing dimensions. Applying the wrong pricing models or designing your architecture without considering pricing can result in order of magnitude differences in costs. This also means that a seemingly innocuous change to your application can dramatically impact cost. With Apache Cassandra, you have your infrastructure and you know your management fees, once you have completed performance testing and you know that your infrastructure can meet your requirements, you know your costs.
Reason 4: Multi-Region Functionality
Apache Cassandra was the first NoSQL technology to offer active-active multi-region support. While DynamoDB has added Global Tables, these have a couple of key limitations when compared to Apache Cassandra. The most significant in many cases is that you cannot add replicas to an existing global table. So, if you set up in two regions and then decide to add a third you need to completely rebuild from an empty table. With Cassandra, adding a region to a cluster is a normal, and fully online, operation. Another major limitation is that DynamoDB only offers eventual consistency across Global Tables, whereas Apache Cassandra’s tunable consistency levels can enforce strong consistency across multiple regions.
Reason 5: Avoiding Vendor Lock-In
Apache Cassandra is true open source software, owned and governed by the Apache Software Foundation to be developed and maintained for the benefit of the community and able to be run in any cloud or on-premise environment. DynamoDB is an AWS proprietary solution that not only locks you in to DynamoDB but also locks your application to the wider AWS ecosystem.
While these are the headline reasons that people make the choice of Apache Cassandra over DynamoDB, there are also many advantages at the detailed functional level such as:
- DynamoDB’s capacity is limited by partition with a maximum of 1,000 write capacity units and 3,000 read capacity units per partition. Cassandra’s capacity is distributed per node which typically provide a per-partition limit orders of magnitude higher than this.
- Cassandra’s CQL query language provides a simple learning curve for developers familiar with SQL.
- DynamoDB only allows single value partition and sort (called clustering in Cassandra) keys while Cassandra support multi-part keys. A minor difference but another way Cassandra reduces application complexity.
- Cassandra supports aggregate functions which in some use cases can provide significant efficiencies.