Instaclustr managed Cassandra services include the underlying cloud provider charges in our standard monthly fee. We’re often asked by customers and potential for information about what these charges look like. This information is also useful to people planning their own cloud-based Cassandra implementation. So, whichever category you’re in, read on for more information.
For simplicity, in this post I’ll focus on our AWS deployment, however, similar cost concepts exist across the other cloud support (Azure, IBM SoftLayer) and other providers we’ve taken a close look at.
Background on our deployment
To start off, a quick bit of background about how we deploy Cassandra (which is basically best practice for deploying Cassandra in the cloud):
- we have some offerings that use local SSD storage and some that use GP2 EBS (General Purpose v2 Elastic Block Storage – a class of AWS network-based storage);
- we deploy each rack into the cluster in a separate Availability Zone (AZ) (most clusters have three racks);
- we deploy each cluster into a separate Virtual Private Cloud (VPC) environment and allow access either via a peered VPC and private IP or via public Elastic IPs assigned to each instance;
- we back up each cluster daily to S3.
AWS Costs
With this style of architecture, we generate the following categories of AWS costs:
Cost | Description | Driver |
---|---|---|
Instances | Cost of the base compute instances (eg m4.xl). | Number and size of nodes in the cluster. |
EBS Volume | Cost of attached EBS volumes (where applicable) | Size of the EBS volume (eg 400GB) |
Network – Public IP In/Out | Loading/retrieving data via public IP | Only applicable if accessing via Public IP: dependant on number of Cassandra read/writes in a month and transaction size. |
Network – Interzone In/Out | Cross-availability zone communication within the cluster | Transaction volume and size, consistency factor used for reads |
Network –
VPC In/Out | Loading/retrieving data via a peered VPC | Only applicable if accessing via Peered VPC: dependant on number of Cassandra read/writes in a month and transaction size. |
S3 Storage | S3 space for storing backups | Volume of data, length of backup retention, deduplication of backup files/data |
S3 Operations | S3 calls for storing backups | Number of sstables (volume of data + compaction strategy), backup strategy |
S3 Data Transfer Out | S3 retrieval data transfer cost | Only applicable if you need to copy data from S3 to a region other than US East to restore a backup. |
In most cases, the instance (compute capacity) cost will be the largest cost component. However, for large EBS-based nodes, EBS cost can come close to compute capacity cost. And, for some instance types and usage scenarios, we have seen network costs equal and even exceed the base compute costs. S3 costs are typically not a major component of the overall picture.
Historical data
At Instaclustr, we use the 18 months+ of historical data we have from running Cassandra on the cloud to estimate expected charges in the monthly fee for our managed service. If you are not interested in managed service then our consultants can also access this data to help you plan your own deployment. Contact us for more information.