The Debezium® Change Data Capture (CDC) source connector for Apache Cassandra® version 4.0.11 is now generally available on the Instaclustr Managed Platform.
The Debezium CDC source connector will stream your Cassandra data (via Apache Kafka®) to a centralized enterprise data warehouse or any other downstream data consumer. Now business information in your Cassandra database is streamed much more simply than before!
Instaclustr has been supporting the Debezium CDC Cassandra source connector through a private offering since 2020. More recently, our Development and Open Source teams have been developing the CDC feature for use by all Instaclustr Cassandra customers.
Instaclustr’s support for the Debezium Cassandra source connector was released to Public Preview earlier this year for our customers to trial in a test environment. Now with the general availability release, our customers can get a fully supported Cassandra CDC feature already integrated and tested on the Instaclustr platform, rather than performing the tricky integration themselves.
“The Debezium CDC source connector is a really exciting feature to add to our product set, enabling our customers to transform Cassandra database records into streaming events and easily pipe the events to downstream consumers.
Our team has been actively developing support for Debezium CDC Cassandra source connector both on our managed platform and also through open source contributions by NetApp to the Debezium Cassandra connector project. We’re looking forward to seeing more of our Cassandra customers using CDC to enhance their data infrastructure operations.”—Jaime Borucinski, Apache Cassandra Product Manager
Instaclustr’s industry–leading SLAs and support offered for the CDC feature provide our customers with confidence to rely on this solution for production. This means that Instaclustr’s managed Debezium source connector for Cassandra is a good fit for your most demanding production data workloads, increasing the value for our customers with integrated solutions across our product set. Our support document, Creating a Cassandra Cluster With Debezium Connector provides step by step instructions to create an Instaclustr for Cassandra CDC solution for your business.
How Does Change Data Capture Operate with Cassandra?
Change Data Capture is a native Cassandra setting that can be enabled on a table at creation, or with an alter table command on an existing table. Once enabled, it will create logs that capture and track cluster data that has changed because of inserts, updates, or deletes. Once initial installation and setup has been completed, the CDC process will take these row-level changes and convert your database records to a streaming event for downstream data integration consumers. When running, the Debezium Cassandra source connector will:
- Read Cassandra commit log files in cdc_raw directory for inserts, updates, deletes, and log the change into the nodes commit log.
- Create a change event for every row-level insert.
- For each table, publish change events in a separate Kafka topic. In practice this means that each CDC enabled table in your Cassandra cluster will have its own Kafka topic.
- Delete the commit log from the cdc_raw directory.
Apache Cassandra and Debezium Open Source Developments
Support for CDC was introduced in Cassandra version 3.11 and has continued to mature through version 4.0. There have been notable open source contributions that improve CDC functionality, including support for Cassandra version 3.11 and Cassandra 4.0 as separate modules, with shared common logic known as a ‘core module’ authored by NetApp employee Stefan Miklosovic and committed by Gunnar Morling. This was identified during development for Cassandra version 4.0 CDC support.
As support for version 4.0 was added, certain components of Cassandra version 3.11 CDC would break. The introduction of the core module now allows Debezium features and fixes to be pinned to a specific Cassandra version when released, without breaking Debezium support for other Cassandra versions. This will be particularly valuable for future development of Debezium support for Cassandra version 4.1 and each newer Cassandra version.
Stefan has continued to enhance Cassandra 4.0 support for Debezium with additional contributions like changing the CQL schema of a node without interrupting streaming events to Kafka. Previously, propagating changes in the CQL schema back to the Debezium source connector required a restart of the Debezium connector. Stefan created a CQL Java Driver schema listener, hooked it to a running node, and as soon as somebody adds a column or a table or similar to their Cassandra cluster these changes can now be detected in Debezium streamed events with no Debezium restarts!
Another notable improvement for Cassandra CDC in Cassandra version 4.0 is the way the Debezium source connector reads the commit log. Cassandra version 3.11 CDC would buffer the CDC change events and only publish change events in batch cycles creating a processing delay. Cassandra version 4.0 CDC now continuously processes the commit logs as new data is available, achieving near real-time publishing.
If you already have Cassandra clusters that you want to stream into an enterprise data store, get started today using the Debezium source connector for Cassandra on Instaclustr’s Managed Platform.
Contact our Support team to learn more about how Instaclustr’s managed Debezium connector for Cassandra can unlock the value of your new or existing Cassandra data stores.