Apache Cassandra® version 5.0 Beta 1.0 is now available in public preview on the Instaclustr Managed Platform!
Here at NetApp we are often asked about the latest releases in the open source space. This year, the biggest news is going to be the release of Apache Cassandra 5.0, and it has already garnered a lot of attention since its beta release. Apache Cassandra has been a go-to choice for distributed, highly scalable databases since its inception, and it has evolved over time, catering to the changing needs of its customers.
As we draw closer to the general availability release of Cassandra 5.0, we have been receiving many questions from our customers regarding some of the features, use cases and benefits of opting for Cassandra 5.0 and how it can help support their growing needs for scalability, performance, and advanced data analysis.
Let’s dive into the details of some of the frequently asked questions.
#1: Why should I upgrade my clusters to Cassandra 5.0?
Upgrading your clusters to Cassandra 5.0 offers several key advantages:
- New features: Cassandra 5.0 comes with a host of new and exciting features that include storage attached indexes, unified compaction strategy, and vector search. These features will significantly improve performance, optimize storage costs and pave the way for AI/ML applications.
- Stability and Bug fixes: Cassandra 5.0 brings more stability through bug fixes and added support for more guardrails. Additional information on the added guardrails can be found here.
- Apache Cassandra versions 3.0 and 3.11 will reach the end of life (EOL) with the General Availability release of Cassandra 5.0: It is standard practice that the project will no longer maintain these older versions (full details of the announcement can be found on the Apache Cassandra website). We understand the challenges this presents customers and NetApp will provide our customers with extended support for these versions for 12 months beyond the Apache Foundation project dates. Our extended support is provided to enable customers to plan their migrations with confidence. Read more about our lifecycle policies on the website.
#2: Is it possible to upgrade my workload from Cassandra 3.x to Cassandra 5.0?
You can upgrade Cassandra to any release within the next major version. For example, you can upgrade a cluster running any 3.x release to any 4.x release. However, upgrading non-adjacent major versions is not supported.
If you want to upgrade by more than one major version increment, you need to upgrade to an intermediate major version first. For example, to upgrade from a 3.x release to a 5.0 release, you must upgrade the entire cluster twice: first to 4.x (preferably the latest release: 4.1.4), then again to 5.0.
#3: What are the key changes in Apache Cassandra 5.0 compared to previous versions?
For several years, Apache Cassandra 3.x has been a major version for many customers, noted for its stability and faster import/exports. Apache Cassandra 4 focused on significantly enhancing performance with a range of enterprise-grade feature additions. However, Cassandra 5.0 introduces new features that are future-driven and open up numerous new use cases.
Let’s take a look at what’s new:
- Advanced data analysis and a pathway to AI through features such as vector search and storage-attached indexing (SAI). To learn more about SAI, visit our dedicated blog on the topic.
- Enhanced performance and efficiency:
- The Unified Compaction Strategy and SAI directly address the need for more efficient data management and retrieval, optimizing resource utilization and improving overall system performance and efficiency.
- The “Trie-based memtables and SSTables” optimize read/write operations and storage efficiency.
- Security and Flexibility:
- Dynamic Data Masking (DDM) enhances data privacy and security by allowing sensitive data to be masked from unauthorized access.
- While Cassandra 4.0 supported JDK 11, Cassandra 5.0 added experimental support for JDK 17, allowing its users to leverage the latest Java features and improve performance and security. It is not currently recommend to use JDK 17 in your production environment.
- The introduction of new mathematical (abs, exp, log, log10 and round) and aggregation scalar CQL functions (count, max/min, sum/avg at a collection level) has expanded Cassandra’s capabilities to handle complex data operations. These functions will help developers handle numerical operations efficiently and contribute to application performance. Read more about the enhanced Mathematical CQL capability on the Apache website.
Apache Cassandra 5.0 introduces hundreds of changes, including minor improvements to brand-new features. Visit the website for a complete list of changes.
#4: What features in Cassandra 5.0 will help me save money?
While Cassandra 5.0 promises exciting features to modernize and make you future-ready, it also includes some features that are going to help you manage your infrastructure better and eventually lower your operational costs. A couple of them include:
- Storage-attached indexes (SAI) can reduce the resource utilization associated with read operations by providing a more efficient way to index and query data, helping with managing increasing operational costs. For some use cases, smaller cluster sizes or node types may be able to achieve the same performance levels.
- The Unified Compaction Strategy optimizes the way Cassandra handles the compaction process. By automating and improving compactions, UCS can help reduce storage requirements and lower I/O overhead, resulting in lower operational costs and performance improvements. Reach out to Instaclustr Support to understand how you can adopt the Unified Compaction Strategy for your Cassandra clusters.
#5: How can Apache Cassandra help us in our AI/ML journey? With Cassandra 5.0, what are the applications and advantages of the new vector data type and similarity functions?
Some of our customers are well-ahead in their AI/ML journey, understand different use cases and know where they’d like to go, while others are still catching up.
Apache Cassandra can help you get started on your AI journey with the introduction of vector search capabilities in Cassandra 5.0. The new vector data type and similarity functions, combined with Storage Attached Indexes(SAI), are designed to handle complex and high-dimensional data.
Vector search is a powerful technique for finding relevant content within large datasets that are either structured (floats, integers, or full strings) or unstructured (such as audio, video, pictures).
This isn’t something totally new; it has been around for many years. It has evolved from a theoretical mathematical model to a key foundational technology today underlying recent AI/ML work and data storage.
Vector search plays a pivotal role in AI applications across industries by enabling efficient similarity-based querying in areas such as:
- Generative AI, Large Language Models (LLMs)
- Retrieval-Augmented Generation (RAG)
- Natural Language Processing (NLP)
- Geographical Information Systems (GIS) applications
These applications can benefit from advanced data analysis, creative content generation, semantic search and spatial analysis capabilities provided by vector search.
Vector search bridges complex data processing with practical applications, transforming the user experience and operational efficiencies across sectors. Cassandra will be an exceptional choice for vector search and AI workloads due to its distributed, highly available and scalable architecture, which is crucial for handling large datasets.
According to a report published by MarketsandMarkets:
“The demand for vector search is going to increase exponentially.
The global vector database market size is expected to grow from a little over USD 1.5 billion in 2024 to USD 4.3 billion by 2028 at a CAGR of 23.3% during the forecast period.”
As this feature is new to Cassandra 5.0, it is important to test it thoroughly before gradually integrating it into the production environment. This will allow you to explore the full capabilities of vector search while ensuring that it meets your specific needs.
Stay tuned to learn more about Vector Search in Cassandra 5.0, its prerequisites, limitations, and suitable workloads.
Instaclustr takes care of cluster upgrades for our customers to help them take advantage of the latest features without compromising stability, performance, or security. If you are not a customer yet and would like help with major version upgrades, contact our sales team.
Getting Started
- If you are an existing customer and would like to try these new features in Cassandra 5.0, you can spin up a cluster today. If you don’t have an account yet, sign up for a free trial and experience the next generation of Apache Cassandra on the Instaclustr Managed Platform.
- Read all our technical documentation here.
- Discover the 10 rules you need to know when managing Apache Cassandra.
- If you are using a relational database and are interested in vector search, check out this blog on support for pgvector, which is available as an add-on for Instaclustr for PostgreSQL services.