In February, the Apache Cassandra project issued releases for all currently supported branches of Apache Cassandra. As far as releases go the change list was modest, which shows that we’re seeing fewer bugs and is overall a good sign. However, there were some major issues addressed that need a specific mention.
TTL Issue
Firstly, the TTL issue. This is a problem for anyone using long TTL’s. Cassandra currently supports up to a maximum of a 20 year TTL (don’t ask why), and as of January 19th, 2018, at 3:14AM (UTC), a ttl beyond this time will overflow due to the 2038 problem.
From the NEWS.txt:
The maximum expiration timestamp that can be represented by the storage engine is 2038-01-19T03:14:06+00:00, which means that inserts with TTL thatl expire after this date are not currently supported. By default, INSERTS with TTL exceeding the maximum supported date are rejected, but it’s possible to choose a different expiration overflow policy. See CASSANDRA-14092.txt for more details.
Prior to 3.0.16 (3.0.X) and 3.11.2 (3.11.x) there was no protection against INSERTS with TTL expiring after the maximum supported date, causing the expiration time field to overflow and the records to expire immediately. Clusters in the 2.X and lower series are not subject to this when assertions are enabled. Backed up SSTables can be potentially recovered and recovery instructions can be found on the CASSANDRA-14092.txt file.
The CASSANDRA-14092.txt file has a lot of useful information on how to deal with this issue. Note that as we approach 2038, the maximum possible TTL will become shorter and you will be more at risk to this problem. So if you’re using TTL’s you should action this now rather than risking forgetting about it. Come 2038 Apache Cassandra <4.0 is very likely to completely stop working, so you’ll have to upgrade to a version that fixes this problem before then. Now I know everyone’s saying 2038 is a long way away, but I haven’t met a database that didn’t want to survive well past the point of everyone wanting to kill it, and I’m absolutely positive there will be some unlucky souls running 3.11 on January 18th, 2038, faced with a potentially disastrous major upgrade. Don’t let me be right, it’s not worth it. Read CASSANDRA-14092.txt today and come up with a long term plan. The last thing we need is a Y2.038K.
A quick summary is that you have the choice of one of the following 3 options for dealing with TTL’s that go beyond 2038:
- REJECT: this is the default policy and will reject any requests with expiration
- CAP: any insert with TTL expiring after 2038-01-19T03:14:06+00:00 will expire on
- CAP_NOWARN: same as previous, except that the client warning will not be emitted.
These policies may be specified via the -Dcassandra.expiration_date_overflow_policy=POLICY startup option in the jvm.options configuration file.
Note that there exists the problem of previously written data prior to upgrading to one of (3.11.2, 3.0.16, 2.2.12, 2.1.20) in which that these policies didn’t exist and thus the data was written but will not be queryable as the TTL overflowed and immediately deleted the data. If this has happened to you there are some extra steps you can take to try and recover the deleted data, but it’s not a sure thing as it will be removed immediately by a compaction. You’re probably better off bidding goodbye to that data.
JRE Support
Another major issue is the JRE support. Version 3.11.1 broke with JRE8u162 due to Cassandra depending on some internal JVM calls that can be changed between minor versions. This issue was resolved in CASSANDRA-14173, and in 3.11.2 8u162 and onwards will work. If you’re required to update your JVM for compliance, be sure you upgrade to 3.11.2 prior to upgrading your JVM. Note this does not affect the 2.x or 3.0 lines.
Cassandra 3.11.1
And finally, relevant because it’s recent news but related to 3.11.1 and onwards, a noteworthy regression has been introduced into 3.11.1 in the repair code in which unbounded validation compactions can be kicked off simultaneously. This gets worse if you’re using vnodes and have many tables, as validations will be kicked off for each vnode and each column family consecutively with no waiting. This will cause a massive hit on CPU and affect your read latencies (as well as everything else). More info can be found on CASSANDRA-13797 and CASSANDRA-14332.
Summary of Important changes in each Cassandra version
3.11.2
- Use new token allocation for non-bootstrap case as well. This allows you to specify allocate_tokens_for_keyspace and have a node started with auto_bootstrap: false use the algorithm.(CASSANDRA-14212)
- Remove dependencies on JVM internal classes from JMXServerUtils. This fixes the issue that caused Cassandra not to start when using JDK8u161 (CASSANDRA-14173)
- Print correct snitch information from nodetool describecluster. (CASSANDRA-13528)
- Close socket on error during connect on OutboundTcpConnection. This should fix issues with nodes that crashed being restarted while their socket is still in CLOSE_WAIT, and thus not being able to successfully join the ring until the socket is closed by the kernel. This resulted in a node seeing all other nodes as up but those nodes seeing it as down. (CASSANDRA-9630)
- Prevent continuous schema exchange between 3.0 and 3.11 nodes. This was possible during upgrade from 3.0.x to 3.11.x and would result in mass schema changes. There was no major negative effect but potential corrupting schema and unnecessary changes. (CASSANDRA-14109)
- Fix imbalanced disks when replacing node with same address with JBOD. (CASSANDRA-14084)
- Reload compaction strategies when disk boundaries are invalidated. Fixes a deadlock that could occur when disk boundaries are changed (JBOD). (CASSANDRA-13948)
- Remove OpenJDK log warning. Finally removed that warning on startup that’s been invalid for years now. Everything runs with OpenJDK! (CASSANDRA-13916)
- Prevent compaction strategies from looping indefinitely. Additional preventive fix after CASSANDRA-13948 (CASSANDRA-14079)
- Cache disk boundaries. This improves startup times that were affected by partitioning SSTables by token range (JBOD) (CASSANDRA-13215)
- Correctly count range tombstones in traces and tombstone thresholds. (CASSANDRA-8527)
3.0 onwards
- Bugfixes for nodetool verify (CASSANDRA-14217, CASSANDRA-13933, CASSANDRA-13922)
Fixes for commit log that will avoid data loss due to crashing when using periodic commit log. (CASSANDRA-13987, CASSANDRA-14108) - Fix for range tombstones creating unexpected rows after upgrading from 2.x to >=3.0. (CASSANDRA-14008)
- Allow role names to have forward slashes. Previously forward-slashes would cause an error when trying to query resources. (CASSANDRA-14088)
- Fix cleanup after removing DC from replication. Previously cleanup wouldn’t remove old data that is no longer replicated if the DC was removed from RF. (CASSANDRA-13526)
- Fix TTL regression when using Materialized Views (CASSANDRA-14071)
- Flag for dropping oversized read repair mutations. If you have a read that’s failing due to a read repair failing you can now set -Dcassandra.drop_oversized_readrepair_mutations=true to skip the read repair which will allow the read to succeed but data to remain inconsistent until a repair has been run. (CASSANDRA-13975)
- Materialised views marked as experimental (CASSANDRA-13959)
- Disable ALTER/DROP on system_distributed. This means you can no longer change TTL’s/compaction strategies on these tables. See tickets for workaround. (CASSANDRA-13813, CASSANDRA-13954)
2.2 onwards
- Let JVM handle OutOfMemoryErrors. Heap dumps will now be generated properly by the JVM on OOME (CASSANDRA-13006)
- Fix reference bug during scrub/index redistribution/cleanups. Fixes “Spinning trying to capture readers” issue. (CASSANDRA-13873)
2.1 onwards
- Protect against overflow of local expiration time. Important change relating to maximum TTL supported by Cassandra, see below for details. (CASSANDRA-14092)
Performance Improvements (misc)
[3.0.16]
- Reduce garbage created by DynamicSnitch (CASSANDRA-14091)
- Optimize CRC check chance probability calculations during compression. (CASSANDRA-14094)
[3.11.2]
- Avoid invalidating disk boundaries unnecessarily (CASSANDRA-14083)
- Avoid locks when checking LCS fanout and if we should defrag (CASSANDRA-13930)
Minor Bug Fixes
- CompactionManager may incorrectly determine a background compaction is running. (CASSANDRA-13801)
- Acquire read lock before accessing CompactionStrategyManager fields (CASSANDRA-14139)
Round buffer size to powers of 2 for the chunk cache (CASSANDRA-13897)