Like so many others within the Apache Cassandra group, I’m extraordinarily excited to see that the 4.0 launch is lastly right here. There are many, many enhancements to Cassandra 4.0. One enhancement that’s extra essential than it would look is the addition of help for Java variations 9 and up. This was not trivial, as a result of Java 9 made modifications to some inside APIs that essentially the most performance-oriented Java initiatives like Cassandra relied on (you possibly can learn extra about this right here).
It is a massive deal as a result of with Cassandra 4.0, you not solely get the direct enhancements to efficiency added by the Apache Cassandra committers, you additionally unlock the power to make the most of seven years of enhancements within the JVM (Java Digital Machine) itself.
Right here, I’d prefer to concentrate on enhancements in Java rubbish assortment that Cassandra 4.0 coupled with Java 16 provides over Cassandra 3.11 on Java 8.
The rubbish assortment problem
In 2012, I gave a chat titled, “Coping with JVM Limitations in Apache Cassandra.” Right here is the primary slide from that presentation:
On the one hand, rubbish assortment is a major cause that Java is a lot extra productive than conventional methods languages like C++. As JVM architect Cliff Click on as soon as wrote, “Many concurrent algorithms are very simple to put in writing with a GC and completely arduous to downright unimaginable utilizing specific free.” Cassandra takes full benefit of this energy.
However performing rubbish assortment means having to briefly pause the JVM to find out which objects are now not in use and may safely be disposed of. These GC pauses may cause delayed response instances to shopper requests, i.e., elevated latencies.
Not all requests are affected by this–solely the handful of requests which might be in flight whereas Cassandra’s request-handling threads are paused for the GC. The efficiency affect is thus solely seen in tail latencies, that’s, the 99th percentile or 99.ninth percentile measurements, akin to the slowest 1% or 0.1% of requests.
As with so many issues, optimizing GC entails tradeoffs, and the unique Java GC designs targeted extra on enhancing throughput than on lowering pause instances. Quick ahead to 2021 and we now have frequent server-class CPUs with 64 cores/128 threads—we now have loads of throughput on faucet. It’s time to spend a few of these cycles on decrease pause instances.
The Z Rubbish Collector (ZGC) was created to deal with this case, and particularly to ensure pause instances underneath 10ms. ZGC was added to Java 11 as an experimental function, promoted to manufacturing in Java 15, and additional improved in Java 16.
To point out how effectively ZGC improves Cassandra efficiency, we in contrast each throughput and latency in three environments: Cassandra 3.11 operating on JDK 8 with its default CMS GC settings, Cassandra 4.0 operating on JDK 8 with the identical settings, and Cassandra 4.0 operating on JDK 16 with ZGC. I’m happy to report that ZGC convincingly achieves its design objectives, permitting Cassandra to ship nearly-constant latencies by means of the 99th percentile, with solely a modest uptick on the 99.ninth percentile!
ZGC efficiency outcomes
My colleague Jonathan Shook benchmarked the efficiency traits of Cassandra 3.11 and 4.0 intimately throughout three workloads: easy key/worth, a time sequence workload with many rows per partition, and a tabular workload with one row per partition however many columns per row.
Right here we’re Cassandra operating at 70% of most throughput. This leaves 30% operational headroom to soak up compaction, restore, or load spikes for the needs of sensible measurements.
Cassandra 4.0 operating with the identical configuration as Cassandra 3.11 is 30% quicker in the important thing/worth workload, 2% slower within the time sequence workload, and 10% quicker within the tabular workload. Turning on ZGC unlocks a further 30% extra throughput for key/worth and time sequence workloads, however has no impact on the tabular workload.
I’ve cut up the latency outcomes into one chart per workload so it’s simpler to see the developments throughout the totally different percentiles:
For these outcomes, we restricted every take a look at situation to the slowest system’s throughput, i.e., we used 30,000, 44,000, and 54,000 requests per second for the important thing/worth, time sequence, and tabular workloads, respectively.
Cassandra 4.0’s latencies are just about equivalent to three.11’s with the identical GC settings, however ZGC is persistently higher, as much as a strong issue of 5 to 10 higher at p99 and p999 percentiles.
The NoSQLBench efficiency testing suite
Most benchmarks of non-relational databases are carried out with both product-specific tooling (like cassandra-stress), or with YCSB, which provides you a lowest-common-denominator key-value workload throughout dozens of methods.
Jonathan Shook created NoSQLBench to be a cross-platform efficiency testing device that’s simpler to make use of than cassandra-stress and (a lot) extra highly effective than YCSB; in actual fact, its scripting layer is highly effective sufficient to help issues that no different testing device can allow, with explicit emphasis on modeling advanced workloads with constancy, in addition to simulating sensible situations reminiscent of load spikes. As its identify suggests, NoSQLBench shouldn’t be Cassandra-specific and encourages participation from all who wish to contribute; right now there are shoppers for Cassandra, CockroachDB, JDBC, and MongoDB, in addition to non-database merchandise Kafka and Pulsar. Should you’re critical about efficiency testing in 2021, it is best to take a look at NoSQLBench. You may get began at GitHub. Different helpful hyperlinks: releases, discord, docs.
The NoSQLBench workload descriptions for the exams on this put up might be discovered right here.
With out switching to ZGC, Cassandra 4.0 provides modest however actual throughput enhancements for key/worth and tabular workloads.
Combining Cassandra 4.0 with ZGC in Java 16 leads to additional enhancements to throughput for key/worth and time sequence workloads in addition to convincingly demonstrating ZGC’s design objectives to make GC pause time a non-issue throughout all examined workloads for Cassandra 4.0.
ZGC is production-ready beginning with Java 15; for enterprises that wish to keep on with LTS releases, ZGC will probably be one of many headlining causes to improve to the Java 17 LTS launch later this 12 months. ZGC is among the most vital efficiency “free lunches” out there, and it Simply Works—the outcomes proven listed below are out-of-the-box for ZGC with no additional tuning.
Appendix: Take a look at surroundings
All exams have been run on the identical bodily cluster of AWS i3.4xl nodes: 16 vCPUs, 122GB RAM, 10Gb community, 5 nodes within the cluster. Storage was configured as XFS on direct NVMe, single quantity. All information was saved at RF3. Assigned tokens have been used to make sure constant information distribution throughout the examined variations. Consistency degree for all operations was set as LOCAL_QUORUM. Concurrency from the shopper facet was set at 960 (20x shopper cores) for the keyvalue take a look at, and 480 (10x shopper cores) for the time-series and tabular exams. All measurements have been taken from the shopper, and embody period between submitting and absolutely studying any information in outcomes. All measurements have been taken with 3 vital digits of precision, then rounded to the closest ms. ZGC was configured with fundamental really helpful settings: 16GB min heap, 64GB max heap, giant pages enabled. The opposite numbers are utilizing Cassandra’s out-of-the-box configuration with CMS.