Introduction
The Garbage Collector (GC) is an important part of the Java Virtual Machine (JVM). It manages an application’s memory allocation, identifies memory that is no longer used and collects it for re-use.
There are currently five GC-implementations available in Java 15 (tested against Azul Zulu build):
- Serial GC
- Parallel GC
- G1 GC
- Z GC
- Shenandoah GC
Let’s review these through a benchmark to get some practical insight into their high-level performance characteristics.
Collectors
Serial
The serial collector uses a single thread to handle GC. This would be suited to an application that occupies few resources, i.e. threads and memory (up to 100 MB).
Parallel
The parallel collector (also known as the throughput collector) is like the serial collector except that it has multiple threads to handle GC. Consider this collector if throughput is the priority, above responsiveness.
G1
The G1 (also known as Garbage First) collector is a mostly concurrent collector, i.e. most of the GC steps are concurrent but not all. Its performance characteristics lie as a compromise between latency and throughput.
Z
The Z Garbage Collector is the low-pause garbage collector, it does not stop the execution of application threads. Its goals are the same as Shenandoah but using coloured references and remapping.
Shenandoah
The Shenandoah Garbage Collector is a low pause time garbage collector, it achieves this by doing more GC steps concurrently. Like ZGC, concurrent copy and compact stages give event shorter pauses, however its implementation is slightly different with its Brooks pointers approach.
A good presentation on Shenandoah GC.
Benchmarks
Allocate
Allocate a byte array of some size repeatedly.
Allocate while 50% occupied
Allocate byte arrays that will occupy 50% of the heap, and then allocate a byte array of some size repeatedly. Originally I configured this test as “while 70% occupied”, however the Shenandoah benchmark would run out of Java heap space. This could be a result of the overhead of its use of Brooks pointers, something to re-visit later perhaps.
Allocate 60%
Allocate a number of byte arrays of some size so that they occupy 60% of the heap, do this repeatedly.
Results
Throughput: Average time per operation.
Allocate / Allocate while 50% occupied:
- In general, G1GC performs the best with in the ‘Allocate’ and ‘Allocate while 50% occupied’ benchmarks, where objects are not humungous in size, i.e. 4 MB in these tests.
- In those instances where allocating 4 MB, ZGC and Shenandoah perform better.
- Note that, in general ZGC and Shenandoah performance is comparable with G1GC.
Allocate 60%:
- This test generates a lot of garbage, here ZGC and Shenandoah perform far better.
- It’s interesting to see that at humungous objects territory, the G1GC’s performance is comparable to ZGC and Shenandoah.
- Note the change in scale, these operations are taking milliseconds versus that the microseconds for the above two benchmarks.
Latency: 99.9th percentile operation time.
Allocate:
- G1GC appears to have better 99.9th percentile latency in the ‘Allocate’ benchmark, but as expected struggled with 4 MB objects.
- Shenandoah and ZGC are generally better than the serial and parallel collectors except at the 4 MB object range.
Allocate while 50% occupied:
- In general, ZGC and Shenandoah perform the best at 4 kB, 40 kB and 400 kB. Parallel GC did do particularly well at 400 kB.
- Interestingly at 4 MB, ZGC really struggles with these large objects – something for a later post perhaps.
Allocate 60%:
- ZGC does the best in all the tests in this benchmark.
- This benchmark which focuses on generating a lot of garbage before releasing it, shows how much better the later collectors perform.
Closing
From these benchmarks:
- For throughput, G1 and Shenandoah GC are the best collectors.
- For latency-sensitive application, ZGC is the best collector.
Obviously each application will have their own unique usage profile and should be benchmarked to identify the most suitable garbage collector to use.
Appendix
Throughput
Benchmark (microseconds) | Serial | Parallel | G1GC | ZGC | Shenandoah |
---|---|---|---|---|---|
Allocate 4kB | 0.339 | 0.334 | 0.236 | 0.243 | 0.24 |
Allocate 40kB | 2.506 | 2.506 | 1.816 | 1.905 | 1.865 |
Allocate 400kB | 26.194 | 24.941 | 17.834 | 18.184 | 17.963 |
Allocate 4MB | 240.664 | 241.458 | 187.925 | 180.764 | 182.533 |
Benchmark (microseconds) | Serial | Parallel | G1GC | ZGC | Shenandoah |
---|---|---|---|---|---|
50% Occupied / Allocate 4kB | 0.341 | 0.355 | 0.234 | 0.244 | 0.244 |
50% Occupied / Allocate 40kB | 2.606 | 2.644 | 2.014 | 2.109 | 1.948 |
50% Occupied / Allocate 400kB | 25.269 | 24.681 | 18.706 | 20.408 | 18.541 |
50% Occupied / Allocate 4MB | 244.635 | 245.725 | 311.757 | 193.762 | 185.755 |
Benchmark (microseconds) | Serial | Parallel | G1GC | ZGC | Shenandoah |
---|---|---|---|---|---|
Allocate 60% in 4kB chunks | 744.214 | 818.615 | 501.143 | 161.28 | 155.462 |
Allocate 60% in 40kB chunks | 534.277 | 706.001 | 436.163 | 117.387 | 112.27 |
Allocate 60% in 400kB chunks | 514.895 | 536.347 | 438.745 | 111.01 | 108.313 |
Allocate 60% in 4MB chunks | 500.248 | 549.07 | 132.809 | 111.872 | 114.179 |
Latency
Benchmark (microseconds) | Serial | Parallel | G1GC | ZGC | Shenandoah |
---|---|---|---|---|---|
Allocate 4kB | 10.496 | 10.288 | 2.3 | 3.4 | 4.065 |
Allocate 40kB | 22.199 | 20.672 | 12.992 | 14.896 | 14.496 |
Allocate 400kB | 213.012 | 154.108 | 51.072 | 125.696 | 59.004 |
Allocate 4MB | 574.464 | 591.872 | 1564.672 | 583.57 | 542.593 |
Benchmark (microseconds) | Serial | Parallel | G1GC | ZGC | Shenandoah |
---|---|---|---|---|---|
50% Occupied / Allocate 4kB | 10.4 | 10.288 | 6.6 | 2.3 | 4.896 |
50% Occupied / Allocate 40kB | 21.792 | 21.472 | 16.672 | 14.992 | 13.6 |
50% Occupied / Allocate 400kB | 151.718 | 89.856 | 148.992 | 137.216 | 123.15 |
50% Occupied / Allocate 4MB | 1046.528 | 2179.949 | 1499.136 | 4426.924 | 689.152 |
Benchmark (microseconds) | Serial | Parallel | G1GC | ZGC | Shenandoah |
---|---|---|---|---|---|
Allocate 60% in 4kB chunks | 985.661 | 1009.779 | 563.085 | 189.006 | 409.993 |
Allocate 60% in 40kB chunks | 726.663 | 819.986 | 488.636 | 145.752 | 150.209 |
Allocate 60% in 400kB chunks | 716.177 | 896.532 | 467.141 | 127.14 | 173.277 |
Allocate 60% in 4MB chunks | 711.983 | 891.29 | 152.83 | 126.878 | 314.049 |