Introduction
The automatic memory management afforded by the Java Garbage Collector (GC) removes the headaches of memory management. The cost of this is that there are GC pauses when garbage collection kicks in. While for most applications this is fine, for latency sensitive applications these pauses can be an issue. An answer to this is to use off-heap memory which is outside the scope of the Garbage Collector.
In this post, let us benchmark the off-heap memory options:
- ByteBuffer API
- Unsafe API
- Foreign Memory Access API
APIs
ByteBuffer API
There are two flavours of ByteBuffer, direct and non-direct. With a direct buffer the JVM tries to perform native I/O operations on it. It is worth noting that the buffer cannot be larger than 2 GB because it uses integers for indexing.
https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/nio/ByteBuffer.html
Unsafe API
The Unsafe API is a non-standard Java API that provides access to direct memory among other features. As suggested by the name, the API is unsafe, and it is possible to crash the application with incorrect memory access – something that happened when building the benchmarks.
Foreign Memory Access API
The new (currently incubating in Java 15, in jdk.incubator.foreign) Foreign Memory Access (FMA) API provides safe and supported access to direct memory.
Benchmarks
Benchmarks run using OpenJDK 15 (2020-09-15 / VM 15+36-1562). The benchmarks are:
- A simple byte write to an empty buffer.
- A simple byte read from a populated buffer.
- A simple long write to an empty buffer.
- A simple long read from a populated buffer.
While the non-direct ByteBuffer does not exist in off-heap memory and is in the heap, it has been added for comparison against direct ByteBuffer performance.
The first benchmark shows byte read and byte write performance across the four options.
- The write-performance is comparable across all four approaches.
- While for read performance, the Unsafe API is the obvious winner with the direct ByteBuffer performing slightly better than the non-direct ByteBuffer.
- As for the Foreign Memory Access approach, its read performance is very poor.
Brian Goetz covers this issue in his Beyond ByteBuffers talk: https://youtu.be/iwSCtxMbBLI?t=2342. The cause of this is an issue with the JIT compiler and if the same code is re-compiled with the Graal compiler, the FMA approach will perform far better.
The second benchmark shows long read and long write performance across the four options.
- Again, the result is similar to the byte benchmarks.
- The write-performance is comparable across all four approaches.
- While for read performance, the Unsafe API is the obvious winner with the direct ByteBuffer performing slightly better than the non-direct ByteBuffer.
- Again, the FMA approach performs poorly.
While benchmarking the Byte Buffer long read and long write performance, I wondered if using a long view buffer would affect performance. There is a very slight improvement in the performance which would only be visible when looking at the raw figures.
The figures are at the end of the post.
Closing
In summary, the Unsafe API is still the quickest method to access off-heap memory.
Appendix
Benchmark | Unsafe | ByteBuffer (Direct) | ByteBuffer (Non-direct) | Foreign Memory Access |
---|---|---|---|---|
getByte | 293,067,292.21 | 223,752,873.76 | 205,326,409.45 | 36,244,400.66 |
putByte | 42,141,794.02 | 43,000,164.75 | 42,646,352.70 | 40,297,479.99 |
Benchmark | Unsafe | ByteBuffer (Direct) | ByteBuffer (Non-direct) | Foreign Memory Access |
---|---|---|---|---|
getLong | 309,569,798.53 | 201,305,093.00 | 179,290,012.74 | 35,481,260.50 |
putLong | 44,084,627.65 | 41,780,129.52 | 41,706,746.90 | 39,933,953.10 |
Benchmark | Unsafe | ByteBuffer (Direct) | ByteBuffer (Non-direct) | Foreign Memory Access |
---|---|---|---|---|
getLong | 309,569,798.53 | 207,778,379.93 | 180,631,902.46 | 35,481,260.50 |
putLong | 44,084,627.65 | 42,464,387.13 | 42,561,624.50 | 39,933,953.10 |