Introduction

Java and GraalVM

This blog is about making your Java code go faster, without any code changes, just by using a specific version of the Java runtime.

I am writing a series of blogs on using different languages to access Oracle databases [eg Python, Node.js, Rust and Julia]. This week, I thought that I would cover Java JDBC.

Java is not my first choice of language, and I have been using JDK 8 for many years, so I thought that I would try out some of the newer Java JDKs. I got side tracked from my intended JDBC topic as I was surprised by the performance of one of the newer JDKs. So this blog become a comparison of recent JDKs using two micro benchmarks.

The JDKs covered were:

Oracle JDK 1.8.0_351-b10
Oracle JDK 11.07
OpenJDK 19.0.1
GraalVM CE JDK 19.0.1
GraalVM EE JDK 19.0.1
GraalVM CE Native Image 19.0.1
GraalVM EE Native Image 19.0.1

This blog covers the following topics:

The two micro menchmarks that I created
The results
My Java source code
How I did the builds and tests
How I calculated the results
Summary

This blog is not a tutorial on the Java language, nor the many features of the GraalVM. This is also not a blog on how to download and configure various JDKs.

I also did the same benchmarks for many other popular runtime languages. I will eventually get back to using Java JDBC with Oracle Databases.

My micro benchmarks

After I got the mandatory ‘hello world’ working with the latest JDK, I wondered whether the marketing claims of newer JDKs are faster were true. I also wanted to try out GraalVM as I had heard of it, but I had never tried it before.

Newer JDKS are faster

I needed some trivial Java workloads to compare the various JDKs that I was interested in. I am not interested in training neural networks nor simulating the big bang. Instead, I care deeply about fast access to SQL databases. I do not want to get distracted by things like Java reflection, exception handling nor generics. Before I got too deep into JDBC benchmarking, I wanted something simple that I could write in 10 minutes. After 10 minutes, I ended up with some trivial code to do some string processing and simple maths:

Calculate the Fibonacci sequence with an input of 1475, repeated one million times
Some trivial string processing with String and StringBuffer. ie creating, concatenating and using substrings for strings under 2000 characters with a huge number of iterations

How valid are these results

Micro benchmarks are by definition only relevant to the specific workload that they cover. These workloads do not try to cover everything, they only cover what I care about. The only workload that matters to you is your workload. So try your own workloads with the newer JDKs. I have found that string processing and simple maths are important to enable fast SQL database drivers, so that is what I tested.

Your milage will vary

Results

java execution time

JDK 8 took the longest to run my combined micro benchmarks
JDK 11 was about twice as fast as JDK 8 to run the same workload
OpenJDK 19 was slightly faster than JDK 11 to run the same workload
GraallVM Community Edition 19.0.1 was actually slower than OpenJDK 19 for the same workload
GraalVM Enterprise Edition 19.0.1 was significantly faster than all of the other runtimes
- It was between 131x to 293x faster !!!
- This Java code running in JIT mode was almost as fast as my optimized C code for this workload

Why was GraallVM Enterprise Edition 19.0.1 so much faster than other runtimes? I assume that the developers were effective in using more aggressive optimizations and licensed the product accordingly.

The results look fairly definitive, but the reality is more complicated than that.

execution time by runtime

I had naively assumed that using AOT [ahead of time] compilation would give better results than JIT mode. This turned out not to be true for these micro benchmarks. Both the community and enterprise editions of native image compilation were signifcantly slower than GraallVM Community Edition 19.0.1 when run in JIT mode. Are these results true for any Java workload, or specific to my microbenchmarks? I do not know. Try GraallVM Enterprise Edition 19.0.1 in both JIT and AOT modes yourself using your workloads.

Maybe in the future, you will have the option to minimize startup time, or to maximize runtime performance with GraalVM native image?

Your milage will vary

The Java native-image [ahead of time] compiler was designed with three goals:

To create a standalone executable which can run independently of the Java runtime
To minimize the startup time of this executable
Lower runtime memory overhead compared to the Java runtime

For my micro benchmarks, all three goals were achieved:

A Linux x8664 ELF image was created
The time to start the process, execute the fibonacci method and exit was about three milliseconds
The VIRT memory [measured by top] of running the executable was about 0.13 GB vs about 32 GB using the Java runtimes

Native startup time

Of the three Java methods, long_strings took 99% of the execution time. Was I weighting the workload to give good Java results? Nope, I originally ran this workload to compare Oracle PLSQL vs C code. For this PLSQL vs C code test, I balanced the number of iterations of the three functions to enable easy measurement of my C code. ie I needed to do sufficient iterations to measure the work in the C code vs the overhead of starting and stopping the process in which it ran. I will cover the PLSQL vs C code results in a follow-up blog.

My trivial Java source code

The Main method

Main method

The fibonacci method has an input of 1475 and was called one million times
- Why 1475, to avoid integer overflow in some of the other languages which I tested this workload against
Both the strings and long_strings methods are called with an input of 1475
Why am I not using a static const variable for 1475, just because

The Fibonacci Method

Fibonacci method

Why am I using a double for the variables?

The values of the Fibonacci sequence rapidly get larger
I also implemented these micro benchmarks in many other languages
Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
So I used the type double to be fair and consistent across all of the languages

strings method

Method for the Fibonacci function

This method does some trivial operations on Strings and StringBuffers
- The operations include constructors, append, length, substring and copy
There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
- 1475 * 12 * 1475 = 26,107,500
- n = 1475
- The string length is 12 characters

The long_strings method part 1

longer string concatenation

The logic for method long_strings was the same as for method strings, but there were significantly more StringBuffer concatenation operations.

The fully appended StringBuffer is 1965 bytes long
Why did I not initially create the StringBuffer with more than 1965 bytes?
- I wanted to make the comparisons with other languages fair and consistent
- I am not trying to optimize the code for this workload, I am trying to see how common / ‘bad’ code performs
The number of iterations of the String and StringBuffer operations is significantly larger
This also creates the opportunity of garbage collection

The long_strings method part 2

function long_string - part 2

The ‘j’ for loop iterates based on the length of the string, ie 1965 times
The ‘k’ for loop iterates n times, ie 1475
The outer ‘i’ for loop also iterates n times, ie 1475
1475 * 1965 * 1475 = 4,275,103,125 iterations
So there are 4.2 billion iterations of the ‘k’ loop which creates Strings from substrings

My environment

I repeated these tested on two different machines:

Oracle Linux 8.6 on Oracle Cloud. 4 OCPU with 128 GB RAM
Ubuntu 22.04 on Oracle Cloud. 4 OCPU with 128 GB RAM
As these were VMs, to avoid the risk of a noisey neighbor, I repeated the tests many times over three days
My micro benchmarks were not doing any disk nor network IO. Instead they were CPU bound for a single threaded workload.
As measured by ‘top‘, the VIRT and RSS memory was stable for the duration of the tests and there was an abundance of RAM
- VIRT was about 32 GB and the VMs had 128 GB of RAM
- I was not running any other workload on the VMs apart from ‘top’
I followed the instructions to install and configure GraalVM Native Image
GCC 8.5.0 was used by GraalVM Native Image on my Oracle Linux VM

How I built and ran each test

For jdk 1.8.0_351

java -version
javac fibStr.java
time java fibStr

For jdk 11.0.17

java -version
javac fibStr.java
time java fibStr

For OpenJdk 19.0.1

java -version
javac fibStr.java
time java fibStr

For jdk 19.0.1 from graalvm-ce-java19-22.3.0

java -version
javac fibStr.java
time java fibStr

For jdk 19.0.1 from graalvm-ee-java19-22.3.0

java -version
javac fibStr.java
time java fibStr

For jdk 19.0.1 Native-Image from graalvm-ce-java19-22.3.0 Community Edition

java -version
javac fibStr.java
native-image fibStr
time ./fibstr

Community Edition native-image

file fibstr
fibstr: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=743060ec73742fd44031580bbfeed20d32e038d2, with debug_info, not stripped

For jdk 19.0.1 Native-Image from graalvm-ee-java19-22.3.0 Enterprise Edition

java – version
javac fibStr.java
native-image fibStr
time ./fibstr

Enterprise Edition native-image

file fibstr
fibstr: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=cc1003d58160ce6f66d884d4f41768c291eb7117, not stripped

How I calculated the results

On three different days, I did the following:

Run the tests for each runtime 10 times using the Linux time command until I got stable results
I eliminated the highest and lowest results
I took the average of the remaining eight results
- The Linux time command gives a resolution of 1 millisecond
- The fastest results for the three functions took about one second
- Most of the results took over a minute
- So measurement error did not seem to be a factor
There was always some variation between the runs, however the relative performance was always the same

Summary

Based on my micro benchmarks, newer Java runtimes tend to give better performance without any code changes
The GraalVM Enterprise Edition JDK 19.0.1 run in JIT mode gave the best performance for these micro benchmarks by a significant margin
The GraalVM native-image ahead of time compiliation was significantly slower than using the JIT mode for these micro benchmarks

Disclaimer: These are my personal thoughts and do not represent Oracle’s official viewpoint in any way, shape, or form.

Making Java go faster

Introduction

My micro benchmarks

How valid are these results

Results

My trivial Java source code

The Main method

The Fibonacci Method

strings method

The long_strings method part 1

The long_strings method part 2

My environment

How I built and ran each test

For jdk 1.8.0_351

For jdk 11.0.17

For OpenJdk 19.0.1

For jdk 19.0.1 from graalvm-ce-java19-22.3.0

For jdk 19.0.1 from graalvm-ee-java19-22.3.0

For jdk 19.0.1 Native-Image from graalvm-ce-java19-22.3.0 Community Edition

For jdk 19.0.1 Native-Image from graalvm-ee-java19-22.3.0 Enterprise Edition

How I calculated the results

Summary

Doug Hood

Oracle AI Vector Search Product Manager

Runtime performance

Using Node.js with Oracle Databases

Making Java go faster

Introduction

My micro benchmarks

How valid are these results

Results

My trivial Java source code

The Main method

The Fibonacci Method

strings method

The long_strings method part 1

The long_strings method part 2

My environment

How I built and ran each test

For jdk 1.8.0_351

For jdk 11.0.17

For OpenJdk 19.0.1

For jdk 19.0.1 from graalvm-ce-java19-22.3.0

For jdk 19.0.1 from graalvm-ee-java19-22.3.0

For jdk 19.0.1 Native-Image from graalvm-ce-java19-22.3.0 Community Edition

For jdk 19.0.1 Native-Image from graalvm-ee-java19-22.3.0 Enterprise Edition

How I calculated the results

Summary

Authors

Doug Hood

Oracle AI Vector Search Product Manager

Runtime performance

Using Node.js with Oracle Databases