By Jim Connors on Sep 17, 2013
It's been about 18 months since we last compared Linux/Arm JVMs, and with the formal release of the much anticipated Java SE Embedded for Arm hard float binary, it marks a good time to revisit JVM performance. The information and results that follow will highlight the following comparisons:
- Java SE-E Arm VFP (armel) vs. Arm Hard Float (armhf)
- Java SE-E armhf Client Compiler (c1) vs. armhf Server Compiler (c2)
- And last but certainly not least ... Java SE-E 7u40 armhf vs. Open JDK armhf
For the sake of simplicity and consistency, we'll use a subset of the DaCapo benchmark suite. It's an open source group of real world applications that put a good strain on a system both from a processor and memory workload perspective. We are aware of customers who use DaCapo to gauge performance, and due to its availability and ease of use, enables anyone interested to run their own set of tests in fairly short order.
It would have been grand to run all these benchmarks on one platform, most notably the beloved Raspberry Pi, but unfortunately it has its limitations:
- There is no Java SE-E server compiler (c2) for the Raspberry Pi. Why? Because the Pi is based on an ARMv6 instruction set whereas the Java SE-E c2 compiler requires a minimum ARMv7 instruction set.
- Demonstrating how rapidly advances are being made in the hardware arena, the Raspberry Pi, within the context of these tests, is quite a humble platform. With 512MB RAM, it runs out of memory when running some of the large DaCapo component applications.
- Boundary Devices BD-SL-i.MX6, quad core 1GHz Cortex-A9 (Freescale i.MX6), 1GB RAM, Debian Wheezy distribution, 3.0.35 kernel (for both armel and armhf configurations)
- GlobalScale D2Plug, single core 800MHz ARMv6/v7 processor (Marvell PXA510), 1GB RAM, Debian Wheezy distribution, 3.5.3-cubox-di+ kernel for armhf
Java SE-E armel vs. armhf
The chart that follows compares the relative performance of the armel JavaSE-E 7u40 JRE with the armhf JavaSE-E 7u40 JRE for 8 of the DaCapo component applications. These tests were conducted on the Boundary Devices BD-SL-i.MX6. Both armel and armhf environments were based on the Debian Wheezy distribution running a 3.0.35 kernel. For all charts, the smaller the result, the faster the run.
In all 8 tests, the armhf binary is faster, some only slightly, and in one case (eclipse) a few percentage points faster. The big performance gain associated with the armhf standard deals with floating point operations, and in particular, the passing of arguments directly into floating point registers. The performance gains realized by the newer armhf standard will be seen more in the native application realm than for Java SE-Embedded primarily because the Java SE-E armel VM already uses FP registers for Java floating point methods. There are still however certain floating point workloads that may show a modest performance increase (in the single digit percent range) with JavaSE-E armhf over Java SE-E armel.
Java SE-E Client Compiler (c1) vs. Server Compiler (c2)
In this section, we'll show tests results for two different platforms,
the first a single core system, followed by the same tests on a
quad-core system. To further demonstrate how workload changes
performance, we'll take advantage of the ability to run the DaCapo component applications
in three different modes: small, default (medium) and large. The first
chart displays the aggregate time required to run the tests for the
three modes, utilizing both the 7u40 client (c1) compiler and the server
(c2) compiler. As expected, c1 outperforms c2 by a wide margin for the tests that run only briefly. As
the total time to run the tests increases from small to large, the c2
compiler gets a chance to "warm up" and close the gap in performance.
But it never does catch up.
Contrast the first chart with the one that follows where small, medium and large versions of the tests were run on a quad core system. The c2 compiler is better able to utilize the additional compute resources supplied by this platform, the result being that initial gap in performance between c1 and c2 for the small version of the test is only 19%. By the time we reach the large version, c2 outperforms c1 by 7%. The moral of the story here is, given enough resources, the server compiler might be the better of the VMs for your workload if it is a long-lived process.
Java SE-E 7u40 armhf vs. Open JDK armhf
For this final section, we'll break out performance on an application-by-application basis for the following JRE/VMs:
- Java SE Embedded 7u40 armhf Client Compiler (c1)
- Java SE Embedded 7u40 armhf Server Compiler (c2)
- OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 OpenJDK Zero VM (build 22.0-b10, mixed mode)
- OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
- OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 OpenJDK Zero VM (build 20.0-b12, mixed mode)
- OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
- OpenJDK IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 CACAO (build 1.6.0+r68fe50ac34ec, compiled mode)
The OpenJDK packages were pulled from the Debian Wheezy distribution.
It appears the bulk of performance work to OpenJDK/Arm still revolves around the OpenJDK 6 platform even though Java 7 was released over two years ago (and Java 8 is coming soon). Regardless, Java SE still outperforms most OpenJDK tests by a wide margin, and perhaps more importantly appears to be the much more reliable platform considering the number of tests that failed with the OpenJDK variants. As demonstrated in previous benchmark results, the older armel OpenJDK VMs appear to be more stable than the armhf versions tested here. Considering the stated direction by the major linux distributions is to migrate towards the armhf binary standard, this is a bit eye opening.
As always, comments are welcome.