Comparing JVMs on ARM/Linux

For quite some time, Java Standard Edition releases have included both client and server bytecode compilers (referred to as c1 and c2 respectively), whereas Java SE-Embedded binaries only contained the client c1 compiler.  The rationale for excluding c2 stems from the fact that (1) eliminating optional components saves space, where in the embedded world, space is at a premium, and (2) embedded platforms were not given serious consideration for handling server-like workloads.  But all that is about to change.  In anticipation of the ARM processor's legitimate entrance into the server market (see Calxeda), Oracle has, with the latest update of Java SE-Embedded (7u2), made the c2 compiler available for ARMv7/Linux platforms, further enhancing performance for a large class of traditional server applications. 

These two compilers go about their business in different ways.  Of the two, c1 is a lighter optimizing compiler, but has faster start up.  It delivers excellent performance and as the default bytecode compiler, works extremely well in almost all situations.  Compared to c1, c2 is the more aggressive optimizer and is suited for long-lived java processes.  Although slower at start up, it can be shown to achieve better performance over time.  As a case in point, take a look at the graph that follows.

One of the most popular Java-based applications, Apache Tomcat, was installed on an ARMv7/Linux device.   The chart shows the relative performance, as defined by mean HTTP request time, of the Tomcat server run with the c1 client compiler (red line) and the c2 server compiler (blue line).  The HTTP request load was generated by an external system on a dedicated network utilizing the ab (Apache Bench) program.  The closer the response time is to zero the better, you can see that for the initial run of 25,000 HTTP requests, the c1 compiler produces faster average response times than c2.  It takes time for the c2 compiler to "warm up", but once the threshold of 50,000 or so requests is met, the c2 compiler performance is superior to c1.  At 250,000 HTTP requests, mean response time for the c2-based Tomcat server instance is 14% faster than its c1 counterpart.

It is important to realize that c2 assumes, and indeed requires more resources (i.e. memory).  Our sample device with 1GB RAM, was more than adequate for these rounds of tests.  Of course your mileage may vary, but if you have the right hardware and the right workload, give c2 a further look.

While discussing these results with a few of my compadres, it was suggested that OpenJDK and some of its variants be included in on this comparison.  The following chart shows mean http request times for 6 different configurations:

  1. Java SE Embedded 7u2 c1 Client Compiler
  2. Java SE Embedded 7u2 c2 Server Compiler
  3. OpenJDK Zero VM (build 20.0-b12, mixed mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  4. JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  5. CACAO (build 1.1.0pre2, compiled mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  6. Interpreter only: OpenJDK Zero VM (build 20.0-b12, interpreted mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)

Results remain pretty much unchanged, so only the first 4 runs (25K-100K requests) are shown.  As can be seen, The Java SE-E VMs are on the order of 3-5x faster than their OpenJDK counterparts irrespective of the bytecode compiler chosen.  One additional promising VM called shark was not included in these tests because, although it built from source successfully, it failed to run Apache Tomcat.  In defense of shark, the ARM version may still be in development (i.e. non-stable) mode.

Creating a really fast virtual machine is hard work and takes a lot of time to perfect.  Considering the resources expended by Oracle (and formerly Sun), it is no surprise that the commercial Java SE VMs are excellent performers.  But the extent to which they outperform their OpenJDK counterparts is surprising.  It would be no shock if someone in the know could demonstrate better OpenJDK results.  But herein lies one considerable problem:  it is an exercise in patience and perseverance just to locate and build a proper OpenJDK platform suitable for a particular CPU/Linux configuration.  No offense would be taken if corrections were presented, and a straightforward mechanism to support these OpenJDK builds were provided.

Comments:

Thank you an introduction to the ARM OpenJDK JVM porting work.

The OpenJDK Zero *mixed-mode* JVM used in this compare includes the now re-maintained ARM Thumb2 JIT and assembler interpreter port that got re-introduced in the IcedTea6-1.11 release.
Many of the OpenJDK JVM like CACAO and JamVM are by design tuned for embedded and client use and thus show strength in both low memory overhead and fast startup time.

When testing JVM performance on ARM its important to remember that the default optimization settings used by the compilers to build the JVM do matter.

The Debian 6.0.4 squeeze "armel" distribution use ARMv4t optimization by default. This low optimization level enable the Debian built packages run on as many kind of different ARM broads and CPU's as possible. The trade-off are that you basically disable all VFP, floating point, optimizations and make synchronization code slower by forcing the JVM to call the Linux kernel helper instead of using faster ARMv7 atomic instructions directly.

To give OpenJDK JVM a better match i would suggest re-running the benchmark using OpenJDK built on top of Debian wheezy "armhf" that by default optimize for the ARMv7 thumb2 instruction-set and make use of the VFP unit inside the CPU, also the "armhf" ABI allows better argument passing between library functions inside the VFP registers of the CPU. Two OpenJDK JVM that are already updated to support this new "armhf" ABI are JamVM and Zero.

You could also run a benchmark using OpenJDK JVMs built using the Ubuntu Precise "armel" tool-chains that still use the legacy soft-float ABI while still adding ARMv7 Thumb2 and VFP optimizations. All OpenJDK JVM tested in this compare would run better by simply using a higher optimization level during the build.

Posted by guest on February 15, 2012 at 03:59 PM EST #

Thank you an introduction to the ARM OpenJDK JVM porting work.

The OpenJDK Zero *mixed-mode* JVM used in this compare includes the now re-maintained ARM Thumb2 JIT and assembler interpreter port that got re-introduced in the IcedTea6-1.11 release.
Many of the OpenJDK JVM like CACAO and JamVM are by design tuned for embedded and client use and thus show strength in both low memory overhead and fast startup time.

When testing JVM performance on ARM its important to remember that the default optimization settings used by the compilers to build the JVM do matter.

The Debian 6.0.4 squeeze "armel" distribution use ARMv4t optimization by default. This low optimization level enable the Debian built packages run on as many kind of different ARM broads and CPU's as possible. The trade-off are that you basically disable all VFP, floating point, optimizations and make synchronization code slower by forcing the JVM to call the Linux kernel helper instead of using faster ARMv7 atomic instructions directly.

To give OpenJDK JVM a better match i would suggest re-running the benchmark using OpenJDK built on top of Debian wheezy "armhf" that by default optimize for the ARMv7 thumb2 instruction-set and make use of the VFP unit inside the CPU, also the "armhf" ABI allows better argument passing between library functions inside the VFP registers of the CPU. Two OpenJDK JVM that are already updated to support this new "armhf" ABI are JamVM and Zero.

You could also run a benchmark using OpenJDK JVMs built using the Ubuntu Precise "armel" tool-chains that still use the legacy soft-float ABI while still adding ARMv7 Thumb2 and VFP optimizations. All OpenJDK JVM tested in this compare would run better by simply using a higher optimization level during the build.

Posted by Xerxes Rånby on February 15, 2012 at 04:02 PM EST #

The OpenJDK Zero *mixed-mode* JVM used in this compare includes the now re-maintained ARM Thumb2 JIT and assembler interpreter port that got re-introduced in the IcedTea6-1.11 release.
Many of the OpenJDK JVM like CACAO and JamVM are by design tuned for embedded and client use and thus show strength in both low memory overhead and fast startup time.

When testing JVM performance on ARM its important to remember that the default optimization settings used by the compilers to build the JVM do matter.

The Debian 6.0.4 squeeze "armel" distribution use ARMv4t optimization by default. This low optimization level enable the Debian built packages run on as many kind of different ARM broads and CPU's as possible. The trade-off are that you basically disable all VFP, floating point, optimizations and make synchronization code slower by forcing the JVM to call the Linux kernel helper instead of using faster ARMv7 atomic instructions directly.

Posted by Xerxes Rånby on February 15, 2012 at 04:04 PM EST #

The OpenJDK Zero *mixed-mode* JVM used in this compare includes the now re-maintained ARM Thumb2 JIT and assembler interpreter port that got re-introduced in the IcedTea6-1.11 release.
Some of the OpenJDK JVM like CACAO and JamVM are by design tuned for embedded and client use and thus show strength in both low memory overhead and fast startup time.

Posted by Xerxes Rånby on February 15, 2012 at 04:09 PM EST #

When testing JVM performance on ARM its important to remember that the default optimization settings used by the compilers to build the JVM do matter.

The Debian 6.0.4 squeeze "armel" distribution use ARMv4t optimization by default. This low optimization level enable the Debian built packages run on as many kind of different ARM broads and CPU's as possible. The trade-off are that you basically disable all VFP, floating point, optimizations and make synchronization code slower by forcing the JVM to call the Linux kernel helper instead of using faster ARMv7 atomic instructions directly.

Posted by Xerxes Rånby on February 15, 2012 at 04:10 PM EST #

The default optimization settings used by the compilers to build the OpenJDK JVMs do matter. Debian 6.0.4 squeeze armel distribution use ARMv4t optimization by default. This low optimization level enable the Debian built packages run on as many kind of different ARM broads and CPU's as possible. The trade-off are that you basically disable all VFP, floating point, optimizations and make synchronization code slower by forcing the JVM to call the Linux kernel helper instead of using faster ARMv7 atomic instructions directly.

Posted by Xerxes Rånby on February 15, 2012 at 04:12 PM EST #

Hi Jim, I would have liked to comment directly on your blog but your spam system kept me at bay so i posed my reply to you here instead:
http://labb.zafena.se/?p=501

Posted by Xerxes Rånby on February 15, 2012 at 04:35 PM EST #

You're comparing 6 (CACAO, JamVM, Zero) against 7 for the proprietary ARM port. That doesn't seem like a fair comparison to me.

Posted by Andrew John Hughes on February 22, 2012 at 03:05 PM EST #

Fair Enough. I would have liked to use OpenJDK 7 but didn't have a whole lot of luck building a stable platform. So stay tuned, I'll include Java SE-Embedded 6 as a comparison in a future blog.

Posted by Jim Connors on February 27, 2012 at 09:39 AM EST #

I'd like to see how recent Dalvik versions compare. Dalvik is still tied to Android, but it's obviously a "JVM-like" environment that runs on ARM.

Posted by Charles Oliver Nutter on March 04, 2012 at 01:44 PM EST #

Once Android introduced a JIT compiler into their release, we started looking at performance. The last comparison was done by Bob Vandette against Android 2.2: https://blogs.oracle.com/javaseembedded/entry/how_does_android_22s_performance_stack_up_against_java_se_embedded. Bottom line: Java SE-E is more than 2x better. With the continued Oracle/Google Java lawsuit, my guess is we won't be publishing any further comparisons any time soon.

Posted by Jim Connors on March 04, 2012 at 08:01 PM EST #

So... OpenJDK is not as fast as it could be... It explains the Oracle's complaints with Apache and the TCKs. Nice! good to know my company do the right thing switching to Python a year ago, where development is transparent, without burocracy or business interfering.

Posted by guest on April 14, 2012 at 05:57 PM EDT #

I cant understand why people in oracle are so stupid to kill their own baby. People who research on languages if they stumble upon this page, with all those smart alec prose, would not only not 'buy' oracle jvm for ARM, but also stop considering java altogether. And theres a comment from a python troll too. Great! Oracle is systematically killing java.

Posted by guest on May 18, 2012 at 08:58 AM EDT #

The OpenJDK Zero *mixed-mode* JVM used in this compare includes the now re-maintained ARM Thumb2 JIT and assembler interpreter port that got re-introduced in the IcedTea6-1.11 release.
Some of the OpenJDK JVM like CACAO and JamVM are by design tuned for embedded and client use and thus show strength in both low memory overhead and fast startup time.

Posted by cialis on December 17, 2012 at 06:16 PM EST #

Is there a small footprint mcu with uart and ethernet and usb that has ported linux/java se (headless) other than Raspberry PI? Can't find one.

Posted by guest on December 30, 2012 at 12:29 PM EST #

Have you tried variants of the Plug Computer? Both Ionics ans GlobalScale have relatively small footprint devices that could meet your needs.

Posted by guest on January 01, 2013 at 11:53 AM EST #

guest, re the small linux board with eth, usb that is not an rpi.
Cubieboard2 is an A20 dual core cortex-A7 board with proper ethernet and more ram (1GB) than a raspberry pi.
It's nearly twice the cost though.

It has proper floating point support and runs Debian 7 'armhf', so has full VFP support.

Posted by guest on March 26, 2014 at 07:32 PM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Jim Connors

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today