Tuesday Sep 17, 2013

Comparing Linux/Arm JVMs Revisited

It's been about 18 months since we last compared Linux/Arm JVMs, and with the formal release of the much anticipated Java SE Embedded for Arm hard float binary, it marks a good time to revisit JVM performance.  The information and results that follow will highlight the following comparisons:

  1. Java SE-E Arm VFP (armel) vs. Arm Hard Float (armhf)
  2. Java SE-E armhf Client Compiler (c1) vs. armhf Server Compiler (c2)
  3. And last but certainly not least ... Java SE-E 7u40 armhf vs. Open JDK armhf

The Benchmark

For the sake of simplicity and consistency, we'll use a subset of the DaCapo benchmark suite.  It's an open source group of real world applications that put a good strain on a system both from a processor and memory workload perspective. We are aware of customers who use DaCapo to gauge performance, and due to its availability and ease of use, enables anyone interested to run their own set of tests in fairly short order.

The Hardware

It would have been grand to run all these benchmarks on one platform, most notably the beloved Raspberry Pi, but unfortunately it has its limitations:

  • There is no Java SE-E server compiler (c2) for the Raspberry Pi.  Why?  Because the Pi is based on an ARMv6 instruction set whereas the Java SE-E c2 compiler requires a minimum ARMv7 instruction set.
  • Demonstrating how rapidly advances are being made in the hardware arena, the Raspberry Pi, within the context of these tests, is quite a humble platform.  With 512MB RAM, it runs out of memory when running some of the large DaCapo component applications.
For these tests we'll primarily use a quad-core Cortex-A9 based system, and for one test we'll utilize a single core Marvell Armada system just to compare what effect the number of cores has on server compiler performance.  The devices in question are:
  1. Boundary Devices BD-SL-i.MX6, quad core 1GHz Cortex-A9 (Freescale i.MX6), 1GB RAM, Debian Wheezy distribution, 3.0.35 kernel (for both armel and armhf configurations)
  2. GlobalScale D2Plug, single core 800MHz ARMv6/v7 processor (Marvell PXA510), 1GB RAM, Debian Wheezy distribution, 3.5.3-cubox-di+ kernel for armhf

Java SE-E armel vs. armhf

The chart that follows compares the relative performance of the armel JavaSE-E 7u40 JRE with the armhf JavaSE-E 7u40 JRE for 8 of the DaCapo component applications.  These tests were conducted on the Boundary Devices BD-SL-i.MX6.  Both armel and armhf environments were based on the Debian Wheezy distribution running a 3.0.35 kernel.  For all charts, the smaller the result, the faster the run.

In all 8 tests, the armhf binary is faster, some only slightly, and in one case (eclipse) a few percentage points faster.  The big performance gain associated with the armhf standard deals with floating point operations, and in particular, the passing of arguments directly into floating point registers.  The performance gains realized by the newer armhf standard will be seen more in the native application realm than for Java SE-Embedded primarily because  the Java SE-E armel VM already uses FP registers for Java floating point methods.  There are still however certain floating point workloads that may show a modest performance increase (in the single digit percent range) with JavaSE-E armhf over Java SE-E armel.

Java SE-E Client Compiler (c1) vs. Server Compiler (c2)

In this section, we'll show tests results for two different platforms, the first a single core system, followed by the same tests on a quad-core system.  To further demonstrate how workload changes performance, we'll take advantage of the ability to run the DaCapo component applications in three different modes: small, default (medium) and large.  The first chart displays the aggregate time required to run the tests for the three modes, utilizing both the 7u40 client (c1) compiler and the server (c2) compiler.  As expected, c1 outperforms c2 by a wide margin for the tests that run only briefly.  As the total time to run the tests increases from small to large, the c2 compiler gets a chance to "warm up" and close the gap in performance.  But it never does catch up.  

Contrast the first chart with the one that follows where small, medium and large versions of the tests were run on a quad core system.  The c2 compiler is better able to utilize the additional compute resources supplied by this platform, the result being that initial gap in performance between c1 and c2 for the small version of the test is only 19%.  By the time we reach the large version, c2 outperforms c1 by 7%.  The moral of the story here is, given enough resources, the server compiler might be the better of the VMs for your workload if it is a long-lived process.

Java SE-E 7u40 armhf vs. Open JDK armhf

For this final section, we'll break out performance on an application-by-application basis for the following JRE/VMs:

  • Java SE Embedded 7u40 armhf Client Compiler (c1)
  • Java SE Embedded 7u40 armhf Server Compiler (c2)
  • OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 OpenJDK Zero VM (build 22.0-b10, mixed mode)
  • OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
  • OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 OpenJDK Zero VM (build 20.0-b12, mixed mode)
  • OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
  • OpenJDK IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 CACAO (build 1.6.0+r68fe50ac34ec, compiled mode)

The OpenJDK packages were pulled from the Debian Wheezy distribution.

It appears the bulk of performance work to OpenJDK/Arm still revolves around the OpenJDK 6 platform even though Java 7 was released over two years ago (and Java 8 is coming soon).  Regardless, Java SE still outperforms most OpenJDK tests by a wide margin, and perhaps more importantly appears to be the much more reliable platform considering the number of tests that failed with the OpenJDK variants.  As demonstrated in previous benchmark results, the older armel OpenJDK VMs appear to be more stable than the armhf versions tested here.  Considering the stated direction by the major linux distributions is to migrate towards the armhf binary standard, this is a bit eye opening.

As always, comments are welcome.



Saturday Mar 16, 2013

Is it armhf or armel?

Arm processors come in all makes and sizes, a certain percentage of which address a market where cost, footprint and power requirements are at a premium.  In this space, the inclusion of even a floating point unit would be considered an unnecessary luxury.  To perform floating point operations with these processors, software emulation is required.

Higher-end Arm processors come bundled with additional capability that enables hardware execution of floating point operations.  The difference between these two architectures gave rise to two separate Embedded Application Binary Interfaces or EABIs for ARM: soft float and VFP (Vector Floating Point).  Although there is forward compatibility between soft and hard float, there is no backward compatibility.  And in fact, when it comes to providing binaries for Java SE Embedded for Arm, Oracle provides two separate options: a soft float binary and a VFP binary.  In the Linux community, releases built upon both these EABIs are refereed to as armel based distributions.

Enter armhf.  Although a big step up in performance, the VFP EABI utilizes less-than-optimal argument passing when a floating point operations take place.  In this scenario, floating point arguments must first be passed through integer registers prior to executing in the floating point unit.  A new EABI, referred to as armhf optimizes the calling convention for floating point operations by passing arguments directly into floating point registers.  It furthermore includes a more efficient system call convention.  The end result is applications compiled with the armhf standard should demonstrate modest performance improvement in some cases, and significant improvement for floating point intensive applications.

Alas, armhf represents yet another binary incompatible standard, but one that has already gained considerable traction in the community. Although still relatively early, the transition from armel to armhf is underway.  In fact, Ubuntu has already announced that future releases will only be built to the armhf standard, effectively obsoleting armel. As mentioned in Henrik's Stahl's Blog, an armhf version of Java SE Embedded is in the works, and we have already made available a armhf-based developer Preview of JDK 8 with JavaFX.

In the interim, we will have to deal with the incompatibilities between armel and armhf.  Most recently we've seen a rash of failed attempts to run the ArmV7 VFP Java SE Embedded binary on top of an armhf-based Linux distro.  During diagnosis, the question becomes, how can I determine whether my Linux distribution is based on armel or armhf?  Turns out this is not as straightforward as one might think.  Aside from experience and anecdotal evidence, one possible way to ascertain whether you're running on armel or armhf is to run the following obscure command:

$ readelf -A /proc/self/exe | grep Tag_ABI_VFP_args

If the Tag_ABI_VFP_args tag is found, then you're running on an armhf system.  If nothing is returned, then it's armel.  To show you an example, here's what happens on a Raspberry Pi running the Raspbian distribution:

pi@raspberrypi:~$ readelf -A /proc/self/exe | grep Tag_ABI_VFP_args
  Tag_ABI_VFP_args: VFP registers

This indicates an armhf distro, which in fact is what Raspbian is.  On the original, soft-float Debian Wheezy distribution, here's what happens:

pi@raspberrypi:~$ readelf -A /proc/self/exe | grep Tag_ABI_VFP_args

Nothing returned indicates that this is indeed armel.

Many thanks to the folks participating in this Raspberry Pi forum topic for providing this suggestion.



Monday Aug 13, 2012

Java One 2012 Java SE Embedded Hands On Lab Returns!

After successful runs at Java One 2011 San Francisco and Tokyo, The Java SE Embedded Hands On Lab returns for Java One 2012.  If you're attending the Java One event in San Francisco (Sept 30 - Oct 4), please consider signing up for this session.  As an added incentive, we will be raffling off a couple of the Plug Computer devices that you'll gain experience with during this lab.  Seating is limited to 100 students, so register early.

Here's an overview:

This hands-on lab aims to show that developers already familiar with the Java develop/debug/deploy lifecycle can apply those same skills to develop Java applications, using Java SE Embedded, on embedded devices. The participants in the lab will:

    • Have their own individual embedded device so they can gain valuable hands-on experience
    • Turn their embedded device into a web container, using off-the-shelf software
    • Learn how to deploy embedded Java applications, developed with an IDE, onto their device
    • Learn how embedded Java applications can be remotely debugged from their desktop IDE
    • Learn how to remotely monitor and manage embedded Java applications from their desktop

The course description can be found here:
HOL 7889: Java SE Embedded Development Made Easy

In addition, 2012 marks the first year that we will have a venue specifically tailored for the Java embedded community.  Entitled Java Embedded @ Java One,  this event takes place during the JavaOne/OpenWorld week in San Francisco on October 3-4.  To Quote from the Java Embedded @ Java One URL:

The conference will feature dedicated business-focused content from Oracle discussing how Java Embedded delivers a secure, optimized environment ideal for multiple network-based devices, as well as meaningful industry-focused sessions from peers who are already successfully utilizing Java Embedded.

So if you want to participate in what many consider to be the next big trend in computing -- the internet of things -- come join us 10/3-4 in San Francisco.

Monday Mar 19, 2012

Take Two: Comparing JVMs on ARM/Linux

Although the intent of the previous article, entitled Comparing JVMs on ARM/Linux, was to introduce and highlight the availability of the HotSpot server compiler (referred to as c2) for Java SE-Embedded ARM v7,  it seems, based on feedback, that everyone was more interested in the OpenJDK comparisons to Java SE-E.  But there were two main concerns:

  • The fact that the previous article compared Java SE-E 7 against OpenJDK 6 might be construed as an unlevel playing field because version 7 is newer and therefore potentially more optimized.
  • That the generic compiler settings chosen to build the OpenJDK implementations did not put those versions in a particularly favorable light.

With those considerations in mind, we'll institute the following changes to this version of the benchmarking:

  1. In order to help alleviate an additional concern that there is some sort of benchmark bias, we'll use a different suite, called DaCapo.  Funded and supported by many prestigious organizations, DaCapo's aim is to benchmark real world applications.  Further information about DaCapo can be found at http://dacapobench.org.
  2. At the suggestion of Xerxes Ranby, who has been a great help through this entire exercise, a newer Linux distribution will be used to assure that the OpenJDK implementations were built with more optimal compiler settings.  The Linux distribution in this instance is Ubuntu 11.10 Oneiric Ocelot.
  3. Having experienced difficulties getting Ubuntu 11.10 to run on the original D2Plug ARMv7 platform, for these benchmarks, we'll switch to an embedded system that has a supported Ubuntu 11.10 release.  That platform is the Freescale i.MX53 Quick Start Board.  It has an ARMv7 Coretex-A8 processor running at 1GHz with 1GB RAM.
  4. We'll limit comparisons to 4 JVM implementations:
    • Java SE-E 7 Update 2 c1 compiler (default)
    • Java SE-E 6 Update 30 (c1 compiler is the only option)
    • OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 CACAO build 1.1.0pre2
    • OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 JamVM build-1.6.0-devel

Certain OpenJDK implementations were eliminated from this round of testing for the simple reason that their performance was not competitive.  The Java SE 7u2 c2 compiler was also removed because although quite respectable, it did not perform as well as the c1 compilers.  Recall that c2 works optimally in long-lived situations.  Many of these benchmarks completed in a relatively short period of time.  To get a feel for where c2 shines, take a look at the first chart in this blog.

The first chart that follows includes performance of all benchmark runs on all platforms.  Later on we'll look more at individual tests.  In all runs, smaller means faster.  The DaCapo aficionado may notice that only 10 of the 14 DaCapo tests for this version were executed.  The reason for this is that these 10 tests represent the only ones successfully completed by all 4 JVMs.  Only the Java SE-E 6u30 could successfully run all of the tests.  Both OpenJDK instances not only failed to complete certain tests, but also experienced VM aborts too.

One of the first observations that can be made between Java SE-E 6 and 7 is that, for all intents and purposes, they are on par with regards to performance.  While it is a fact that successive Java SE releases add additional optimizations, it is also true that Java SE 7 introduces additional complexity to the Java platform thus balancing out any potential performance gains at this point.  We are still early into Java SE 7.  We would expect further performance enhancements for Java SE-E 7 in future updates.

In comparing Java SE-E to OpenJDK performance, among both OpenJDK VMs, Cacao results are respectable in 4 of the 10 tests.  The charts that follow show the individual results of those four tests.  Both Java SE-E versions do win every test and outperform Cacao in the range of 9% to 55%.

For the remaining 6 tests, Java SE-E significantly outperforms Cacao in the range of 114% to 311%

So it looks like OpenJDK results are mixed for this round of benchmarks.  In some cases, performance looks to have improved.  But in a majority of instances, OpenJDK still lags behind Java SE-Embedded considerably.

Time to put on my asbestos suit.  Let the flames begin...

Tuesday Feb 14, 2012

Comparing JVMs on ARM/Linux

For quite some time, Java Standard Edition releases have included both client and server bytecode compilers (referred to as c1 and c2 respectively), whereas Java SE-Embedded binaries only contained the client c1 compiler.  The rationale for excluding c2 stems from the fact that (1) eliminating optional components saves space, where in the embedded world, space is at a premium, and (2) embedded platforms were not given serious consideration for handling server-like workloads.  But all that is about to change.  In anticipation of the ARM processor's legitimate entrance into the server market (see Calxeda), Oracle has, with the latest update of Java SE-Embedded (7u2), made the c2 compiler available for ARMv7/Linux platforms, further enhancing performance for a large class of traditional server applications. 

These two compilers go about their business in different ways.  Of the two, c1 is a lighter optimizing compiler, but has faster start up.  It delivers excellent performance and as the default bytecode compiler, works extremely well in almost all situations.  Compared to c1, c2 is the more aggressive optimizer and is suited for long-lived java processes.  Although slower at start up, it can be shown to achieve better performance over time.  As a case in point, take a look at the graph that follows.

One of the most popular Java-based applications, Apache Tomcat, was installed on an ARMv7/Linux device.   The chart shows the relative performance, as defined by mean HTTP request time, of the Tomcat server run with the c1 client compiler (red line) and the c2 server compiler (blue line).  The HTTP request load was generated by an external system on a dedicated network utilizing the ab (Apache Bench) program.  The closer the response time is to zero the better, you can see that for the initial run of 25,000 HTTP requests, the c1 compiler produces faster average response times than c2.  It takes time for the c2 compiler to "warm up", but once the threshold of 50,000 or so requests is met, the c2 compiler performance is superior to c1.  At 250,000 HTTP requests, mean response time for the c2-based Tomcat server instance is 14% faster than its c1 counterpart.

It is important to realize that c2 assumes, and indeed requires more resources (i.e. memory).  Our sample device with 1GB RAM, was more than adequate for these rounds of tests.  Of course your mileage may vary, but if you have the right hardware and the right workload, give c2 a further look.

While discussing these results with a few of my compadres, it was suggested that OpenJDK and some of its variants be included in on this comparison.  The following chart shows mean http request times for 6 different configurations:

  1. Java SE Embedded 7u2 c1 Client Compiler
  2. Java SE Embedded 7u2 c2 Server Compiler
  3. OpenJDK Zero VM (build 20.0-b12, mixed mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  4. JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  5. CACAO (build 1.1.0pre2, compiled mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  6. Interpreter only: OpenJDK Zero VM (build 20.0-b12, interpreted mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)

Results remain pretty much unchanged, so only the first 4 runs (25K-100K requests) are shown.  As can be seen, The Java SE-E VMs are on the order of 3-5x faster than their OpenJDK counterparts irrespective of the bytecode compiler chosen.  One additional promising VM called shark was not included in these tests because, although it built from source successfully, it failed to run Apache Tomcat.  In defense of shark, the ARM version may still be in development (i.e. non-stable) mode.

Creating a really fast virtual machine is hard work and takes a lot of time to perfect.  Considering the resources expended by Oracle (and formerly Sun), it is no surprise that the commercial Java SE VMs are excellent performers.  But the extent to which they outperform their OpenJDK counterparts is surprising.  It would be no shock if someone in the know could demonstrate better OpenJDK results.  But herein lies one considerable problem:  it is an exercise in patience and perseverance just to locate and build a proper OpenJDK platform suitable for a particular CPU/Linux configuration.  No offense would be taken if corrections were presented, and a straightforward mechanism to support these OpenJDK builds were provided.

Friday Aug 19, 2011

Serial Port Communication for Java SE Embedded

The need to communicate with devices connected to serial ports is a common application requirement.  Falling outside the purview of the Java SE platform, serial and parallel port communication has been addressed with a project called RXTX.  (In the past, you may have known this as javacomm).  With RXTX,  Java developers access serial ports through the RXTXcomm.jar file.  Alongside this jar file, an underlying native layer must be provided to interface with the operating system's UART ports.  For the usual suspects (Windows, Linux/x86, MacOS, Solaris/Sparc), pre-compiled binaries are readily available.  To host this on an alternative platform, some (hopefully minimal) amount of work is required.

Here's hoping the following notes/observations might aid in helping you to build RXTX for an embedded device utilizing one of our Java SE Embedded binaries.  The device used for this particular implementation is my current favorite: the Plug Computer.

Notes on Getting RX/TX 2.1-7-r2 Working on a Plug Computer

1. At this early juncture with Java 7, be wary of mixing Java 7 with code from older versions of Java. The class files generated by the JDK7 javac compiler contain an updated version byte with a value that results in older (Java 6 and before) JVMs refusing to load these classes.

2. The RXTX download location http://rxtx.qbang.org/wiki/index.php/Download has binaries for many platforms including Arm variants, but none that worked for the Plug Computer, so one had to be built from source.

3. Using the native GCC for the Plug Computer and the RXTX source, binaries (native shared objects) were compiled for the armv5tel-unknown-linux-gnu platform.

4. The RXTX "stable" source code found at the aforementioned site is based on version rxtx 2.1-7r2.  This code appears to be pretty long in the tooth, in that it has no knowledge of Java 6.  Some changes need to be made to accommodate a JDK 6 environment.  Without these modifications, RXTX will not build with JDK6

SUGGESTED FIX, most elegant, not recommended:
Edit the configure.in file in the source directory and look for the following:

    case $JAVA_VERSION in
    1.2*|1.3*|1.4*|1.5*)

and change the second line to:

    1.2*|1.3*|1.4*|1.5*|1.6*)

Upon modification, the autogen.sh script found in the rxtx source directory must be re-run to recreate the ./configure script.  Unfortunately, this requires loading the autoconf, automake and libtool packages (plus dependencies) and ended up resulting in libtool incompatibilies when running the resultant ./configure script.

RECOMMENDED FIX:
Instead, edit ./configure and search for the occurrences (there are more than one) of

    case $JAVA_VERSION in
    1.2*|1.3*|1.4*|1.5*)

and change the second line to:

    1.2*|1.3*|1.4*|1.5*|1.6*)

Run './configure', then 'make' to generate the RXTXcomm.jar and platform specific .so shared object libraries.

5. You may also notice in the output of the make, that there were compilation errors for source files which failed to find the meaning of "UTS_RELEASE".  This results in some of the shared object files not being created.  These pertain to the non-serial aspects of RXTX.  As we were only interested in librxtxSerial.so, this was no problem for us.

6. Once built, move the following files into the following directories:

    # cd rxtx-2.1-7-r2/
    # cp RXTXcomm.jar $JAVA_HOME/lib/ext
    # cd armv5tel-unknown-linux-gnu/.libs/
    # cp librxtxSerial-2.1-7.so $JAVA_HOME/lib/arm
    # cd $JAVA_HOME/lib/arm
    # ln -s librxtxSerial-2.1-7.so librxtxSerial.so

Now Java applications which utilize RXTX should run without any java command-line additions.

The RXTXcomm.jar file can be downloaded here.  To spare you the effort, a few pre-built versions of  librxtxSerial-2.1-7.so are provided at this location:

If you've gone through this exercise on any additional architectures, send them my way and I'll post them here.

Tuesday Mar 15, 2011

The Unofficial Java SE Embedded SDK

Developing applications for embedded platforms gets simpler all the time, thanks in part to the tremendous advances in microprocessor design and software tools.  And in particular, with the availability of Java SE compatible Virtual Machines for the popular embedded platforms, development has never been more straightforward.

The real beauty behind Java SE Embedded development lies in the fact that you can use your favorite IDE (Eclipse, NetBeans, JDeveloper ...) to create, test and debug code in the identical fashion in which you'd develop a standard desktop or server application.  When the time comes to try it out on a Java SE Embedded capable device, it's just a matter of shipping the bytecodes over to the device and letting it run.  There is no need for complicated emulators, toolchains and cross-compilers.  The exact same bytecodes that ran on your PC, run unmodified on the embedded device.

In fact, because all versions of Java SE (embedded or not) share a considerable amount of common code, we have plenty of anecdotal evidence which supports the notion that behavior -- correct or incorrect -- manifests itself identically across platforms.  We refer specifically here to bugs.  Now no one wants bugs, but believe it or not, our customers like the fact that behavior is consistent across platforms whether it's right or not. "Bug for bug compatibility" has actually become a strong selling point!

Having espoused the virtues of transparently developing off device, many still wish to test and debug on-device regularly as part of their development cycle.  If you're the touchy/feely type, there are ample examples of affordable and supported off-the-shelf devices that could fit the bill for an Unofficial Java SE Embedded SDK.  One such platform is the Plug Computer.

The reference platform for the Plug Computer is supplied by Marvell Technology Group. Manufacturers then license the technology from Marvell to create their own specific implementations.  Two such vendors are GlobalScale and Ionics.  These are incredibly capable devices that include Arm processors in the 1.2GHz to 2.0GHz range, and sport 512MB of RAM and flash.  There are a host of external port and interface options including USB, µUSB, SATA, GBE, SD, WiFi, ZigBee, Z-Wave and soon HDMI.  Additionally, several Linux distros are available for these systems too.  The typical cost for a base model is $99, and perhaps the most disruptive aspect of these systems, they consume on average about 5 watts of power.

Alongside developing in the traditional manner, the ability to step through and examine state on these devices via remote debugging comes as a standard feature with the Java SE-E VM.  Furthermore, you can use the JConsole application from your desktop to remotely monitor performance and resource consumption on the device.

So what would a bill of materials look like for The Unofficial Java SE Embedded SDK?  Pretty simple actually:

That's about it.  Of course, for higher level functionality, you can add additional packages.  For example, Apache runs beautifully here.  Could anyone imagine a large number of these devices acting as a parallel web server?

About

Jim Connors

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today