Monday Mar 17, 2014

An Embedded Java 8 Lambda Expression Microbenchmark

It's been a long road, but Java 8 has finally arrived.  Much has been written and said about all the new features contained in this release, perhaps the most important of these is the introduction of Lambda Expressions.  Lambdas are now intimately integrated into the Java platform and they have the potential to aid developers in the traditionally tricky realm of parallel programming.

Following closely behind, Compact Profiles promise to open up the tremendous benefits of Java Standard Edition compatibility to embedded platforms previously thought to be too small.  Can you see where this is heading?  It might be interesting to use these two technologies simultaneously and see how well they work together.  What follows is the description of a small program and its performance measurements -- a microbenchmark if you will -- that aims to highlight how programming with the new Lambda Expression paradigm can be beneficial not only for typical desktops and servers, but also for a growing number of embedded platforms too.

The Hardware/OS Platform(s)

Of primary interest for this article is the Boundary Devices BD-SL-i.MX6 single board computer.  It is a quad-core ARM® Cortex™-A9 based system with 1GB RAM running an armhf  Debian Linux distribution.  At the time of this article's publication, its list price is US $199.

imx6q_sabrelite_top1

What makes it more interesting is that we'll not only run Java 8 Lambda Expressions on device, we'll do it within the confines of the new Java 8 Compact1 profile.  The static footprint of this Java runtime environment is 10½ MB.

A second system, altogether different in capability and capacity from our embedded device will be used as a means to compare and contrast execution behavior across disparate hardware and OS environments.  The system in question is a Toshiba Tecra R840 laptop running Windows 7/64-bit.  It has a dual-core Intel® Core™ i5-2520M processor with 8GB RAM and will use the standard Java 8 Runtime Environment (JRE) for Windows 64-bit.

The Application

Looking for a sample dataset as the basis for our rudimentary application, this link provides an ideal (and fictional) database of employee records.  Among the available formats, a comma-delimited CSV file is supplied with approximately 300,000 entries.  Our sample application will read this file and store the employee records into a LinkedList<EmployeeRec>.  The EmployeeRec has the following  fields:

public class EmployeeRec {
    private String id;
    private String birthDate;
    private String lastName;
    private String firstName;
    private String gender;
    private String hireDate;
    ...
}

With this data structure initialized, our application is asked to perform one simple task:  calculate the average age of all male employees.

Old School

First off let's perform this calculation in a way that predates the availability of Lambda Expressions.  We'll call this version OldSchool.  The code performing the "average age of all male employees" calculation looks like this:

double sumAge = 0;
long numMales = 0;
for (EmployeeRec emp : employeeList) {
    if (emp.getGender().equals("M")) {
        sumAge += emp.getAge();
        numMales += 1;
    }
}
double avgAge = sumAge / numMales;

Lamba Expression Version 1

Our second variation will use a Lambda expression to perform the identical calculation.  We'll call this version Lamba stream().  The key statement in Java 8 looks like this:

double avgAge = employeeList.stream()
                .filter(s -> s.getGender().equals("M"))
                .mapToDouble(s -> s.getAge())
                .average()
                .getAsDouble();

Lambda Expression Version 2

Our final variation uses the preceding Lambda Expression with one slight modification: it replaces the stream() method call with the parallelStream() method, offering the potential to split the task into smaller units running on separate threads.  We'll call this version Lambda parallelStream(). The Java 8 statement looks as follows:

double avgAge = employeeList.parallelStream()
                .filter(s -> s.getGender().equals("M"))
                .mapToDouble(s -> s.getAge())
                .average()
                .getAsDouble();

Initial Test Results

The charts that follow display execution times of the sample problem solved via our three aforementioned variations.  The left chart represents times recorded on the ARM Cortex-A9 processor while the right chart shows recorded times for the Intel Core-i5.  The smaller the result, the faster, both examples indicate that there is some overhead to utilizing a serial Lambda stream() over and above the old school pre-Lambda solution.  As far as parallelStream() goes, it's a mixed bag.  For the Cortex-A9, the parallelStream() operation is negligibly faster than the old school solution, whereas for the Core-i5, the overhead incurred by parallelStream() actually makes the solution slower.

Without any further investigation, one might conclude that parallel streams may not be worth the effort. But what if performing a trivial calculation on a list of 300,000 employees simply isn't enough work to show the benefits of parallelization?  For this next series of tests, we'll increase the computational load to see how performance might be effected.

Adding More Work to the Test

For this version of the test, we'll solve the same problem, that is to say, calculate the average age of all males, but add a varying amount of intermediate computation.  We can variably increase the number of required compute cycles by introducing the following identity method to our programs:

/*
 * Rube Goldberg way of calculating identity of 'val',
 * assuming number is positive
 */
private static double identity(double val) {
    double result = 0;
    for (int i=0; i < loopCount; i++) {
        result += Math.sqrt(Math.abs(Math.pow(val, 2)));    
    }
    return result / loopCount;

}

As this method takes the square root of the square of a number, it is in essence an expensive identity function. By changing the value of loopCount (this is done via command-line option), we can change the number of times this loop executes per identity() invocation.  This method is inserted into our code, for example with the Lambda ParallelStream() version, as follows:

double avgAge = employeeList.parallelStream()
                .filter(s -> s.getGender().equals("M"))
                .mapToDouble(s -> identity(s.getAge()))
                .average()
                .getAsDouble();

A modification identical to what is highlighted in red above is also applied to both Old School and Lambda Stream() variations.  The charts that follow display execution times for three separate runs of our microbenchmark, each with a different value assigned to the internal loopCount variable in our Rube Goldberg identity() function.

For the Cortex-A9, you can clearly see the performance advantage of parallelStream() when the loop count is set to 100, and it becomes even more striking when the loop count is increased to 500.  For the Core-i5, it takes a lot more work to realize the benefits of parallelStream().  Not until the loop count is set to 50,000 do the performance advantages become apparent.  The Core-i5 is so much faster and only has two cores; consequently the amount of effort needed to overcome the initial overhead of parallelStream() is much more significant.

Downloads

The sample code used in this article is available as a NetBeans project.  As the project includes a CSV file with over 300,000 entries, it is larger than one might expect.  The blogs.oracle.com  site prohibits storing files larger than 2MB in size so this project source has been compressed and split into three parts.  Here are the links:

Just concatenate the three downloaded files together to recreate the original LambdaMicrobench.zip file.  In Linux, the command would look something like this:

$ cat LambdaMicrobench.zip.part? > LambdaMicrobench.zip

Conclusion

A great deal of effort has been put into making Java 8 a much more universal platform.  Our simple example here demonstrates that even an embedded Java runtime environment as small as 10½ MB can take advantage of the latest advances to the platform.  This is just the beginning.  There is lots more work to be done to further enhance the performance characteristics of parallel stream Lambda Expressions.  We look forward to future enhancements.

Monday Mar 10, 2014

Introducing the EJDK

In lock step with the introduction of Compact Profiles, Java 8 includes a new distribution mechanism for Java SE Embedded called the EJDK.  As the potential exists to confuse the EJDK with the standard JDK (Java Development Kit), it makes sense to dedicate a few words towards highlighting how these two packages differ in form and function.

The JDK

The venerable Java Development Kit is the mainstay of Java developers.  It incorporates not only a standard Java Runtime Environment (JRE), but also includes critical tools required by those same developers.  For example, among many others, the JDK comes with a Java compiler (javac), a Java console application (jconsole), the Java debugger (jdb) and the Java archive utility (jar).  It also serves as the underpinnings for very popular Java Integrated Development Environments (IDEs) such as NetBeans, Eclipse, JDeveloper and IntelliJ to name a few.

Like Java, the Java Development Kit is constantly evolving, and Java 8 brings about its fair share of enhancements to the JDK.  For Java 8, javac can now be instructed (via the -profile command-line option) to insure that your source code is compatible with a specific compact profile.  Furthermore, the Java 8 JDK comes with a new useful tool called jdeps, providing a means to analyze your compiled class and jar files for dependencies.

The EJDK

The EJDK is new to Java 8, and although similar in namesake to the JDK, it serves quite a different purpose.  Prior to Java 8, supported Java SE-Embedded runtime platforms were provided as binaries by Oracle.  With the advent of Compact Profiles, the number of possible binary options per supported platform would simply be too unweildy.  Rather than furnishing binaries for each of the possible combinations, an EJDK will be supplied for each supported Java SE-Embedded platform.  It contains the tools needed to create the profile you wish to use.

The EJDK is designed to be run with either Windows or Linux/Unix platforms alongside a Java runtime environment.  It contains a wrapper called jrecreate (jrecreate.sh for Unix/Linux and jrecreate.bat for Windows) whose function it is to create deployable compact profile instances. In the examples that follow, we'll show two sample invocations.

First off, let's briefly take a look at the contents of a typical EJDK.   For our first example, we've installed the EJDK on a linux/x86 system.   Listing the contents of the ejdk1.8.0/ directory, we see a subdirectory named linux_arm_vfp_hflt/.  This tells us what platform this instance of the EJDK supports.  For all our examples we'll use an EJDK that creates compact profiles suitable for Linux/Arm Hard Float platform, often times referred to as armhf.

$ ls ejdk1.8.0
bin  doc  lib  linux_arm_vfp_hflt

Looking one level deeper into the bin/ directory, we see the jrecreate.bat and jrecreate.sh files:

$ ls ejdk1.8.0/bin
jrecreate.bat  jrecreate.config.properties  jrecreate.sh

As we're on a Linux system, let's use the jrecreate.sh script to create a compact profile:

$ ./ejdk1.8.0/bin/jrecreate.sh --profile compact1 --dest compact1-minimal --vm minimal

Briefly reviewing this invocation, the --profile compact1 option instructs jrecreate to use the Compact1 profile.  The --profile option accepts [compact1 | compact2 | compact3]  as an argument. The --dest compact1-minimal option specifies the name of the destination directory containing the newly generated profile.  Note that the directory argument to --dest must not exist prior to invocation.  Finally, the --vm minimal option tells jrecreate to use the minimal (i.e. the smallest) virtual machine for this instance.  The --vm option accepts  [minimal | client | server | all] as an argument.  Running the complete jrecreate.sh command, we get the following output:

$ ./ejdk1.8.0/bin/jrecreate.sh --profile compact1 --dest compact1-minimal --vm minimal
Building JRE using Options {
   ejdk-home: /home/java8/ejdk1.8.0
    dest: /home/java8/compact1-minimal
    target: linux_arm_vfp_hflt
    vm: minimal
    runtime: compact1 profile
    debug: false
    keep-debug-info: false
    no-compression: false
    dry-run: false
    verbose: false
    extension: []
}

Target JRE Size is 10,595 KB (on disk usage may be greater).
Embedded JRE created successfully

This creates a Compac1 profile distribution of about 10 ½ MB in the compact-1-minimal/ directory.  For our second example, we'll create a profile based on Compact2 and the client VM, this time from a Windows 7/64-bit system:

c:\demo>ejdk1.8.0\bin\jrecreate.bat --profile compact2 --dest compact2-client --vm client
Building JRE using Options {
    ejdk-home: c:\demo\ejdk1.8.0\bin\..
    dest: c:\demo\compact2-client
    target: linux_arm_vfp_hflt
    vm: client
    runtime: compact2 profile
    debug: false
    keep-debug-info: false
    no-compression: false
    dry-run: false
    verbose: false
    extension: []
}

Target JRE Size is 17,552 KB (on disk usage may be greater).
Embedded JRE created successfully

This Compact2 instance is created in the compact2-client/ directory and has an approximate footprint of 17 ½ MB.  Additional options to jrecreate are available for further customization.

Finally, lets migrate the generated profiles over to a real device.  As a host platform we'll use none other than the ubiquitous Raspberry Pi.  Here's a listing of the two profiles and their size (in 1K blocks) on the filesystem:

pi@pi0 ~/java8 $ ls
compact1-minimal  compact2-client

pi@pi0 ~/java8 $ du -sk compact*
10616   compact1-minimal
17660   compact2-client

And here's what each version outputs when java -version is run:

pi@pi0 ~/java8 $ ./compact1-minimal/bin/java -version
java version "1.8.0"
Java(TM) SE Embedded Runtime Environment (build 1.8.0-b127, profile compact1, headless)
Java HotSpot(TM) Embedded Minimal VM (build 25.0-b69, mixed mode)

pi@pi0 ~/java8 $ ./compact2-client/bin/java -version
java version "1.8.0"
Java(TM) SE Embedded Runtime Environment (build 1.8.0-b127, profile compact2, headless)
Java HotSpot(TM) Embedded Client VM (build 25.0-b69, mixed mode)

In conclusion, you are encouraged to experiment with the EJDK.  It will very quickly give you a feel for the compact profile configuration options available for your device.

Friday Dec 06, 2013

Java SE Embedded Pricing Explained

You're probably asking yourself, "Pricing?  Really?  In a techie blog?", and I would normally agree wholeheartedly with your assessment.  But in this one instance the topic might be worthy of a few words.  There is, as the expression goes, no such thing as a free lunch.  Whether you pay for software outright, or roll your own with open source projects, a cost must be paid.

Like clockwork, we regularly receive inquiries for Java embedded information that go something like this:

Dear Oracle,  We've downloaded and evaluated Java SE-Embedded and have found it to be a very appealing platform to run our embedded application.  We understand this is commercial software; before we decide to deploy our solution with your runtime, can you give us a feel for the royalties associated with shipping x number of units?

Seems pretty straightforward, right?  Well, yes, except that in the past Oracle required the potential customer to sign a non-disclosure agreement prior to receiving any embedded pricing information.  It didn't matter if the customer was interested in deploying ten units or ten thousand, they all had to go through this process.  Now certain aspects of pricing may still require confidential agreements, but why not make quantity 1 list prices available?   With the release of this document, that pricing information is now public.

The evidence is out there, both anecdotal and real, demonstrating that Oracle's Java SE-Embedded platform is unquestionably superior in quality and performance to the OpenJDK variants.  For the latest example, take a look at this blog entry.  So the question becomes, is it actually more affordable to pay for a commercial platform that is fully supported, faster and more reliable or to opt for a "free" platform and support it yourself.

So What Does Java SE-Embedded Cost?

The universal answer to such a question is: it depends.  That is to say it depends upon the capability of the embedded processor.  Before we lose you, let's show the list price for Java embedded licensing associated with three platforms and then explain how we arrived at the numbers.  As of the posting of this entry, 06 December, 2013, here they are:

  1. Per-unit cost for a Raspberry Pi: US $0.71
  2. Per-unit cost for system based on Intel Atom Z510P: US $2.68
  3. Per-unit cost for a Compulab Trim-Slice: US $5.36

How Does It Work?

These bullet points help describe the process, then we'll show how we arrived at our three sample platform prices.

  • Pricing is done on a per-core basis.
  • Processors are classified based on their capability and assigned a core factor.  The more capable the processor, the higher the core factor.
  • Per-core pricing is determined by multiplying the standard per-core Java embedded price by the core factor.
  • A 19% Software Update License & Support Fee is automatically added onto each system.

The core factor table that follows, found in the Oracle Java Embedded Global Price List, dated September 20, 2013, groups processors of similar capabilities into buckets called chip classes.  Each chip class is assigned a core factor.


Example 1

To compute the per-unit cost, use this formula:

Oracle Java Embedded per-core license fee  *  core factor  *  number of cores  *  support uplift

The standard per-core license fee is always $300.  The Raspberry Pi is a Class I device and therefore has a core factor of .002.  There is only one core in the Raspberry Pi, and the Software Update License & Support fee is always 19%.  So plugging in the numbers, we get:

$300  *  .002  *  1  *  1.19  =  $0.714

Example 2

The processor in this example, the Intel Atom Z510P, is a Class II device and has a core factor of .0075.  Using the same formula from Example 1, here's what we get:

$300  *  .0075  *  1  *  1.19  =  $2.6775

Example 3

The processor for the Trim-Slice is based on the ARM Cortex-A9, a Class II device.  Furthermore it is a dual-core system.  Using the same formula as the previous examples, we arrive at the following per-unit pricing:

$300  *  .0075  *  2  *  1.19  = $5.355

Conclusion

With your hardware specs handy, you should now have enough information to make a reasonable estimate of Oracle Java embedded licensing costs.  At minimum, it could be a help in your "buy vs. roll your own" decision making process.  And of course, if you have any questions, don't be afraid to ask.


Tuesday Sep 17, 2013

Comparing Linux/Arm JVMs Revisited

It's been about 18 months since we last compared Linux/Arm JVMs, and with the formal release of the much anticipated Java SE Embedded for Arm hard float binary, it marks a good time to revisit JVM performance.  The information and results that follow will highlight the following comparisons:

  1. Java SE-E Arm VFP (armel) vs. Arm Hard Float (armhf)
  2. Java SE-E armhf Client Compiler (c1) vs. armhf Server Compiler (c2)
  3. And last but certainly not least ... Java SE-E 7u40 armhf vs. Open JDK armhf

The Benchmark

For the sake of simplicity and consistency, we'll use a subset of the DaCapo benchmark suite.  It's an open source group of real world applications that put a good strain on a system both from a processor and memory workload perspective. We are aware of customers who use DaCapo to gauge performance, and due to its availability and ease of use, enables anyone interested to run their own set of tests in fairly short order.

The Hardware

It would have been grand to run all these benchmarks on one platform, most notably the beloved Raspberry Pi, but unfortunately it has its limitations:

  • There is no Java SE-E server compiler (c2) for the Raspberry Pi.  Why?  Because the Pi is based on an ARMv6 instruction set whereas the Java SE-E c2 compiler requires a minimum ARMv7 instruction set.
  • Demonstrating how rapidly advances are being made in the hardware arena, the Raspberry Pi, within the context of these tests, is quite a humble platform.  With 512MB RAM, it runs out of memory when running some of the large DaCapo component applications.
For these tests we'll primarily use a quad-core Cortex-A9 based system, and for one test we'll utilize a single core Marvell Armada system just to compare what effect the number of cores has on server compiler performance.  The devices in question are:
  1. Boundary Devices BD-SL-i.MX6, quad core 1GHz Cortex-A9 (Freescale i.MX6), 1GB RAM, Debian Wheezy distribution, 3.0.35 kernel (for both armel and armhf configurations)
  2. GlobalScale D2Plug, single core 800MHz ARMv6/v7 processor (Marvell PXA510), 1GB RAM, Debian Wheezy distribution, 3.5.3-cubox-di+ kernel for armhf

Java SE-E armel vs. armhf

The chart that follows compares the relative performance of the armel JavaSE-E 7u40 JRE with the armhf JavaSE-E 7u40 JRE for 8 of the DaCapo component applications.  These tests were conducted on the Boundary Devices BD-SL-i.MX6.  Both armel and armhf environments were based on the Debian Wheezy distribution running a 3.0.35 kernel.  For all charts, the smaller the result, the faster the run.

In all 8 tests, the armhf binary is faster, some only slightly, and in one case (eclipse) a few percentage points faster.  The big performance gain associated with the armhf standard deals with floating point operations, and in particular, the passing of arguments directly into floating point registers.  The performance gains realized by the newer armhf standard will be seen more in the native application realm than for Java SE-Embedded primarily because  the Java SE-E armel VM already uses FP registers for Java floating point methods.  There are still however certain floating point workloads that may show a modest performance increase (in the single digit percent range) with JavaSE-E armhf over Java SE-E armel.

Java SE-E Client Compiler (c1) vs. Server Compiler (c2)

In this section, we'll show tests results for two different platforms, the first a single core system, followed by the same tests on a quad-core system.  To further demonstrate how workload changes performance, we'll take advantage of the ability to run the DaCapo component applications in three different modes: small, default (medium) and large.  The first chart displays the aggregate time required to run the tests for the three modes, utilizing both the 7u40 client (c1) compiler and the server (c2) compiler.  As expected, c1 outperforms c2 by a wide margin for the tests that run only briefly.  As the total time to run the tests increases from small to large, the c2 compiler gets a chance to "warm up" and close the gap in performance.  But it never does catch up.  

Contrast the first chart with the one that follows where small, medium and large versions of the tests were run on a quad core system.  The c2 compiler is better able to utilize the additional compute resources supplied by this platform, the result being that initial gap in performance between c1 and c2 for the small version of the test is only 19%.  By the time we reach the large version, c2 outperforms c1 by 7%.  The moral of the story here is, given enough resources, the server compiler might be the better of the VMs for your workload if it is a long-lived process.

Java SE-E 7u40 armhf vs. Open JDK armhf

For this final section, we'll break out performance on an application-by-application basis for the following JRE/VMs:

  • Java SE Embedded 7u40 armhf Client Compiler (c1)
  • Java SE Embedded 7u40 armhf Server Compiler (c2)
  • OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 OpenJDK Zero VM (build 22.0-b10, mixed mode)
  • OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
  • OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 OpenJDK Zero VM (build 20.0-b12, mixed mode)
  • OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
  • OpenJDK IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 CACAO (build 1.6.0+r68fe50ac34ec, compiled mode)

The OpenJDK packages were pulled from the Debian Wheezy distribution.

It appears the bulk of performance work to OpenJDK/Arm still revolves around the OpenJDK 6 platform even though Java 7 was released over two years ago (and Java 8 is coming soon).  Regardless, Java SE still outperforms most OpenJDK tests by a wide margin, and perhaps more importantly appears to be the much more reliable platform considering the number of tests that failed with the OpenJDK variants.  As demonstrated in previous benchmark results, the older armel OpenJDK VMs appear to be more stable than the armhf versions tested here.  Considering the stated direction by the major linux distributions is to migrate towards the armhf binary standard, this is a bit eye opening.

As always, comments are welcome.



Monday Aug 12, 2013

Compact Profiles Demonstrated

Following up on an article introducing compact profiles, the video that follows demonstrates how this new feature in the upcoming Java 8 release can be utilized.  The video:

  • Describes the compact profile feature and the rationale for its creation.
  • Shows how to use the new jrecreate utility to generate compact profiles that can be readily deployed.
  • Demonstrates that even the smallest of profiles (less than 11MB) is robust enough to support very popular and important software frameworks like OSGi.

The software demonstrated is in early access.  For those interested in trying it out before the formal release of Java 8, there are two options:

  1. Members of the Oracle Partner Network (OPN) with a gold membership or higher can download the early access Java 8 binaries of Java SE-Embedded shown here.  For those not at this level, it may still be possible to get early access software, but it will require a qualification process beforehand.
  2. It's not as intimidating as it sounds, you can pull down the source code for OpenJDK 8, and build it yourself.  By default, compact profiles are not built, but this forum post shows you how.  The reference platform for this software is linux/x86.  Functionally, the generated compact profiles will contain the pared down modules for each compact profile, but you'll find the footprint for each to be much larger than the ones demonstrated in this video, as none of the Java SE-Embedded space optimizations are performed by default.

Not having any premium privileges on YouTube, the maximum allowed length of a video is 15 minutes.  There's actually lots more to talk about with compact profiles, including enhancements to java tools and utilities (javac, jar, jdeps, and the java command itself) that have incorporated intelligence for dealing with profiles.

Hmm.  Maybe there's an opportunity for a Compact Profiles Demonstrated Part II?


Wednesday Jul 31, 2013

An Introduction to Java 8 Compact Profiles

Java SE is a very impressive platform indeed, but with all that functionality comes a large and ever increasing footprint.  It stands to reason then that one of the more frequent requests from the community has been the desire to deploy only those components required for a particular application instead of the entire Java SE runtime environment.  Referred to as subsetting, the benefits of such a concept would seem to be many:

  • A smaller Java environment would require less compute resources, thus opening up a new domain of devices previously thought to be too humble for Java.
  • A smaller runtime environment could be better optimized for performance and start up time.
  • Elimination of unused code is always a good idea from a security perspective.
  • If the environment could be pared down significantly, there may be tremendous benefit to bundling runtimes with each individual Java application.
  • These bundled applications could be downloaded more quickly.

Despite these perceived advantages, the platform stewards (Sun, then Oracle) have been steadfast in their resistance to subsetting.  The rationale for such a stance is quite simple: there was sincere concern that the Java SE platform would fragment.  Agree or disagree, the Java SE standard has remained remarkably in tact over time.  If you need any further evidence of this assertion, compare the state of Java SE to that of Java ME, particularly in the mobile telephony arena.  Better still, look how quickly Android has spawned countless variants in its brief lifespan.

Nonetheless, a formal effort has been underway having the stated goal of providing a much more modular Java platform.  Called Project Jigsaw, when complete, Java SE will be composed of a set of finer-grained modules and will include tools to enable developers to identify and isolate only those modules needed for their application.  However, implementing this massive internal change and yet maintaining compatibility has proven to be a considerable challenge.  Consequently full implementation of the modular Java platform has been delayed until Java 9.

Understanding that Java 9 is quite a ways off, an interim solution will be available for Java 8, called Compact Profiles.  Rather than specifying a complete module system, Java 8 will define subset profiles of the Java SE platform specification that developers can use to deploy.  At the current time three compact profiles have been defined, and have been assigned the creative names compact1, compact2, and compact3. The table that follows lists the packages that comprise each of the profiles.  Each successive profile is a superset of its predecessor.  That is to say, the compact2 profile contains all of the packages in compact1 plus those listed under the compact2 column below.  Likewise, compact3 contains all of compact2 packages plus the ones listed in the compact3 column.

compact1                     compact2                    compact3
--------------------------   -----------------------     --------------------------
java.io                      java.rmi                    java.lang.instrument
java.lang                    java.rmi.activation         java.lang.management
java.lang.annotation         java.rmi.registry           java.security.acl
java.lang.invoke             java.rmi.server             java.util.prefs
java.lang.ref                java.sql                    javax.annotation.processing
java.lang.reflect            javax.rmi.ssl               javax.lang.model
java.math                    javax.sql                   javax.lang.model.element
java.net                     javax.transaction           javax.lang.model.type
java.nio                     javax.transaction.xa        javax.lang.model.util
java.nio.channels            javax.xml                   javax.management
java.nio.channels.spi        javax.xml.datatype          javax.management.loading
java.nio.charset             javax.xml.namespace         javax.management.modelbean
java.nio.charset.spi         javax.xml.parsers           javax.management.monitor
java.nio.file                javax.xml.stream            javax.management.openmbean
java.nio.file.attribute      javax.xml.stream.events     javax.management.relation
java.nio.file.spi            javax.xml.stream.util       javax.management.remote
java.security                javax.xml.transform         javax.management.remote.rmi
java.security.cert           javax.xml.transform.dom     javax.management.timer
java.security.interfaces     javax.xml.transform.sax     javax.naming
java.security.spec           javax.xml.transform.stax    javax.naming.directory
java.text                    javax.xml.transform.stream  javax.naming.event
java.text.spi                javax.xml.validation        javax.naming.ldap
java.util                    javax.xml.xpath             javax.naming.spi
java.util.concurrent         org.w3c.dom                 javax.script
java.util.concurrent.atomic  org.w3c.dom.bootstrap       javax.security.auth.kerberos
java.util.concurrent.locks   org.w3c.dom.events          javax.security.sasl
java.util.jar                org.w3c.dom.ls              javax.sql.rowset
java.util.logging            org.xml.sax                 javax.sql.rowset.serial
java.util.regex              org.xml.sax.ext             javax.sql.rowset.spi
java.util.spi                org.xml.sax.helpers         javax.tools
java.util.zip                                            javax.xml.crypto
javax.crypto                                             javax.xml.crypto.dom
javax.crypto.interfaces                                  javax.xml.crypto.dsig
javax.crypto.spec                                        javax.xml.crypto.dsig.dom
javax.net                                                javax.xml.crypto.dsig.keyinfo
javax.net.ssl                                            javax.xml.crypto.dsig.spec
javax.security.auth                                      org.ieft.jgss
javax.security.auth.callback
javax.security.auth.login
javax.security.auth.spi
javax.security.auth.x500
javax.security.cert

You may ask what savings can be realized by using compact profiles?  As Java 8 is in pre-release stage, numbers will change over time, but let's take a look at a snapshot early access build of Java SE-Embedded 8 for ARMv5/Linux.  A reasonably configured compact1 profile comes in at less than 14MB.  Compact2 is about 18MB and compact3 is in the neighborhood of 21MB.  For reference, the latest Java 7u21 SE Embedded ARMv5/Linux environment requires 45MB.

So at less than one-third the original size of the already space-optimized Java SE-Embedded release, you have a very capable runtime environment.  If you need the additional functionality provided by the compact2 and compact3 profiles or even the full VM, you have the option of deploying your application with them instead.

In the next installment, we'll look at Compact Profiles in a bit more detail.


Tuesday Oct 09, 2012

Raspberry Pi and Java SE: A Platform for the Masses

One of the more exciting developments in the embedded systems world has been the announcement and availability of the Raspberry Pi, a very capable computer that is no bigger than a credit card.  At $35 US, initial demand for the device was so significant, that very long back orders quickly ensued. After months of patiently waiting, mine finally arrived. 

Those initial growing pains appear to have been fixed, so availability now should be much more reasonable. At a very high level, here are some of the important specs:

  • Broadcom BCM2835 System on a chip (SoC)
  • ARM1176JZFS, with floating point, running at 700MHz
  • Videocore 4 GPU capable of BluRay quality playback
  • 256Mb RAM
  • 2 USB ports and Ethernet
  • Boots from SD card
  • Linux distributions (e.g. Debian) available

So what's taking place taking place with respect to the Java platform and Raspberry Pi?

  • A Java SE Embedded binary suitable for the Raspberry Pi is available for download (Arm v6/7) here.  Note, this is based on the armel architecture, a variety of Arm designed to support floating point through a compatibility library that operates on more platforms, but can hamper performance.  In order to use this Java SE binary, select the available Debian distribution for your Raspberry Pi.
  • The more recent Raspbian distribution is based on the armhf (hard float) architecture, which provides for more efficient hardware-based floating point operations.  However armhf is not binary compatible with armel.  As of the writing of this blog, Java SE Embedded binaries are not yet publicly available for the armhf-based Raspbian distro, but as mentioned in Henrik Stahl's blog, an armhf release is in the works.
  • As demonstrated at the just-completed JavaOne 2012 San Francisco event, the graphics processing unit inside the Raspberry Pi is very capable indeed, and makes for an excellent candidate for JavaFX.  As such, plans also call for a Pi-optimized version of JavaFX in a future release too.
A thriving community around the Raspberry Pi has developed at light speed, and as evidenced by the packed attendance at Pi-specific sessions at Java One 2012, the interest in Java for this platform is following suit. So stay tuned for more developments...


Monday Aug 13, 2012

Java One 2012 Java SE Embedded Hands On Lab Returns!

After successful runs at Java One 2011 San Francisco and Tokyo, The Java SE Embedded Hands On Lab returns for Java One 2012.  If you're attending the Java One event in San Francisco (Sept 30 - Oct 4), please consider signing up for this session.  As an added incentive, we will be raffling off a couple of the Plug Computer devices that you'll gain experience with during this lab.  Seating is limited to 100 students, so register early.

Here's an overview:

This hands-on lab aims to show that developers already familiar with the Java develop/debug/deploy lifecycle can apply those same skills to develop Java applications, using Java SE Embedded, on embedded devices. The participants in the lab will:

    • Have their own individual embedded device so they can gain valuable hands-on experience
    • Turn their embedded device into a web container, using off-the-shelf software
    • Learn how to deploy embedded Java applications, developed with an IDE, onto their device
    • Learn how embedded Java applications can be remotely debugged from their desktop IDE
    • Learn how to remotely monitor and manage embedded Java applications from their desktop

The course description can be found here:
HOL 7889: Java SE Embedded Development Made Easy

In addition, 2012 marks the first year that we will have a venue specifically tailored for the Java embedded community.  Entitled Java Embedded @ Java One,  this event takes place during the JavaOne/OpenWorld week in San Francisco on October 3-4.  To Quote from the Java Embedded @ Java One URL:

The conference will feature dedicated business-focused content from Oracle discussing how Java Embedded delivers a secure, optimized environment ideal for multiple network-based devices, as well as meaningful industry-focused sessions from peers who are already successfully utilizing Java Embedded.

So if you want to participate in what many consider to be the next big trend in computing -- the internet of things -- come join us 10/3-4 in San Francisco.

Friday Jun 22, 2012

Healthcare Mobile Database Synchronization Demonstration

Like many of you, I learn best by getting my hands dirty.  When confronted with the task of understanding a new set of products and technologies and figuring out how they might apply to a vertical industry like healthcare, I set out to create a demonstration.  The video that follows aims to show how the Oracle embedded software portfolio can be applied to a healthcare application.  The demonstration utilizes among others, Java SE Embedded, Berkeley DB, Apache Tomcat, Oracle 11gR2 and Oracle Database Mobile Server.

Eric Jensen gives a great critique and description of the demo here.  To sum it up, we aim to show how live medical data can be collected on a medical device, stored in a local database, synchronized to a master database and furthermore propagated to a mobile phone (Android) application.  Come take a look!


Monday Mar 19, 2012

Take Two: Comparing JVMs on ARM/Linux

Although the intent of the previous article, entitled Comparing JVMs on ARM/Linux, was to introduce and highlight the availability of the HotSpot server compiler (referred to as c2) for Java SE-Embedded ARM v7,  it seems, based on feedback, that everyone was more interested in the OpenJDK comparisons to Java SE-E.  But there were two main concerns:

  • The fact that the previous article compared Java SE-E 7 against OpenJDK 6 might be construed as an unlevel playing field because version 7 is newer and therefore potentially more optimized.
  • That the generic compiler settings chosen to build the OpenJDK implementations did not put those versions in a particularly favorable light.

With those considerations in mind, we'll institute the following changes to this version of the benchmarking:

  1. In order to help alleviate an additional concern that there is some sort of benchmark bias, we'll use a different suite, called DaCapo.  Funded and supported by many prestigious organizations, DaCapo's aim is to benchmark real world applications.  Further information about DaCapo can be found at http://dacapobench.org.
  2. At the suggestion of Xerxes Ranby, who has been a great help through this entire exercise, a newer Linux distribution will be used to assure that the OpenJDK implementations were built with more optimal compiler settings.  The Linux distribution in this instance is Ubuntu 11.10 Oneiric Ocelot.
  3. Having experienced difficulties getting Ubuntu 11.10 to run on the original D2Plug ARMv7 platform, for these benchmarks, we'll switch to an embedded system that has a supported Ubuntu 11.10 release.  That platform is the Freescale i.MX53 Quick Start Board.  It has an ARMv7 Coretex-A8 processor running at 1GHz with 1GB RAM.
  4. We'll limit comparisons to 4 JVM implementations:
    • Java SE-E 7 Update 2 c1 compiler (default)
    • Java SE-E 6 Update 30 (c1 compiler is the only option)
    • OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 CACAO build 1.1.0pre2
    • OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 JamVM build-1.6.0-devel

Certain OpenJDK implementations were eliminated from this round of testing for the simple reason that their performance was not competitive.  The Java SE 7u2 c2 compiler was also removed because although quite respectable, it did not perform as well as the c1 compilers.  Recall that c2 works optimally in long-lived situations.  Many of these benchmarks completed in a relatively short period of time.  To get a feel for where c2 shines, take a look at the first chart in this blog.

The first chart that follows includes performance of all benchmark runs on all platforms.  Later on we'll look more at individual tests.  In all runs, smaller means faster.  The DaCapo aficionado may notice that only 10 of the 14 DaCapo tests for this version were executed.  The reason for this is that these 10 tests represent the only ones successfully completed by all 4 JVMs.  Only the Java SE-E 6u30 could successfully run all of the tests.  Both OpenJDK instances not only failed to complete certain tests, but also experienced VM aborts too.

One of the first observations that can be made between Java SE-E 6 and 7 is that, for all intents and purposes, they are on par with regards to performance.  While it is a fact that successive Java SE releases add additional optimizations, it is also true that Java SE 7 introduces additional complexity to the Java platform thus balancing out any potential performance gains at this point.  We are still early into Java SE 7.  We would expect further performance enhancements for Java SE-E 7 in future updates.

In comparing Java SE-E to OpenJDK performance, among both OpenJDK VMs, Cacao results are respectable in 4 of the 10 tests.  The charts that follow show the individual results of those four tests.  Both Java SE-E versions do win every test and outperform Cacao in the range of 9% to 55%.

For the remaining 6 tests, Java SE-E significantly outperforms Cacao in the range of 114% to 311%

So it looks like OpenJDK results are mixed for this round of benchmarks.  In some cases, performance looks to have improved.  But in a majority of instances, OpenJDK still lags behind Java SE-Embedded considerably.

Time to put on my asbestos suit.  Let the flames begin...

Tuesday Feb 14, 2012

Comparing JVMs on ARM/Linux

For quite some time, Java Standard Edition releases have included both client and server bytecode compilers (referred to as c1 and c2 respectively), whereas Java SE-Embedded binaries only contained the client c1 compiler.  The rationale for excluding c2 stems from the fact that (1) eliminating optional components saves space, where in the embedded world, space is at a premium, and (2) embedded platforms were not given serious consideration for handling server-like workloads.  But all that is about to change.  In anticipation of the ARM processor's legitimate entrance into the server market (see Calxeda), Oracle has, with the latest update of Java SE-Embedded (7u2), made the c2 compiler available for ARMv7/Linux platforms, further enhancing performance for a large class of traditional server applications. 

These two compilers go about their business in different ways.  Of the two, c1 is a lighter optimizing compiler, but has faster start up.  It delivers excellent performance and as the default bytecode compiler, works extremely well in almost all situations.  Compared to c1, c2 is the more aggressive optimizer and is suited for long-lived java processes.  Although slower at start up, it can be shown to achieve better performance over time.  As a case in point, take a look at the graph that follows.

One of the most popular Java-based applications, Apache Tomcat, was installed on an ARMv7/Linux device.   The chart shows the relative performance, as defined by mean HTTP request time, of the Tomcat server run with the c1 client compiler (red line) and the c2 server compiler (blue line).  The HTTP request load was generated by an external system on a dedicated network utilizing the ab (Apache Bench) program.  The closer the response time is to zero the better, you can see that for the initial run of 25,000 HTTP requests, the c1 compiler produces faster average response times than c2.  It takes time for the c2 compiler to "warm up", but once the threshold of 50,000 or so requests is met, the c2 compiler performance is superior to c1.  At 250,000 HTTP requests, mean response time for the c2-based Tomcat server instance is 14% faster than its c1 counterpart.

It is important to realize that c2 assumes, and indeed requires more resources (i.e. memory).  Our sample device with 1GB RAM, was more than adequate for these rounds of tests.  Of course your mileage may vary, but if you have the right hardware and the right workload, give c2 a further look.

While discussing these results with a few of my compadres, it was suggested that OpenJDK and some of its variants be included in on this comparison.  The following chart shows mean http request times for 6 different configurations:

  1. Java SE Embedded 7u2 c1 Client Compiler
  2. Java SE Embedded 7u2 c2 Server Compiler
  3. OpenJDK Zero VM (build 20.0-b12, mixed mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  4. JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  5. CACAO (build 1.1.0pre2, compiled mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  6. Interpreter only: OpenJDK Zero VM (build 20.0-b12, interpreted mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)

Results remain pretty much unchanged, so only the first 4 runs (25K-100K requests) are shown.  As can be seen, The Java SE-E VMs are on the order of 3-5x faster than their OpenJDK counterparts irrespective of the bytecode compiler chosen.  One additional promising VM called shark was not included in these tests because, although it built from source successfully, it failed to run Apache Tomcat.  In defense of shark, the ARM version may still be in development (i.e. non-stable) mode.

Creating a really fast virtual machine is hard work and takes a lot of time to perfect.  Considering the resources expended by Oracle (and formerly Sun), it is no surprise that the commercial Java SE VMs are excellent performers.  But the extent to which they outperform their OpenJDK counterparts is surprising.  It would be no shock if someone in the know could demonstrate better OpenJDK results.  But herein lies one considerable problem:  it is an exercise in patience and perseverance just to locate and build a proper OpenJDK platform suitable for a particular CPU/Linux configuration.  No offense would be taken if corrections were presented, and a straightforward mechanism to support these OpenJDK builds were provided.

Tuesday Jan 03, 2012

Tomcat Micro Cluster

The term Micro Server has been bandied about recently as a means to provide a certain class of server functionality. As embedded systems continue their inexorable drive towards better performance, and standard hardware/software architectures become ubiquitous, the notion of using low-cost, low-power, small-footprint devices as servers becomes quite realistic.  Just as data center managers have utilized multitudes of affordable rack mount servers to provide scalability, why not duplicate that effort with these off-the-shelf devices?

The video that follows takes the Micro Server to its next logical evolution: The Micro Cluster.  Built from commodity hardware (and by commodity I mean The Home Depot), the cluster board has a rack mount form factor that can house 12 Plug Computers.  As the Java SE HotSpot Virtual Machine is available for the Plug Computer (ArmV5/Linux), we'll utilize Apache Tomcat to demonstrate a Tomcat Micro Cluster.  Over time, as the individual compute nodes increase in performance and capacity, this should become even more compelling.

Monday Oct 24, 2011

Java ONE 2011 Hands on Lab for Java SE Embedded

Now that the dust has settled, sincere thanks go out to my compadres (you know who you are) for helping make The Java One 2011 Java SE Embedded Hands on Lab such a success.  In fact it was so well received, our peers in Asia are already planning on replicating the effort for JavaOne Tokyo in April 2012.  In addition to the Tokyo event,  we hope to provide future opportunities for this venue elsewhere.  In the interim, we'd seriously consider hosting this lab, albeit on a smaller scale (Java One had 105 networked devices and workstations), for interested customers.  To give you a feel for the lab contents, here's a synopsis:

Java One 2011 Hands on Lab 24642: Java SE Embedded Development Made Easy

This Hands-on Lab aims to show that developers already familiar with the Java develop/debug/deploy lifecycle can apply those same skills to develop Java applications, using Java SE Embedded, on embedded devices.  Each participant in the lab will:

  • Have their own individual embedded device to gain valuable hands on experience
  • Turn their embedded device into a Java Servlet container
  • Learn how to deploy embedded Java applications, developed with the NetBeans IDE, onto their device
  • Learn how embedded Java applications can be remotely debugged from their desktop NetBeans IDE
  • Learn how to remotely monitor and manage embedded Java applications from their desktop
If the logistics for setting up a lab prove to be a bit too much, as an alternative, we've given quite a few live presentations/demonstrations with similar flair.  So please, by all means, contact me at james.connors@oracle.com, if you're interested in learning more.  For those of you who run developer user groups, most notably Java User Groups, and are looking for a speaker at your next meeting, please consider us.  We will not disappoint.


Friday Aug 19, 2011

Serial Port Communication for Java SE Embedded

The need to communicate with devices connected to serial ports is a common application requirement.  Falling outside the purview of the Java SE platform, serial and parallel port communication has been addressed with a project called RXTX.  (In the past, you may have known this as javacomm).  With RXTX,  Java developers access serial ports through the RXTXcomm.jar file.  Alongside this jar file, an underlying native layer must be provided to interface with the operating system's UART ports.  For the usual suspects (Windows, Linux/x86, MacOS, Solaris/Sparc), pre-compiled binaries are readily available.  To host this on an alternative platform, some (hopefully minimal) amount of work is required.

Here's hoping the following notes/observations might aid in helping you to build RXTX for an embedded device utilizing one of our Java SE Embedded binaries.  The device used for this particular implementation is my current favorite: the Plug Computer.

Notes on Getting RX/TX 2.1-7-r2 Working on a Plug Computer

1. At this early juncture with Java 7, be wary of mixing Java 7 with code from older versions of Java. The class files generated by the JDK7 javac compiler contain an updated version byte with a value that results in older (Java 6 and before) JVMs refusing to load these classes.

2. The RXTX download location http://rxtx.qbang.org/wiki/index.php/Download has binaries for many platforms including Arm variants, but none that worked for the Plug Computer, so one had to be built from source.

3. Using the native GCC for the Plug Computer and the RXTX source, binaries (native shared objects) were compiled for the armv5tel-unknown-linux-gnu platform.

4. The RXTX "stable" source code found at the aforementioned site is based on version rxtx 2.1-7r2.  This code appears to be pretty long in the tooth, in that it has no knowledge of Java 6.  Some changes need to be made to accommodate a JDK 6 environment.  Without these modifications, RXTX will not build with JDK6

SUGGESTED FIX, most elegant, not recommended:
Edit the configure.in file in the source directory and look for the following:

    case $JAVA_VERSION in
    1.2*|1.3*|1.4*|1.5*)

and change the second line to:

    1.2*|1.3*|1.4*|1.5*|1.6*)

Upon modification, the autogen.sh script found in the rxtx source directory must be re-run to recreate the ./configure script.  Unfortunately, this requires loading the autoconf, automake and libtool packages (plus dependencies) and ended up resulting in libtool incompatibilies when running the resultant ./configure script.

RECOMMENDED FIX:
Instead, edit ./configure and search for the occurrences (there are more than one) of

    case $JAVA_VERSION in
    1.2*|1.3*|1.4*|1.5*)

and change the second line to:

    1.2*|1.3*|1.4*|1.5*|1.6*)

Run './configure', then 'make' to generate the RXTXcomm.jar and platform specific .so shared object libraries.

5. You may also notice in the output of the make, that there were compilation errors for source files which failed to find the meaning of "UTS_RELEASE".  This results in some of the shared object files not being created.  These pertain to the non-serial aspects of RXTX.  As we were only interested in librxtxSerial.so, this was no problem for us.

6. Once built, move the following files into the following directories:

    # cd rxtx-2.1-7-r2/
    # cp RXTXcomm.jar $JAVA_HOME/lib/ext
    # cd armv5tel-unknown-linux-gnu/.libs/
    # cp librxtxSerial-2.1-7.so $JAVA_HOME/lib/arm
    # cd $JAVA_HOME/lib/arm
    # ln -s librxtxSerial-2.1-7.so librxtxSerial.so

Now Java applications which utilize RXTX should run without any java command-line additions.

The RXTXcomm.jar file can be downloaded here.  To spare you the effort, a few pre-built versions of  librxtxSerial-2.1-7.so are provided at this location:

If you've gone through this exercise on any additional architectures, send them my way and I'll post them here.

Wednesday Jun 01, 2011

Java SE Embedded Development Made Easy

Slowly but surely this old dog (who can learn new tricks, but at a snail's pace) came to the realization that although still quite relevant, a whole generation of people prefer not to read lengthy writings, but would rather digest information in small pieces using new media formats.  Thus the rationale for the following blog...

Certainly no thespian when it comes to public speaking, I will say this:  based upon my experience demonstrating Java SE on embedded devices, people have definitely expressed genuine interest.  Maybe it was the cool device (i.e. Plug Computer) which was used, or maybe this combination of hardware and software inspired the audience to think of the possibilities presented by this new platform.  Either way, I thought it might make sense to capture a shortened presentation/demonstration session.  Following is a 30 minute session broken down into two 15 minute videos (because YouTube only allows videos of 15 minutes or less for mere mortals). They aim to demonstrate how developers already familiar with the Java SE development paradigm can leverage that knowledge to seamlessly develop on very capable embedded processors.  Happy viewing!


About

Jim Connors

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today