Monday Mar 17, 2014

An Embedded Java 8 Lambda Expression Microbenchmark

It's been a long road, but Java 8 has finally arrived.  Much has been written and said about all the new features contained in this release, perhaps the most important of these is the introduction of Lambda Expressions.  Lambdas are now intimately integrated into the Java platform and they have the potential to aid developers in the traditionally tricky realm of parallel programming.

Following closely behind, Compact Profiles promise to open up the tremendous benefits of Java Standard Edition compatibility to embedded platforms previously thought to be too small.  Can you see where this is heading?  It might be interesting to use these two technologies simultaneously and see how well they work together.  What follows is the description of a small program and its performance measurements -- a microbenchmark if you will -- that aims to highlight how programming with the new Lambda Expression paradigm can be beneficial not only for typical desktops and servers, but also for a growing number of embedded platforms too.

The Hardware/OS Platform(s)

Of primary interest for this article is the Boundary Devices BD-SL-i.MX6 single board computer.  It is a quad-core ARM® Cortex™-A9 based system with 1GB RAM running an armhf  Debian Linux distribution.  At the time of this article's publication, its list price is US $199.


What makes it more interesting is that we'll not only run Java 8 Lambda Expressions on device, we'll do it within the confines of the new Java 8 Compact1 profile.  The static footprint of this Java runtime environment is 10½ MB.

A second system, altogether different in capability and capacity from our embedded device will be used as a means to compare and contrast execution behavior across disparate hardware and OS environments.  The system in question is a Toshiba Tecra R840 laptop running Windows 7/64-bit.  It has a dual-core Intel® Core™ i5-2520M processor with 8GB RAM and will use the standard Java 8 Runtime Environment (JRE) for Windows 64-bit.

The Application

Looking for a sample dataset as the basis for our rudimentary application, this link provides an ideal (and fictional) database of employee records.  Among the available formats, a comma-delimited CSV file is supplied with approximately 300,000 entries.  Our sample application will read this file and store the employee records into a LinkedList<EmployeeRec>.  The EmployeeRec has the following  fields:

public class EmployeeRec {
    private String id;
    private String birthDate;
    private String lastName;
    private String firstName;
    private String gender;
    private String hireDate;

With this data structure initialized, our application is asked to perform one simple task:  calculate the average age of all male employees.

Old School

First off let's perform this calculation in a way that predates the availability of Lambda Expressions.  We'll call this version OldSchool.  The code performing the "average age of all male employees" calculation looks like this:

double sumAge = 0;
long numMales = 0;
for (EmployeeRec emp : employeeList) {
    if (emp.getGender().equals("M")) {
        sumAge += emp.getAge();
        numMales += 1;
double avgAge = sumAge / numMales;

Lamba Expression Version 1

Our second variation will use a Lambda expression to perform the identical calculation.  We'll call this version Lamba stream().  The key statement in Java 8 looks like this:

double avgAge =
                .filter(s -> s.getGender().equals("M"))
                .mapToDouble(s -> s.getAge())

Lambda Expression Version 2

Our final variation uses the preceding Lambda Expression with one slight modification: it replaces the stream() method call with the parallelStream() method, offering the potential to split the task into smaller units running on separate threads.  We'll call this version Lambda parallelStream(). The Java 8 statement looks as follows:

double avgAge = employeeList.parallelStream()
                .filter(s -> s.getGender().equals("M"))
                .mapToDouble(s -> s.getAge())

Initial Test Results

The charts that follow display execution times of the sample problem solved via our three aforementioned variations.  The left chart represents times recorded on the ARM Cortex-A9 processor while the right chart shows recorded times for the Intel Core-i5.  The smaller the result, the faster, both examples indicate that there is some overhead to utilizing a serial Lambda stream() over and above the old school pre-Lambda solution.  As far as parallelStream() goes, it's a mixed bag.  For the Cortex-A9, the parallelStream() operation is negligibly faster than the old school solution, whereas for the Core-i5, the overhead incurred by parallelStream() actually makes the solution slower.

Without any further investigation, one might conclude that parallel streams may not be worth the effort. But what if performing a trivial calculation on a list of 300,000 employees simply isn't enough work to show the benefits of parallelization?  For this next series of tests, we'll increase the computational load to see how performance might be effected.

Adding More Work to the Test

For this version of the test, we'll solve the same problem, that is to say, calculate the average age of all males, but add a varying amount of intermediate computation.  We can variably increase the number of required compute cycles by introducing the following identity method to our programs:

 * Rube Goldberg way of calculating identity of 'val',
 * assuming number is positive
private static double identity(double val) {
    double result = 0;
    for (int i=0; i < loopCount; i++) {
        result += Math.sqrt(Math.abs(Math.pow(val, 2)));    
    return result / loopCount;


As this method takes the square root of the square of a number, it is in essence an expensive identity function. By changing the value of loopCount (this is done via command-line option), we can change the number of times this loop executes per identity() invocation.  This method is inserted into our code, for example with the Lambda ParallelStream() version, as follows:

double avgAge = employeeList.parallelStream()
                .filter(s -> s.getGender().equals("M"))
                .mapToDouble(s -> identity(s.getAge()))

A modification identical to what is highlighted in red above is also applied to both Old School and Lambda Stream() variations.  The charts that follow display execution times for three separate runs of our microbenchmark, each with a different value assigned to the internal loopCount variable in our Rube Goldberg identity() function.

For the Cortex-A9, you can clearly see the performance advantage of parallelStream() when the loop count is set to 100, and it becomes even more striking when the loop count is increased to 500.  For the Core-i5, it takes a lot more work to realize the benefits of parallelStream().  Not until the loop count is set to 50,000 do the performance advantages become apparent.  The Core-i5 is so much faster and only has two cores; consequently the amount of effort needed to overcome the initial overhead of parallelStream() is much more significant.


The sample code used in this article is available as a NetBeans project.  As the project includes a CSV file with over 300,000 entries, it is larger than one might expect.  The  site prohibits storing files larger than 2MB in size so this project source has been compressed and split into three parts.  Here are the links:

Just concatenate the three downloaded files together to recreate the original file.  In Linux, the command would look something like this:

$ cat >


A great deal of effort has been put into making Java 8 a much more universal platform.  Our simple example here demonstrates that even an embedded Java runtime environment as small as 10½ MB can take advantage of the latest advances to the platform.  This is just the beginning.  There is lots more work to be done to further enhance the performance characteristics of parallel stream Lambda Expressions.  We look forward to future enhancements.

Monday Mar 10, 2014

Introducing the EJDK

In lock step with the introduction of Compact Profiles, Java 8 includes a new distribution mechanism for Java SE Embedded called the EJDK.  As the potential exists to confuse the EJDK with the standard JDK (Java Development Kit), it makes sense to dedicate a few words towards highlighting how these two packages differ in form and function.


The venerable Java Development Kit is the mainstay of Java developers.  It incorporates not only a standard Java Runtime Environment (JRE), but also includes critical tools required by those same developers.  For example, among many others, the JDK comes with a Java compiler (javac), a Java console application (jconsole), the Java debugger (jdb) and the Java archive utility (jar).  It also serves as the underpinnings for very popular Java Integrated Development Environments (IDEs) such as NetBeans, Eclipse, JDeveloper and IntelliJ to name a few.

Like Java, the Java Development Kit is constantly evolving, and Java 8 brings about its fair share of enhancements to the JDK.  For Java 8, javac can now be instructed (via the -profile command-line option) to insure that your source code is compatible with a specific compact profile.  Furthermore, the Java 8 JDK comes with a new useful tool called jdeps, providing a means to analyze your compiled class and jar files for dependencies.


The EJDK is new to Java 8, and although similar in namesake to the JDK, it serves quite a different purpose.  Prior to Java 8, supported Java SE-Embedded runtime platforms were provided as binaries by Oracle.  With the advent of Compact Profiles, the number of possible binary options per supported platform would simply be too unweildy.  Rather than furnishing binaries for each of the possible combinations, an EJDK will be supplied for each supported Java SE-Embedded platform.  It contains the tools needed to create the profile you wish to use.

The EJDK is designed to be run with either Windows or Linux/Unix platforms alongside a Java runtime environment.  It contains a wrapper called jrecreate ( for Unix/Linux and jrecreate.bat for Windows) whose function it is to create deployable compact profile instances. In the examples that follow, we'll show two sample invocations.

First off, let's briefly take a look at the contents of a typical EJDK.   For our first example, we've installed the EJDK on a linux/x86 system.   Listing the contents of the ejdk1.8.0/ directory, we see a subdirectory named linux_arm_vfp_hflt/.  This tells us what platform this instance of the EJDK supports.  For all our examples we'll use an EJDK that creates compact profiles suitable for Linux/Arm Hard Float platform, often times referred to as armhf.

$ ls ejdk1.8.0
bin  doc  lib  linux_arm_vfp_hflt

Looking one level deeper into the bin/ directory, we see the jrecreate.bat and files:

$ ls ejdk1.8.0/bin

As we're on a Linux system, let's use the script to create a compact profile:

$ ./ejdk1.8.0/bin/ --profile compact1 --dest compact1-minimal --vm minimal

Briefly reviewing this invocation, the --profile compact1 option instructs jrecreate to use the Compact1 profile.  The --profile option accepts [compact1 | compact2 | compact3]  as an argument. The --dest compact1-minimal option specifies the name of the destination directory containing the newly generated profile.  Note that the directory argument to --dest must not exist prior to invocation.  Finally, the --vm minimal option tells jrecreate to use the minimal (i.e. the smallest) virtual machine for this instance.  The --vm option accepts  [minimal | client | server | all] as an argument.  Running the complete command, we get the following output:

$ ./ejdk1.8.0/bin/ --profile compact1 --dest compact1-minimal --vm minimal
Building JRE using Options {
   ejdk-home: /home/java8/ejdk1.8.0
    dest: /home/java8/compact1-minimal
    target: linux_arm_vfp_hflt
    vm: minimal
    runtime: compact1 profile
    debug: false
    keep-debug-info: false
    no-compression: false
    dry-run: false
    verbose: false
    extension: []

Target JRE Size is 10,595 KB (on disk usage may be greater).
Embedded JRE created successfully

This creates a Compac1 profile distribution of about 10 ½ MB in the compact-1-minimal/ directory.  For our second example, we'll create a profile based on Compact2 and the client VM, this time from a Windows 7/64-bit system:

c:\demo>ejdk1.8.0\bin\jrecreate.bat --profile compact2 --dest compact2-client --vm client
Building JRE using Options {
    ejdk-home: c:\demo\ejdk1.8.0\bin\..
    dest: c:\demo\compact2-client
    target: linux_arm_vfp_hflt
    vm: client
    runtime: compact2 profile
    debug: false
    keep-debug-info: false
    no-compression: false
    dry-run: false
    verbose: false
    extension: []

Target JRE Size is 17,552 KB (on disk usage may be greater).
Embedded JRE created successfully

This Compact2 instance is created in the compact2-client/ directory and has an approximate footprint of 17 ½ MB.  Additional options to jrecreate are available for further customization.

Finally, lets migrate the generated profiles over to a real device.  As a host platform we'll use none other than the ubiquitous Raspberry Pi.  Here's a listing of the two profiles and their size (in 1K blocks) on the filesystem:

pi@pi0 ~/java8 $ ls
compact1-minimal  compact2-client

pi@pi0 ~/java8 $ du -sk compact*
10616   compact1-minimal
17660   compact2-client

And here's what each version outputs when java -version is run:

pi@pi0 ~/java8 $ ./compact1-minimal/bin/java -version
java version "1.8.0"
Java(TM) SE Embedded Runtime Environment (build 1.8.0-b127, profile compact1, headless)
Java HotSpot(TM) Embedded Minimal VM (build 25.0-b69, mixed mode)

pi@pi0 ~/java8 $ ./compact2-client/bin/java -version
java version "1.8.0"
Java(TM) SE Embedded Runtime Environment (build 1.8.0-b127, profile compact2, headless)
Java HotSpot(TM) Embedded Client VM (build 25.0-b69, mixed mode)

In conclusion, you are encouraged to experiment with the EJDK.  It will very quickly give you a feel for the compact profile configuration options available for your device.


Jim Connors


« March 2014 »