Friday Dec 06, 2013

Java SE Embedded Pricing Explained

You're probably asking yourself, "Pricing?  Really?  In a techie blog?", and I would normally agree wholeheartedly with your assessment.  But in this one instance the topic might be worthy of a few words.  There is, as the expression goes, no such thing as a free lunch.  Whether you pay for software outright, or roll your own with open source projects, a cost must be paid.

Like clockwork, we regularly receive inquiries for Java embedded information that go something like this:

Dear Oracle,  We've downloaded and evaluated Java SE-Embedded and have found it to be a very appealing platform to run our embedded application.  We understand this is commercial software; before we decide to deploy our solution with your runtime, can you give us a feel for the royalties associated with shipping x number of units?

Seems pretty straightforward, right?  Well, yes, except that in the past Oracle required the potential customer to sign a non-disclosure agreement prior to receiving any embedded pricing information.  It didn't matter if the customer was interested in deploying ten units or ten thousand, they all had to go through this process.  Now certain aspects of pricing may still require confidential agreements, but why not make quantity 1 list prices available?   With the release of this document, that pricing information is now public.

The evidence is out there, both anecdotal and real, demonstrating that Oracle's Java SE-Embedded platform is unquestionably superior in quality and performance to the OpenJDK variants.  For the latest example, take a look at this blog entry.  So the question becomes, is it actually more affordable to pay for a commercial platform that is fully supported, faster and more reliable or to opt for a "free" platform and support it yourself.

So What Does Java SE-Embedded Cost?

The universal answer to such a question is: it depends.  That is to say it depends upon the capability of the embedded processor.  Before we lose you, let's show the list price for Java embedded licensing associated with three platforms and then explain how we arrived at the numbers.  As of the posting of this entry, 06 December, 2013, here they are:

  1. Per-unit cost for a Raspberry Pi: US $0.71
  2. Per-unit cost for system based on Intel Atom Z510P: US $2.68
  3. Per-unit cost for a Compulab Trim-Slice: US $5.36

How Does It Work?

These bullet points help describe the process, then we'll show how we arrived at our three sample platform prices.

  • Pricing is done on a per-core basis.
  • Processors are classified based on their capability and assigned a core factor.  The more capable the processor, the higher the core factor.
  • Per-core pricing is determined by multiplying the standard per-core Java embedded price by the core factor.
  • A 19% Software Update License & Support Fee is automatically added onto each system.

The core factor table that follows, found in the Oracle Java Embedded Global Price List, dated September 20, 2013, groups processors of similar capabilities into buckets called chip classes.  Each chip class is assigned a core factor.


Example 1

To compute the per-unit cost, use this formula:

Oracle Java Embedded per-core license fee  *  core factor  *  number of cores  *  support uplift

The standard per-core license fee is always $300.  The Raspberry Pi is a Class I device and therefore has a core factor of .002.  There is only one core in the Raspberry Pi, and the Software Update License & Support fee is always 19%.  So plugging in the numbers, we get:

$300  *  .002  *  1  *  1.19  =  $0.714

Example 2

The processor in this example, the Intel Atom Z510P, is a Class II device and has a core factor of .0075.  Using the same formula from Example 1, here's what we get:

$300  *  .0075  *  1  *  1.19  =  $2.6775

Example 3

The processor for the Trim-Slice is based on the ARM Cortex-A9, a Class II device.  Furthermore it is a dual-core system.  Using the same formula as the previous examples, we arrive at the following per-unit pricing:

$300  *  .0075  *  2  *  1.19  = $5.355

Conclusion

With your hardware specs handy, you should now have enough information to make a reasonable estimate of Oracle Java embedded licensing costs.  At minimum, it could be a help in your "buy vs. roll your own" decision making process.  And of course, if you have any questions, don't be afraid to ask.


Tuesday Sep 17, 2013

Comparing Linux/Arm JVMs Revisited

It's been about 18 months since we last compared Linux/Arm JVMs, and with the formal release of the much anticipated Java SE Embedded for Arm hard float binary, it marks a good time to revisit JVM performance.  The information and results that follow will highlight the following comparisons:

  1. Java SE-E Arm VFP (armel) vs. Arm Hard Float (armhf)
  2. Java SE-E armhf Client Compiler (c1) vs. armhf Server Compiler (c2)
  3. And last but certainly not least ... Java SE-E 7u40 armhf vs. Open JDK armhf

The Benchmark

For the sake of simplicity and consistency, we'll use a subset of the DaCapo benchmark suite.  It's an open source group of real world applications that put a good strain on a system both from a processor and memory workload perspective. We are aware of customers who use DaCapo to gauge performance, and due to its availability and ease of use, enables anyone interested to run their own set of tests in fairly short order.

The Hardware

It would have been grand to run all these benchmarks on one platform, most notably the beloved Raspberry Pi, but unfortunately it has its limitations:

  • There is no Java SE-E server compiler (c2) for the Raspberry Pi.  Why?  Because the Pi is based on an ARMv6 instruction set whereas the Java SE-E c2 compiler requires a minimum ARMv7 instruction set.
  • Demonstrating how rapidly advances are being made in the hardware arena, the Raspberry Pi, within the context of these tests, is quite a humble platform.  With 512MB RAM, it runs out of memory when running some of the large DaCapo component applications.
For these tests we'll primarily use a quad-core Cortex-A9 based system, and for one test we'll utilize a single core Marvell Armada system just to compare what effect the number of cores has on server compiler performance.  The devices in question are:
  1. Boundary Devices BD-SL-i.MX6, quad core 1GHz Cortex-A9 (Freescale i.MX6), 1GB RAM, Debian Wheezy distribution, 3.0.35 kernel (for both armel and armhf configurations)
  2. GlobalScale D2Plug, single core 800MHz ARMv6/v7 processor (Marvell PXA510), 1GB RAM, Debian Wheezy distribution, 3.5.3-cubox-di+ kernel for armhf

Java SE-E armel vs. armhf

The chart that follows compares the relative performance of the armel JavaSE-E 7u40 JRE with the armhf JavaSE-E 7u40 JRE for 8 of the DaCapo component applications.  These tests were conducted on the Boundary Devices BD-SL-i.MX6.  Both armel and armhf environments were based on the Debian Wheezy distribution running a 3.0.35 kernel.  For all charts, the smaller the result, the faster the run.

In all 8 tests, the armhf binary is faster, some only slightly, and in one case (eclipse) a few percentage points faster.  The big performance gain associated with the armhf standard deals with floating point operations, and in particular, the passing of arguments directly into floating point registers.  The performance gains realized by the newer armhf standard will be seen more in the native application realm than for Java SE-Embedded primarily because  the Java SE-E armel VM already uses FP registers for Java floating point methods.  There are still however certain floating point workloads that may show a modest performance increase (in the single digit percent range) with JavaSE-E armhf over Java SE-E armel.

Java SE-E Client Compiler (c1) vs. Server Compiler (c2)

In this section, we'll show tests results for two different platforms, the first a single core system, followed by the same tests on a quad-core system.  To further demonstrate how workload changes performance, we'll take advantage of the ability to run the DaCapo component applications in three different modes: small, default (medium) and large.  The first chart displays the aggregate time required to run the tests for the three modes, utilizing both the 7u40 client (c1) compiler and the server (c2) compiler.  As expected, c1 outperforms c2 by a wide margin for the tests that run only briefly.  As the total time to run the tests increases from small to large, the c2 compiler gets a chance to "warm up" and close the gap in performance.  But it never does catch up.  

Contrast the first chart with the one that follows where small, medium and large versions of the tests were run on a quad core system.  The c2 compiler is better able to utilize the additional compute resources supplied by this platform, the result being that initial gap in performance between c1 and c2 for the small version of the test is only 19%.  By the time we reach the large version, c2 outperforms c1 by 7%.  The moral of the story here is, given enough resources, the server compiler might be the better of the VMs for your workload if it is a long-lived process.

Java SE-E 7u40 armhf vs. Open JDK armhf

For this final section, we'll break out performance on an application-by-application basis for the following JRE/VMs:

  • Java SE Embedded 7u40 armhf Client Compiler (c1)
  • Java SE Embedded 7u40 armhf Server Compiler (c2)
  • OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 OpenJDK Zero VM (build 22.0-b10, mixed mode)
  • OpenJDK 7 IcedTea 2.3.10 7u25-2.3.10-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
  • OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 OpenJDK Zero VM (build 20.0-b12, mixed mode)
  • OpenJDK 6 IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)
  • OpenJDK IcedTea6 1.12.6 6b27-1.12.6-1~deb7u1 CACAO (build 1.6.0+r68fe50ac34ec, compiled mode)

The OpenJDK packages were pulled from the Debian Wheezy distribution.

It appears the bulk of performance work to OpenJDK/Arm still revolves around the OpenJDK 6 platform even though Java 7 was released over two years ago (and Java 8 is coming soon).  Regardless, Java SE still outperforms most OpenJDK tests by a wide margin, and perhaps more importantly appears to be the much more reliable platform considering the number of tests that failed with the OpenJDK variants.  As demonstrated in previous benchmark results, the older armel OpenJDK VMs appear to be more stable than the armhf versions tested here.  Considering the stated direction by the major linux distributions is to migrate towards the armhf binary standard, this is a bit eye opening.

As always, comments are welcome.



Monday Mar 19, 2012

Take Two: Comparing JVMs on ARM/Linux

Although the intent of the previous article, entitled Comparing JVMs on ARM/Linux, was to introduce and highlight the availability of the HotSpot server compiler (referred to as c2) for Java SE-Embedded ARM v7,  it seems, based on feedback, that everyone was more interested in the OpenJDK comparisons to Java SE-E.  But there were two main concerns:

  • The fact that the previous article compared Java SE-E 7 against OpenJDK 6 might be construed as an unlevel playing field because version 7 is newer and therefore potentially more optimized.
  • That the generic compiler settings chosen to build the OpenJDK implementations did not put those versions in a particularly favorable light.

With those considerations in mind, we'll institute the following changes to this version of the benchmarking:

  1. In order to help alleviate an additional concern that there is some sort of benchmark bias, we'll use a different suite, called DaCapo.  Funded and supported by many prestigious organizations, DaCapo's aim is to benchmark real world applications.  Further information about DaCapo can be found at http://dacapobench.org.
  2. At the suggestion of Xerxes Ranby, who has been a great help through this entire exercise, a newer Linux distribution will be used to assure that the OpenJDK implementations were built with more optimal compiler settings.  The Linux distribution in this instance is Ubuntu 11.10 Oneiric Ocelot.
  3. Having experienced difficulties getting Ubuntu 11.10 to run on the original D2Plug ARMv7 platform, for these benchmarks, we'll switch to an embedded system that has a supported Ubuntu 11.10 release.  That platform is the Freescale i.MX53 Quick Start Board.  It has an ARMv7 Coretex-A8 processor running at 1GHz with 1GB RAM.
  4. We'll limit comparisons to 4 JVM implementations:
    • Java SE-E 7 Update 2 c1 compiler (default)
    • Java SE-E 6 Update 30 (c1 compiler is the only option)
    • OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 CACAO build 1.1.0pre2
    • OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 JamVM build-1.6.0-devel

Certain OpenJDK implementations were eliminated from this round of testing for the simple reason that their performance was not competitive.  The Java SE 7u2 c2 compiler was also removed because although quite respectable, it did not perform as well as the c1 compilers.  Recall that c2 works optimally in long-lived situations.  Many of these benchmarks completed in a relatively short period of time.  To get a feel for where c2 shines, take a look at the first chart in this blog.

The first chart that follows includes performance of all benchmark runs on all platforms.  Later on we'll look more at individual tests.  In all runs, smaller means faster.  The DaCapo aficionado may notice that only 10 of the 14 DaCapo tests for this version were executed.  The reason for this is that these 10 tests represent the only ones successfully completed by all 4 JVMs.  Only the Java SE-E 6u30 could successfully run all of the tests.  Both OpenJDK instances not only failed to complete certain tests, but also experienced VM aborts too.

One of the first observations that can be made between Java SE-E 6 and 7 is that, for all intents and purposes, they are on par with regards to performance.  While it is a fact that successive Java SE releases add additional optimizations, it is also true that Java SE 7 introduces additional complexity to the Java platform thus balancing out any potential performance gains at this point.  We are still early into Java SE 7.  We would expect further performance enhancements for Java SE-E 7 in future updates.

In comparing Java SE-E to OpenJDK performance, among both OpenJDK VMs, Cacao results are respectable in 4 of the 10 tests.  The charts that follow show the individual results of those four tests.  Both Java SE-E versions do win every test and outperform Cacao in the range of 9% to 55%.

For the remaining 6 tests, Java SE-E significantly outperforms Cacao in the range of 114% to 311%

So it looks like OpenJDK results are mixed for this round of benchmarks.  In some cases, performance looks to have improved.  But in a majority of instances, OpenJDK still lags behind Java SE-Embedded considerably.

Time to put on my asbestos suit.  Let the flames begin...

Tuesday Feb 14, 2012

Comparing JVMs on ARM/Linux

For quite some time, Java Standard Edition releases have included both client and server bytecode compilers (referred to as c1 and c2 respectively), whereas Java SE-Embedded binaries only contained the client c1 compiler.  The rationale for excluding c2 stems from the fact that (1) eliminating optional components saves space, where in the embedded world, space is at a premium, and (2) embedded platforms were not given serious consideration for handling server-like workloads.  But all that is about to change.  In anticipation of the ARM processor's legitimate entrance into the server market (see Calxeda), Oracle has, with the latest update of Java SE-Embedded (7u2), made the c2 compiler available for ARMv7/Linux platforms, further enhancing performance for a large class of traditional server applications. 

These two compilers go about their business in different ways.  Of the two, c1 is a lighter optimizing compiler, but has faster start up.  It delivers excellent performance and as the default bytecode compiler, works extremely well in almost all situations.  Compared to c1, c2 is the more aggressive optimizer and is suited for long-lived java processes.  Although slower at start up, it can be shown to achieve better performance over time.  As a case in point, take a look at the graph that follows.

One of the most popular Java-based applications, Apache Tomcat, was installed on an ARMv7/Linux device.   The chart shows the relative performance, as defined by mean HTTP request time, of the Tomcat server run with the c1 client compiler (red line) and the c2 server compiler (blue line).  The HTTP request load was generated by an external system on a dedicated network utilizing the ab (Apache Bench) program.  The closer the response time is to zero the better, you can see that for the initial run of 25,000 HTTP requests, the c1 compiler produces faster average response times than c2.  It takes time for the c2 compiler to "warm up", but once the threshold of 50,000 or so requests is met, the c2 compiler performance is superior to c1.  At 250,000 HTTP requests, mean response time for the c2-based Tomcat server instance is 14% faster than its c1 counterpart.

It is important to realize that c2 assumes, and indeed requires more resources (i.e. memory).  Our sample device with 1GB RAM, was more than adequate for these rounds of tests.  Of course your mileage may vary, but if you have the right hardware and the right workload, give c2 a further look.

While discussing these results with a few of my compadres, it was suggested that OpenJDK and some of its variants be included in on this comparison.  The following chart shows mean http request times for 6 different configurations:

  1. Java SE Embedded 7u2 c1 Client Compiler
  2. Java SE Embedded 7u2 c2 Server Compiler
  3. OpenJDK Zero VM (build 20.0-b12, mixed mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  4. JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  5. CACAO (build 1.1.0pre2, compiled mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)
  6. Interpreter only: OpenJDK Zero VM (build 20.0-b12, interpreted mode) OpenJDK 1.6.0_24-b24 (IcedTea6 1.12pre)

Results remain pretty much unchanged, so only the first 4 runs (25K-100K requests) are shown.  As can be seen, The Java SE-E VMs are on the order of 3-5x faster than their OpenJDK counterparts irrespective of the bytecode compiler chosen.  One additional promising VM called shark was not included in these tests because, although it built from source successfully, it failed to run Apache Tomcat.  In defense of shark, the ARM version may still be in development (i.e. non-stable) mode.

Creating a really fast virtual machine is hard work and takes a lot of time to perfect.  Considering the resources expended by Oracle (and formerly Sun), it is no surprise that the commercial Java SE VMs are excellent performers.  But the extent to which they outperform their OpenJDK counterparts is surprising.  It would be no shock if someone in the know could demonstrate better OpenJDK results.  But herein lies one considerable problem:  it is an exercise in patience and perseverance just to locate and build a proper OpenJDK platform suitable for a particular CPU/Linux configuration.  No offense would be taken if corrections were presented, and a straightforward mechanism to support these OpenJDK builds were provided.

Thursday Dec 31, 2009

GlassFish on a Handheld

Until now, the idea of running something akin to a Java EE application server on a small handeld device would have been greeted with ridicule.  Suddenly that notion doesn't seem so ridiculous when considering the recent technology that's been made available. In particular, the following software advances make this pipe dream more of a reality:

  • Java Standard Edition for Embedded Devices: A series of Java virtual machines are available from Sun for many of the popular embedded hardware/OS platforms. They are not only Java SE compatible, but have been space optimized from a static footprint and RAM perspective to perform in embedded environments.  To give you an idea of some of those optimizations, read this.
  • Java Enterprise Edition 6 Platform Specification and the Web Profile:  The Java EE 6 specification allows for the creation of a subset of the component technologies, called "profiles".  The first of these has been dubbed the Web Profile and contains the common technologies required to create small to medium web applications.  Rather than having to use a full blown Java EE application server in all its glory, you can take advantage of a significantly smaller, less complex framework.
  • Embedded GlassFish: This capability, which is now part of GlassFish v3, enables you to run GlassFish inside your Java application, as opposed to the other way around. Simply put, there is no need install GlassFish or create GlassFish domains in this scenario.  Instead, you include an instance of glassfish-embedded-web.jar in your classpath, make a few GlassFish Embedded API calls from your standard Java application, and voila! you've got a web application up and running.

The Hardware

Rather than opting for one of the many embedded development platforms around (because I'm cheap), I instead decided to investigate what was available from a handheld perspective and see if that environment could be adapted to suit my needs.  After some searching, it looked like the Nokia N810 just might fit the bill.  Courtesy of my buddy Eric Bruno, here's a picture of the N810:

To get a feel for this very capable device, check out Eric's Article. What most interested me was that (1) it has 128MB RAM, (2)  a 400MHz Arm v6 processor, (3) runs a common embedded version of Linux (maemo), (4) has a version of Java SE Embedded (from Sun) which runs fine on this platform and (5) can be had for a relatively affordable price on eBay.

The Operating System

The Nokia N810 is powered by the maemo distribution, an open source platform with a thriving community of developers.  Knowing full well that any attempt to get a web application up and running on this device would stretch its resources to the limit, it was necessary to reclaim as much RAM as possible before starting out.  Here's a description of some of the kludgery involved:

  1. You'll need to download and install some additional applications which can be retrieved from the N810's Application Manager program.  They include: rootsh to enable root access to the device and openssh-client and openssh-server to remotely access the device.
  2. A quick and dirty way to reclaim RAM is to shut down the X-server and kill all of the windowing applications that run in parallel. There are certainly more elegant ways to do this, but in addition to being cheap, I'm lazy too.  What you quickly find out is that any attempt to manually kill off some of these processes results in a reboot of the tablet.  Why? Because by default, the N810 includes a watchdog process that monitors the state of the system.  If it detects any irregularities, it forces a restart.
  3. You can get around this problem by putting the device into what is called "R&D" mode.  This is achieved by downloading the "flasher" utility from maemo.org and establishing a USB connection between the N810 and your host computer.  Directions for this process can be found here.
  4. Once established, you can invoke the following flasher command:  flasher3.5 --set-rd-flags=no-lifeguard-reset. If this was done successfully, you'll notice that a wrench appears on the tablet screen when it is rebooted.
  5. Once in R&D mode you'll have to remotely ssh into the device via the WiFi connection. The following script called set-headless.sh has been provided to kill off the windowing system.  After executing this script, the N810 in effect becomes a headless device.  The only way to communicate with it is through the network.

The Environment

Here's what was required to get the web application up and running:

  1. Ssh into the device.
  2. Download an evaluation copy of Java SE Embedded (ARMv6 Linux - Headless).  For this example the download file was gunzip'ed and untar'ed into the N810 device's /usr directory resulting in a new /usr/ejre1.6.0_10 directory.
  3. Download a copy of glassfish-embedded-web-3.0.jar and place this file in the /usr/ejre1.6.0_10/lib directory.
  4. Modify your PATH variable to include /usr/ejre1.6.0_10/bin and set your JAVA_HOME variable to /usr/ejre1.6.0_10
  5. Create a temporary directory, for this example we'll create a /root/tmp directory.
  6. Compile the following Java source file, Embedded.java,  on a JDK equipped system, which is a slightly hacked version of the original provided by Alexis Moussine-Pouchkine.
  7. Create a glassfish directory under /root/tmp/ and place the compiled Embedded.class file there
  8. Download the sample hello web application, hello.war, and place it in the /root/tmp directory.
  9. Reclaim as much RAM as possible by running the set-headless.sh script
  10. Run the web application from inside the /root/tmp directory via the the following command-line invocation:
     # java -cp /usr/ejre1.6.0_10/lib/glassfish-embedded-web-3.0.jar:. glassfish/Embedded hello.war 600

As the N810 cannot match even the most modest of laptops in terms of performance, be prepared to wait around a little before the application is ready.  Check this output to see what is printed out to the console during invocation.

For this run the, N810 was assigned a WiFi IP address of 192.168.1.102, thus the browser is pointed to that address with port 8888.  Here's what comes up:

And Interacting with the web app produces this:

So this is obviously not ready for prime time, but it does open up a whole lot more possibilities in the near future.

Happy New Year! 

Thursday Jan 31, 2008

Good Things Come To Those Who Wait

Java RTS 2.1 EA (Early Access) marks the arrival of a commitment made some time back, namely that Sun would provide a version of the Java Real-Time System for Linux.  Perhaps to some, it was long time in the making, but in fact there are at least 2 good reasons why a Linux version wasn't available till now:

  1. Until recently, there was no standard Linux release/kernel which had true real-time support.  Typically the versions available were non-standard and did not constitute any considerable volume.  Mainstream Linux distributions are only now incorporating the necessary real-time underpinnings.
  2. Porting the Java Real-Time System VM to additional platforms is non-trivial.

Support and testing for Java RTS 2.1 EA at this time is limited to the currently shipping SUSE Linux Enterprise Real Time10 platform and the upcoming Red Hat Enterprise MRG 1.0 release.  It is however possible that other versions of Linux could run Java RTS 2.1 EA as it utilizes the real-time POSIX programming interface.  At minimum they would require a 2.6.21 kernel or greater and a glibc of 2.5 or greater.  In addition, the latest RT patch would also be needed.

This announcement pertains only to Linux, but of course a 2.1 EA version for both Solaris x86/x64 and Sparc will be shortly forthcoming. In the interim, a version of Java RTS 2.0 update 1 is readily available.  Documentation for both Java RTS 2.0 and 2.1 EA can be found here.

Regardless of platform, an evaluation version of the software is available for download at the Java RTS download page.

 

About

Jim Connors

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today