Everything you want and need to know about Oracle SPARC systems performance

  • May 12, 2017

Java Streams 3x to 22x Faster with Hardware Acceleration on Oracle's SPARC Servers, a Step-by-Step Guide

Brian Whitney
Principal Software Engineer

Thanks to Karthik Ganesan for providing the majority of this content.

The Java streams abstraction in JDK 8 provides an easy way to write and efficiently execute aggregate operations on large datasets. A key driver for this work is making parallelism more accessible to developers. In addition, Java Stream programs are often more concise, understandable and maintainable.

Oracle's SPARC M7 and S7 processors can seamlessly make Java Streams programs 3x to 22x faster than x86. This is accomplished using the streamoffload Oracle Solaris package. See the blog entry "Accelerating Java Streams Performance Using SPARC with DAX" for more information.

The SPARC M7 and S7 processors are designed with “Software in Silicon” functions that can accelerate the execution of key features of Java Streams. These SPARC processors include a hardware unit called the DAX (Data Analytics Accelerator) that accelerates scanning filtering and other operations.

Minimum requirement for acceleration

  • Oracle Solaris and above
  • SPARC M7, SPARC S7, SPARC T7 server
  • Java 8

Installation instructions

  • pkg set-publisher -G '*' -g http://pkg.oracle.com/solaris/release solaris
  • pkg install streamoffload displays the agreement. To accept and install, use
  • pkg install --accept streamoffload

Once this package has been installed, three new files should appear in the /usr/lib/streamoffload directory: comOracleStream.jar, libstreamoffload.so and a README.


  1. Add import com.oracle.stream to Java program. Use DaxIntStream.of() function to create the Stream instead of standard IntStream interface. Users can continue to use an IntStream handle for the Stream.
  2. Compile with the library:
    javac -cp /usr/lib/streamoffload/comOracleStream.jar {Program.java}
  3. Include the following in the Java run command:
    -cp .:/usr/lib/streamoffload/comOracleStream.jar -DaccelLibPath=/usr/lib/streamoffload -d64

Supported Java Streams Operations

  • filter()
  • map(ternary) (only if the output space is confined to the values 1 and 0)
  • toArray()
  • anyMatch()
  • allMatch()
  • noneMatch()
  • filter().count()

Example Java code:

import com.oracle.stream.*; 
public class Test1 { 
   static int myGlobal = 0; 
   public static void main(String[] args) { 
      int[] intArray = new int[1_000_000]; 
      // Init array 
      IntStream myStream = DaxIntStream.of(intArray).parallel(); 
      boolean matched = myStream.allMatch(x -> (x>myGlobal)&&(x<15)); 

Compile using:

javac -cp /usr/lib/streamoffload/comOracleStream.jar: Test1

Run using:

java -XX:LargePageSizeInBytes=256m -XX:+UseLargePages -d64 \
-cp .:/usr/lib/streamoffload/comOracleStream.jar \
-DaccelLibPath=/usr/lib/streamoffload -cp . Test1

To measure performance without hardware acceleration, one can use the flag



Best Practices

  • Only combinations of the supported operations listed above in a given pipeline will be accelerated, so you move operations listed above into a separate pipeline
  • Underlying source data of the pipeline should be an integer array
  • It is essential that the stream be marked as parallel so the streamoffload library knows it is acceptable to do operations in parallel.
  • For best performance when using the streamoffload library, it is often preferable to use large pages for larger input sizes and smaller pages for smaller input sizes. One can do this using the Java flags -XX:LargePageSizeInBytes -XX:+UseLargePages. A good rule of thumb is to use the smallest page size within which the entire input data will fit.
  • Streamoffload supports lambda expressions under certain conditions. The following are supported in the lambda expression: comparison operators (<,>,<=,>=,==,!=),|| and && logical operators, global variables, Constants, final static local,variables, scalar argument to the lambda
  • The following are not accelerated in lambda expressions: Arithmetic operations inside predicates, Instance or non static local variables. This can be worked around by assigning Instance variables and local variables to a final static variable before usage in the Lambda.

See Also

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.