News, tips, partners, and perspectives for the Oracle Solaris operating system

SPARC T4 OpenSSL Engine

SPARC T4 OpenSSL Engine

Cryptography is a major component of secure e-commerce.
Since cryptography is compute intensive and adds a significant load to applications, such as SSL web servers (https), crypto performance is an important factor.
Providing accelerated crypto hardware greatly helps these applications and will help lead to a wider adoption of cryptography, and lower cost, in e-commerce and other applications.

The SPARC T4 microprocessor has several new instructions available to perform several cryptography functions in hardware.
These instructions are used in a new built-in OpenSSL 1.0 engine available in Solaris 11, the t4 engine.
These new crypto instructions are different from previous generations of SPARC hardware, which has separate crypto processing units.

Previous generation: SPARC T3

alt="SPARC T3 microprocessor" width="104" height="118" border="0" />

The SPARC T3 provided on-chip cryptography in separate co-processors, 1 per core.
The co-processor supports MD5, SHA-1, SHA-256, SHA-512, CRC-32 digests, and DES, DES3, AES, Kasumi, Galois Field (for GCM), RSA-2048, and ECC cryptography.
The co-processor units have the advantage of the CPU off-loading computation-intensive crypto operations.
However, this comes at the cost of high overhead in some cases.
These units are managed and used by the Solaris kernel, which is fine for kernel crypto
(such as ZFS disk crypto, Kernel SSL (KSSL), or IPsec network crypto).

However, for user-based crypto, such as Apache SSL, or OpenSSL,
crypto operations have to go through kernel context switching and buffer copying, which takes thousands of cycles.
This startup overhead isn't noticeable for large amount of data, but for small packets, say 256 bytes or 1 Kbytes, the startup overhead negates the crypto acceleration.
This is unfortunate because a lot of secure Internet communication consists of several small packets encrypted on-the-fly.

Here's a diagram that illustrates the data flow:

SPARC T3 data flow

Application —> Solaris Kernel —> Hyper-privileged call —> SPARC T3 on-chip crypto co-processor —> Solaris Kernel —> Application

The SPARC T4 approach

SPARC T4 microprocessor




8 cores

64 threads



  & more

To solve the problem of kernel and buffer overhead, the SPARC microelectronics team designed new crypto instructions. These new instructions are non-privileged: any program can create or use these instructions—no kernel environment, root permissions, or special setup is needed. As the cryptography is performed directly in hardware, instead over hundreds or thousands of low-level instructions, crypto operations are much faster.

Here's a diagram that illustrates the data flow:

SPARC T4 data flow

Application —> SPARC T4 crypto instruction(s) —> Application

New T4 crypto instructions include:

aes_kexpand0, aes_kexpand1, aes_kexpand2
These perform "key expansion", expanding the 128-, 192-, or 256-bit user-provided key into a "key schedule" used internally during encryption and decryption. aes_kexpand2 is used just for AES-256. The other two aes_kexpand instructions are all used for all three key lengths: AES-128, AES-192, and AES-256.

aes_eround01, aes_eround23, aes_eround01_l, aes_eround_23_l
Used for AES encryption "rounds" or transformations. According to the AES standard (FIPS 197), the number of rounds used (10, 12, or 14) varies according to AES key length, since use of larger keys presumably indicates a desire for more robust encryption at the cost of more computation.

aes_dround01, aes_dround23, aes_dround01_l, aes_dround_23_l
Used for AES decryption "rounds" in a similar way as with encryption

Unlike the official reference implementation of AES, SPARC T4 hardware decryption is implemented in such a way to use only one encryption key schedule—it doesn't use a separate decryption key schedule.

Since the aes_eround and aes_dround instructions are implemented in hardware, they take a constant number of cycles and are not vulnerable to side-channel timing attacks that attempt to discern some bits of data from the time taken to encrypt or decrypt the data.

Other T4 crypto instructions, not used in the t4 engine, are for DES/DES3, Kasumi, Camellia, Montgomery multiply/square root (for RSA Bignum), and CRC32c checksums.
T4 also has md5, sha1, and sha2 digest instructions.
All of the T4 crypto and digest instructions are intended for use in hand-coded assembly called by a high-level language (usually C).

Solaris and OpenSSL Software Optimizations

Having SPARC T4 hardware crypto instructions is all well and good, but how do we access it?
The software is available with Solaris 11 and is used automatically
if you are running Solaris a SPARC t4.
It is used internally in the kernel through kernel crypto modules.
It is available in user space through the PKCS#11 library.

For OpenSSL on Solaris 11, T4 crypto is available directly with a new built-in OpenSSL 1.0 engine, called the "t4 engine."
This is in lieu of the extra overhead of going through the Solaris OpenSSL pkcs11 engine,
which accesses Solaris crypto operations.
Instead, T4 assembly is included directly in the new t4 engine.
Instead of including the t4 engine in a separate library in /lib/openssl/engines/,
the t4 engine is "built-in", meaning it is included directly in OpenSSL's libcrypto.so.1.0.0 library.
This reduces overhead and the need to manually specify the t4 engine.
Since the engine is built-in (that is, in libcrypto.so.1.0.0), the openssl -engine command line flag or API call is not needed to access the engine—the t4 engine is used automatically on T4 hardware.

Here's a diagram that illustrates the data flow:

OpenSSL T4 engine data flow

OpenSSL —> OpenSSL t4 engine —> OpenSSL

Ciphers supported by OpenSSL t4 engine

The Openssl t4 engine auto-detects if it's running on T4 hardware and uses T4 encryption instructions
for these ciphers:
AES-128-CBC, AES-192-CBC, AES-256-CBC,
AES-128-CFB128, AES-192-CFB128, AES-256-CFB128,
AES-128-CTR, AES-192-CTR, AES-256-CTR,
AES-128-ECB, AES-192-ECB, and AES-256-ECB.

Implementation of the OpenSSL t4 engine

The assembly language routines are the same as used in the Solaris kernel and userland PKCS#11 library, but copied to the OpenSSL t4 engine to reduce overhead.
A minimal amount of "glue" code in the t4 engine works between the OpenSSL libcrypto.so.1.0.0 library and the assembly functions.
The t4 engine code is separate from the base OpenSSL code and requires patching only a few source files to use it. That means OpenSSL can be more easily updated to future versions without losing the performance from the built-in t4 engine.

OpenSSL t4 engine Performance

Here's some graphs of t4 engine performance I measured by running

openssl speed -evp $algorithm where $algorithm is the crypto algorithm.
These are using the 64-bit version of openssl on Solaris 11 with a T4 processor running at 2.85GHz.
"Before" is openssl without the t4 engine and "after" is openssl with the t4 engine.
The numbers are MBytes/second.

AES-CBC Performance

Chart showing SPARC T4 OpenSSL engine performance for AES CBC mode

(Higher is better; "before"=OpenSSL on T4 without T4 engine software, "after"=OpenSSL T4 engine)

As obviously seen in the chart above,
AES performance improves dramatically using the t4 engine. For AES-128-CBC, performance improves
~12x for 8 Kbytes of data and
~6x for 16 bytes of data.
Similar results occur with AES-192-CBC and AES-256-CBC and for ECB and CFB128 modes.

Verifying the OpenSSL t4 engine is present

The easiest way to determine if you are running the t4 engine is to type "openssl engine"
on the command line.
No configuration, API, or command line options are needed to use the OpenSSL t4 engine.
If you are running on SPARC T4 hardware with Solaris 11 FCS, you'll see this output indicating you are using the t4 engine:

sparc-t4 $ openssl engine
(t4) SPARC T4 engine support
(dynamic) Dynamic engine loading support
(pkcs11) PKCS #11 engine support

If you are running on SPARC without T4 hardware you'll see this output indicating the hardware can't support the t4 engine:

sparc-t2 $ openssl engine
(t4) SPARC T4 engine support (no T4)
(dynamic) Dynamic engine loading support
(pkcs11) PKCS #11 engine support

For Solaris on AMD or Intel platforms or for older versions of Solaris OpenSSL software, you won't see any t4 engine line at all.
Third-party OpenSSL software (built yourself or from outside Oracle) will not have the t4 engine either.
Solaris 11 FCS comes with OpenSSL version 1.0.0e. The output of typing
"openssl version"
should be "OpenSSL 1.0.0e 6 Sep 2011".

64- and 32-bit OpenSSL

OpenSSL comes in both 32- and 64-bit binaries.
64-bit executable is now the default, at /usr/bin/openssl, and OpenSSL 64-bit libraries at
/lib/sparcv9/libcrypto.so.1.0.0 and libssl.so.1.0.0
The 32-bit executable is at /usr/bin/sparcv7/openssl and
the libraries are at
/lib/libcrytpo.so.1.0.0 and libssl.so.1.0.0.


The OpenSSL t4 engine is available in Solaris 11 for both the 64- and 32-bit versions of OpenSSL.
The OpenSSL t4 engine is not available with Solaris 10, although the SPARC T4 is supported and used in the latest Solaris 10 updates with the OpenSSL pkcs11 engine and the Solaris kernel.
You must have a processor that supports SPARC AES instructions, otherwise OpenSSL will fallback to the older, slower AES implementation without AES instructions.
Processors that support AES instructions are those in the SPARC T4 processor family. The easiest way to determine if the processor supports AES is with the isainfo -v command—look for "sparcv9" and "aes" in the output:

$ isainfo -v
64-bit sparcv9 applications
crc32c cbcond pause mont mpmul sha512 sha256 sha1 md5 camellia kasumi
des aes ima hpc vis3 fmaf asi_blk_init vis2 vis popc

Future Availability and Commitment

The new crypto instructions mention here are available on the SPARC T4 microprocessor.
However, these new instructions should be considered "evolving" in that there's no guarantee the instructions
will be present or unchanged in future SPARC processors.
That's to allow the possibility for optimizations and tuning in the future.
That's not a problem to those using Solaris software, including OpenSSL,
as future software will support future hardware.
But it is a consideration to those coding and using crypto instructions
directly in their code.
Applications should verify T4 aes is availabe with the getisax(2) system call and checking for the AV_SPARC_AES bit (or similar bit for digests) defined in header file /usr/include/sys/auxv_SPARC.h.


The Solaris 11 OpenSSL t4 engine provides another interface to access powerful SPARC T4 hardware cryptography,
in addition to Solaris userland PKCS#11 libraries and Solaris crypto kernel modules.

Update for Solaris 11.2 (May 2014)

With the Solaris 11.2 update, the OpenSSL t4 engine has been removed. That's because the upstream OpenSSL community has integrated T4 crypto instructions directly in the mainline OpenSSl crypto code, so a separate t4 engine is no longer needed or used. OpenSSL automatically detects if it's running on a SPARC processor with crypto instructions (which now includes not just the T4, but T5, M5, and M6 SPARC processors).
For more information see the Misaki Miyashita's blog, "OpenSSL on Oracle Solaris 11.2".

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.