Update 2016: everything here applies to subsequent Oracle SPARC processors since T4, including the SPARC M7/T7 line. Further optimizations and speedups are "under the hood."
To review, the SPARC T4 processor includes a crypto unit that supports several crypto instructions.
For hardware crypto these include 11 AES instructions, 4 xmul* instructions (for AES GCM carryless multiply), mont for Montgomery multiply (optimizes RSA and DSA), and 5 des_* instructions (for DES3).
For hardware hash algorithm optimization, the T4 has the md5, sha1, sha256, and sha512 instructions (the last two are also used for SHA-224 an SHA-384).
First off, it's easy to tell if the processor T4 crypto instructions—use the isainfo -v command and look for "sparcv9" and "aes" (and other hash and crypto algorithms) in the output:
$ isainfo -v
These instructions are not-privileged, so are available for direct use in user-level applications and libraries (such as OpenSSL).
Here is the "openssl speed -evp" command shown with the built-in t4 engine and with the pkcs11 engine.
Both run the T4 AES instructions, but the t4 engine is faster than the pkcs11 engine because it has less overhead (especially for smaller packet sizes):
t-4 $ /usr/bin/openssl version
Note: The "-evp" flag indicates use the OpenSSL "EnVeloPe" API, which gives more accurate results.
That's because it tells OpenSSL to use the same API that external programs use when calling OpenSSL libcrypto functions, evp(3openssl).
OK, good enough, the isainfo(1) command shows the instructions are present, but how does one know if they are being used?
Chi-Chang Lin, who works on Oracle Solaris performance, wrote a Dtrace script to show if T4 instructions are being executed.
To show the T4 instructions are being used, run the following Dtrace script. Look for functions named "t4" and "yf" in the output. The OpenSSL T4 engine uses functions named "t4" and the PKCS#11 engine uses functions named "yf".
To demonstrate, I'll first run "openssl speed" with the built-in t4 engine then with the pkcs11 engine. The performance numbers are not valid due to dtrace probes slowing things down.
t-4 # dtrace -Z -n \
So, as shown above the OpenSSL built-in t4 engine executes t4_* functions (which are hand-coded assembly executing the T4 AES instructions) and the OpenSSL pkcs11 engine executes *yf* functions (which are similar assembly functions for OpenSSL).
The OpenSSL t4 engine is used automatically with the /usr/bin/openssl command line.
Chi-Chang Lin also points out that
if you're calling the OpenSSL API (libcrypto.so) from a program, you must call
ENGINE_load_builtin_engines(), otherwise the built-in t4 engine will not be loaded
(do not call ENGINE_set_default()).
The benchmark program "openssl speed -evp" also calls ENGINE_load_builtin_engines(), so one does not need to specify the "-engine" option for built-in OpenSSL engines such as the t4 engine.
To use the t4 engine optimization use the OpenSSL EnVeloPe (EVP) api, evp(3openssl) and header openssl/evp.h.
For crypto, use EVP_CryptoInit(), EVP_CryptoUpdate(), EVP_CryptoFinal() and EVP_CIPHER_CTX_cleanup().
For digest, use EVP_DigestInit(), EVP_DigestUpdate(), EVP_DigestFinal() and EVP_CIPHER_CTX_cleanup().
These functions automatically call the t4 engine functions when present.
Your best bet for hash algorithms is to use the Solaris native libmd(3LIB) library and its header files md5.h, sha1.h, and sha2.h. This library automatically uses the T4 hash and crypto instructions when running on a T4 and is more efficient (has less overhead) than both the OpenSSL EVP or the PKCS#11 libraries.
The OpenSSL t4 engine is available with Solaris 11 and 11.1. For Solaris 10 08/11 (U10), you need to use the OpenSSL pkcs11 engine. The OpenSSL t4 engine is distributed only with the version of OpenSSL distributed with Solaris (and not third-party or self-compiled versions of OpenSSL).
The OpenSSL engine implements the AES cipher for Solaris 11, released 11/2011.
For Solaris 11.1, released 11/2012, the OpenSSL engine adds optimization for the MD5, SHA-1, and SHA-2 hash algorithms, and DES-3.
Although the T4 processor has Camillia and Kasumi block cipher instructions, these are not implemented in the OpenSSL T4 engine.
The following charts may help view availability of optimizations.
The first chart shows what's available with Solaris CLIs and APIs, the second chart shows what's available in Solaris OpenSSL.
This table is shows Solaris native CLI and API support.
As such, they are all available with the OpenSSL pkcs11 engine.
CLIs: "openssl -engine pkcs11", encrypt(1), decrypt(1), mac(1), digest(1), MD5sum(1), SHA1sum(1), SHA224sum(1), SHA256sum(1), SHA384sum(1), SHA512sum(1)
APIs: PKCS#11 library libpkcs11(3LIB) (incluDES Openssl pkcs11 engine), libMD(3LIB), and Solaris kernel modules
|Solaris 11||Solaris 11.1|
|AES-ECB, AES-CBC, AES-CTR, AES-CBC AES-CFB128||X||X||X|
|DES3-ECB, DES3-CBC, DES2-ECB, DES2-CBC, DES-ECB, DES-CBC||X||X||X|
|bignum Montgomery multiply (RSA, DSA)||X||X||X|
|MD5, SHA-1, SHA-256, SHA-384, SHA-512||X||X||X|
Update (2014): the T4 engine has been removed in S11.2 and replaced with a native T4 optimization in the base OpenSSL code.
This table is for the Solaris OpenSSL built-in t4 engine.
Algorithms listed above are also available through the OpenSSL pkcs11 engine.
APIs: openssl(5), engine(3openssl), evp(3openssl), libcrypto crypto(3openssl)
|Algorithm||Solaris 11||Solaris 11|
|AES-ECB, AES-CBC, AES-CTR, AES-CBC AES-CFB128||X||X||X|
|DES3-ECB, DES3-CBC, DES-ECB, DES-CBC||X|
|bignum Montgomery multiply (RSA, DSA)||X|
|MD5, SHA-1, SHA-256, SHA-384, SHA-512||X||X|
Most of the T4 assembly code that called the new T4 crypto instructions
was written by Ferenc Rákóczi of the Solaris Security group,
with assistance from others.
You can download the Solaris source for this and other parts of Solaris as
a few zip files at the
Oracle Download website.
The relevant source files are generally under directories
Solaris 11 binaries
(including updates) are available from the
Oracle Solaris 11 download website.
OpenSSL t4 engine
The source for the OpenSSL t4 engine,
which is based on the Solaris source above, is viewable through the
OpenGrok source code browser in directory
You can download the source from the same website or through Mercurial source code management, hg(1).
Oracle Solaris with SPARC T4 provides a rich set of accelerated cryptographic and hash algorithms.
Using the latest update, Solaris 11.1, provides the best set of optimized algorithms,
but alternatives are often available, sometimes slightly slower,
for releases back to Solaris 10 08/11 (U10).
Solaris 11.3 Update. Starting with Solaris 11.3 Update, the OpenSSL t4 engine was removed an replaced with native OpenSSL code optimized for SPARC T4+. Unfortunately, they don't use a consistent function name for all their t4 code, but the SPARC AES functions have a _sparcv9_ prefix that can be tested. I updated the example above to probe for _sparcv9_*() functions as well.
Eric Reid gives another method of detecting SPARC crypto instructions, using the cpustat(1M) and cputrack(1) sampling. This method detects use not only by the Solaris Cryptographic Framework and OpenSSL, as described above, but also direct use of SPARC crypto instructions by any software running in Solaris user space, including the Crypto Framework and OpenSSL. These methods do not track use by the Solaris kernel of SPARC crypto instructions. Also it counts floating point instruction use, too, if any.
See also these earlier blogs.