By Darren Moffat-Oracle on Nov 22, 2011
Since we released hardware crypto acceleration for SPARC T4 and Intel AES-NI support we have had a common question come up: 'How do I test without the hardware crypto acceleration?'.
Initially this came up just for development use so developers can do unit testing on a machine that has hardware offload but still cover the code paths for a machine that doesn't (our integration and release testing would run on all supported types of hardware anyway). I've also seen it asked in a customer context too so that we can show that there is a performance gain from the hardware crypto acceleration, (not just the fact that SPARC T4 much faster performing processor than T3) and measure what it is for their application.
With SPARC T2/T3 we could easily disable the hardware crypto offload by running 'cryptoadm disable provider=n2cp/0'. We can't do that with SPARC T4 or with Intel AES-NI because in both of those classes of processor the encryption doesn't require a device driver instead it is unprivileged user land callable instructions.
Turns out there is away to do this by using features of the Solaris runtime loader (ld.so.1). First I need to expose a little bit of implementation detail about how the Solaris Cryptographic Framework is implemented in Solaris 11. One of the new Solaris 11 features of the linker/loader is the ability to have a single ELF object that has multiple different implementations of the same functions that are selected at runtime based on the capabilities of the machine. The alternate to this is having the application coded to call getisax() and make the choice itself. We use this functionality of the linker/loader when we build the userland libraries for the Solaris Cryptographic Framework (specifically libmd.so, and the unfortunately misnamed due to historical reasons libsoftcrypto.so)
The Solaris linker/loader allows control of a lot of its functionality via environment variables, we can use that to control the version of the cryptographic functions we run. To do this we simply export the LD_HWCAP environment variable with values that tell ld.so.1 to not select the HWCAP section matching certain features even if isainfo says they are present.
For SPARC T4 that would be:
export LD_HWCAP="-aes -des -md5 -sha256 -sha512 -mont -mpmul"
and for Intel systems with AES-NI support:
This will work for consumers of the Solaris Cryptographic Framework that use the Solaris PKCS#11 libraries or use libmd.so interfaces directly. It also works for the Oracle DB and Java JCE. However does not work for the default enabled OpenSSL "t4" or "aes-ni" engines (unfortunately) because they do explicit calls to getisax() themselves rather than using multiple ELF cap sections.
However we can still use OpenSSL to demonstrate this by explicitly selecting "pkcs11" engine using only a single process and thread.
$ openssl speed -engine pkcs11 -evp aes-128-cbc ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 54170.81k 187416.00k 489725.70k 805445.63k 1018880.00k$ LD_HWCAP="-aes" openssl speed -engine pkcs11 -evp aes-128-cbc ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 29376.37k 58328.13k 79031.55k 86738.26k 89191.77k
We can clearly see the difference this makes in the case where AES offload to the SPARC T4 was disabled. The "t4" engine is faster than the pkcs11 one because there is less overhead (again on a SPARC T4-1 using only a single process/thread - using -multi you will get even bigger numbers).
$ openssl speed -evp aes-128-cbc ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 85526.61k 89298.84k 91970.30k 92662.78k 92842.67k
Note these above openssl speed output is not intended to show the actual performance of any particular benchmark just that there is a significant improvement from using hardware acceleration on SPARC T4. For cryptographic performance benchmarks see the http://blogs.oracle.com/BestPerf/ postings.