Intel AES-NI Optimization on Solaris
By danx on Nov 17, 2010
Intel AES-NI Optimization on Solaris
This AES encryption flash animation
is useful to visualize AES encryption
and understand how AES operates
(Enrique Zabala, Universidad ORT, Uruguay)
Since 2001, AES has been widely-adopted and is now a part of several data communication standards, such as WPA2 for wi-fi, IPsec for secure Internet transmission, SSH 2 for file and terminal access, and SSH v2 for secure web connections.
To improve performance Intel added 6 new instructions to the Intel64 instruction set, called AES-NI (for AES New Instructions). The AES-NI instructions are first available on the "Westmere" architecture microprocessors (some low-end Westmere chips for mobile/laptop use don't have AES-NI). Westmere processors are part of the Intel "Core" processor family and include the Xeon 5600 processors introduced in 2010. Oracle's Sun Fire X4170 M2 and X4270 M2 are two systems that use Xeon 5600 processors.
Previously, for OpenSolaris 2008.11/Solaris Nevada (build 93), I optimized AES by replacing optimized C code with optimized assembly. The optimized C code used previously was the optimized reference implementation written in C furnished by the authors of AES and first made available in Solaris 10. The optimized assembly code I used was based on Dr. Brian Gladman's AES implementation, which was also faster than the OpenSSL assembly. For details see my previous blog post, Optimizing OpenSolaris With Open Source: AES (2008).
Intel AES-NI Instruction Set
Intel AES-NI consists 6 instructions: AESENC, AESENCLAST, AESDEC, AESDECLAST, AESKEYGENASSIST, and AESIMC. AESENC performs one round of encryption (which consists of these steps: Substitute bytes, shift rows, mix columns, and add (xor) round key). AESENCLAST performs the final encryption round, which doesn't mix columns. Similarly AESDEC and AESDECLAST perform the one round of decryption.
Two more instructions perform key expansion of the user key, formatting it for internal use by the algorithm. The AESKEYGENASSIST instruction helps generate the round keys, used for encryption. The AESIMC then converts the encryption key, with an operation called Inverse Mix Columns, to a form suitable for decryption.
Cache Attack prevention with AES-NI
The most highly-optimized AES algorithms, including Dr. Gladman's, has a weakness under timing attacks, due to their use of large lookup tables. By pre-loading the microprocessor cache the AES table entries, and measuring the encryption time, once can can find what table entries were accessed. This information could be used to help reveal the secret key (although still difficult). Current software mitigation techniques against cache attacks carry significant performance penalties. However, AES-NI prevents such attacks because AES-NI instruction latency is fixed and data-independent.
To implement AES-NI required a number of dependencies, briefly:
- getisax(2) and the Kernel's x86_feature/x86_featureset bit array needed to be expanded to detect and record the presence of Intel AES-NI instructions (CR 6750666). These bits are set by Solaris from the CPUID instruction.
- The Solaris amd64 assemblers, as(1) and fbe(1) needed to support the new AES-NI instructions (CR 6740663). The disassembler, dis(1) also was extended to display AES-NI (CR 6762031). Also, GNU binutils was updated to 2.19 to get the latest version of the GNU assembler, gas(1), with AES-NI support (CR 6644870).
Intel provided an implementation for OpenSSL to optimize AES using assembly that includes the AES-NI instructions and the 128-bit %xmm registers, %xmm0-%xmm16. The implementation is basically the same as in OpenSSL with minor differences in source. Changes include reordering the function parameters and structure types from OpenSSL to those defined in Solaris. In userland, the kernel saves and restores the %xmm registers. However, these registers are not saved or restored when the kernel swaps kernel threads, so I added code to save and restore these registers on a 0 mod 16-aligned stack, when necessary (that is, when Intel Control Register CR0.TS isn't set).
Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with an Intel Clarkdale processor, which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded amd64 assembly. The "after" case has AES-NI instructions integrated into the Solaris Crypto Framework, which is the PKCS11 library in userland and the "aes" module in the kernel.
Userland library performance The first chart compares AES128, AES192, and AES256 before and after the AES-NI optimization using the libpkcs11.so/libsoftcrypto.so libraries. The time shown is user time, in seconds on a quiet system running an internal micro-benchmark, aesspeed (lower is better). Runtime improved by 79%, 74%, and 79%, respectively.
Solaris kernel performance This chart shows Solaris kernel performance using kernel module "aes". This micro-benchmark, runs AES128 and AES256 in 4 threads for 5000 iterations on 1024 bytes of data. Numbers are is in 1000000 bytes/second (higher is better). Performance improved here by 26% and 56%, respectively.
Solaris kernel performance Finally, another Solaris kernel micro-benchmark. This one is similar to the previous one, except it's running AES128 with 64 bytes of data on 1, 2, 3, and 4 threads. Performance improved by 50% for the 1, 3, and 4 thread case. The 2 thread case looks like an outlier.
Availability in Solaris
This feature is available only for Solaris x86 64-bit, running on a Intel microprocessor that supports the AES-NI instruction set.
- I integrated AES-NI optimization in Solaris build snv_114 (see Change Request CR 6767618), so it's available in Oracle Solaris 11 Express 2010.11.
- I also back-ported AES-NI optimization to Solaris 10 10/09 (aka update 8).
- AES-NI is available by default in Java through Java Cryptography Extension (JCE)'s PKCS#11 extension. PKCS#11 is an industry standard interface supported by Solaris and used by default by JCE. For more information see Ramesh Nagappan's blog Java Cryptography on Intel Westmere: Solaris Advantage.
- Intel has a detailed white paper on AES-NI written by Shay Gueron, Intel Advanced Encryption Standard (AES) Instructions Set (2008, 2010).
- Jeffrey Rott of Intel has a brief overview of AES-NI, "Intel Advanced Encryption Standard Instructions (AES-NI) (2010).
- The assembly source is in file $SRC/common/crypto/aes/amd64/aes_intel.s. This is common code for both the userland library, libsoftcrypto.so, and kernel module aes.
Disclaimer: the statements in this blog are my personal views, not that of my employer.