Using the UltraSPARC hardware cryptographic accelerators


A brief synopsis of how to leverage the UltraSPARC hardware cryptographic accelerators from your application.


Introduction


Sun's UltraSPARC T1, T2 and T2Plus processors support high-performance hardware cryptographic accelerators on chip. These accelerators can significantly reduce the normally significant overheads associated with cryptography and secure operation.

On the UltraSPARC T1, T2 and T2plus processors, there is a cryptographic accelerator per each core, such that an 8-core processor provides 8 accelerators. The algorithms supported by these accelerators vary with processor and are illustrated in the following table:


Algorithm

UltraSPARC T1

UltraSPARC T2/T2Plus

Public-key algorithms

RSA, DSA, DH

ECC, ECDSA, ECDHA

X

Symmetric algorithms

RC4

X

DES, 3DES

X

AES-{128,192,256}

X

Cryptographic hashes

MD5

X

SHA-1

X

SHA-224/256

X


The public-key operations are performed by the accelerator's modular arithmetic unit, while symmetric cipher and cryptographic hash operations are performed by the accelerator's cipher and hash unit (CHU). The UltraSPARC T1 accelerators are composed of just a MAU, while the UltraSPARC T2/T2plus accelerators have both MAU and CHU, both of which can operate in parallel. The accelerators operate at the core frequency (in parallel with the core) and are capable of delivering cryptographic performance that is typically an order of magnitude better than can be achieved on traditional processors in software, as is illustrated in the following table:


Algorithm

UltraSPARC T1 (1.2GHz)

UltraSPARC T2/T2Plus (1.4GHz)

RSA-1024

20,000 sign operations/sec/chip (8-core)

37,000 sign operations/sec/chip (8-core)

AES-128-CBC

X

44Gb/s/chip (8-core)

SHA-1

X

32Gb/s/chip (8-core)


This article describes how to code your application such that it can leverage these hardware accelerators. Many important applications will already leverage the UltraSPARC hardware accelerators, either directly out-of-the-box or with minimal configuration. These include; the Sun Studio webserver, the Apache webserver, KSSL and IPsec to name but a few. More details of how to configure these applications are provided in a Sun cryptographic blueprint [1].


Using the UltraSPARC hardware cryptographic accelerators

Access to the cryptographic accelerators is controlled by the Solaris Cryptographic Framework. For non-privileged applications, access is via the userland cryptographic framework (UCF), while for kernel modules (such as KSSL or IPsec) access is via the kernel cryptographic framework (KCF). This article focuses on the userland cryptographic framework.

The Userland Cryptographic Framework exposes a PKCS#11 [2] compliant API to non-priv userland applications. Applications can interact directly with the UCF via the PKCS#11 interface, or indirectly via:

    • Java Cryptographic Framework (JCE)

    • OpenSSL

    • Network Security Services (NSS)

The remainder of this article focuses on how to interact with the UCF directly and indirectly via JCE, OpenSSL and NSS.


Direct interaction with UCF

For PKCS#11 compliant applications, libpkcs11.so is the gateway to the UCF, and its just a simple matter of linking against this library [located in /usr/lib]. Given the fairly widespread use of the PKCS11 interface, especially with respect to traditional off-chip cryptographic accelerators (such as Sun's SCA6000 card), many applications already leverage PKCS#11. If an application doesn't already use the PKCS#11 interface, it is pretty straightforward to modify the application, with documents showing example implementations readily available [3].


Offload via OpenSSL

If the application uses OpenSSL for its cryptographic requirements (and many do), access to the accelerators can be achieved by using a version of OpenSSL that has been modified to support the PKCS#11 engine. A patched version of OpenSSL is supplied with Solaris 10 and is located in /usr/sfw/lib, allowing application compilation as follows:


cc -fast -I /usr/sfw/include -L /usr/sfw/lib -lcrypto aes_test.c -o aes_test.out


For operations that are to be offloaded, it is necessary to restrict use to the EVP_ functions and explicitly indicate the use of the PKCS11 engine; something like the following works for bulk ciphers (the process for RSA is similar):


ENGINE \*e;

ENGINE_load_builtin_engines();

e = ENGINE_by_id("pkcs11");

ENGINE_set_default_ciphers(e);

EVP_CIPHER_CTX_init (&ctx);

EVP_EncryptInit (&ctx, EVP_des_cbc (), key, iv);

EVP_EncryptUpdate (.....);


PKCS#11 engine patches are available from OpenSSL.org for a number of different versions of OpenSSL, if the version of OpenSSL that ships with Solaris isn't suitable [4].


Offload via JCE

For applications that utilize the Java Cryptographic Extensions (JCE), the application should simply be configured to utilize the SunPKCS11-Solaris provider. Accordingly, in order for applications to use the hardware accelerators automatically, it is just necessary to ensure that sun.security.pkcs11.SunPKCS11 is configured as the first provider in $JAVA_HOME/jre/lib/security/java.security file.


The SunPKCS11-Solaris provider can also be explicitly selected as follows:


String provider = "SunPKCS11-Solaris";

Cipher aescipher = Cipher.getInstance("AES/ECB/NoPadding", provider);


It should be noted that the SunPKCS11-Solaris provider currently only offloads a subset of the chaining modes supported by the hardware, so make sure that the chaining mode and padding mode are supported [5]. The modes supported by the hardware accelerators are illustrated in the following table:


Cipher

Supported chaining modes

AES

ECB, CBC, CTR

DES/3DES

ECB, CBC, CFB64



Offloading via NSS

In order for NSS to use the hardware cryptographic accelerators, the Solaris cryptographic framework should be added as a provider for NSS. This is achieved by modifying the appropriate NSS security databases. As an example, the following illustrates how firefox can offload RSA operations to the hardware:


/usr/sfw/bin/modutil -dbdir /home/sprack/.mozilla/firefox/r5s548iw.default/ -add "Solaris Crypto Framework" -libfile /usr/lib/libpkcs11.so -mechanisms RSA

/usr/sfw/bin/modutil -dbdir /home/sprack/.mozilla/firefox/r5s548iw.default/ -enable "Solaris Crypto Framework"


The use of the mechanism option indicates that the Solaris Cryptographic Framework should be the default provider for RSA operations [6].


Observability

When operations are submitted to the cryptographic framework, the cryptographic framework will, as appropriate, route processing for these operations to the Niagara cryptographic provider (ncp) device driver for public-key operations, and the Niagara-2 cryptographic provider (n2cp) device driver for symmetric cipher and cryptographic hash operations. These device drivers then perform the actual offload to the hardware accelerators and return the results to the framework. The interaction between these drivers and the cryptographic frame is controlled via cryptoadm.

kstat can be used to provide insight into the cryptographic operations that ncp and n2cp are handling, as follows:


kstat -m ncp | less

kstat -m n2cp | less


Additionally, cputrack can be utilized to determine the activity of the hardware accelerators directly (use cputrack -h to determine which counters to track).


Concluding thoughts

Cryptographic processing overheads are finding their way into an ever wider array of applications as security becomes ever more important. By providing on-chip hardware cryptographic accelerators, the UltraSPARC processors can vastly reduce these overheads, and in many situations enable respectable performance even when operating securely.

Via the Cryptographic Framework Solaris provides a simple way via which applications can leverage the benefits of the UltraSPARC hardware accelerators, while continuing to ensure application portability



References


[1] Using the cryptographic accelerators in the UltraSPARC T1 and T2 processors

[2] PKCS #11: Cryptographic Token Interface Standard

[3] The Solaris cryptographic framework

[4] Miscellaneous OpenSSL Contributions

[5] Sun PKCS#11 Provider's Supported Algorithms

[6] Configuring Solaris Cryptographic Framework and Sun Java System Web Server 7 on Systems With UltraSPARC T1 Processors



Comments:

The numbers are really impressive :)

I'm quite familiar with Cell BE. and it is reported that AES (CTR) has a bandwidth of 15.7Gb/s on 8 cores (SPU) [1] and that RSA-1024 does 662.7 signing by second on 16 cores [2].

I assume that it's not truly comparable but neverthless it illustrates quite well how good your results are.

[1] http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/3F88DA69A1C0AC40872570AB00570985
[2] http://eprint.iacr.org/2007/061.pdf

Posted by seb on May 29, 2008 at 09:53 PM PDT #

Post a Comment:
Comments are closed for this entry.
About

Dr. Spracklen is a senior staff engineer in the Architecture Technology Group (Sun Microelectronics), that is focused on architecting and modeling next-generation SPARC processors. His current focus is hardware accelerators.

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today