Wednesday Aug 26, 2009

Hot Chips -- next-generation crypto accelerator

My Hot Chips presentation on the next-generation UltraSPARC security accelerator can be found here.

Thursday Jun 04, 2009

Improved crypto scaling on T2+

Some great work by Krishan Yenduri has led to nice improvements in the multi-socket bulk cipher performance on UltraSPARC T2+ processors. The improvements are available in the current build, snv_117. Krishna has performance data for scaling on a 4-socket T5440 system in his recent blog. Using the same kernel umicrobenchmark, the following plot shows the scaling on a dual-socket UltraSPARC T2 Plus system:

In this test, the requesting threads are scheduled by Solaris (rather than bound to specific cores), so Solaris will tend to even distribute the threads across the 16 cores in the system – this explains by you get this rapid increase in aggregate cryptographic throughout as the number of threads is increased. If the first 8-threads where bound to core 0, the second 8 to core 1 and so on, the scaling would be essentially linear as the cores are added.

So, a 2-socket T2+ system is delivering around 9GBytes/second. Not bad, given most other dual-socket systems can deliver at max around 2GB/s. Further, from the above it is apparent that we hit 9GB/s on the T2 system with less than 50% of the HW strands being utilized.

Monday Jun 01, 2009

Hot Chips 21 -- Sun's next-generation HW security accelerator

I'll be presenting Sun's next-generation on-chip UltraSPARC security accelerator at this year's Hot Chips. The preliminary program can be found here.

Monday Sep 22, 2008

OpenSSH & T2 (contd)

Following from the recent post discussing modifying OpenSSL to enable OpenSSH to take advantage of the UltraSPARC T2 crypto accelerators, I should also mention that it is possible to just use the PKCS11 engine modified OpenSSL that Sun provides. You should use the –with-ssl-engine when you configure OpenSSH. Further, it may just be my mistake, but I am having problems getting OpenSSH to use the PKCS11 engine unless I modify openssl-compat.c. In the unmodified code, ssh_SSLeay_add_all_algorithms() does:

/\* Enable use of crypto hardware \*/

I changed this to:

ENGINE \*pkcengine;
/\* Enable use of crypto hardware \*/
pkcengine = ENGINE_by_id("pkcs11");

and things started working fine. I need to find some cycles to go back I see if I had things misconfigured.

Thursday Sep 18, 2008

OpenSSH and T2

Following from the last entry about recent enhancements to SunSSH to enable it to take advantage of the UltraSPARC T2 cryptographic accelerators, for those who use OpenSSH, its also possible to leverage the T2 cryptographic accelerators. One simple way to achieve this, without modifying OpenSSH itself, is just to use a version of OpenSSL that has been modified to take advantage of the HW crypto; for the standard aes-128-cbc operating mode, the simplest way to achieve this is to modify aes_cbc.c to call libpkcs11. Its about a 10 line modification and can be applied to any version of OpenSSL. I will post the required code later today here.

Monday Sep 15, 2008

SSH (& scp etc) gets faster on T2 processors

Great to see from Jan's recent blog entry that SunSSH has been enhanced to take advantage of the UltraSPARC T2 hardware cryptographic accelerators – see here for more details.

I will spend some time playing with this later this week and report more generally on the performance benefits I observe

Monday Jun 30, 2008

Crypto wiki

I've been gradually expanding the crypto wiki (which can be found here); adding additional info and some code examples. Please let me know what additional information would be useful to add, how the wiki could be improved, and even add your own thoughts....

Wednesday Jun 04, 2008

Crypto performance wiki

I've started a wiki to capture the more pertinent info on UltraSPARC crypto performance in a more organized form.

Thursday May 29, 2008

Using the UltraSPARC hardware cryptographic accelerators

A brief synopsis of how to leverage the UltraSPARC hardware cryptographic accelerators from your application.


Sun's UltraSPARC T1, T2 and T2Plus processors support high-performance hardware cryptographic accelerators on chip. These accelerators can significantly reduce the normally significant overheads associated with cryptography and secure operation.

On the UltraSPARC T1, T2 and T2plus processors, there is a cryptographic accelerator per each core, such that an 8-core processor provides 8 accelerators. The algorithms supported by these accelerators vary with processor and are illustrated in the following table:



UltraSPARC T2/T2Plus

Public-key algorithms




Symmetric algorithms







Cryptographic hashes







The public-key operations are performed by the accelerator's modular arithmetic unit, while symmetric cipher and cryptographic hash operations are performed by the accelerator's cipher and hash unit (CHU). The UltraSPARC T1 accelerators are composed of just a MAU, while the UltraSPARC T2/T2plus accelerators have both MAU and CHU, both of which can operate in parallel. The accelerators operate at the core frequency (in parallel with the core) and are capable of delivering cryptographic performance that is typically an order of magnitude better than can be achieved on traditional processors in software, as is illustrated in the following table:


UltraSPARC T1 (1.2GHz)

UltraSPARC T2/T2Plus (1.4GHz)


20,000 sign operations/sec/chip (8-core)

37,000 sign operations/sec/chip (8-core)



44Gb/s/chip (8-core)



32Gb/s/chip (8-core)

This article describes how to code your application such that it can leverage these hardware accelerators. Many important applications will already leverage the UltraSPARC hardware accelerators, either directly out-of-the-box or with minimal configuration. These include; the Sun Studio webserver, the Apache webserver, KSSL and IPsec to name but a few. More details of how to configure these applications are provided in a Sun cryptographic blueprint [1].

Using the UltraSPARC hardware cryptographic accelerators

Access to the cryptographic accelerators is controlled by the Solaris Cryptographic Framework. For non-privileged applications, access is via the userland cryptographic framework (UCF), while for kernel modules (such as KSSL or IPsec) access is via the kernel cryptographic framework (KCF). This article focuses on the userland cryptographic framework.

The Userland Cryptographic Framework exposes a PKCS#11 [2] compliant API to non-priv userland applications. Applications can interact directly with the UCF via the PKCS#11 interface, or indirectly via:

    • Java Cryptographic Framework (JCE)

    • OpenSSL

    • Network Security Services (NSS)

The remainder of this article focuses on how to interact with the UCF directly and indirectly via JCE, OpenSSL and NSS.

Direct interaction with UCF

For PKCS#11 compliant applications, is the gateway to the UCF, and its just a simple matter of linking against this library [located in /usr/lib]. Given the fairly widespread use of the PKCS11 interface, especially with respect to traditional off-chip cryptographic accelerators (such as Sun's SCA6000 card), many applications already leverage PKCS#11. If an application doesn't already use the PKCS#11 interface, it is pretty straightforward to modify the application, with documents showing example implementations readily available [3].

Offload via OpenSSL

If the application uses OpenSSL for its cryptographic requirements (and many do), access to the accelerators can be achieved by using a version of OpenSSL that has been modified to support the PKCS#11 engine. A patched version of OpenSSL is supplied with Solaris 10 and is located in /usr/sfw/lib, allowing application compilation as follows:

cc -fast -I /usr/sfw/include -L /usr/sfw/lib -lcrypto aes_test.c -o aes_test.out

For operations that are to be offloaded, it is necessary to restrict use to the EVP_ functions and explicitly indicate the use of the PKCS11 engine; something like the following works for bulk ciphers (the process for RSA is similar):



e = ENGINE_by_id("pkcs11");


EVP_CIPHER_CTX_init (&ctx);

EVP_EncryptInit (&ctx, EVP_des_cbc (), key, iv);

EVP_EncryptUpdate (.....);

PKCS#11 engine patches are available from for a number of different versions of OpenSSL, if the version of OpenSSL that ships with Solaris isn't suitable [4].

Offload via JCE

For applications that utilize the Java Cryptographic Extensions (JCE), the application should simply be configured to utilize the SunPKCS11-Solaris provider. Accordingly, in order for applications to use the hardware accelerators automatically, it is just necessary to ensure that is configured as the first provider in $JAVA_HOME/jre/lib/security/ file.

The SunPKCS11-Solaris provider can also be explicitly selected as follows:

String provider = "SunPKCS11-Solaris";

Cipher aescipher = Cipher.getInstance("AES/ECB/NoPadding", provider);

It should be noted that the SunPKCS11-Solaris provider currently only offloads a subset of the chaining modes supported by the hardware, so make sure that the chaining mode and padding mode are supported [5]. The modes supported by the hardware accelerators are illustrated in the following table:


Supported chaining modes





Offloading via NSS

In order for NSS to use the hardware cryptographic accelerators, the Solaris cryptographic framework should be added as a provider for NSS. This is achieved by modifying the appropriate NSS security databases. As an example, the following illustrates how firefox can offload RSA operations to the hardware:

/usr/sfw/bin/modutil -dbdir /home/sprack/.mozilla/firefox/r5s548iw.default/ -add "Solaris Crypto Framework" -libfile /usr/lib/ -mechanisms RSA

/usr/sfw/bin/modutil -dbdir /home/sprack/.mozilla/firefox/r5s548iw.default/ -enable "Solaris Crypto Framework"

The use of the mechanism option indicates that the Solaris Cryptographic Framework should be the default provider for RSA operations [6].


When operations are submitted to the cryptographic framework, the cryptographic framework will, as appropriate, route processing for these operations to the Niagara cryptographic provider (ncp) device driver for public-key operations, and the Niagara-2 cryptographic provider (n2cp) device driver for symmetric cipher and cryptographic hash operations. These device drivers then perform the actual offload to the hardware accelerators and return the results to the framework. The interaction between these drivers and the cryptographic frame is controlled via cryptoadm.

kstat can be used to provide insight into the cryptographic operations that ncp and n2cp are handling, as follows:

kstat -m ncp | less

kstat -m n2cp | less

Additionally, cputrack can be utilized to determine the activity of the hardware accelerators directly (use cputrack -h to determine which counters to track).

Concluding thoughts

Cryptographic processing overheads are finding their way into an ever wider array of applications as security becomes ever more important. By providing on-chip hardware cryptographic accelerators, the UltraSPARC processors can vastly reduce these overheads, and in many situations enable respectable performance even when operating securely.

Via the Cryptographic Framework Solaris provides a simple way via which applications can leverage the benefits of the UltraSPARC hardware accelerators, while continuing to ensure application portability


[1] Using the cryptographic accelerators in the UltraSPARC T1 and T2 processors

[2] PKCS #11: Cryptographic Token Interface Standard

[3] The Solaris cryptographic framework

[4] Miscellaneous OpenSSL Contributions

[5] Sun PKCS#11 Provider's Supported Algorithms

[6] Configuring Solaris Cryptographic Framework and Sun Java System Web Server 7 on Systems With UltraSPARC T1 Processors

Wednesday May 14, 2008

Java and hardware cryptographic acceleration

I've just been experimenting with Java Cryptographic Framework (JCE) on the UltraSPARC T2 processor and it is important to remember which algorithms/modes/padding are supported for offload to the cryptographic hardware. While the UltraSPARC T2 processor supports most common chaining modes, offloads from JCE occur via the SunPKCS11-Solaris provider. The supported algorithms/modes/padding are somewhat more restrictive and are listed here. If a none supported mode is specified, the operation will not be offloaded to the HW, but will be performed in software.

If the SunPKCS11-Solaris provider is explicitly selected:

String provider = "SunPKCS11-Solaris";

Cipher aescipher = Cipher.getInstance("AES/ECB/NoPadding", provider);

then an exception is taken when a non supported mode is requested.

Thursday May 08, 2008

T2 HW crypto and Java

As stated in an earlier entry, when running on an UltraSPARC T2 processor, applications using the Java cryptographic extensions (JCE) should (when applicable) automatically leverage the on-chip cryptographic accelerators.

Following a recent conversation with a Java Guru, you should check the following, if you experience problems:

Java on Solaris automatically sets SunPKCS11-Solaris (which calls into
the Solaris Crypto Framework) as the default security provider, so you
need to do nothing.

This begins from some version of J2SE 5.0. You can go look at the
${java.home}/lib/security/ file. There should be one line
look like:

Interesting article on using AES from Java can be found here

Wednesday Apr 30, 2008

T2 HW crypto and SPECweb2005

I typically witter on about crypto performance at the microbenchmark level, but I was recently browsing the SPECweb05 results and I was impressed to see how the T2 performs, especially on the Banking workload, which is 100% HTTPS:



1 x T2 [1.4GHz]


2 x Quad-core Opteron Processor (2356) [2.3GHz]


2 x Quad-core Xeon Processor X5460 [3.2GHz]


4 x Quad-core Xeon Processor X7350 [3.0GHz]


Intel 2-chip
Intel 4-chip


Pretty Impressive! So a single-socket UltraSPARC T2 processor provides equivalent performance to 4-socket x64 systems containing Quad-core processors! On a per socket basis, T2 outperforms the competition by over 2.7X!

Now, this performance leadership is not all down to the HW crypto support – I'm sure the onchip NICs, and abundance of threads help somewhat too. However, the cryptographic overheads associated with HTTPS are pretty significant – RSA ops for session establishment and then RC4 and MD5 (these are the algorithms used for SPECweb2005 anyway) operations to secure and authenticate the subsequent traffic. In fact, looking at the following figures:

Figure 1: Relative costs in an HTTPS transaction for different file sizes. Referenced from here

Figure 2: Typical breakdown of overheads for SPECweb2005 banking

it is apparent that a significant proportion of the total application-level overheads are associated with cryptographic processing. Its therefore not surprising that providing HW support to accelerate cryptographic processing provides a significant performance advantage to the UltraSPARC T2 processor on SPECweb05 banking...

Its nice to see that the good microbenchmark numbers actually translate into significant gains at an application level....

Wednesday Apr 09, 2008

Crypto acceleration on multi-chip UltraSPARC T2 Plus systems

As I've mentioned in previous entries, Sun's latest UltraSPARC T2 Plus processors, which are launched today, continue to provide hardware acceleration for a wide variety of important cryptographic operations.

Acceleration is provided in an identical manner to the original UltraSPARC T2 processor - each core has its own hardware cryptographic accelerator that provides support for public-key operations (RSA, DSA, DH, ECC), bulk ciphers (RC4, DES, 3DES, AES-{128/192/256}) and secure hashes (MD5, SHA-1, SHA-256). For the bulk ciphers the currently supported chaining modes are ECB, CBC and CFB64 for DES/3DES and ECB, CBC, and CTR for AES.

The Sun SPARC Enterprise T5240 and T5140 Servers both support 2 UltraSPARC T2 Plus processors for a total of up to 16 cryptographic accelerators per system. Access to the accelerators is via the Solaris Cryptographic Framework (either directly, or indirectly via Java, OpenSSL or NSS) and the framework will automatically load balance requests across the 16 accelerators.

For both the T5240 and the T5140 the accelerators provide an aggregate throughput of up to 80Gb/s of AES-128 (enabling wire-speed encryption), and over 70,000 RSA-1024 sign operations/sec. And this performance can be delivered while the processor is largely idle and available for other processing, essentially eliminating the normally significant overheads associated with crypto processing (zero-cost security!).

Thursday Oct 11, 2007

Detailed UltraSPARC T2 RSA performance

Interesting to note that:

1) The UltraSPARC can hit the HW peak accelerator performance with the majority of the threads idle, allowing other useful work to be conducted while the RSA operations are being performed.

UltraSPARC T2 crypto performance outstrips traditional processors

Comparing the crypto performance of a 2-socket quad core against a single socket UltraSPARC T2 processor shows the very significant performance advance this CMT processor has over more traditional processors.


Dr. Spracklen is a senior staff engineer in the Architecture Technology Group (Sun Microelectronics), that is focused on architecting and modeling next-generation SPARC processors. His current focus is hardware accelerators.


Top Tags
« June 2016