Wednesday Aug 26, 2009

Hot Chips -- next-generation crypto accelerator

My Hot Chips presentation on the next-generation UltraSPARC security accelerator can be found here.

Thursday Jun 04, 2009

Improved crypto scaling on T2+

Some great work by Krishan Yenduri has led to nice improvements in the multi-socket bulk cipher performance on UltraSPARC T2+ processors. The improvements are available in the current build, snv_117. Krishna has performance data for scaling on a 4-socket T5440 system in his recent blog. Using the same kernel umicrobenchmark, the following plot shows the scaling on a dual-socket UltraSPARC T2 Plus system:

In this test, the requesting threads are scheduled by Solaris (rather than bound to specific cores), so Solaris will tend to even distribute the threads across the 16 cores in the system – this explains by you get this rapid increase in aggregate cryptographic throughout as the number of threads is increased. If the first 8-threads where bound to core 0, the second 8 to core 1 and so on, the scaling would be essentially linear as the cores are added.

So, a 2-socket T2+ system is delivering around 9GBytes/second. Not bad, given most other dual-socket systems can deliver at max around 2GB/s. Further, from the above it is apparent that we hit 9GB/s on the T2 system with less than 50% of the HW strands being utilized.

Monday Jun 01, 2009

Hot Chips 21 -- Sun's next-generation HW security accelerator

I'll be presenting Sun's next-generation on-chip UltraSPARC security accelerator at this year's Hot Chips. The preliminary program can be found here.

T2 acceleration of encrypt()

Following from my recent post mentioning the acceleration of encrypt/decrypt and OpenSSL enc using the T2 crypto HW (here) I went and did some basic tests to see what kind of uptick was achieved:

Large file processing. File in /tmp

(1) openssl perf test (SW crypto)

timex /usr/sfw/bin/openssl enc -aes-128-cbc -k testpass -in /tmp/ -out /tmp/

(2) openssl perf test (HW crypto)

timex /usr/sfw/bin/openssl enc -aes-128-cbc -k testpass -engine pkcs11 -in /tmp/ -out /tmp/

(3) encrypt perf test (HW crypto)

timex encrypt -a aes -i /tmp/ -o /tmp/

Comparing (1) versus (2) I saw about a 4X improvement in performance when I started using the T2 HW crypto. With (1) versus (3) I saw a 2.5X improvement. So a fairly decent performance improvement! I looked into why encrypt is currently being outperformed by OpenSSL and it looks like it is due to buffer sizing – OpenSSL is using a buffer that is 2X larger than is being used by encrypt to read(), encrypt and write() the file data. I modified encrypt to use a 64KB buffer size and saw encrypt performance improvement over (1) increase to over 7X.

So, it looks like you can get get a serious performance from the HW crypto when encrypting large files like ZFS snapshots. In fact, for the above experiment just doing a simple “cp /tmp/ /tmp/” is less than 2X faster than using the enhanced version of encrypt to perform AES-128-CBC encryption of the data too.

Thursday May 28, 2009

Securing data in the cloud

An interesting project to backup and securely store ZFS snapshots to the cloud can be found here. This is a great opportunity for the UltraSPARC T2 cryptographic hardware accelerators that can be used to significantly accelerate the process of encrypting the ZFS snapshot. The shell script for automating the process uses the encrypt function that will automatically use the UltraSPARC T2 cryptographic accelerators.

Thursday May 14, 2009

New security book

Interesting book on Solaris security can be found here. According to the blurb, it covers "the main security features in the Solaris operating system, including roles and privileges, cryptographic services, network security, auditing, and Solaris Trusted Extensions".

Tuesday Feb 17, 2009

T2 crypto paper

A paper on the UltraSPARC T2 crypto hardware and the Solaris cryptographic framework will be presented at the upcoming International Workshop on Multicore Software Engineering. Details on the workshop can be found here.

Wednesday Dec 10, 2008

T2 IPsec & crypto_taskq_threads

When running IPsec on the UltraSPARC T2, performance can frequently be improved by increasing the number of worker threads provided by the Solaris kernel crypto framework. The number of worker threads is controlled by the crypto_taskq_threads variable. This can be set in /etc/system or altered using mdb (n2cp should be unloaded and reloaded after changing via mdb).

UltraSPARC T2 IPsec performance when using the HW crypto accelerators is, not surprisingly, pretty impressive -- especially, if you use jumbo frames.

Tuesday Dec 09, 2008

Encryption please...

Yesterday's NPR on-point program discussed security -- it can be found here. Mostly obvious stuff, but good to see some of these issues/problems getting air time.

Wednesday Nov 19, 2008

async crypto performance from userland

Support for async crypto operations is not provided via the userland cryptographic framework. However, it is pretty simple to create a simple driver that can be used by a userland app to gain access to the kernel cryptographic framework and async support. Performance is pretty good -- if you look at how the framework is implemented, requests to the hardware are passed down to the kernel framework via /dev/crypto anyway. You could probably talk to /dev/crypto directly -- looking here -- but there are also plenty of simple of driver examples on that can be easily enhanced to provide this functionality.

Tuesday Nov 18, 2008


I've been investigating IPsec performance on the UltraSPARC T2 and have found uperf (which can be found here) to be very helpful -- especially for multi-threaded stress testing. Currently, I've got two T5220 systems connected directly by 10GbE and I'm investigating peak IPsec performance...

Improved T2 single-thread crypto via async operations

I've been comparing the sync and async APIs to the kernel cryptographic framework and if you are interested in improving single-thread crypto performance on the UltraSPARC T2, async can be interesting:

8KB objects, crypto_encrypt_mac() operations (3DES, MD5)

# operations

Performance improvement (async perf / sync perf)









So, if you have the opportunity to handle multiple outstanding crypto operations per thread, using the async API is a good way to go, potentially improving crypto performance by over 4.7X. If you only have one outstanding request per thread, then sync delivers better performance, because there are no Solaris interrupt overheads.

Monday Nov 17, 2008

Peak performance with AES counter mode -- (looking at libstrp)

Playing with libsrtp recently and just experimenting with enhancing the library to use the T2 HW crypto. Generally, strp uses AES counter mode. Looking at the libstrp code, there is a keystream_buffer buffer which is XORed with the packet stream. Once the keybuffer is emptied it is refilled. Currently, the buffer is 128-bits i.e. 1 block. This approach is not too ineffecient when performing AES in SW, but will lead to suboptimal performance when using crypto HW. There are typically some SW overheads associated with accessing the crypto hardware, and so performance generally increases with the size of the object being processed. Accordingly, in libsrtp it is preferable if the keystream_buffer is increased considerably in size (e.g. 8KB) and refills are performed much less frequently.

Monday Sep 22, 2008

OpenSSH & T2 (contd)

Following from the recent post discussing modifying OpenSSL to enable OpenSSH to take advantage of the UltraSPARC T2 crypto accelerators, I should also mention that it is possible to just use the PKCS11 engine modified OpenSSL that Sun provides. You should use the –with-ssl-engine when you configure OpenSSH. Further, it may just be my mistake, but I am having problems getting OpenSSH to use the PKCS11 engine unless I modify openssl-compat.c. In the unmodified code, ssh_SSLeay_add_all_algorithms() does:

/\* Enable use of crypto hardware \*/

I changed this to:

ENGINE \*pkcengine;
/\* Enable use of crypto hardware \*/
pkcengine = ENGINE_by_id("pkcs11");

and things started working fine. I need to find some cycles to go back I see if I had things misconfigured.

Thursday Sep 18, 2008

OpenSSH and T2

Following from the last entry about recent enhancements to SunSSH to enable it to take advantage of the UltraSPARC T2 cryptographic accelerators, for those who use OpenSSH, its also possible to leverage the T2 cryptographic accelerators. One simple way to achieve this, without modifying OpenSSH itself, is just to use a version of OpenSSL that has been modified to take advantage of the HW crypto; for the standard aes-128-cbc operating mode, the simplest way to achieve this is to modify aes_cbc.c to call libpkcs11. Its about a 10 line modification and can be applied to any version of OpenSSL. I will post the required code later today here.


Dr. Spracklen is a senior staff engineer in the Architecture Technology Group (Sun Microelectronics), that is focused on architecting and modeling next-generation SPARC processors. His current focus is hardware accelerators.


Top Tags
« April 2014