Wednesday Aug 26, 2009

Hot Chips -- next-generation crypto accelerator

My Hot Chips presentation on the next-generation UltraSPARC security accelerator can be found here.

Thursday Jun 04, 2009

Improved crypto scaling on T2+

Some great work by Krishan Yenduri has led to nice improvements in the multi-socket bulk cipher performance on UltraSPARC T2+ processors. The improvements are available in the current build, snv_117. Krishna has performance data for scaling on a 4-socket T5440 system in his recent blog. Using the same kernel umicrobenchmark, the following plot shows the scaling on a dual-socket UltraSPARC T2 Plus system:

In this test, the requesting threads are scheduled by Solaris (rather than bound to specific cores), so Solaris will tend to even distribute the threads across the 16 cores in the system – this explains by you get this rapid increase in aggregate cryptographic throughout as the number of threads is increased. If the first 8-threads where bound to core 0, the second 8 to core 1 and so on, the scaling would be essentially linear as the cores are added.

So, a 2-socket T2+ system is delivering around 9GBytes/second. Not bad, given most other dual-socket systems can deliver at max around 2GB/s. Further, from the above it is apparent that we hit 9GB/s on the T2 system with less than 50% of the HW strands being utilized.

Monday Jun 01, 2009

Hot Chips 21 -- Sun's next-generation HW security accelerator

I'll be presenting Sun's next-generation on-chip UltraSPARC security accelerator at this year's Hot Chips. The preliminary program can be found here.

T2 acceleration of encrypt()

Following from my recent post mentioning the acceleration of encrypt/decrypt and OpenSSL enc using the T2 crypto HW (here) I went and did some basic tests to see what kind of uptick was achieved:

Large file processing. File in /tmp

(1) openssl perf test (SW crypto)

timex /usr/sfw/bin/openssl enc -aes-128-cbc -k testpass -in /tmp/ -out /tmp/

(2) openssl perf test (HW crypto)

timex /usr/sfw/bin/openssl enc -aes-128-cbc -k testpass -engine pkcs11 -in /tmp/ -out /tmp/

(3) encrypt perf test (HW crypto)

timex encrypt -a aes -i /tmp/ -o /tmp/

Comparing (1) versus (2) I saw about a 4X improvement in performance when I started using the T2 HW crypto. With (1) versus (3) I saw a 2.5X improvement. So a fairly decent performance improvement! I looked into why encrypt is currently being outperformed by OpenSSL and it looks like it is due to buffer sizing – OpenSSL is using a buffer that is 2X larger than is being used by encrypt to read(), encrypt and write() the file data. I modified encrypt to use a 64KB buffer size and saw encrypt performance improvement over (1) increase to over 7X.

So, it looks like you can get get a serious performance from the HW crypto when encrypting large files like ZFS snapshots. In fact, for the above experiment just doing a simple “cp /tmp/ /tmp/” is less than 2X faster than using the enhanced version of encrypt to perform AES-128-CBC encryption of the data too.

Thursday May 28, 2009

Offchip bandwidth enhancement using compression

My slides on using light-weight compression to enhance available offchip bandwidth on future processors can now be found here. As way of an introduction, we found that light-weight compression schemes can improve the effective offchip bandwidth by over 3X on a wide variety of important workloads.

Securing data in the cloud

An interesting project to backup and securely store ZFS snapshots to the cloud can be found here. This is a great opportunity for the UltraSPARC T2 cryptographic hardware accelerators that can be used to significantly accelerate the process of encrypting the ZFS snapshot. The shell script for automating the process uses the encrypt function that will automatically use the UltraSPARC T2 cryptographic accelerators.

Thursday May 14, 2009

New security book

Interesting book on Solaris security can be found here. According to the blurb, it covers "the main security features in the Solaris operating system, including roles and privileges, cryptographic services, network security, auditing, and Solaris Trusted Extensions".

Thursday Mar 19, 2009

Interesting collation of stuff

Interesting new book, the Developers Edge, brings together a good collection of technical articles harvested from the Sun Blogosphre. Naturally, there is some info included on T2 crypto. The book can be found here.

Monday Mar 16, 2009

Offchip bandwidth and compression presentation

I'm presenting on leveraging compression to increase the effective offchip bandwidth of multicore processors at this weeks Multicore Expo in Santa Clara. Details here.

Tuesday Feb 17, 2009

T2 crypto paper

A paper on the UltraSPARC T2 crypto hardware and the Solaris cryptographic framework will be presented at the upcoming International Workshop on Multicore Software Engineering. Details on the workshop can be found here.

Wednesday Dec 10, 2008

T2 IPsec & crypto_taskq_threads

When running IPsec on the UltraSPARC T2, performance can frequently be improved by increasing the number of worker threads provided by the Solaris kernel crypto framework. The number of worker threads is controlled by the crypto_taskq_threads variable. This can be set in /etc/system or altered using mdb (n2cp should be unloaded and reloaded after changing via mdb).

UltraSPARC T2 IPsec performance when using the HW crypto accelerators is, not surprisingly, pretty impressive -- especially, if you use jumbo frames.

Tuesday Dec 09, 2008

Encryption please...

Yesterday's NPR on-point program discussed security -- it can be found here. Mostly obvious stuff, but good to see some of these issues/problems getting air time.

Wednesday Nov 19, 2008

async crypto performance from userland

Support for async crypto operations is not provided via the userland cryptographic framework. However, it is pretty simple to create a simple driver that can be used by a userland app to gain access to the kernel cryptographic framework and async support. Performance is pretty good -- if you look at how the framework is implemented, requests to the hardware are passed down to the kernel framework via /dev/crypto anyway. You could probably talk to /dev/crypto directly -- looking here -- but there are also plenty of simple of driver examples on that can be easily enhanced to provide this functionality.

Tuesday Nov 18, 2008


I've been investigating IPsec performance on the UltraSPARC T2 and have found uperf (which can be found here) to be very helpful -- especially for multi-threaded stress testing. Currently, I've got two T5220 systems connected directly by 10GbE and I'm investigating peak IPsec performance...

Improved T2 single-thread crypto via async operations

I've been comparing the sync and async APIs to the kernel cryptographic framework and if you are interested in improving single-thread crypto performance on the UltraSPARC T2, async can be interesting:

8KB objects, crypto_encrypt_mac() operations (3DES, MD5)

# operations

Performance improvement (async perf / sync perf)









So, if you have the opportunity to handle multiple outstanding crypto operations per thread, using the async API is a good way to go, potentially improving crypto performance by over 4.7X. If you only have one outstanding request per thread, then sync delivers better performance, because there are no Solaris interrupt overheads.


Dr. Spracklen is a senior staff engineer in the Architecture Technology Group (Sun Microelectronics), that is focused on architecting and modeling next-generation SPARC processors. His current focus is hardware accelerators.


Top Tags
« August 2016