RSA Performance of Sun Fire T2000
By chichang_lin on Dec 06, 2005
You might have heard that UltraSPARC T1 has special hardware circuitry to accelerate certain crypto operations. In this blog I will show you what operations it is good at, and how good.
UltraSPARC T1 comes with Modular Arithmetic Unit (MAU) per core which can accelerate expensive modular arithmetic operations found in public key crypto algorithms such as RSA, DSA and DH. In Solaris, the utilization of MAU has to go through Niagara Cryptographic Provider (NCP) within Solaris Cryptographic Framework (SCF). Currently only RSA (up to 2048 bit) and DSA (up to 1024 bit) are supported by NCP.
On the Sun Fire T2000/T1000 with the UltraSPARC T1 processor, you can readily get a glimpse of the fast RSA operations performed by MAU. Here is an example for the popular 1024-bit and 2048-bit RSA on a Sun Fire T2000 with 1.2 GHz UltraSPARC T1 with 8 cores:
[watercloset]~> /usr/sfw/bin/openssl speed rsa1024 rsa2048 -engine pkcs11 engine "pkcs11" set. Doing 1024 bit private rsa's for 10s: 10332 1024 bit private RSA's in 0.45s Doing 1024 bit public rsa's for 10s: 25550 1024 bit public RSA's in 0.89s Doing 2048 bit private rsa's for 10s: 2371 2048 bit private RSA's in 0.11s Doing 2048 bit public rsa's for 10s: 10308 2048 bit public RSA's in 0.37s OpenSSL 0.9.7d 17 Mar 2004 built on: date not available options:bn(64,32) md2(int) rc4(ptr,char) des(ptr,risc1,16,long) aes(partial) blowfish(ptr) compiler: information not available available timing options: TIMES TIMEB HZ=100 [sysconf value] timing function used: times sign verify sign/s verify/s rsa 1024 bits 0.0000s 0.0000s 22960.0 28707.9 rsa 2048 bits 0.0000s 0.0000s 21554.5 27859.5This invokes the OpenSSL speed test bundled with Solaris. The OpenSSL bundled with Solaris has PKCS#11 engine built-in which is necessary to access SCF (and thus MAU); if you download OpenSSL package and build it yourself, you will not be able to take advantage of MAU because it does not have PKCS#11 engine. Let's examine the performance numbers above. What we just did was to test the single-threaded RSA performance. Each RSA operation is run for 10 seconds. However, due to the timing errors in OpenSSL speed test in the single-threaded case, the throughput numbers at the bottom cannot be trusted when the operations are done in hardware. After some re-calculations we get:
sign verify sign/s verify/s rsa 1024 bits 0.0000s 0.0000s 1033.2 2550.0 rsa 2048 bits 0.0000s 0.0000s 237.1 1030.8
Are these numbers good? They are actually very good. Take 1024-bit RSA sign operation number, 1033.2, and compare it with the number on 3.6 GHz Xeon Dell box - 843.0. UltraSPARC T1 offers 20% more RSA performance at 1/3 clock rate and uses less power. Note that this is single-threaded test. As shown below, UltraSPARC T1 really dwarfs others in the multi-process test.
Now, let's look at multi-process RSA performance. This is where UltraSPARC T1 really shines. Do an OpenSSL speed test again, this time with the "-multi" option to invoke multiple processes to conduct RSA operations concurrently:
[watercloset]~> /usr/sfw/bin/openssl speed rsa1024 rsa2048 -engine pkcs11 -multi 32 [ intermediate output snipped..... ] sign verify sign/s verify/s rsa 1024 bits 0.0001s 0.0000s 12871.3 45148.1 rsa 2048 bits 0.0004s 0.0000s 2299.6 20425.3
We have used 32 processes to fully saturate the 32 hardware threads on UltraSPARC T1 to get the maximum throughput. Compare this with the results on the 2-way 3.6 GHz Xeon Dell PowerEdge 2850 (with hyperthreading on):
wgs93-187:~ openssl speed rsa1024 rsa2048 -multi 4 [ intermediate output snipped..... ] sign verify sign/s verify/s rsa 1024 bits 0.0005s 0.0000s 1943.2 34632.7 rsa 2048 bits 0.0031s 0.0001s 327.4 10891.9
For 1024-bit RSA sign operation (as commonly used in web server SSL handshaking), Sun Fire T2000 outperforms Dell PowerEdge 2850 by a whopping 6x! UltraSPARC T1 also excels when compared with the Sun Crypto Accelerator 4000, which can do 8000 1024-RSA signs/s. And remember, all this comes with just the Sun Fire T2000/T1000 box, no extra crypto accelerator card is needed.
In summary, if RSA/DSA operations consume a certain amount of CPU cycles in your application (e.g. HTTPS), Sun Fire T2000/T1000 with UltraSPARC T1 will offer you the biggest bang for the bucks with its per-core MAU and unique 8-core CMT architecture.