High-performance SHA-1

In my recent CommunityOne Microparallelism presentation, one of the cases studies discusses how to convert high ILP code on superscalar processors into the TLP implementations on CMT processors. The case study is discussed with reference to the SPARC implementation of SHA-1, which I wrote several years ago. The code, tuned for sun4u processors, can actually be found in OpenSolaris here. The message expansion portion of the SHA-1 computation is performed in parallel with the compression function portion using the VIS instructions. The SIMD nature of the VIS instructions is not leveraged, merely the fact that they allow integer operations to be performed on the FP pipelines. As a result, the IPC on a UltraSPARC IV+ processor is increased from around 2 to almost 4 -- improving performance by over 1.7X...


On CMT processors, such as T2, this doesn't deliver optimal performance. However, given the low inter-thread synchronization costs, one can consider performing these two portions of the SHA-1 computation using two threads:

Comments:

Post a Comment:
Comments are closed for this entry.
About

Dr. Spracklen is a senior staff engineer in the Architecture Technology Group (Sun Microelectronics), that is focused on architecting and modeling next-generation SPARC processors. His current focus is hardware accelerators.

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today