Monday Oct 03, 2016

Amdahl's Law In Reverse Mode

This blog is about using Amdahl's Law in a somewhat unconventional way. Typically this law is used to estimate the parallel performance using the measured execution time on a single thread and the time on "p" threads, with p > 1. In most cases, p = 2.

Here we derive the formulas to estimate the execution time on a lower number of threads, assuming there are two measured execution times using "p" and "q" threads. We call this reverse mode. 

Through a real-world example it is shown how this can be used in practice. In this particular case it is used to estimate the single thread time. 

If you like to learn more why we did this and the formulas can be used, click on [Read More] below! 


[Read More]

Monday Feb 16, 2009

The seminar on "Combinatorial Scientific Computing" at Schloß Dagstuhl in Germany

This seminar was held February 1-6, 2009, at Schloß Dagstuhl in Wadern, Germany. The Dagstuhl seminars are small scale events. Attendance is by invitation only. The goal is to not only exchange information, but also to encourage discussions and to get to know the attendees better. This approach worked out really well. Below my impressions on the scientific aspects of this event. 

All the seminar information can be found at the workshop web site. This is also where you can find all the presentation material. I've posted my slides, but will also write an extended abstract for the proceedings. This will be more like a short paper.

The two major new things I learned at this event were in how many areas combinatorial analysis is used, and that many of the algorithms are characterized by random memory access on large data sets.

Regarding the former, I was for example surprised to hear that the analysis of social networks boils down to a combinatorial problem. When you think about it though, there is a natural link between these two. A new aspect is however that these networks, like LinkedIn or Hives, are so huge. Nobody really knows what they look like, and a deeper analysis of their structure can be revealing. 

The computational aspects are quite interesting and challenging. In particular, traditional cache based architectures do not perform very well at all, due to the irregular memory access patterns, combined with the ever growing size of the data set. For the same reason, it is also a challenge for a cc-NUMA architecture to perform well.

Instead, heavily threaded architectures using latency hiding techniques shine on these kind of applications. Even an old system like the Tera MTA performs relatively well, despite its low clock rate. Several presenters reported excellent results on Niagara 2 and Victoria Falls based systems. For more details I can highly recommend the talk given by Prof. David Bader from the Georgia Institute of Technology. The slides can be found here

 

About

Picture of Ruud

Ruud van der Pas is a Senior Principal Software Engineer in the SPARC Microelectronics organization at Oracle. His focus is on application performance, both for single threaded, as well as for multi-threaded programs. He is also co-author on the book Using OpenMP

Cover of the Using OpenMP book

Search


Categories
Archives
« March 2017
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today