Visualising Performance


Visualising Performance

There are several things that interest me. Filesystem and datapath software design is one. Computer performance is another; particularly datapath performance of course but also the whole stack. Open Source software for helping in improving performance; load generators, probes and monitors, mathematical and graphical software for doing such things as statistical manipulation, implementing queuing theory and simulation; that sort of thing. I'm not alone here. Richard Cockroft, author of perhaps the primary source on Solaris performance has blogged on this topic.

What do I mean by visualising performance? Well, look at the following table, extracted from the Lustre Wiki - data gleaned from a netperf benchmark of 10 gigabit ethernet interfaces, increasing the payload size and the size of the socket buffer:

MBytes/s

Socket Buffer Size

Send Size

128K

256K

512K

1M

2M

4M

8M

16M

8K

212.79

260.79

273.72

314.31

362.51

349.24

358.81

376.20

16K

218.68

259.10

273.53

314.24

362.34

348.82

358.39

376.09

32K

213.63

260.07

273.29

329.90

362.17

349.00

358.63

376.01

64K

221.17

263.98

273.31

316.10

361.51

348.74

358.11

375.91

128K

224.50

266.42

273.96

313.34

362.08

348.88

358.20

376.39

256K

221.97

260.96

275.27

290.05

361.51

348.68

357.97

376.48

512K

222.43

265.68

274.28

289.10

361.28

348.95

358.14

376.37

1M

226.24

266.02

275.66

295.67

361.64

348.70

357.93

376.71

This is a common enough scenario. There is one dependent variable; the throughput of the connection. There are two independent variables - the size of the socket buffer and the size of the request. I had to look at that table for quite a while before I could see the result - the relationship. This is very common in benchmarking. Often, only two causal factors would be considered to be on the light side; the mount parameters for a filesystem can run to a dozen or more.

OK, so this example is not one that is going to set the world alight but its in the public domain, which helps. I have to get drunk with people who, in terms of scientific visualisation, have bigger fish to fry. But these days we (Sun) have bigger fish on the chopping board - especially petabyte storage and grids; both of the compute and storage varieties. You canot build these things in the lab on a whim; you have to model and modelling means visualisation.

I found this graph more intuitive:

require(lattice)

g_data <- read.table
   (fileName <- choose.files("\*.csv"), header=T)

print(wireframe(g_data$mbs ~ g_data$soc_buf 
   \* g_data$send_kb,
	zlab="Mb/s" ,
	ylab="Send size (Kb)" ,
	xlab="Socket buffer size (Kb)" ,
	drape = TRUE, 
	colorkey = TRUE
	) )

The code to the above is for the The R Package, a free software environment for statistical computing and graphics, more of which below. I think the key message is "This is not a lot of code" (to 'fess up, I did have to deprocess the pretty printed table back to CSV). So this more or less tells us that one of the variables has little effect. But we can do better than this:

g_data <- read.table(fileName <- choose.files("\*.csv"), header=T)
print(splom( ~ g_data)

This gives us a scatterplot matrix. In two lines of code we can compare the relationship between every variable in the test and the relationships leap from the page. In our case there are only three dimensions but trellis graphics (in S-Plus, the commercial version) or lattice graphics (in R) allow us several graphical methods to visually explore our data.

What does it tell us? That after a certain point, increasing the size of the buffer provides no further boost in throughput. This is important as kernel memory is a finite resource.

Then its just a matter of drilling down for the "management summary" (But 'fessing up again, I am daintily sidestepping the thorny topic of non-linear regression analysis. Another day.):

xyplot(mbs ~ soc_buf_kb , 
	aspect = "xy", 
	ylab = "Mb/s" , 
	xlab = "Socket buffer size (Kb)")

So then. The my elevator pitch for R.

  • Its free (as in speech, not beer, yadda yadda). There is good community around it.
  • It has vector maths and matrices built in so no more loops, nested loops, nested nested...[repeat 'till fade].
  • All the regression, correlation, smoothing, modelling mathmatical grind and all the presentation graphics have already been attended to.
  • It interfaces to Java (and C, and [your language shared library of choice here]).
  • It is object orientated which is handy for someone who wants to represent e.g a storage array or compute node both as a piece of graphics (icon, connectors, etc) and as a chunk of maths.
  • Its home from home for those that like a command line environment. Intractable (write-only) code to rival Perl can be written if one leans to the beard-stroking, sandal wearing edge of the technology community.
  • It incorporates the TCL/Tk libraries so you can write fully formed standalone GUI applications in it.

When all is said and done, its really good for performance & capacity planning "exploration"; later on I'll measure an elephant for you in pretty quick time in R.

So endeth my first blog; respect and gratitude to David Levy for requisite motivational arse kicking and Simon Dachtler for finding time to produce my banner graphic while still keeping the Far-East manufacturing economy ticking over.


Comments:

Always nice to see netperf mentioned :)

Posted by rick jones on August 24, 2005 at 07:01 PM BST #

http://blogs.sun.com/eclectic/

Posted by Wayne Horkan on October 02, 2007 at 06:49 PM BST #

i think you have a intresting website it is very complicated it is me rosie your friend simon dachtler i am his daughter rosie you know me and nice web site

Posted by rosie dachtler on December 23, 2008 at 09:24 AM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

dom

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today