Wednesday Sep 30, 2009

Harman turns fire on Sun decision

Nice headline, but just to clarify, it's not this Harman or that Sun :)

Monday Sep 28, 2009

So long, and thanks for all the fish!

Tomorrow I will lose update access to this site, so I've copied all my content to http://harpeople.com/mt, where my blog will live on. Please drop by when you get the chance.

I am now available for hire (all interesting fulltime, contracting or consulting opportunities will be considered). Further details and references are available at http://tinyurl.com/pgdh-linkedin. My obfuscated private email address is phil dot harman at gmail dot com.

Thankyou for you interest,

Phil

p.s. I have turned comments off across the blog, because I will be unable to moderate them.

Tuesday Sep 08, 2009

Low latency computing with Solaris and DTrace

Over the past couple of years I've helped a number of financial institutions identify and eliminate or significantly reduce sources of latency and jitter in their systems. In the city there's currently something akin to an arms race, as banks seek to gain a competitive edge by shaving microseconds off transaction times. It's all about beating the competition to make the right trade at the right price. A saving of one millisecond can be worth millions in profit. And the regulatory bodies need to be able to keep up with this new low latency world too.

Code path length is only part of the picture (although an important one). However, processor architectures with challenging single thread performance (such as Sun's T-series systems) are still able to offer competitive advantage in scenarios where quick access to CPU resource is a bigger factor. Your mileage will vary.

When it comes to jitter I've seen a fair amount of naivety. Just because I have 32 cores in a system doesn't mean I won't see issues such as preemption, interrupt pinning, thundering herds and thread migration. Thankfully, DTrace provides the kind of observability I need to identify and quantify such issues, and Solaris often has the features needed to ameliorate them, often without the need to change application code.

I generally find that there is a lot of "low hanging fruit", and am often able to demonstrate a dramatic reduction in jitter and absolute latency in a short amount of time. You may have seen some pretty big claims for DTrace, but in my experience it is hard to over-hype what can be achieved. It's not just about shaving milliseconds of transaction times, but about reducing the amount of hardware that needs to be thrown at the problem.

DTrace for dummies - Complexity

DTrace is many things to many people. To me it is a tool for engaging with complexity. Sure there's an important place for the DTrace Toolkit, advanced OpenStorage analytics, Chime and other wonderful technologies built on DTrace (most of which don't even come close to exposing the user to the more low-level cranium challenging detail), but for me DTrace remains "The One True Tool" (as slashdot reviewer) and the means by which I can ask an arbitrary question and get an instant answer.

When presenting DTrace to a new audience, I see my primary goal as creating desire. Nothing worth having comes easily. Getting to grips with DTrace involves a steep learning curve. Before exposing candidates to potentially overwhelming detail, I need to show them why the gain is going to be worth the pain. It's also useful to underline some seeds of self doubt and insecurity, to establish my authority as the teacher they can trust. So I generally start by talking about complexity.

All I'm going to blog here is one of my favourite complexity stories. It is best done live, with lots of stuff scrolling up a green screen, and plenty of theatrical flare. However, for the purpose of this post I've done the UNIX thing and used a pipe into the wc(1) command. I'm sorry if it loses something in the telling, but the base data is still interesting.

I usually start by talking about how complexity has increased during my time at Sun. In the good old days when we all programmed in C it was possible for one person to have a handle on the whole system. But today's world is very different. In a bid to connect with the old timers, we start talking about "Hello World!". I then show how good the truss(1) utility is at exposing some of the implementation detail.

We then move on to a Java implementation. The code looks similar, and it is functionally equivalent. Although both the C and Java versions complete in far less then a second, even the casual observer can see that the Java variant is slower. I then start digging deeper with truss(1). First, we compare just the number of system calls, then the number of inter-library function calls, the lastly, the number of intra-library function calls.

This post is really just the raw data, simply to underline the point that todays software environments are a lot more complex than we often give them credit for; and secondly, that we need a new generation of tools to engage with this level of complexity. For added fun, I've added Perl and Python data to the mix. Enjoy!

The Code

opensolaris$ head -10 hello.c hello.pl hello.py hello.java
==> hello.c <==
#include 

int
main(int argc, char \*argv[])
{
	(void) printf("Hello World!\\n");
}

==> hello.pl <==
#!/usr/bin/perl

print "Hello World!\\n";

==> hello.py <==
#!/usr/bin/python

print "Hello World!"

==> hello.java <==
public class hello {
	public static void main(String args[]) {
		System.out.println("Hello World!");
	}
}

It works!

opensolaris$ ./hello
Hello World!
opensolaris$ ./hello.pl
Hello World!
opensolaris$ ./hello.py
Hello World!
opensolaris$ java hello 
Hello World!

Sycalls

opensolaris$ truss ./hello 2>&1 | wc -l   
33
opensolaris$ truss ./hello.pl 2>&1 | wc -l
118
opensolaris$ truss ./hello.py 2>&1 | wc -l
660
opensolaris$ truss java hello 2>&1 | wc -l     
2209

Inter-library calls

opensolaris$ truss -t!all -u : ./hello 2>&1 | wc -l
9
opensolaris$ truss -t!all -u : ./hello.pl 2>&1 | wc -l
232
opensolaris$ truss -t!all -u : ./hello.py 2>&1 | wc -l
31578
opensolaris$ truss -t!all -u : java hello 2>&1 | wc -l     
12055
Note: these numbers need to be divided by two (see the raw output for why).

Intra-library calls

opensolaris$ truss -t!all -u :: ./hello 2>&1 | wc -l    
329
opensolaris$ truss -t!all -u :: ./hello.pl 2>&1 | wc -l 
10337
opensolaris$ truss -t!all -u :: ./hello.py 2>&1 | wc -l
548908
opensolaris$ truss -t!all -u :: java hello 2>&1 | wc -l    
4142645
Note: these numbers also need to be divided by two (see above).

Context

opensolaris$ uname -a
SunOS opensolaris 5.11 snv_111b i86pc i386 i86pc Solaris

Conclusion

Of course the above gives no indication of how long each experiment took. Yes, I could have wrapped the experiment with ptime(1), but I'll leave that as an exercise for the reader. When I use this illustration with a live audience, it's generally sufficient to allow the longest case to continue to scroll up the screen for the rest of the presentation.

At this point, I generally move on. Usually, I say some kind words about high level languages, abstraction, code reuse etc. I am not out to knock Java. That's not the point. The point is complexity. I then move on to how DTrace can help us to engage with complexity. I'd do that here, but I hope that I'll continue to be asked to speak on the subject, and I don't want to give it all away here, just now.

Monday Sep 07, 2009

About

My employement at Sun will end on September 30th, 2009. This was my choice (I was made an offer I simply couldn't refuse). I am currently exploring future employment options, and am open to offers and suggestions. I see this and recent posts to my blog as a legitimate way to "set out my stall". The remainder of this post is background copied from materials supporting my recent promotion...

Who

Hi, my name is Phil Harman and I'm a Senior Staff Engineer and Principal Field Technologist (PFT) attached to the Systems Practice in Sun UK. I joined Sun in February 1989, and have been an OS Ambassador for most of the intervening years. When I joined the Systems Practice in November 2007, I also became a Technical Systems (TS) Ambassador.

Solaris, holistic systems performance and extreme multithreading are my long term interests and areas of expertise. After about 5 years in the UK Performance Centre, and a spell in the Products and Technologies Specialists Group (a forerunner of the Systems Practice), I spent four years in Performance and Availability Engineering (PAE), before moving on to the Solaris Kernel Performance Group.

It's official: I'm an inventor, and I have the patent to prove it! I'm also co-architect of the OpenSolaris project libMicro, and consequently became originator of the slogan: "If Linux is faster, it's a Solaris bug!".

How

I have a reputation for being a passionate evangelist for Sun technology. I joined the company because I was nuts about UNIX and impressed with SPARC (I still am both). I hate FUD and shallow "me too" marketing. I love the moral high ground, and believe our customers deserve the truth, not wishful thinking. I believe Sun can and does make a difference, and that our disruptive technologies delivers real business value (in 20 years I have been involved in many such examples).

My holistic approach to systems performance means that I am also a "people person". I detest dehumanising business practices. "I am not a number", so if you [just] "want information", "you won't get it!". I like to get up close and personal with my customers, to see the whites in their eyes, because like House MD, I know that "people lie".

Where

I live and work in North Wales, in the UK. Over the past few years I seem to have spent more time in Menlo Park, California offices than in my designated office in Sale, Cheshire. When I'm not working from home, I am often on site with customers (generally in London ... it's only a 2.5 hour journey, and I can work on the train).

If you need to contact me Namefinder is your friend (if you're a Sun employee, but only until the end of this month). Email is generally best, but in emergencies I'm usually near my mobile phone. My private email address is phil.harman-AT-gmail-DOT-com.

What

I talk a lot. If you need a Sun technology pitch, then I'm your man! I'm most at home with Solaris, SPARC, CMT, holistic systems performance and multithreading, but I take an active interest in other areas of Sun innovation. However, I prefer to speak about what I know in depth because I think people want to hear from speakers with authority, integrity and passion. I don't take myself too seriously, but I will only use my own slide deck, thankyou!

I take on a lot of performance work. This could be a proof of concept, or perhaps rolling up my sleeves to deal with some melt-down escalation or other. I am very data driven, and have many tricks up my sleeve for obtaining it by hook or by crook. In my various roles in the field and in engineering, on job rotations and through the ambassador programme, as an international conference speaker and with many customers internaqtionally, I have built a broad network of useful contacts among whom I often function a rabble rouser or dating agency.

Until recently I was spending a lot of time (generally oncee per week, for the last couple of years) explaining the CMT value proposition to customers who didn't quite get it, sometimes with major repercussions. Within the last year I have spent quite a lot of time (sic) reducing latency and jitter in realtime trading systems, and inincreasing throughput on large backend systems.

As part of the UK Systems Practice, my primary focus was on leveraging systems sales. As such I tended not to take on chargeable work (the ROI didn't add up, and I'd rather be out selling Sun technology elsewhere). However, I did see myself as an SMI citizen first and foremost (I grew up during the golden days of Sun's "can do" ethos, in McNeally's "to ask permission is to seek denial" culture).

Thursday Sep 03, 2009

Going out on a high note

Yes, I know it's hardly fair (posting so many entries in such a short time), but I don't think I've ever made it into the "Popular Blogs" roll of honour before. Lasy time I looked I was 13th, with gazillions of hits. How nice to go out on a high note!

Twenty years at Sun


Photo credit: Milton Stephenson.

Here I am, receiving my 20 years award from Brian Hackett at the (very) last UK Systems Practice team meeting in London on May 15th. I chose the Bose iPod dock thingy as my gift, and am really pleased with it. The little pin thingy is quite cute too, but the certificate signed my MLP is another matter entirely!

The bulk of the preceding posts are taken from an internal site I put together as part of my (successful) application to become a Principal Field Technologist. The material highlights some of the fun stuff I've done at Sun over twenty years, but it's just too much detail to include in a CV/resume :)

Lakeland Holistic Performance Workshop

Convinced of the need for more people who share my holistic view of systems performance (not simply people who know some DTrace syntax), last year I organised an autumn mentoring workshop in the Lake District for a dozen hand-picked folk from Sun UK. I hope to do the same again soon, so if you are interested please send me an email introducing yourself and explaining why you think you qualify.

SUPerG, Oracle World, Sun Tech Days, JavaU, CEC, Developer Days, OSUGs, etc

Since my initial "baptism of fire" at the Sun UK User Group / UKUUG meeting all those years ago (see below) I have really gotten into this presenting thing, and to audiences both large and small, internal and external. Here's some of my more outrageously well received subject matter ...

  • Solaris: where innovation happens (my current pitch - see above)
  • Solaris: greater than the sum of its parts
  • DTrace for Dummies (a comedy double act with Jon Haslam)
  • libMicro: we scare because we care
  • A Brief History of Threads (underlining Sun's leadership in multithreading)
  • Solaris 7: sixty-four reasons to upgrade (that's bits, stupid!)

Almost an author

My name has made it into a number of books, but has yet to make it onto the front cover. In addition to directing the photo shoot for the covers of the second edition of Solaris Internals (that T2000 prototype was never the same again), and of Solaris Performance and Tools (DTrace is child's play, even Jon Haslam can do it), I also merited a special mention for contributions to the chapter on the Solaris process model.

To my shame, I have, at times, referred some of my more truculent customers to the "acknowledgments" sections of these, and of Cockcroft's Sun Performance Tuning (second edition), with a cursory "I taught them all they know, so shut up, and do as I say!" This always has the desired effect (some have even asked me to sign their copy, and if you can find one I haven't signed, it is worth a fortune)!

The book I don't tend to mention is boohoo - a dot.com story from concept to catastrophe where I am erroneously credited with the sale of an E10K. My actual advice was "fix your code, because as it is, it won't scale on an E10K" (but even that wasn't enough to save them from disaster).

"Ambassador, you're realling spoiling us!"

I was an OS Ambassador from before Solaris 2.0 shipped, and I still have the golden edition signed by Bill Joy to prove it! In my biased opinion, OS Ambassadors has been the most successful of the ambassador programmes, bringing tangible value to the field and engineering alike. Our conferences became a forum for change, and sometimes served as a watering hole for different engineering groups, working on similar projects in total isolation (we were excellent match-makers).

As my experience and confidence grew, I became more vocal and more of a driver. When folk like Richard McDougall (who can forget his VxVM vs SDS coin?) moved on to other things and the original Ambassador Group Boards were formed, I joined the leadership team. I too moved on when I joined PAE, but I maintained "honorary ambassador" status until returning to the field.

Back in the UK, Chris Gerhard and I started "uk-solaris" as a forum for all with a technical interest in Solaris from various field and engineering roles. At our first meeting we "treated" everyone to Ferror Rocher "chocolates", which provoked the famous line from one of the cheesiest TV adverts ever (preserved for posterity here).

Putting something back

My first two putbacks into Solaris were a huge learning experience, and my respect for those who do this kind of engineering day in, day out grew immensely. I'd recommend the experience to anyone who needs a better understanding of the process ...

  • 4991763 getenv doesn't scale
  • 5105528 fix for 4915617 breaks simple multiprocess rwlock test case
  • 5105683 fix for 4915617 should be kinder with uncontended shared rwlocks
  • 6209711 thread error detection false positives possible with shared mutexes

libMicro: we scare because we care

In some ways, libMicro was a reaction to LMbench (which Bart Smaalders and I considered unscientific and a pain in the neck), but we really wanted to write a useful tool which could produce compelling data to drive improvements in Solaris. The result has exceeded our expectations dramatically. Not only has libMicro produced data for many "Linux is faster than Solaris at xxx" bugs, but it also kick-started Sun's interest in the AMD Opteron processor (as well as helping the adoption of SPARC64).

libMicro also has the distinction of being one of the first open source projects at hosted under Mercurial on the opensolaris.org collaboration website. It is still used extensively within Sun, and the code has also proven to be a useful reference for those wanting to write multithreaded applications. Today libMicro can be found alive and well here, and even our competitors are using it!

PRISM and the patent

Before Solaris could have large page support for program text and data, we needed a business case. PRISM stands for Process Relocation in Intimate Shared Memory, and was my first big innovation whilst in PAE. The idea is simple: stop the process, copy a region of small pages somewhere, unmap the source region, remap the source region with large pages, copy the data back, and then allow the process to continue. At the time ISM was the only source of large pages.

My first solution used the LD_PRELOAD shared library interposition technique, but quickly moved on to LD_AUDIT interposition because this provides more fine-grained control. Operating at process startup (with the inclusion of an optional dummy malloc() and free() to preallocate the heap before the relocation took place), PRISM generated plenty of useful data to fuel the MPSS and Large Pages OOB projects. It also highlighted the usefulness of local copies of readonly text and data for large scale NUMA machines.

The PRISM library helped some of our published CPU benchmark numbers, and so had to be shipped with some versions of our compilers. This triggered the patent filing process, with my patent finally being awarded a year or so later.

About five years later, with MPSS and Large Pages OOB in place, I revisited the PRISM idea with Shatter, a tool to break up large pages into smaller ones. This contributed part of Nicolai Kosce's dataspace profiling initiative (DProfile), which was trying to understand the effect of page colouring on performance.

A brief history of threads

Before joining PAE (Performance and Availabilty Engineering), I worked with a major european database vendor on their kernel scalability (on behalf of a mutual customer, a leading media company). We were fighting limitations in an aging implementation of Sun's pioneering two-level thread model (something which became known as "old and broken libthread"). During one of my OS Ambassador trips, I visited Bryan Cantrill and Roger Faulker, and discovered that Bryan had sketched and Roger had prototyped a new implementation based on a one-level model. I then used the customer as the business case for introducing the one level "alternate" implementation in Solaris 8 (under /usr/lib/lwp).

By the time I joined PAE, the new implementation had gained quite a reputation for fixing scalability and stability problems with many multithreaded applications. PAE had many fans of the two-level concept, so I found myself immediately in conflict with some of my new colleagues. But I stuck to my guns and was able to win most of them over to the one-level model. I then worked with Roger, Bart Smaalders and others to have make the one-level model the only implementation in Solaris 9. Part of my contribution to this effort was to write the technical whitepaper Multithreading in the Solaris Operating Environment:

  • The original version on www.sun.com [pdf]
  • The revised version as presented at SUPerG [pdf]
This paper has become a widely quoted document of how we do multithreading, and is still relevant today. Of course, the new thread implementation paved the way for Roger's 1600 file putback to unify the Solaris process model, making threads first class citizens in Solaris - something Linux may actually never achieve!

About

pgdh

Search

Categories
Archives
« September 2009
MonTueWedThuFriSatSun
 
1
2
4
5
6
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
29
    
       
Today