Saturday Jun 05, 2010

Visualizing System Latency

I've just had an article published in ACMQ: Visualizing System Latency, which demonstrates latency analysis using heat maps in Analytics from Oracle's Sun Open Storage appliances. These have revealed details about system performance that were previously not visible, and show how effective a simple visualization can be. As many of these details are new, they are still being studied and are not yet understood.

One detail now can be understood, thanks to Pete Harllee who offered an explanation for the faint line at the top of Figure 4: these I/O are where the lba range spans two tracks in the drive (which I should have realized can happen sooner since these are 8 Kbyte writes on a drive with 512 byte sectors); the additional latency encountered is the expected track to track seek time during the I/O, as the lbas are written to one track and then complete writing on the next.

The resolution of the screenshots was reduced to fit the online article, which preserved the patterns that the article was describing but not ancillary text; the original resolution screenshots are linked in the article, and are also listed here:

  • Figure 1: NFS Latency When Enabling SSD-based Cache Devices
  • Figure 2: Synchronous Writes to a Striped Pool of Disks
  • Figure 3: Single Disk Latency from a Striped Pool of Disks
  • Figure 4: Synchronous Write Latency to a Single-disk Pool
  • Figure 5: Synchronous Write Latency to a Two-disk Pool
  • Figure 6: Synchronous Writes to a Mirrored Pool of Disks
  • Figure 7: Sequential Disk Reads, Stepping Disk Count
  • Figure 8: Repeated Disk Reads, Stepping Disk Count
  • Figure 9: High Latency I/O

The article has been picked up by sites including Slashdot and Vizworld.

Saturday May 15, 2010

Performance Instrumentation Counters: short talk

Performance Instrumentation Counters (PICs) allow CPU internals to be observed, and are especially useful for identifying why exactly CPUs are busy - not just that they are. I've blogged about them before, as part of analyzing HyperTransport utilization and CPI (Cycles-per-Instruction). There are a number of performance analysis needs for which can only be answered via PICs, either using the command line cpustat/cputrack tools, developer suites such as Oracle Sun Studio, or accessing them via DTrace. They include observing:

  • CPI: cycles per instruction
  • Memory bus utilization
  • I/O bus utilization (between the CPUs and I/O controllers)
  • CPU interconnect bus utilization
  • Level 1 cache (I$/D$) hit/miss rate
  • Level 2 cache (E$) hit/miss rate
  • Level 3 cache (if present) hit/miss rate
  • MMU events: TLB/TSB hit/miss rate
  • CPU stall cycles for other reasons (thermal?)
  • ... and more

This information is useful, not just for developers writing code (who are typically more familiar with their existence from using Oracle Sun Studio), but also for system administrators doing performance analysis and capacity planning.

I've recently been doing more performance analysis with PICs and taking advantage of PAPI (Performance Application Programming Interface), which provides generic counters that are both easy to identify and work across different platforms. Over the years I've maintained a collection of cpustat based scripts to answer questions from the above list. These scripts were written for specific architectures and became out of date when new processor types were introduced. PAPI solves this - I'm now writing a suite of cpustat based scripts based on PAPI (out of necessity - performance analysis is my job), that will work across different and future processor types. If I can, I'll post them here.

And for the reason of this post: Roch Bourbonnais, Jim Mauro and myself were recently in the same place at the same time, and used the opportunity to have a few informal talks about performance topics recorded on video. These talks wern't prepared beforehand, we just chatted about what we knew at the time, including advice and tips. This talk is on PICs:

download part 1 for iPod

download part 2 for iPod

I'm a fan of informal video talks, and I hope to do more - they are an efficient way to disseminate information. And for busy people like myself, it can be the difference between never documenting a topic or providing something - albeit informal - to help others out. Just based on my experience, the time it's taken to generate different formats of content has been:

  • Informal talk: 0.5 - 1 hour
  • Blog post: 1 - 10 hours
  • Formal presentation: 3 - 10 hours
  • Published article: 3 - 30+ hours
  • Whitepaper: 5 - 50+ hours
  • Book: (months)

In fact, it's taken twice the time to write this blog post about the videos than it took to plan and film them.

Documentation is another passion of mine, and we are doing some very cool things in the Fishworks product to create documentation deliverables in a smart and efficient way; which can be the topic of another blog post...


Brendan Gregg, Fishworks engineer


« April 2014