By realneel on Jan 07, 2008
If you have ever worked with performance data, you will pretty soon realize that
- Performance Data can get huge.
- Consider a benchmark running on a 64 core system with 100's of disks attached, with multiple network interfaces for 30 minutes. If you collect mpstat at 10 second intervals for the whole run, you end with more than 11,000 lines of data! (That is 400 CNTRL-F's if you are using VI in a regular sized termial). If you collect data from more tools like vmstat, iostat, trapstat, busstat, cpustat, etc you will end up with much more! Going through each of them line by line is not a scalable approach.
- Performance Data is interrelated.
- The tool outputs are just different views of the system behavior. We want to look at the system as a whole, rather than at its individual views. If your incoming network packets peaks, your interrupts in your mpstat most likely peaks. We may want to see if throughput was impacted as a result of a burst of writes to our disks, etc.
- Some performance data makes sense visually.
- For large data, a visual view gives a quick summary of the data. As Tim Cook states it, "the human brain is a powerful pattern-recognition machine - graphs allow you to spot things you would never see in numbers (like waves of CPU migrations moving across different cores)". Look at the bottom of the blog for more details
- Performance Data should be queryable
- We want to be able to query or ask questions to the performance
data. For ex, you might want to know "What are my hot disks?".
Traditionally, people have answered such questions by writing
custom scripts using sed/awk/perl. This can get tedious very fast. We
need a better way of asking questions. In Fenxi, we store the data in
the database, and questions are formulated in SQL.
- Performance Data should be comparable, averageable, etc.
- Since I work in the performance group at Sun, we run a lot of benchmarks. Since the goal of [most] benchmarks is to maximize the performance of a system, we are always constantly trying out new changes to the system. Typically, we change a parameter and repeat the benchmark and see if it has improved performance.
- Performance Data should be sharable.
- We rarely work in isolation. We should be able to share data with
our peers and collaborate on finding performance fixes.
Fenxi tries to solve all of the above problems.
You can see a sample database run processed by Fenxi. I urge you to check it out!