System Profiling 101 : Getting started using sys_diag v.7.04
By tjobson on Aug 03, 2007
The following entry is a variation of an article that I created for this month's Sun "Technocrat" publication (Aug. 2007).
This posting demonstrates the art of system profiling (from a high level overview) by introducing a few sample screenshots of the sys_diag .html report (it's header, Dashboard, and Table of Contents).. demonstrating how in a few minutes, sys_diag can present you with an accurately depicted system profile !
Note, the .html report snapshot samples presented here, match the command line output from my previous blog postings (from the same run of sys_diag).
If you haven't had the chance to try out sys_diag yet, this should give you the highlights of what you can expect in the .html report header sections.
Enjoy and let me know what you think,
Real world PROFILING ..
So.. what is Profiling.. ?? ... Well, in the real world, you can define profiling in many ways.
.. from the "profile" of the person standing next to you (what you see from your vantage point), to the personality "profiles" that we've all heard of in psychology (characterization based upon key attributes) ..
In your standard dictionary you'll find a definition such as this (from Dictionary.com) :
PROFILE : (-noun)
- the outline or contour of the human face, esp. the face viewed from one side.
- a verbal, arithmetical, or graphic summary or analysis of the history,
status, etc., of a process, activity, relationship, or set of
characteristics: a biochemical profile of a patient's blood; a profile of national consumer spending.
- a set of characteristics or qualities that identify a type or category of person or thing: a profile of a typical allergy sufferer.
- Psychology. a description of behavioral and personality traits of a person compared with accepted norms or standards.
System Profiling .. in the World of Computing
Well, in the world of technology, and more specifically.. Computing, "profiling" takes on it's own connotation, though similar to many of the more technical definitions noted above.
To some, system profiling simply includes a high level summary of resource utilization and bottleneck identification of a system during some period of data collection (point in time or over a duration).
"Broad Spectrum" System Profiling ..
System profiling to me is the characterization of a system as a whole, given a set of data, either for one event/point in time, or over a duration. This characterization goes beyond workload (as you'll typically hear the term "workload characterization"), which is why I call it "Broad (or Full) Spectrum" profiling, more broadly taking into account and including :
- Configuration characteristics (of all sub-systems/components within the "application environment" being profiled).
- System Performance metrics captured, reflecting the variations in system/subsystem activity measurements (utilization, contention, throughput, latency, etc...).
- Workload Characterization : Details correlating the Workload that was ongoing during the data collection (workload characteristics.. TPS/ Response Times/ Mbps/ ...), with the measurements taken. (beyond the internal system workload identified, External sources need to be correlated)
- A Characterized (Summarized) "Profile" of overall System Efficiency and "health" based upon Performance Analysis findings (system/sub-system Avg/Peaks.., Utilization vs. Workload, etc ...)
- Notable Events and/or Exceptions encountered from the data available.
- ... etc ...
"Narrow Spectrum" System Profiling ..
This would be in contrast to "Narrow Spectrum" (Focused) System Profiling, where attention to detail is focused in a very "narrow" and specific area of interest for analysis (typically in determining a Root Cause where a specific bottleneck is know within a sub-system or specific component of the system).
Look for more details on this and much more in an up-coming blog entry more thoroughly delineating the distinction between Profiling, Workload Characterization, Performance Analysis, and Capacity Planning....
For now, enjoy the following discussion on how sys_diag can have you profiling in no time at all ... :)
Profiling with sys_diag ...
Automating Solaris Performance and Configuration Analysis
In the arena of Performance and Configuration Analysis, the freely available Solaris utility “sys_diag” offers the capability to automatically capture this information in a single .html report (also .txt and .ps) after running one easy to use ksh script. Typically, several utilities/tools need to be run separately, requiring manual collection, aggregation and correlation of the data, prior to conducting the analysis of data.
Sys_diag automates this legwork, by running over 100 Solaris built in commands/utilities (depending on the parameters used) and presenting the data as a structured report with a summarized header, a color-coded “dashboard” (broken down by high level workload characterization, sub-system findings, followed by a Table of Contents), all with links to the corresponding report sections with detailed configuration and/or performance analysis findings.
sys_diag 's HTML Report Header :
sys_diag 's HTML Performance "Dashboard" :
The following sample .html system performance “Dashboard” (a portion of which is shown below) reflects the 4 key sub-systems (CPU/Kernel Profiling, Memory, IO, and Networking) as a summarized depiction of sub-system “health”, based against a list of rules / thresholds that the captured data is compared to during post-processing.
These rules and thresholds are listed in the Performance Analysis section (Section #24) and can be easily tuned / modified to offer more stringent or lenient identification of performance exceptions that contribute to the Green/ Yellow / Red (OK / Warning / Critical) color-coded status within each dashboard section. Within each section are listed the key performance metrics and a summary of exceptions, along with Average and Peak (High Water Mark) values present during the collection/sampling period. At the end of each section is a list of key “links” to the substantiating detailed data analysis within the report.
\*Click to Enlarge\*
When run for performance data gathering (-g or -G), 2 types of performance data is captured :
\* vmstat, mpstat, iostat, netstat, .. data for a duration (-T total_secs), captured at specified sampling rates (-I interval_#secs). The default duration is 5 minutes of data capture @ 2 second intervals if -I / -T are not specified.
\* 3 Point in Time detailed snapshots (beginning, mid point, end point). If -G is used, and Solaris 10 is the OS, then Dtrace and detailed lockstat, pmap/pfiles, cputrack, ... snapshots will be taken (beyond the core “-g” snapshots that include ps, netstat -i, vxstat, kstat, ...).
Sys_diag has been run on virtually all models of Sun systems running Solaris 2.6 or > (from x86 laptops up to fully loaded E25K's), offering extensive Solaris 10 configuration and performance data, including DTrace snapshots. It creates a single .tar.Z compressed archive (including all raw, snapshot and post-processed datafiles) that can be emailed/ ftp'd.. for performing system configuration and/or performance analysis off-site.. from virtually anywhere.
This is one of the key characteristics that sys_diag offers.. to save a LOT of time.. not requiring many separate manual runs / collection / correlation of data, or the need for any 3rd party tools, libraries, or agents to be installed on a system other than downloading the "sys_diag" ksh script itself. Virtually no learning curve is required for loading, running, and reflecting basic performance profiling, including high level sub-system bottlenecks (deeper root cause correlation might require some level of advanced system administration knowledge, though virtually all the data needed will have been already captured by sys_diag).
This utility has been used extensively in the field over the last several years, run on literally hundreds of production systems as part of escalation root cause analysis, in addition to providing the basis for dozens of Architectural and/or Performance Assessments
(including formal Capacity Planning / Benchmarking). Graphing of the data captured (vmstat, netstat...) is also easy to do using StarOffice as explained in the README file that sys_diag creates.
sys_diag 's HTML Report "Table of Contents" :
The screenshot below shows the Table of Contents and related sections available within the .html report (\* Click to Enlarge \*) :
Although this tool isn't meant to replace long-term historical Performance Trending and Capacity Planning packages (Teamquest, etc..), it provides the foundation and basis for a very robust starting point (and actually is much better at point in time workload characterization and root cause analysis of bottlenecks, where very granular detailed data correlation is required).
Over the time that sys_diag has been posted on BigAdmin, many Sun customers around the globe have downloaded and commented positively on their experiences with it. For more information, or to download and try it out for yourself , the following URL's should help you get started :
The latest release of sys_diag (v.7.04) is available from BigAdmin
(unpackaged ksh) at :
sys_diag is also available as part of the "UnixPackages.com" Distribution
(packaged with the README) at :
The following recent blog postings provide an extended overview of sys_diag and it's capabilities :
Solaris Performance Analysis and Monitoring Tools... at what cost ?...http://blogs.oracle.com/toddjobson/entry/solaris_performance_monitoring_tools
What is sys_diag ?? .. Automating Solaris Performance Profiling and Workload Characterization.
sys_diag v.7.04 command line output ...
NOTE: read the ksh script header pages or the README file prior to using, and ALWAYS test first on a representative non-production system.. as is the best practice when making ANY production environment changes... ;)
(Copyright 2007, Todd A. Jobson)