Saturday Jan 14, 2017

What does sys_diag v8.3 offer ?.. The Easiest, most Complete, Automated Solaris Performance Profiling and Workload Characterization tool..


The following is an excerpt from the full sys_diag_Users_Guide.pdf which gives a
high level overview of the sys_diag capabilities and command line arguments /
/ usage examples.

I've created this over many years to automate and reduce the amount of time it takes to gather and correlate system data for conducting off-site (remote) Performance/Configuration/Consolidation.. or other forms of Architectural Analysis. 

With sys_diag, all you need to do is download the ksh script.. and you're on your way.

After it's run you can click on the .html report to do explore the automated findings, or take the generated single .tar.Z that you can upload or email for remote analysis (-G even includes a wide variety of DTrace examination and deepest probing available).

Read the following introduction, or for the complete deep dive, download the  sys_diag_Users_Guide.pdf .

Even better.. download the latest version of   sys_diag  and try it out ! (read through the .ksh header for complete chronology of Update history, usage, and agreements, ..).    

NOTE:  for either download, right-click on the links above and Save-As.  (then # uncompress sys_diag.Z on a Solaris system)


sys_diag v.8.3g  Overview :


sys_diag is a Solaris utility (ksh/awk/javascript) that can perform several functions, among them : system configuration 'snapshot' and reporting (detailed or high-level) along-side performance data capture (over some specified duration or point in time PEAK PERIOD 'snapshot'). Most significantly, after the data is captured, it automatically does correlation, analysis, and reporting of findings/exceptions (based upon configurable thresholds that can be easily changed within the script header). The output provides a single .html report with a color-coded “dashboard” that includes auto-generated chart summaries of findings, along-side system configuration and snapshot details.

Each run of sys_diag creates a local sub-directory where all datafiles captured or created (analysis, reports, graphs generated) are stored. Upon completion, sys_diag creates an compressed archive within a single .tar.Z for examination externally.

The report format is provided in .html, and .txt as a single file for easy review (without requiring trudging through several subdirectories of separate files potentially thousands of lines long each, to manually correlate and review for hours /days.. before manually generating the assessment report and/or any graphs needed). This tool will literally save you a week of analysis for complicated configurations that require diagnosis.

sys_diag has previously been run on Solaris 2.x (or above) Solaris platforms, and today should be capable of being run on any x86 or SPARC Solaris 8+ system. Version 8.3 includes reporting new Solaris 11.3 capabilities (zones, LDOM’s/OVM, SRM, zfspools, fmd, ipfilter/ipnat, link aggregation, Dtrace probing, etc...).

Beyond the Solaris configuration reporting commands (System/storage HW config, OS config, kernel tunables, network/IPMP/Trunking config, ZFS/FS/VM/NFS, users/groups, security, NameSvcs, pkgs, patches, errors/warnings, and system/network performance metrics), sys_diag also captures relevant application configuration details, such as Sun Cluster 2.x/3.x, Veritas VCS/VM/vxfs, Oracle .ora/RAC/CRS/listener.., MySQL.., along with other detailed configuration capture of key files (and tracking of changes via -t), etc.

Of all the capabilities, the greatest benefits are found by being able to run this single ksh script on a system and do the analysis from one single report/ file offline/elsewhere.

Since sys_diag is a ksh script (using awk for post-processing the data and javascript for dynamic
HTML/chart generation), no packages need to be installed, only using standard built-in Solaris Utilities, allowing for the widest range of support.

Version 8.3g of sys_diag offers built-in dynamic HTML generation with both javascript embedded dashboard charts, as well as stand-alone .gr.html files for each individual chart.   Additionally, the vmstat, iostat, and netstat data is exported in a text format friendly (.gr.txt) format to import and create custom graphs from within OpenOffice or Excel.

Regarding the system overhead, sys_diag runs all commands in a serially, (waiting for each command to complete before running the next) impacting system performance the same as if an admin were typing these commands one at a time on a console. The only exception is the background vmstat/mpstat/iostat/netstat (-g) performance gathering of metrics at the specified sampling interval (-I) and total duration (-T), which generally has negligible overhead on a "healthy" system.

Workflow (order of execution) of a typical sys_diag run (with arguments “-g –I1 –l”) :

This example uses a 1 second sampling Interval (-I) a DEFAULT Total duration (-T)
of 5 minutes = (–T 300) to gather performance data (-g) and create a long (-l)
configuration report. *All Commands are run serially, except Background Collection*

- Extract README_sys_diag.txt (note this is an older summarized version of the complete Users_Guide)

- Beginning BME (0=Begin/1=Midpt/2=EndPt) Profiling SNAPSHOT (#0) 
*ONLY IF Not Excluded via “-x”, & using verbosity via “-v”

(to profile the system point-in-time SNAPs serially with prstat, ps, iostat, netstat, zpool, tcpstat,.. *before any background collection is started*).

- Initiate BACKGROUND Data Collection at a (“-I x”) x second sampling rate for total duration default 300 seconds (5mins) or t Total Seconds via “-T t”.   [data gathered includes vmstat, mpstat, iostat, netstat, .. and if non-gz capped:  also zonestat]

- WAIT until the MidPoint of performance data gathering

- Initiate BME Midpoint Profiling SNAPSHOT (#1)*ONLY IF Not Excluded via “-x”, & using Deep Verbosity via “-V”,         AND IF >3mins of Total duration remains

- WAIT for Background Data Collection to Complete

- Initiate BME Midpoint Profiling SNAPSHOT (#2), *ONLY IF Not Excluded via “-x”, & using verbosity via “-v”.

- Capture System Configuration Data for report (following the TOC Table of Contents Outline)

- Post-Process the Performance data gathered to identify exceptions.

- Generate both the embedded .HTML Javascript charts and stand-alone .html and .gr.txt files (for Excel/OpenOffice custom import chart creation)

- *Generate the complete .html report*

- Identify the Data_Directory Path, the HTML Report File link that can be opened for examination

- Create a compressed tar.Z archive of DataDirectory (all+ sys_diag & perflog)

*NOTE : See Section 12 for the actual command line output running sys_diag *

sys_diag is generally run from the same directory (eg. /var/tmp) that will have enough available disk space for storing the data directories and archives (however, the data directory and all files can be removed after each run using –C).   When always run from the same directory, a single sys_diag_perflog.out file is appended to as a system chronology of performance each time sys_diag is run, that can later be referred to.

NOTE: For the best .html viewing experience, Do NOT use MS Internet Explorer browser as it varies in support of HTML stds for formatting and iframe file inclusion (ending up opening many windows vs embedding output files in a single .html report)

** USE Chrome, Firefox as recommended browsers ** (for best viewing open full screen)

3.0 Command Line Arguments & available parameters :


# sys_diag [-a -A -c -C -d_ -D -f_ -g -G -H -I_ -l -L_ -n -N -o_ -p -P -q -s -S -T_ -t -u -v -V -h|-?]
-a 	Application / DB Configs (included in -l/-A, Oracle/RAC/MySQL/SunRay ..)
-A 	ALL Options are turned on, except Debug and -u
-b 	Generate a Performance Thresholds "Baseline" profile (see -B or default fname used)
-B (1 | 2) Use Baseline file Threshold Analysis Calculation (1=Range HWM, 2=StdDev)
-c 	Configuration details (included in -l/-A)
-C 	Cleanup Files and remove Directory if tar works
-d path Base directory for data directory / files
-D 	Debug mode (ksh set -x .. echo statements/variables/evaluations)
-e email_addr Emails sys_diag .tar.Z file upon completion (assuming sendmail is configured)
-f input_file Used with -t to list configuration files to Track changes of
-g 	gather Performance data (def: 5 sec samples for 5 mins, unless -I |-T exist)
-G 	GATHER Extended Perf data (S10+ Dtrace, lockstats+, pmap/pfiles) vs -g
-h | -? Help / Command Usage (this listing) / Version_#
-H 	HA configuration and stats (Solaris Cluster, VCS, ..)
-I secs Perf Gathering Sample Interval (default is 5 secs)
-l 	Long Listing (most details, but not -g|-G,-v|-V,-A,-t,-D)
-L label_descr_nospaces (Descriptive Label For Report)
-n 	Network configuration and stats (also included in -l/-A except ndd settings)
-N 	No Graph generation in HTML Reports.
-o outfile Output filename (stored under sub-dir created)
-p cminp  Specify Individual Performance Subsystems for data capture (for -g | -G).
	[eg “-p cminp” selects All (CPU|Mem|IO|Net|Process), “-p cn” only cpu & net]
-P -d ./data_dir_path Post-process the Perf data skipped with -S and finish .html rpt
-q 	Quiet mode, disables command line output. (*not yet fully implemented*)
-s 	Security configuration
-S 	SKIP POST PROCESSing of Performance data (use -P -d data_dir to complete)
-t 	Track configuration / cfg_file changes (Saves/Rpts cfg/file chgs *see -f)
-T secs Perf Gathering Total Duration (default is 300 secs =5 mins)
-u 	unTar ed: (do NOT create a tar file)
-v 	Extended verbosity level 1 (for -g perf gathering, examines more top procs,
	Also adds pmap/pfiles/ptree, and lightweight lockstat to BME SNAPSHOTS).
-V Deep Verbosity level 2 (adds path_to_inst, netwk dev settings, snoop..)
	Longer message/error/log listings. Additionally, the probe duration for
	Dtrace and lockstat sampling is widened from 2 seconds (during -G) to 5 seconds
	(if -G && -V). Ping is also run against the default route and
	If -g|-G & -V, then mdb memory usage is captured (page cache, kernel, anon..).
-x 	Excludes lockstat, intrstat, plockstat (DTrace usage),pfiles & mdb from
	-g|-G performance data gathering, also skipping Midpt BME snapshots.


	BOTH of the following command line syntax examples are functionally the same (order/spacing doesn’t matter):
	eg. 	./sys_diag -g –v -I 1 -T 600 -l
		./sys_diag -g -l -I1 –T600 -v

	NOTE: NO args equates to a brief rpt with NO Performance capture (No -A,-g/I,-l,-t,-D,-V,..)
	** Also, note that option/parameter ordering is flexible, as well as use of white
	space before arguments to parameters (or not). The only requirement is to list
	every option/parameter separately with a preceeding - (-g -l , but not -gl).


** EXIT Status ** (Return Code) :
0 if OK, non-zero if an error occurred or Performance EXCEEDED Thresholds!found
IF Performance Gathering and Analysis (-g|-G) has Noted EXCEEDED Thresholds! THEN a bitmask is
produced of the following Conditions (added together to produce a single integer
exit/return code) :

RED (Critical) CPU Alarm : return_code = return_code + 1
RED (Critical) Memory Alarm : return_code = return_code + 2
RED (Critical) StorageIO Alrm : return_code = return_code + 4
RED (Critical) Network Alarm : return_code = return_code + 8
YELLOW (Warning) CPU Alarm : return_code = return_code + 16
YELLOW (Warning) Memory Alarm : return_code = return_code + 32
YELLOW (Warning) StorageIO Alrm : return_code = return_code + 64
YELLOW (Warning) Network Alarm : return_code = return_code + 128


Therefore, if you take the return code and start by subracting the highest values, you can identify which subsystems (cpu/memory/storageIO/network) had alarms.

 eg. root# exit 0 will give you the exit code of the last run command/utility


Therefore, if sys_diag returned an exit code of 129, then that depicts :
return_code - 128 shows that Network Warnings (YELLOW) were present.. and
return_code - 1 shows CPU (RED) Critical Alarms


(essentially, start subtracting the largest exceptions, and take the remainder
and go down the list.. so an exit code of 5 would have been RED_IO & RED_CPU)


____________________________________________________ 4.0 Common Command Line Usage examples : ____________________________________________________
./sys_diag -l 		Creates a LONG (detailed) configuration snapshot report in both 
			HTML (.html) and Text formats (.out). Without -l, the config report
			created has minimal system cfg details.  Note, that -l (as with 
			most cmd line arguments) can be added when capturing performance 
			data to create a more complete rpt.
./sys_diag -g 		gathers performance data at the default sampling rate of 5 secs for 
			a total duration of 5 mins, creating a color coded HTML rpt with 
			header/ Dashboard Summary section and performance details/ findings/
			exceptions found. Also runs the BME starting/endpoint snapshots 
			(before/after background data gathering of vm/mp/io/netstat..).
			*This example will NOT create detailed configuration report sections.
			NOTE: -g is meant to gather perf data without overhead, therefore only 
				1 second lockstat samples are taken. Use -G and/or -V for more 
				detailed system probing (see examples and notes below) Using 
				–v/-V with -g, adds pmap/pfiles snapshots, vs. using -G to also 
				capture Dtrace and extended lockstat probing.
			** Any time that sys_diag is run with either -g or -G, the performance
			* dashboard/summary section of the command line output is appended to
			* the file sys_diag_perflog.out, which gets copied and archived as
			** part of the final .tar.Z output file.
./sys_diag -g –l -I 1 -T 600 	Gathers perf data at 1 sec samples for 10 mins and Also does 
				basic BME Begin/Midpt/Endpoint sampling, and creates a long/
				detailed configuration report.
./sys_diag -l -g -C 		Creates a long configuration snapshot report, gathers basic 
				performance data/analysis, and Cleans up (aka removes the 
				data directory) after data directory archive compression (.tar.Z).
./sys_diag -d base_directory_path –l … (-d changes the data directory location to be created)
./sys_diag -G –l -T 600 	Gathers DEEP performance & Dtrace/lockstat/pmap data at the 
				default Interval (sampling rate of 5 secs) for 10 mins 
				(including the std data gathering from -g).
		*NOTE:  this runs all Dtrace/Lockstat/Pmap probing during BME snapshot intervals 
			(beginning_0/midpoint_1 w -V/ and endpoint_#2 snapshots), limiting probing
			overhead to BEFORE/AFTER the standard data gathering begins (vmstat, 
			mpstat, iostat, netstat, .. from -g).  The MIDPOINT probing occurs at a 
			known point as not to confuse this activity for other system processing.
		*Because of this, standard data collection may not start for 30+ seconds, or until
		 the beginning snapshot (snapshot_#0) is complete. 
		 (-g snapshot_#0 activities only take a couple seconds to complete, since they 
		  do not include any Dtrace/lockstat.. beyond 1 sec samples).
./sys_diag -G -V -I 1 -T 600 	Gathers DEEP, VERBOSE, performance & Dtrace/lockstat/pmap data 
				at 1 sec sample intervals for 10 mins (uses 5 second Dtrace and
				Lockstat snapshots, vs. 2 second probing with -g.
				(in addition to the standard data gathering from -g).
./sys_diag -g –l -S 	(gathers perf data, runs long config rpt, and SKIPS Post-Processing 
			 and .html report generation)
	NOTE: * This allows for completing the post-processing/analysis activities either 
		on another system, or at a later time, as long as the data_directory
		exists (which can be extracted from the .tar.Z, then refered to a
		-d data_dir_path ). ** See the next example using -P -d data_path **
./sys_diag -P -d ./data_dir_path    (Completes Skipped Post-Processing & .html rpt creation)


This has been an invaluable asset used to characterize / diagnose / analyze workloads across literally hundreds of systems within many of the top Fortune 100 datacenters.  As would be expected, the obligations, support, and implications of use are the sole responsibility of the user, as is documented within the header of sys_diag. As a standard “best practice”, this and/or any new workload introduced to a system should always be tested first in a non-production environment for validation and familiarity.

Enjoy, and let me know if you have any Q's or suggestions !


.. see our next installment for  samples of the new dynamically auto-generated embedded and stand-alone .html javascript charts .. also found within the complete  sys_diag_Users_Guide.pdf

Monday May 02, 2016

OpenStack FULL distribution included/supported in Solaris 11.2 and beyond !!

The FULL OpenStack distribution has been available since Solaris 11.2 (2014) !  

Once again, proving that Solaris is THE single most complete, mission critical, linearly scalable, binary compatible.. Opertating System on the planet !    (beyond the dozens of Solaris-ONLY capabilities that blow away Linux.. here is but another, at NO COST to you)

Many references & WP's are available at the following link :

For a good high-level overview of what OpenStack in Solaris offers watch this video : 

Sunday Sep 30, 2007

The Many Flavors of System Latency.. along the Critical Path of Peak Performance

From an article that I wrote last month, published in the September 2007 issue of Sun's Technocrat, this examination of System Latency starts where we left off with the last discussion What is Performance ? .. in the Real World .  That discussion identified the following list of key attributes and metrics that most in the IT world associate with optimal system performance :
  • Response Times (Client GUI's, Client/Server Transactions, Service Transactions, ..) Measured as "acceptable" Latency.
  • Throughput (how much Volume of data can be pushed through a specific subsystem.. IO, Network, etc...)
  • Transaction Rates (DataBase, Application Services, Infrastructure / OS / Network.. Services, etc.).  These can be either rates per Second, Hour, or even Day... measuring various service-related transactions.
  • Failure Rates (# or Frequency of exceeding High or Low Water Marks .. aka Threshold Exceptions)
  • Resource Utilization (CPU Kernel vs. User vs. Idle, Memory Consumption, etc..)
  • Startup Time (System HW, OS boot, Volume Mgmt Mirroring, Filesystem validation, Cluster Data Services, etc..)
  • FailOver / Recovery Time (HA clustered DataServices, Disaster Recovery of a Geographic Service, ..)  Time to recover a failed Service (includes recovery and/or startup time of restoring the failed Service)
  • etc ...

Each of the attributes and perceived gauges of performance listed above have their own intrinsic relationships and dependencies to specific subsystems and components... in turn reflecting a type of "latency" (delay in response). It is these latencies that are investigated and examined for root cause and correlation as the basis for most Performance Analysis activities.

How do you define Latency ?

In the past, the most commonly used terminology relating to latency within the field of Computer Science had been "Rotational Latency". This was due to the huge discrepancy between the responsiveness of an operation requiring mechanical movement, vs. the flow of electrons between components, where previously the discrepancy was astronomical (nano seconds vs. milliseconds).  Although the most common bottlenecks do typically relate to physical disk-based I/O latency, the paradigm of latency is shifting.  With today's built in HW caching controllers and memory resident DB's, (along with other optimizations at the HW, media, drivers, and protocols...), the gap has narrowed. Realize that in 1 nanosecond (1 billionth of a second), electricity can travel approximately one foot down a wire (approaching the speed of light). 

However, given the industry's latest cpu's running multiple cores at clock speeds upwards of multiple GigaHertz (with >= 1 thread per core,  each theoretically executing > 1+ billion  instructions per second...), many bottlenecks can  now easily be realized within memory, where the densities have increased dramatically, the distances across huge supercomputer buses (and grids) have expanded dramatically, and most significantly.. the latency of memory has not decreased at the same rate as cpu speed increases. In order to best investigate system latency, we first need to define it and fully understand what we're dealing with.


  • noun               The delay, or time that it takes prior to a function, operation, and/or transaction occurring.  (my own definition)
  • adj   (Latent)   Present or potential but not evident or active.
  • noun               A place or stage in a process at which progress is impeded.
  • noun              Output relative to input; the amount of data passing through a system from input to output.
  • noun              The amount of data that can be passed along a communications channel in a given period of time.

(definitions cited from


The "Application Environment" and it's basic subsystems :


Once again, the all-inclusive entity that we need to realize and examine in it's entirety is the "Application Environment", and it's standard subsystems :

  • OS / Kernel (System processing)
  • Processors / CPU's
  • Memory
  • Storage related I/O
  • Network related I/O
  • Application (User) SW


The "Critical Path" of (End-to-End) System Performance :

Although system performance might frequently be associated with one (or a few) system metrics, we must take 10 steps back and realize that overall system performance is one long inter-related sequence of events (both parallel and sequential). Depending on the type of workload and services running within an Application Environment, the Critical Path might vary, as each system has it's own performance profile and related "personality. Using the typical OLTP RDBMS environment as an example, the Critical Path would include everything (and ALL Latencies incurred) between :

Client Node / User -> Client GUI -> Client Application / Services -> Client OS / Kernel -> Client HW -> NICs -> Client LAN -> (network / naming services, etc.. ) -> WAN (switches, routers, ...) -> ... Network Load Balancing Devices

-> Middleware / Tier(s) -> Web Server(s) -> Application Server(s) -> Directory, Naming, NFS... Servers/Services->

-> RDBMS Server(s) [Infrastructure Svcs, Application SW, OS / kernel, VM, FS / Cache, Device Drivers, System HW, HBA's, ...] -> External SAN /NAS I/O [ Switches, Zones/Paths, Array(s), Controllers, HW Cache, LUN(s), Disk Drives, .. ] -> RDBMS Svr ... LAN ...... -> ... and back to the Client Node through the WAN, etc... <<-

(NOTE: MANY sub-system components / interactions are left out in this example of a transaction and response between a client and DB Server)


Categories of Latency :

Latency, in and of itself, simply refers to a delay of sorts.  In the realm of Performance Analysis and Workload Characterization, an association can generally be made between certain types of latency and a specific sub-system "bottleneck".  However, in many cases the underlying "root causes of bottlenecks are the result of several overlapping conditions, none of which individually cause performance degradation, but together can result in a bottleneck. It is for this reason that performance analysis is typically an iterative exercise, where the removal of one bottleneck can easily result in the creation of another "hot spot elsewhere, requiring further investigation and /or correlation once a bottleneck has been removed.


Internal vs. External Latency ...

Internal Forms of Latency :

  • CPU Saturation (100% Utilization, High Run Queues, Blocked Kthreads, Cpu Contention ... Migrations / Context Switching / ... SMTX, ..)
  • Memory Contention (100% Utilization, Allocation Latency due to either location, Translation, and/or paging/swapping, ...)
  • OS Kernel Contention Overhead ( aka .. "Thrashing" due to saturation.. )
  • IO Latency ( Hot Spots, High Svc Times, ...)
  • Network Latency
  • OS Infrastructure Service Latency (Telnet, FTP, Naming Svcs, ...)
  • Application SW / Services (Application Libraries, JVM, DB, ...)

External Forms of Latency :

  • SAN or External Storage Devices (Arrays, LUNS, Controllers, Disk Drives, Switches, NAS, ...)
  • LAN/WAN Device Latency (Switches, Routers, Collisions, Duplicate IP's, Media Errors, ....)
  • External Services .. DNS, NIS, NFS, LDAP, SNMP, SMTP, DB, ....)
  • Protocol Latency (NACK's, .. Collisions, Errors, etc...)
  • Client Side Latency

Perceived vs. Actual Latency ...

For anyone that has worked in the field with end-users, they have likely experienced scenarios where users will attribute a change in application behavior to a performance issue, in many cases incorrectly. The following is a short list of the top reasons for a lapse in user perception of system performance :

  • Mis-Alignment of user expectations, vantage points, anticipation, etc.. (Responsiveness / Response Times, ...)
  • Deceptive expectations based upon marketing "PEAK" Throughput and/or CPU clock-speed #'s and promised increases in performance.  (high clock speeds do NOT always equate to higher throughput or better overall performance, especially if ANY bottlenecks are present)
  • PEAK Throughput #'s can only be achieved if there is NO bottleneck or related latency along the critical path as described above. The saturation of ANY sub-system will degrade the performance until that bottleneck is removed.

    The PEAK Performance of a system will be dictated by the performance of it's most latent and/or contentious components (or sub-systems) along the critical path of system performance. (eg. The PEAK bandwidth of a system is no greater than that of it's slowest components along the path of a transaction and all it's interactions.)

    As the holy grail of system performance (along with Capacity Planning.. and ROI) dictates, ... a system that allows for as close to 100% of CPU processing time as possible (vs. WAIT events that pause processing) is what every  IT Architect and System Administrator strives for.   This is where systems using CMT (multiple cores per cpu, each with multiple threads per core) shine, allowing for more processing to continue even when many threads are waiting on I/O.



    The Application Environment and it's Sub-Systems ... where the bottlenecks can be found


    Within Computing, or more broadly, Information Technology, "latency" and it's underlying causes can be tied to one or more specific "sub-systems". The following list reflects the first level of "sub-systems" that you will find for any Application Environment :

    Subsystem / Components

    Attributes and key Characteristics

    Related Metrics, Measurements, and/or Interactions

    System "Bus" / Backplane

    Backplane / centerplane, I/O Bus, etc.. (many types of connectivity and media are possible, all with individual response times and bandwidth properties).

    Busstat output, aggregated total throughput #'s (from kstat, etc..)


    # Cores, # HW Threads per core, Clock speed / Frequency in Ghz (cycles per second), Operations (instructions) per Sec, Cache, DMA, etc..

    vmstat, trapstat, cpustat, cputrack, mpstat, ... (Run Queue, Blocked Kthreads, ITLB_Misses, % S/U/Idle Utilization, # lwp's, ...)

    Memory / Cache

    Speed/Frequency of Bus, Bandwidth of Bus, Bus Latency, DMA Config, L1/L2/L3 Cache Locations/ Sizes, FS page cache, Physical Proximity of Cache and/or RAM, FS page caching, tmpfs, pagesizes, ..

    vmstat, pmap, mdb, kstat, prstat, trapstat, ipcs, pagesize, swap, ... (Cache Misses, DTLB_misses, Page Scan Rate, heap/stack/kernel sizes,..)

    Controllers (NIC's, HBA's, ..)

    NIC RX Interrupt Saturation, NIC Overflows, NIC / HBA Caching, HBA SW vs. HW RAID, Bus/Controller Bridges/Switches, DMP, MPxIO, ...

    netstat, kstat (RX Pkts / Sec, Network Errors, ...) , iostat, vxstat.. (Response Times, Storage device Svc_times..), lockstat, intrstat, ...

    Disk Based Devices

    Boot Devices, RAID LUN's, File Systems (types, block sizes, ...), Volumes, RAID configuration (stripes, mirrors, RAID Level, paths,...), physical fragmentation, Mpxio, etc..

    iostat, vxstat, kstat, dtrace, statspack, .. (%wait, Service Times, blocked kernel threads, ... FS/LUN Hot Spots)

    OS / Kernel

    Process Scheduling, Virtual Memory Mgmt, HW Mgmt/Control, Interrupt handling, polling, system calls, ...

    vmstat (utilization, interrupts, syscalls, %Sys / % Usr, ...), prstat, top, mpstat, ps, lockstat (for smtx, lock, spin.. contention), ...

    OS Infrastructure Services

    FTP, Telnet, BIND/DNS, Naming Svcs, LDAP, Authentication/Authoriz., ..

    prstat, ps, svcadm, .. various ..

    Application Services

    DB Svr, Web Svr, Application Svr, ...



Note, if you want a single Solaris utility to do the heavy lifting, performance / workload correlation, and reporting for you, take a look at sys_diag if you haven't already done so (or the README).


Media/ Transport Bandwidth and related Latencies :


The following table demonstrates the wide range of typical operating frequencies and latencies PER Sub-System, Component, and/or Media Type :

Component / Transport Media

Response Time / Frequency / Speed

 Throughput / Bandwidth


> 1+ Giga Hertz (1+ billion cycles per second)
\*  (# cores \* HW Threads / core)

>1 billion operations per second
(huge theoretical #ops/s per system)


DDR (PC-3200@200MHz/200MHz bus) ~5ns

DDR2 (PC2-5300@166MHz/333MHz bus) ~ 6ns

DDR2 (PC2-8500@266MHz/533MHz bus) ~ 3.75ns  <TBD>

nanoseconds (billionths of a second)

DDR-400 Peak Transfer 3.2 GB/s

DDR2-667 Pk Transfer 5.3GB/s

DDR2-1066 Pk Transfer 8.5GB/s <TBD>

Disk Devices

Service Times : ~5+ ms =
~ X ms Latency   +  Y ms Seek Times   
(1 millisecond = 1000th of a second)
[platter size, # cylinders/ platters, RPM,...]

varies greatly, see below

Ultra 320 SCSI (16 bit) parallel

(high performance, cable & dev limitations..)

Up to 320 MBps

SAS [Serial Attached SCSI]

Future <TBD>

> 300 MBps (>3 Gbps)
Up to 1200 MBps <TBD>

SATA [Serial ATA]

low cost, higher capacity (poor performance)
Future <TBD>

Up to 300 MBps
Up to 600 MBps <TBD>

USB 2.0
10-200+ Microseconds
(1 microsecond [us] = 1 millionth of a second)
up to 480 Mbps (60 MBps)             ~40 MBps Real-World Usable
FireWire (IEEE 1394)

Up to 50 MBps

Fiber Channel (Dual Ch)

4 Gb  (4 / 2 / 1 Gb) \*2
8 Gb  (8 / 4 / 2 Gb) \*2  <TBD>

Up to 1.6 GBps (1 GB Usable)

Up to 3.2 GBps (1.8 GB Usable)

1 Gigabit Ethernet

\*\* Latency ~ 50 us [microseconds] \*\*

125 MBps (~1 Gbps) theoretical

10 Gigabit Ethernet

Up to 20 Gbps (<= 9 Gbps Usable)

Infiniband (Dual Ported HCA)

x4 (SDR / DDR) Dual Ported= \*2

\*\* Latency < 2 microseconds \*\*
x8 (DDR) \*2  <TBD>

2\*10Gb= 20 Gbps (16Gbps Usable)

Up to 40 Gbps (32 Gbps Usable)
PCI 2.2
32 bit @ 33 MHz
64 bit @ 33 MHz
64 bit @ 66MHz
133 MBps
266 MBps
533 MBps
64 bit bus width @ 100 MHz (parallel bus)
64 bit bus width @ 133 MHz (parallel bus)
Up to 800 MB/s
1066 MBps (1 GBps)
v.1 serial bus / bi-directional @ 2.5 GHz

v.2  @ 5 GHz   <TBD>
(10's -100's of nanoseconds for latencies)
4 GBps (x16 lanes) one direction
8 GBps (x32 lanes) one direction
Up to 16 GBps bi-directional (x32)

32 GBps bi-directional (x32 lanes)


Other Considerations Regarding System Latency :

Other considerations regarding system latency that are often overlooked include the following, which offers us a more holistic vantage point of system performance and items that might work against "Peak system capabilities :

  • For Application SW that supports advanced capabilities such as Infiniband RDMA (Remote Direct Memory Access), interconnect latencies can be virtually eliminated via Application RDMA "kernel bypass".  This would be applicable in an HPC grid and/or possibly  Oracle RAC Deployments, etc. (confirming certifications of SW/HW..).
  • Level of Multi-Threading vs. Monolithic serial or "batch" jobs (If Applications are not Multi-Threaded, then SMP and/or CMT systems with multiple processors / cores will likely always remain under-utilized).
  • Architectural configurations supporting load distribution across multiple devices / paths (cpu's, cores, NIC's, HBA's, Switches, LUNs, Drives, ...)
  • System Over Utilization (too much running on one system.. due to under-sizing or over-growth, resulting in system "Thrashing" overhead)
  • External Latency Due to Network and/or SAN I/O Contention
  • Saturated Sub-Systems / Devices (NIC's, HBA's, Ports, Switches, ...) create system overhead handling the contention.
  • Excessive Interrupt Handling (vs. Polling, Msg passing, etc..), resulting in overhead where Interrupt Handling can cause CPU migrations / context switching (interrupts have the HIGHEST priority within the Solaris Kernel, and are handled even before RT processing, preempting running threads if necessary).   Note, this can easily occur with NIC cards/ports that become saturated (> ~25K RX pkts/sec), especially for older drivers and/or over-utilized systems.
  • Java Garbage collection Overhead (sub-par programming practices, or more frequently OLD JVM's, and/or missing compilation optimizations).
  • Use of Binaries that are compiled generically using GCC, vs. HW optimized compilations using Sun's Studio Compilers (Sun Studio 12 can give you 200% + better performance than gcc binaries).
  • Virtualization Overhead (significant overhead relating to traps and library calls... when using VmWare, etc..)
  • System Monitoring Overhead (the cumulative impact of monitoring utilities, tools, system accounting, ... as well as the IO incurred to store that historical performance trending data).
  • OS and/or SW ... Patches, Bugs, Upgrades (newly applied, or possibly missing)
  • Systems that are MIS-tuned, are accidents waiting to happen.  Only Tune kernel/drivers if you KNOW what you are doing, or have been instructed by support to do so (and have FIRST tested on a NON-production system).  I can't tell you how many performance issues I have encountered that were to do administrator "tweaks" to kernel tunables (to the point of taking down entire LAN segments !).  The defaults are generally the BEST starting point unless a world-class benchmarking effort is under-way.


The "Iterative" nature of Performance Analysis and System Tuning

No matter what the root causes are found to be, in the realm of Performance Analysis and system Tuning, ... once you remove one bottleneck, the system processing characteristics will change, resulting in a new performance profile, and new "hot spots" that require further data collection and analysis. The process is iterative, and requires a methodical approach to remediation.

Make certain that ONLY ONE (1) change is made at a time, otherwise, the effects ( + or - ) can not be quantified.

Hopefully at some point in the future we'll be operating at latencies measured in attoseconds (10 \^-18th, or 1 quintillionth of a second), but until then .... Happy tuning :)

For more information regarding Performance Analysis, Capacity Planning, and related Tools, review some of my other postings at :


Copyright 2007  Todd A. Jobson


This blog does not reflect the viewpoint or opinions of Oracle. All comments are personal reflections and responsibility of Todd A. Jobson, Sr. and are implicitly copyrighted from the posted year to current year, to that effect.


« February 2017