By User12610236-Oracle on Jun 14, 2016
For a High-level review of the latest Oracle M7 CPU/System News & Video's.. chekout the "M7 InfoWall" :
For a High-level review of the latest Oracle M7 CPU/System News & Video's.. chekout the "M7 InfoWall" :
TheRegister speculates on Oracle's System and CPU futures.. based upon last year's HotChips CPU conference and presentation.
.. interesting stuff, and could be an other Game-Changer for Oracle ! .. TBD
As several have recently asked for, the following list should provide an ample starting point to review the 20+ M7 CPU World-Record and/or Best-In-Class benchmark results :
Oracle's M7 CPU is a "Game-Changer" .. As TheRegister notes a quote from John Fowler VP of Oracle Systems :
(regarding the M7 microprocessor and its builtin coprocessors that speed up crypto algorithms and database requests stood apart from the generic Intel x86 servers swelling today's data center racks.. and relative to M7, poor performing Power8 CPU's)..
"I don't believe that the million-server data center powered by a hydroelectric dam is the scalable future of enterprise computing," Fowler said. "We'll need to keep doing it, but we also need to invest in new technology so you all don't have to build them."
.. and today we have with the introduction of the M7 CPU and related servers @ Oracle !
InfoWorld : "Oracle Makes It's BIGGEST SPARC Announcement Since Buying Sun !!"
M7 Processor and M7/T7 Servers are "Game-Changers" in the following ways :
The recent article from Enterprise Linux News demonstrates a customer that has migrated production systems to Solaris 10.. formerly being an ALL Linux shop.
The resons noted include :
One additional item that should be noted is that the target platform tested was using Sun's T1 cpu, which offers 8 cores, each with 4 HW threads of execution (aka.. 32 HW threads within one CPU).
Sun's latest Coolthreads/CMT CPU's (T2's) offer roughly double the performance of the former T1 cpu's (up to 8 cores per cpu, each with 8 HW threads = 64 threads/cpu).. and above and beyond single socket 5120 and 5220 T2 (cpu) based systems, Sun now offers dual socket T2 systems including the T5140 and T5240 systems (totaling up to 128 HW threads of execution in a single 1U or 2U system !!!).
To read the article in it's entirety click here :
The following post is a close proximity of an article that I published in this month's Sun "Technocrat" (November 2007) issue. Hopefully you'll enjoy this discussing regarding the past and present relationship of CPU's and architecture to system performance.
paced world of ever increasing demands for system throughput, the
foundation of discussion and expectations typically all hinge upon the
same topic.. CPU performance. This article will be an
examination of CPU's and system architecture, as they relate to
performance and capacity planning as a whole. From our last
Many Flavors of System Latency ..." , we will extend the context to
focus on past and present competing aspects of system/CPU
architecture, including a brief history of how we got to the
current competitive landscape we find ourselves in today. (The photo to the left is of a Sun T1
core, 32 thread cpu)
Choosing the right CPU for your Workload :
SPARC based CMP / CMT CPU's :
The following table provides a high level comparison of Sun's T2 and T1 CPU's :
(for the complete Microprocessor
Review report on the T2, click here)
|| Kernel Threads Runnable,
but not executing (best if 0, or at most
< # cores)
|| Blocked Kernel
(typically ID's an
IO bottleneck, see also %wt,
|| Number of System Calls
(calls made into the OS kernel, accounting towards %Sys)
|| Number of System
Interrupts per interval (interrupts have the highest priority on the
|% CPU (U/S/I)
|| % CPU utilization (% User
space / % System kernel / % Idle); % User should
be 2\* % Sys
|| Per CPU Cross-Calls
(either for cross processor interrupts, and/or maintaining cpu virtual
memory translation consistency .. aka cache consistency with MME and
mapping TLB entries, etc.)
|| Per CPU Interrupts
(also use intrstat, as well as
lockstat for system
|| Involuntary context
switching (icsw reflects preemption..)
voluntary context switching (csw)
|| Per Cpu Migrations .. A
more inclusive migration off of and onto another CPU.
|| Mutex exclusion lock
activity (per cpu) p/lockstat
gives the best visibility of this activity.
|% CPU Waiting
|| % of a single CPU spent
Waiting (during the sampling interval). See also b kthr.
|| % of MMU related
Instruction Translation Lookaside Buffer Misses (see also pagesize, pmap, cpustat..)
|| % of MMU related Data
Translation Lookaside Buffer Misses (see note above as pgsize is
||VARIOUS CPU specific HW
event counters (Cache, Instruction level, FP, TLB; man cputrack for your HW specific
|| VARIOUS System Wide CPU
event Counters (man cpustat for
your HW specific counters)
|| Available System Specific
Bus Device / Instance Counters & Events (use busstat -l
for your HW)
|| ALL kernel statistics are
available individually via kstat
The following list provides a set of
definitions and examples for some of today's most common independent
(industry accepted) computing benchmarks.
IBM System p570
HP ProLiant DL360 G5
IBM p6 570
Power Consumption (Watts)
Performance / Watt
Power Consumption (Watts)
Performance (SPECjApp JOPS)
Performance / Watt
Over the past 12 years with Sun, on several occations I've been brought into a mission critical production environment having
performance issues just after going live. The reasons which
were most commonly the cause of this include :
Hopefully this article has helped you
reflect on the wide variety of CPU options available, as well as how
they play such a significant role as the "cornerstone" of system
architecture and overall performance of our customer's production
environments. Enjoy, and "let the chips fall (or rise) as they
For more information regarding Performance Analysis, Capacity Planning, and related Tools, see Todd's Blog at : http://blogs.sun.com/toddjobson/category/Performance+and+Capacity+Planning
\* Copyright 2007 Todd A. Jobson \*
Each of the attributes and perceived gauges of performance listed above have their own intrinsic
relationships and dependencies to specific subsystems and components... in turn reflecting a type of "latency" (delay in response). It is these latencies that are investigated and examined for root cause and correlation as the basis for most Performance Analysis activities.
How do you define Latency ?
In the past, the most commonly used terminology relating to latency within the field of Computer Science had been "Rotational Latency". This was due to the huge discrepancy between the responsiveness of an operation requiring mechanical movement, vs. the flow of electrons between components, where previously the discrepancy was astronomical (nano seconds vs. milliseconds). Although the most common bottlenecks do typically relate to physical disk-based I/O latency, the paradigm of latency is shifting. With today's built in HW caching controllers and memory resident DB's, (along with other optimizations at the HW, media, drivers, and protocols...), the gap has narrowed. Realize that in 1 nanosecond (1 billionth of a second), electricity can travel approximately one foot down a wire (approaching the speed of light).
However, given the industry's latest cpu's running multiple cores at clock speeds upwards of multiple GigaHertz (with >= 1 thread per core, each theoretically executing > 1+ billion instructions per second...), many bottlenecks can now easily be realized within memory, where the densities have increased dramatically, the distances across huge supercomputer buses (and grids) have expanded dramatically, and most significantly.. the latency of memory has not decreased at the same rate as cpu speed increases. In order to best investigate system latency, we first need to define it and fully understand what we're dealing with.
(definitions cited from www.dictionary.com)
The "Application Environment" and it's basic subsystems :
Once again, the all-inclusive entity that we need to realize and examine in it's entirety is the "Application Environment", and it's standard subsystems :
The "Critical Path" of (End-to-End) System Performance :
Although system performance might frequently be associated with one (or a few) system metrics, we must take 10 steps back and realize that overall system performance is one long inter-related sequence of events (both parallel and sequential). Depending on the type of workload and services running within an Application Environment, the Critical Path might vary, as each system has it's own performance profile and related "personality. Using the typical OLTP RDBMS environment as an example, the Critical Path would include everything (and ALL Latencies incurred) between :
Client Node / User -> Client GUI -> Client Application / Services -> Client OS / Kernel -> Client HW -> NICs -> Client LAN -> (network / naming services, etc.. ) -> WAN (switches, routers, ...) -> ... Network Load Balancing Devices
-> Middleware / Tier(s) -> Web Server(s) -> Application Server(s) -> Directory, Naming, NFS... Servers/Services->
-> RDBMS Server(s) [Infrastructure Svcs, Application SW, OS / kernel, VM, FS / Cache, Device Drivers, System HW, HBA's, ...] -> External SAN /NAS I/O [ Switches, Zones/Paths, Array(s), Controllers, HW Cache, LUN(s), Disk Drives, .. ] -> RDBMS Svr ... LAN ...... -> ... and back to the Client Node through the WAN, etc... <<-
(NOTE: MANY sub-system components / interactions are left out in this example of a transaction and response between a client and DB Server)
Categories of Latency :
Latency, in and of itself, simply refers to a delay of sorts. In the realm of Performance Analysis and Workload Characterization, an association can generally be made between certain types of latency and a specific sub-system "bottleneck". However, in many cases the underlying "root causes of bottlenecks are the result of several overlapping conditions, none of which individually cause performance degradation, but together can result in a bottleneck. It is for this reason that performance analysis is typically an iterative exercise, where the removal of one bottleneck can easily result in the creation of another "hot spot elsewhere, requiring further investigation and /or correlation once a bottleneck has been removed.
Internal vs. External Latency ...
Internal Forms of Latency :
External Forms of Latency :
Perceived vs. Actual Latency ...
For anyone that has worked in the field with end-users, they have likely experienced scenarios where users will attribute a change in application behavior to a performance issue, in many cases incorrectly. The following is a short list of the top reasons for a lapse in user perception of system performance :
The PEAK Performance of a system will be dictated by the performance of it's most latent and/or contentious components (or sub-systems) along the critical path of system performance. (eg. The PEAK bandwidth of a system is no greater than that of it's slowest components along the path of a transaction and all it's interactions.)
As the holy grail of system performance (along with Capacity Planning.. and ROI) dictates, ... a system that allows for as close to 100% of CPU processing time as possible (vs. WAIT events that pause processing) is what every IT Architect and System Administrator strives for. This is where systems using CMT (multiple cores per cpu, each with multiple threads per core) shine, allowing for more processing to continue even when many threads are waiting on I/O.
The Application Environment and it's Sub-Systems ... where the bottlenecks can be found
Within Computing, or more broadly, Information Technology, "latency" and it's underlying causes can be tied to one or more specific "sub-systems". The following list reflects the first level of "sub-systems" that you will find for any Application Environment :
Subsystem / Components
Attributes and key Characteristics
Related Metrics, Measurements, and/or Interactions
System "Bus" / Backplane
Backplane / centerplane, I/O Bus, etc.. (many types of connectivity and media are possible, all with individual response times and bandwidth properties).
Busstat output, aggregated total throughput #'s (from kstat, etc..)
# Cores, # HW Threads per core, Clock speed / Frequency in Ghz (cycles per second), Operations (instructions) per Sec, Cache, DMA, etc..
vmstat, trapstat, cpustat, cputrack, mpstat, ... (Run Queue, Blocked Kthreads, ITLB_Misses, % S/U/Idle Utilization, # lwp's, ...)
Memory / Cache
Speed/Frequency of Bus, Bandwidth of Bus, Bus Latency, DMA Config, L1/L2/L3 Cache Locations/ Sizes, FS page cache, Physical Proximity of Cache and/or RAM, FS page caching, tmpfs, pagesizes, ..
vmstat, pmap, mdb, kstat, prstat, trapstat, ipcs, pagesize, swap, ... (Cache Misses, DTLB_misses, Page Scan Rate, heap/stack/kernel sizes,..)
Controllers (NIC's, HBA's, ..)
NIC RX Interrupt Saturation, NIC Overflows, NIC / HBA Caching, HBA SW vs. HW RAID, Bus/Controller Bridges/Switches, DMP, MPxIO, ...
netstat, kstat (RX Pkts / Sec, Network Errors, ...) , iostat, vxstat.. (Response Times, Storage device Svc_times..), lockstat, intrstat, ...
Disk Based Devices
Boot Devices, RAID LUN's, File Systems (types, block sizes, ...), Volumes, RAID configuration (stripes, mirrors, RAID Level, paths,...), physical fragmentation, Mpxio, etc..
iostat, vxstat, kstat, dtrace, statspack, .. (%wait, Service Times, blocked kernel threads, ... FS/LUN Hot Spots)
OS / Kernel
Process Scheduling, Virtual Memory Mgmt, HW Mgmt/Control, Interrupt handling, polling, system calls, ...
(utilization, interrupts, syscalls, %Sys / % Usr, ...), prstat, top,
mpstat, ps, lockstat (for smtx, lock, spin.. contention), ...
OS Infrastructure Services
FTP, Telnet, BIND/DNS, Naming Svcs, LDAP, Authentication/Authoriz., ..
svcadm, .. various ..
DB Svr, Web Svr, Application Svr, ...
Bandwidth and related Latencies :
The following table demonstrates the wide range of typical operating frequencies and latencies PER Sub-System, Component, and/or Media Type :
Component / Transport Media
Response Time / Frequency / Speed
Throughput / Bandwidth
> 1+ Giga Hertz (1+ billion cycles per
>1 billion operations per second
(PC-3200@200MHz/200MHz bus) ~5ns
DDR2 (PC2-5300@166MHz/333MHz bus) ~ 6ns
DDR2 (PC2-8500@266MHz/533MHz bus) ~ 3.75ns
(billionths of a second)
Peak Transfer 3.2 GB/s
Pk Transfer 8.5GB/s <TBD>
Service Times : ~5+ ms =
varies greatly, see below
Ultra 320 SCSI (16 bit) parallel
(high performance, cable & dev
Up to 320 MBps
SAS [Serial Attached SCSI]
> 300 MBps (>3 Gbps)
SATA [Serial ATA]
low cost, higher capacity (poor performance)
Up to 300 MBps
(1 microsecond [us] = 1 millionth of a second)
|up to 480
Mbps (60 MBps) ~40 MBps
||Up to 50 MBps
Fiber Channel (Dual Ch)
4 Gb (4 / 2 / 1 Gb) \*2
Up to 1.6 GBps (1 GB Usable)Up to 3.2 GBps (1.8 GB Usable)
1 Gigabit Ethernet
\*\* Latency ~ 50 us
125 MBps (~1 Gbps) theoretical
||Up to 20
Gbps (<= 9 Gbps Usable)
Infiniband (Dual Ported HCA)
x4 (SDR / DDR) Dual Ported= \*2
\*\* Latency < 2 microseconds \*\*
2\*10Gb= 20 Gbps (16Gbps Usable)
Up to 40 Gbps (32 Gbps Usable)
||32 bit @ 33
64 bit @ 33 MHz
64 bit @ 66MHz
||64 bit bus
width @ 100 MHz (parallel bus)
64 bit bus width @ 133 MHz (parallel bus)
|Up to 800
1066 MBps (1 GBps)
bus / bi-directional @ 2.5 GHz
v.2 @ 5 GHz <TBD>
(10's -100's of nanoseconds for latencies)
|4 GBps (x16 lanes) one direction
8 GBps (x32 lanes) one direction
Up to 16 GBps bi-directional (x32)
32 GBps bi-directional (x32 lanes)
Other Considerations Regarding System Latency :
Other considerations regarding system latency that are often overlooked include the following, which offers us a more holistic vantage point of system performance and items that might work against "Peak system capabilities :
The "Iterative" nature of Performance Analysis and System Tuning
No matter what the root causes are found to be, in the realm of Performance Analysis and system Tuning, ... once you remove one bottleneck, the system processing characteristics will change, resulting in a new performance profile, and new "hot spots" that require further data collection and analysis. The process is iterative, and requires a methodical approach to remediation.
Make certain that ONLY ONE (1) change is made at a time, otherwise, the effects ( + or - ) can not be quantified.
Hopefully at some point in the future
we'll be operating at latencies measured in attoseconds (10 \^-18th, or
1 quintillionth of a second), but until then .... Happy tuning :)
For more information regarding Performance Analysis, Capacity Planning, and related Tools, review some of my other postings at : http://blogs.sun.com/toddjobson/category/Performance+and+Capacity+Planning
Copyright 2007 Todd A. Jobson
If you haven't been up to date on the latest news regarding Sun's High Availability offerings with Sun Cluster 3.2 .. (even Open Sourcing it !).. I though this would be the perfect opportunity for a quick recap with a few key articles, WP's, and related links ( many of these are specific to Oracle RAC integration with Sun Cluster ) :
\* Sun Cluster 3.2 Offers the Strongest Integration with Oracle RAC 10gR2..
> Sun Cluster 3.2 Software: Making Oracle Database 10G R2 RAC Even More Unbreakable
\* Sun offers up Solaris Cluster as Open Source Gem : (Network World link)
> Sun will post code to High Availability Clusters community on OpenSolaris.org
In case you didn't know, Sun is actually ranked #1 in terms of SW Contributions to the Open Source community! Even BEFORE the JAVA (OpenJDK) OR Sun Cluster contributions, the European Commission on FLOSS (Free/Libre Open Source Software) has reflected that Sun contributes to and participates in more open source projects than any other commercial company, including IBM, RedHat, Novell and HP. See : (page 51) for the breakdown. http://ec.europa.eu/enterprise/ict/policy/doc/2006-11-20-flossimpact.pdf
Sun Cluster 3.x is a Best In Class High Availability Suite that offers everything from single node clustering.. all the way up to Global / Geographic Clustering for Disaster Recovery. You can find Sun's external Availability
Suite page at : http://www.sun.com/software/solaris/cluster/index.xml
The following entry is a variation of an article that I created for this month's Sun "Technocrat" publication (Aug. 2007).
This posting demonstrates the art of system profiling (from a high level overview) by introducing a few sample screenshots of the sys_diag .html report (it's header, Dashboard, and Table of Contents).. demonstrating how in a few minutes, sys_diag can present you with an accurately depicted system profile !
Note, the .html report snapshot samples presented here, match the command line output from my previous blog postings (from the same run of sys_diag).
If you haven't had the chance to try out sys_diag yet, this should give you the highlights of what you can expect in the .html report header sections.
Enjoy and let me know what you think,
So.. what is Profiling.. ?? ... Well, in the real world, you can define profiling in many ways.
.. from the "profile" of the person standing next to you (what you see from your vantage point), to the personality "profiles" that we've all heard of in psychology (characterization based upon key attributes) ..
In your standard dictionary you'll find a definition such as this (from Dictionary.com) :
Well, in the world of technology, and more specifically.. Computing, "profiling" takes on it's own connotation, though similar to many of the more technical definitions noted above.
To some, system profiling simply includes a high level summary of resource utilization and bottleneck identification of a system during some period of data collection (point in time or over a duration).
System profiling to me is the characterization of a system as a whole, given a set of data, either for one event/point in time, or over a duration. This characterization goes beyond workload (as you'll typically hear the term "workload characterization"), which is why I call it "Broad (or Full) Spectrum" profiling, more broadly taking into account and including :
This would be in contrast to "Narrow Spectrum" (Focused) System Profiling, where attention to detail is focused in a very "narrow" and specific area of interest for analysis (typically in determining a Root Cause where a specific bottleneck is know within a sub-system or specific component of the system).
Look for more details on this and much more in an up-coming blog entry more thoroughly delineating the distinction between Profiling, Workload Characterization, Performance Analysis, and Capacity Planning....
For now, enjoy the following discussion on how sys_diag can have you profiling in no time at all ... :)
The following sample .html system performance “Dashboard” (a portion of which is shown below) reflects the 4 key sub-systems (CPU/Kernel Profiling, Memory, IO, and Networking) as a summarized depiction of sub-system “health”, based against a list of rules / thresholds that the captured data is compared to during post-processing.
These rules and thresholds are listed in the Performance Analysis section (Section #24) and can be easily tuned / modified to offer more stringent or lenient identification of performance exceptions that contribute to the Green/ Yellow / Red (OK / Warning / Critical) color-coded status within each dashboard section. Within each section are listed the key performance metrics and a summary of exceptions, along with Average and Peak (High Water Mark) values present during the collection/sampling period. At the end of each section is a list of key “links” to the substantiating detailed data analysis within the report.
When run for performance data gathering (-g or -G), 2 types of performance data is captured :
\* vmstat, mpstat, iostat, netstat, .. data for a duration (-T total_secs), captured at specified sampling rates (-I interval_#secs). The default duration is 5 minutes of data capture @ 2 second intervals if -I / -T are not specified.
\* 3 Point in Time detailed snapshots (beginning, mid point, end point). If -G is used, and Solaris 10 is the OS, then Dtrace and detailed lockstat, pmap/pfiles, cputrack, ... snapshots will be taken (beyond the core “-g” snapshots that include ps, netstat -i, vxstat, kstat, ...).
Sys_diag has been run on virtually all models of Sun systems running Solaris 2.6 or > (from x86 laptops up to fully loaded E25K's), offering extensive Solaris 10 configuration and performance data, including DTrace snapshots. It creates a single .tar.Z compressed archive (including all raw, snapshot and post-processed datafiles) that can be emailed/ ftp'd.. for performing system configuration and/or performance analysis off-site.. from virtually anywhere.
This is one of the key characteristics that sys_diag offers.. to save a LOT of time.. not requiring many separate manual runs / collection / correlation of data, or the need for any 3rd party tools, libraries, or agents to be installed on a system other than downloading the "sys_diag" ksh script itself. Virtually no learning curve is required for loading, running, and reflecting basic performance profiling, including high level sub-system bottlenecks (deeper root cause correlation might require some level of advanced system administration knowledge, though virtually all the data needed will have been already captured by sys_diag).
This utility has been used extensively in the field over the last several years, run on literally hundreds of production systems as part of escalation root cause analysis, in addition to providing the basis for dozens of Architectural and/or Performance Assessments
(including formal Capacity Planning / Benchmarking). Graphing of the data captured (vmstat, netstat...) is also easy to do using StarOffice as explained in the README file that sys_diag creates.
The screenshot below shows the Table of Contents and related sections available within the .html report (\* Click to Enlarge \*) :
Although this tool isn't meant to replace long-term historical Performance Trending and Capacity Planning packages (Teamquest, etc..), it provides the foundation and basis for a very robust starting point (and actually is much better at point in time workload characterization and root cause analysis of bottlenecks, where very granular detailed data correlation is required).
Over the time that sys_diag has been posted on BigAdmin, many Sun customers around the globe have downloaded and commented positively on their experiences with it. For more information, or to download and try it out for yourself , the following URL's should help you get started :
The latest release of sys_diag (v.7.04) is available from BigAdmin
(unpackaged ksh) at :
sys_diag is also available as part of the "SunFreeware" Distribution
(packaged with the README) at :
The following recent blog postings provide an extended overview of sys_diag and it's capabilities :
Solaris Performance Analysis and Monitoring Tools... at what cost ?...http://blogs.sun.com/toddjobson/entry/solaris_performance_monitoring_tools
What is sys_diag ?? .. Automating Solaris Performance Profiling and Workload Characterization.
sys_diag v.7.04 command line output ...
\*\*Note, read the ksh script header pages or the README file prior to using, and ALWAYS test first on a representative non-production system.. as is the best practice when making ANY production environment changes... ;)
(Copyright 2007, Todd A. Jobson)
As of 7/30/2007, SunFreeware.com will now be including the Solaris utility "sys_diag" as part of their distributions (Solaris 8 -> Solaris 10 for both Sparc and x86).
The format at SunFreeware varies BigAdmin's distribution, only in that BigAdmin provides the raw ksh script, where sys_diag on SunFreeware is packaged along with it's README file in the standard Solaris package format (pkgadd).
If you're not yet familiar with SunFreeware.com, you should be ! .. check it out asap for a great selection of Solaris freeware (already compiled and packaged for you !!).
The following output was captured recently from running sys_diag v.7.04 on a (Solaris 10u3) Sun Ultra60 2 cpu test system in my lab
Note the list of utilities run and types of data captured, as well as the final performance summary (a small summary of the complete color coded HTML dashboard available in the full .html report).
sys_diag has been run on virtually every type of Sun system, running Solaris 2.6 -> S10. I have personally conducted dozens of Performance Analysis, Capacity Planning/ Benchmarking, and/or Architectural Assessments using sys_diag in production environments.. x86.. up to fully loaded E25K environments.
The latest release of sys_diag (v.7.04) is available from either BigAdmin (unpackaged ksh) or SunFreeware.com (pkg'd with the README) at the following URL's :
Realize that more than half of sys_diag 's benefit is in working from the .html aggregated report file.. that links and correlates all the independant data files together with findings and exceptions via a nice color-coded header / dashboard / and Table of Contents. (the legwork is all done for you !)
I'll try to get a sample snapshot of a report header/dashboard for an up-coming blog... but for now, just download and test run sys_diag (v.7.04 is recommended), review the final .html report and forward and questions/comments back to me.. along with RFE for future releases.
(Read the last sections of the README for a detailed description of all datafiles created/available...)
With a little practice, it should save you many hours.. if not days.. of effort as it does for me.
Enjoy and let me know what you think,
The following example does the deepest level of Performance data Gathering (-G, which includes Dtrace and pmap/pfiles snapshots vs. -g for light-weight perf gathering), Verbose output (-V), in addition to creation of a long/detailed configuration report (-l). The sampling rate used is 1 second intervals (-I1) for a total duration of 298 seconds (-T298).
\*Without -I || -T, the defaults are 2 second samples for 5 minutes total data gathering. Also note that when -G && -V are used together, the initial Dtrace and Lockstat snapshots take a couple minutes to complete, prior to beginning the data collection for 298 seconds (since the duration of probing is expanded with -V to 5 seconds vs 2 seconds with -G alone, or 1 second minimal lockstat sampling using -g ..aka.. no Dtrace probing).
root@/var/tmp # ./sys_diag -G -V -l -I1 -T298
sys_diag:0717_033209: GATHER Extra PERFORMANCE DATA (-G)
sys_diag:0717_033209: VERBOSE (-V)
sys_diag:0717_033209: INTERVAL : 1 second sampling (-I1)
sys_diag:0717_033209: TIME Duration: 298 seconds (-T298)
sys_diag:0717_033209: LONG report (-l)
sys_diag:0717_033209: # Creating ... README_sys_diag.txt ...
sys_diag: ------- Beginning Process SNAPSHOT (# 0) -------
sys_diag:0717_033209: Dtrace: TCP write bytes by process ...(_dtcp_tx Snap 0)
sys_diag:0717_033209: Dtrace: TCP read bytes by process ... (_dtcp_rx Snap 0)
sys_diag:0717_033209: Dtrace: systemwide IO / IO wait... (_diow Snap 0)
sys_diag:0717_033235: Dtrace: Syscall count by process... (_dcalls_ Snap 0)
sys_diag:0717_033243: Dtrace: Syscall count by syscall... (_dsyscall_ Snap 0)
sys_diag:0717_033251: Dtrace: Read bytes by process... (_dR_ Snap 0)
sys_diag:0717_033258: Dtrace: Write bytes by process... (_dW_ Snap 0)
sys_diag:0717_033306: Dtrace: Sysinfo counts by process... (_dsinfo_ Snap 0)
sys_diag:0717_033314: Dtrace: Sdt_counts ... (_dsdtcnt_ Snap 0)
sys_diag:0717_033321: Dtrace: Interupt Times [sdt:::intr].. (_dintrtm_ Snap 0)
sys_diag:0717_033321: # ps -e -o ...(by %CPU) ... Snapshot # 0
sys_diag:0717_033321: # ps -e -o ...(by %MEM) ... Snapshot # 0
sys_diag:0717_033332: # pmap -xs 519 ...
sys_diag:0717_033332: # pmap -S 519 ...
sys_diag:0717_033332: # pmap -r 519 ...
sys_diag:0717_033332: # ptree -a 519 ...
sys_diag:0717_033332: # pfiles 519 ...
sys_diag:0717_033333: Dtrace: IO by process 519 ... (_dpio Snap 0)
sys_diag:0717_033339: # pmap -xs 448 ...
sys_diag:0717_033339: # pmap -S 448 ...
sys_diag:0717_033339: # pmap -r 448 ...
sys_diag:0717_033339: # ptree -a 448 ...
sys_diag:0717_033339: # pfiles 448 ...
sys_diag:0717_033340: Dtrace: IO by process 448 ... (_dpio Snap 0)
sys_diag:0717_033346: # pmap -xs 90 ...
sys_diag:0717_033346: # pmap -S 90 ...
sys_diag:0717_033346: # pmap -r 90 ...
sys_diag:0717_033346: # ptree -a 90 ...
sys_diag:0717_033346: # pfiles 90 ...
sys_diag:0717_033347: Dtrace: IO by process 90 ... (_dpio Snap 0)
sys_diag:0717_033353: # pmap -xs 825 ...
sys_diag:0717_033353: # pmap -S 825 ...
sys_diag:0717_033353: # pmap -r 825 ...
sys_diag:0717_033353: # ptree -a 825 ...
sys_diag:0717_033353: # pfiles 825 ...
sys_diag:0717_033353: Dtrace: IO by process 825 ... (_dpio Snap 0)
sys_diag:0717_033353: # /usr/bin/netstat -i -a ...
sys_diag:0717_033400: # Snapshot Kernel Memory Usage.. ::memstat | mdb -k ...
sys_diag:0717_033409: # /usr/sbin/lockstat -IW -n 100000 -s 13 sleep 5 ...
sys_diag:0717_033419: # /usr/sbin/lockstat -A -n 90000 -D15 sleep 5 ...
sys_diag:0717_033431: # /usr/sbin/lockstat -A -s8 -n 90000 -D10 sleep 5 ...
sys_diag:0717_033446: # /usr/sbin/lockstat -AP -n 90000 -D10 sleep 5 ...
sys_diag:0717_033521: Dtrace: Involuntary Context Switches (icsw) by process .. (_dmpc Snap 0)
sys_diag:0717_033526: Dtrace: Cross CPU Calls (xcal) caused by process ........ (_dmpc Snap 0)
sys_diag:0717_033531: Dtrace: MUTEX try lock (smtx) by lwp/process ............ (_dmpc Snap 0)
sys_diag: --\*\*-- (Background) DATA COLLECTION FOR 298 secs STARTED --\*\*--
sys_diag:0717_033531: # /usr/bin/vmstat -q 1 298 > ./sysd_socrates_070717_0332/sysd_vm_socrates_070717_033209.out 2>&1 &
sys_diag:0717_033531: # /usr/bin/iostat -xn 1 298 > ./sysd_socrates_070717_0332/sysd_io_socrates_070717_033209.out 2>&1 &
sys_diag:0717_033531: # /usr/bin/mpstat -q 1 298 > ./sysd_socrates_070717_0332/sysd_mp_socrates_070717_033209.out 2>&1 &
sys_diag:0717_033537: # /usr/bin/netstat -i -I lo0 1 298 > ./sysd_socrates_070717_0332/sysd_net1_socrates_070717_033537.out 2>&1 &
sys_diag:0717_033537: # /usr/bin/kstat -p -T u -n lo0 1> ./sysd_socrates_070717_0332/sysd_knetb_lo0_socrates_070717_033537.out 2>&1
sys_diag:0717_033538: # /usr/bin/netstat -i -I hme0 1 298 > ./sysd_socrates_070717_0332/sysd_net2_socrates_070717_033538.out 2>&1 &
sys_diag:0717_033538: # /usr/bin/kstat -p -T u -n hme0 1> ./sysd_socrates_070717_0332/sysd_knetb_hme0_socrates_070717_033538.out 2>&1
sys_diag:0717_033538: # /usr/sbin/snoop ...
sys_diag: ------- (Foreground) Gathering System Configuration Details -------
sys_diag:0717_033539: # uname -a ...
sys_diag:0717_033539: # hostid ...
sys_diag:0717_033539: # domainname (DNS) ...
sys_diag:0717_033539: ###### SYSTEM CONFIGURATION / DEVICE INFO ######
sys_diag:0717_033539: # prtdiag ...
sys_diag:0717_033539: # prtconf | grep Memory ...
sys_diag:0717_033539: # /usr/sbin/psrinfo -v ...
sys_diag:0717_033539: # /usr/sbin/psrinfo -pv ...
sys_diag:0717_033539: # /usr/sbin/psrset -q ...
sys_diag:0717_033539: # cfgadm -l ...
sys_diag:0717_033539: # cfgadm -al ...
sys_diag:0717_033539: # cfgadm -v ...
sys_diag:0717_033539: # cfgadm -av | grep memory | grep perm ...
sys_diag:0717_033541: ###### E10K / E25K / SunFire System INFO ######
sys_diag:0717_033541: # Checking Kernel Cage settings ...
sys_diag:0717_033541: # eeprom ...
sys_diag:0717_033541: # /usr/bin/coreadm ...
sys_diag:0717_033541: # /usr/sbin/dumpadm ...
sys_diag:0717_033541: # modinfo ...
sys_diag:0717_033541: # /usr/sbin/lustatus ...
sys_diag:0717_033541: # cat /etc/path_to_inst ...
sys_diag:0717_033542: ###### WORKLOAD CHARACTERIZATION ######
sys_diag:0717_033542: # prstat -c -a 1 1 ...
sys_diag:0717_033542: # prstat -c -J 1 1 ...
sys_diag:0717_033542: # prstat -c -Z 1 1 ...
sys_diag:0717_033542: # prstat -c 1 2 ...
sys_diag:0717_033544: # prstat -c -v 1 3 ...
sys_diag:0717_033546: # ps -e -o ...(by %CPU) ...
sys_diag:0717_033546: # ps -e -o ...(by %MEM) ...
sys_diag:0717_033546: # ps -e -o ...(by LWP) ...
sys_diag:0717_033546: ###### PERFORMANCE PROFILING (System / Kernel) ######
sys_diag:0717_033547: # vmstat 1 5 ...
sys_diag:0717_033551: # /usr/bin/mpstat 1 3 ...
sys_diag:0717_033551: # /usr/bin/isainfo -v ...
sys_diag:0717_033553: # /usr/bin/ipcs -a ...
sys_diag:0717_033553: # /usr/bin/pagesize ...
sys_diag:0717_033553: # swap -l ...
sys_diag:0717_033553: # swap -s ...
sys_diag:0717_033553: # /usr/bin/vmstat -s ...
sys_diag:0717_033553: # /usr/bin/kstat -n system_pages ...
sys_diag:0717_033553: # /usr/bin/kstat -n vm ...
sys_diag:0717_033554: # /usr/sbin/trapstat 1 2 ...
sys_diag:0717_033554: # /usr/sbin/trapstat -t 1 2 ...
sys_diag:0717_033554: # /usr/sbin/trapstat -l ...
sys_diag:0717_033554: # /usr/sbin/trapstat -t 1 2 ...
sys_diag:0717_033554: # /usr/sbin/trapstat -T 1 2 ...
sys_diag:0717_033554: # /usr/sbin/intrstat 1 2 ...
sys_diag:0717_033554: # /usr/bin/vmstat -i ...
sys_diag:0717_033554: ###### KERNEL ZONES/ SRM / Acctg / TUNABLES ######
sys_diag:0717_033554: # /usr/sbin/zoneadm list -v ...
sys_diag:0717_033554: # /usr/bin/projects -l ...
sys_diag:0717_033554: # /usr/sbin/psrset -i ...
sys_diag:0717_033554: # /usr/sbin/psrset -p ...
sys_diag:0717_033554: # /usr/sbin/psrset -q ...
sys_diag:0717_033554: # /usr/sbin/rctladm -l ...
sys_diag:0717_033554: # /usr/bin/priocntl -l ...
sys_diag:0717_033554: # /usr/sbin/acctadm ...
sys_diag:0717_033554: # /usr/sbin/acctadm -r...
sys_diag:0717_033554: # tail -80 /etc/system ...
sys_diag:0717_033554: # sysdef | tail -85 ...
sys_diag:0717_033554: # tail -40 /etc/init.d/sysetup ...
sys_diag:0717_033554: # cat /etc/power.conf ...
sys_diag:0717_033612: ###### STORAGE / ARRAY INFO ######
sys_diag:0717_033612: # prtconf -pv ...
sys_diag:0717_033613: # luxadm probe ...
sys_diag:0717_033614: ###### STORAGE VOLUME MANAGEMENT INFO ######
sys_diag:0717_033614: ###### SOLARIS (SDS/SVM) VOLUME MANAGER Info ######
sys_diag:0717_033614: # /sbin/metadb ...
sys_diag:0717_033614: # /sbin/metastat ...
sys_diag:0717_033614: # /sbin/metastat -p...
sys_diag:0717_033614: ###### Sun STMS / MPxIO Info ######
sys_diag:0717_033614: # cat /kernel/drv/fp.conf ...
sys_diag:0717_033614: # cat /kernel/drv/fcp.conf ...
sys_diag:0717_033614: ###### FILESYSTEM INFO ######
sys_diag:0717_033614: # df ...
sys_diag:0717_033614: # df -k ...
sys_diag:0717_033614: # mount -v ...
sys_diag:0717_033614: # /usr/sbin/showmount -a ...
sys_diag:0717_033614: # cat /etc/vfstab ...
sys_diag:0717_033614: # /usr/bin/cachefsstat ...
sys_diag:0717_033614: ###### I/O STATS ######
sys_diag:0717_033614: # /usr/bin/iostat -nxe 3 2 ...
sys_diag:0717_033614: # /usr/bin/iostat -xcC 3 2 ...
sys_diag:0717_033614: # /usr/bin/iostat -xnE ...
sys_diag:0717_033614: ###### NFS INFO ######
sys_diag:0717_033614: # /usr/bin/nfsstat ...
sys_diag:0717_033614: # /usr/bin/nfsstat -m ...
sys_diag:0717_033614: ###### NETWORKING INFO ######
sys_diag:0717_033614: # cat /etc/hosts ...
sys_diag:0717_033614: # /usr/sbin/ifconfig -a ...
sys_diag:0717_033614: # /usr/bin/netstat -i ...
sys_diag:0717_033614: # /usr/bin/netstat -r ...
sys_diag:0717_033614: # /usr/sbin/arp -a ...
sys_diag:0717_033614: # /usr/sbin/ping -s 192.168.200.1 56 10 ...
sys_diag:0717_033614: # /usr/sbin/ping -s 192.168.200.1 1016 10 ...
sys_diag:0717_033614: # /usr/sbin/ping -s google.com 56 10 ...
sys_diag:0717_033614: # /usr/sbin/ping -s google.com 1016 10 ...
sys_diag:0717_033614: # cat /etc/hostname.hme0 ...
sys_diag:0717_033614: # cat /etc/inet/networks ...
sys_diag:0717_033614: # cat /etc/netmasks ...
sys_diag:0717_033614: # tail -30 /etc/inet/ntp.server ...
sys_diag:0717_033614: # /usr/sbin/dladm show-dev ...
sys_diag:0717_033614: # /usr/sbin/dladm show-link ...
sys_diag:0717_033614: # /usr/sbin/dladm show-aggr ...
sys_diag:0717_033614: # /usr/sbin/pntadm -L ...
sys_diag:0717_033703: # /usr/bin/kstat -c net ...
sys_diag:0717_033703: # ndd -get /dev/tcp ...
sys_diag:0717_033703: # ndd -get /dev/udp ...
sys_diag:0717_033703: # ndd -get /dev/ip ...
sys_diag:0717_033706: # ndd -set /dev/hme instance 0 ...
sys_diag:0717_033706: # ndd -get /dev/hme ...
sys_diag:0717_033706: # /usr/bin/netstat -a ...
sys_diag:0717_033711: # /usr/bin/netstat -s ...
sys_diag:0717_033711: ###### TTY / MODEM INFO ######
sys_diag:0717_033711: # /usr/sbin/pmadm -l ...
sys_diag:0717_033711: # cat /etc/remote ...
sys_diag:0717_033711: # cat /var/adm/aculog ...
sys_diag:0717_033711: ###### USER / ACCOUNT / GROUP Info ######
sys_diag:0717_033711: # w ...
sys_diag:0717_033711: # who -a ...
sys_diag:0717_033711: # cat /etc/passwd ...
sys_diag:0717_033711: # cat /etc/group ...
sys_diag:0717_033711: ###### SERVICES / NAMING RESOLUTION ######
sys_diag:0717_033711: # /usr/bin/svcs -v ...
sys_diag:0717_033711: # cat /etc/services ...
sys_diag:0717_033711: # cat /etc/inetd.conf ...
sys_diag:0717_033711: # cat /etc/inittab ...
sys_diag:0717_033711: # cat /etc/nsswitch.conf ...
sys_diag:0717_033711: # cat /etc/resolv.conf ...
sys_diag:0717_033711: # cat /etc/auto_master ...
sys_diag:0717_033711: # cat /etc/auto_home ...
sys_diag:0717_033712: # /usr/bin/ypwhich ...
sys_diag:0717_033712: # /usr/bin/nisdefaults ...
sys_diag:0717_033712: ###### SECURITY / CONFIG FILES ######
sys_diag:0717_033712: # cat /etc/syslog.conf ...
sys_diag:0717_033712: # cat /etc/pam.conf ...
sys_diag:0717_033712: # cat /etc/default/login ...
sys_diag:0717_033712: # tail -250 /var/adm/sulog ...
sys_diag:0717_033712: # /usr/bin/last reboot ...
sys_diag:0717_033712: # /usr/bin/last -200 ...
sys_diag:0717_033712: # /usr/sbin/ipf -T list ...
sys_diag:0717_033712: # cat /etc/ipf/ipf.conf ...
sys_diag:0717_033712: # cat /etc/ipf/pfil.ap ...
sys_diag:0717_033712: # /usr/sbin/ipnat -vls ...
sys_diag:0717_033713: ###### HA/ CLUSTERING INFO ######
sys_diag:0717_033713: ###### SUN N1 Configuration INFO ######
sys_diag:0717_033713: ###### APPLICATION / ORACLE CONFIG FILES ######
sys_diag:0717_033713: ###### PACKAGE INFO / SOLARIS REGISTRY ######
sys_diag:0717_033713: # /usr/bin/prodreg browse ...
sys_diag:0717_033713: # /usr/bin/pkginfo ...
sys_diag:0717_033713: # /usr/bin/pkginfo -l ...
sys_diag:0717_033713: ###### PATCH INFO ######
sys_diag:0717_033713: # /usr/bin/showrev -p ...
sys_diag:0717_033713: # /usr/sadm/bin/smpatch analyze NOT RUN, passwd required....
sys_diag:0717_033753: ###### CRONTAB FILE LISTINGS ######
sys_diag:0717_033753: ###### FMD / SYSTEM MESSAGE/LOG FILES ######
sys_diag:0717_033753: # /usr/sbin/fmadm config ...
sys_diag:0717_033753: # /usr/sbin/fmdump ...
sys_diag:0717_033753: # /usr/sbin/fmstat ...
sys_diag:0717_033753: # tail -250 /var/adm/messages ...
sys_diag:0717_033753: # /usr/bin/dmesg ...
sys_diag:0717_033753: # tail -500 /var/log/syslog ...
sys_diag:0717_033754: ...WAITING 12 seconds for midpoint data collection...
sys_diag: ------- MidPoint Process SNAPSHOT (# 1) -------
sys_diag:0717_033806: Dtrace: TCP write bytes by process ...(_dtcp_tx Snap 1)
sys_diag:0717_033806: Dtrace: TCP read bytes by process ... (_dtcp_rx Snap 1)
sys_diag:0717_033806: Dtrace: systemwide IO / IO wait... (_diow Snap 1)
sys_diag:0717_033832: Dtrace: Syscall count by process... (_dcalls_ Snap 1)
sys_diag:0717_033840: Dtrace: Syscall count by syscall... (_dsyscall_ Snap 1)
sys_diag:0717_033847: Dtrace: Read bytes by process... (_dR_ Snap 1)
sys_diag:0717_033855: Dtrace: Write bytes by process... (_dW_ Snap 1)
sys_diag:0717_033903: Dtrace: Sysinfo counts by process... (_dsinfo_ Snap 1)
sys_diag:0717_033911: Dtrace: Sdt_counts ... (_dsdtcnt_ Snap 1)
sys_diag:0717_033918: Dtrace: Interupt Times [sdt:::intr].. (_dintrtm_ Snap 1)
sys_diag:0717_033918: # ps -e -o ...(by %CPU) ... Snapshot # 1
sys_diag:0717_033918: # ps -e -o ...(by %MEM) ... Snapshot # 1
sys_diag:0717_033929: # pmap -xs 4188 ...
sys_diag:0717_033929: # pmap -S 4188 ...
sys_diag:0717_033929: # pmap -r 4188 ...
sys_diag:0717_033929: # ptree -a 4188 ...
sys_diag:0717_033929: # pfiles 4188 ...
sys_diag:0717_033929: Dtrace: IO by process 4188 ... (_dpio Snap 1)
sys_diag:0717_033935: # pmap -xs 4181 ...
sys_diag:0717_033935: # pmap -S 4181 ...
sys_diag:0717_033935: # pmap -r 4181 ...
sys_diag:0717_033935: # ptree -a 4181 ...
sys_diag:0717_033935: # pfiles 4181 ...
sys_diag:0717_033936: Dtrace: IO by process 4181 ... (_dpio Snap 1)
sys_diag:0717_033942: # /usr/bin/netstat -i -a ...
sys_diag:0717_033942: # Snapshot Kernel Memory Usage.. ::memstat | mdb -k ...
sys_diag:0717_033952: # /usr/sbin/lockstat -IW -n 100000 -s 13 sleep 5 ...
sys_diag:0717_034002: # /usr/sbin/lockstat -A -n 90000 -D15 sleep 5 ...
sys_diag:0717_034015: # /usr/sbin/lockstat -A -s8 -n 90000 -D10 sleep 5 ...
sys_diag:0717_034037: # /usr/sbin/lockstat -AP -n 90000 -D10 sleep 5 ...
sys_diag:0717_034051: Dtrace: Involuntary Context Switches (icsw) by process .. (_dmpc Snap 1)
sys_diag:0717_034056: Dtrace: Cross CPU Calls (xcal) caused by process ........ (_dmpc Snap 1)
sys_diag:0717_034101: Dtrace: MUTEX try lock (smtx) by lwp/process ............ (_dmpc Snap 1)
sys_diag: ------- EndPoint Process SNAPSHOT (# 2) -------
sys_diag:0717_034101: # /usr/bin/kstat -p -T u -n lo0 2>&1
sys_diag:0717_034101: # /usr/bin/kstat -p -T u -n hme0 2>&1
sys_diag:0717_034107: Dtrace: TCP write bytes by process ...(_dtcp_tx Snap 2)
sys_diag:0717_034107: Dtrace: TCP read bytes by process ... (_dtcp_rx Snap 2)
sys_diag:0717_034107: Dtrace: systemwide IO / IO wait... (_diow Snap 2)
sys_diag:0717_034133: Dtrace: Syscall count by process... (_dcalls_ Snap 2)
sys_diag:0717_034141: Dtrace: Syscall count by syscall... (_dsyscall_ Snap 2)
sys_diag:0717_034149: Dtrace: Read bytes by process... (_dR_ Snap 2)
sys_diag:0717_034156: Dtrace: Write bytes by process... (_dW_ Snap 2)
sys_diag:0717_034204: Dtrace: Sysinfo counts by process... (_dsinfo_ Snap 2)
sys_diag:0717_034212: Dtrace: Sdt_counts ... (_dsdtcnt_ Snap 2)
sys_diag:0717_034220: Dtrace: Interupt Times [sdt:::intr].. (_dintrtm_ Snap 2)
sys_diag:0717_034220: # ps -e -o ...(by %CPU) ... Snapshot # 2
sys_diag:0717_034220: # ps -e -o ...(by %MEM) ... Snapshot # 2
sys_diag:0717_034230: # pmap -xs 519 ...
sys_diag:0717_034230: # pmap -S 519 ...
sys_diag:0717_034230: # pmap -r 519 ...
sys_diag:0717_034230: # ptree -a 519 ...
sys_diag:0717_034230: # pfiles 519 ...
sys_diag:0717_034231: Dtrace: IO by process 519 ... (_dpio Snap 2)
sys_diag:0717_034237: # pmap -xs 448 ...
sys_diag:0717_034237: # pmap -S 448 ...
sys_diag:0717_034237: # pmap -r 448 ...
sys_diag:0717_034237: # ptree -a 448 ...
sys_diag:0717_034237: # pfiles 448 ...
sys_diag:0717_034238: Dtrace: IO by process 448 ... (_dpio Snap 2)
sys_diag:0717_034244: # pmap -xs 90 ...
sys_diag:0717_034244: # pmap -S 90 ...
sys_diag:0717_034244: # pmap -r 90 ...
sys_diag:0717_034244: # ptree -a 90 ...
sys_diag:0717_034244: # pfiles 90 ...
sys_diag:0717_034245: Dtrace: IO by process 90 ... (_dpio Snap 2)
sys_diag:0717_034251: # pmap -xs 825 ...
sys_diag:0717_034251: # pmap -S 825 ...
sys_diag:0717_034251: # pmap -r 825 ...
sys_diag:0717_034251: # ptree -a 825 ...
sys_diag:0717_034251: # pfiles 825 ...
sys_diag:0717_034251: Dtrace: IO by process 825 ... (_dpio Snap 2)
sys_diag:0717_034251: # /usr/bin/netstat -i -a ...
sys_diag:0717_034258: # Snapshot Kernel Memory Usage.. ::memstat | mdb -k ...
sys_diag:0717_034307: # /usr/sbin/lockstat -IW -n 100000 -s 13 sleep 5 ...
sys_diag:0717_034317: # /usr/sbin/lockstat -A -n 90000 -D15 sleep 5 ...
sys_diag:0717_034329: # /usr/sbin/lockstat -A -s8 -n 90000 -D10 sleep 5 ...
sys_diag:0717_034344: # /usr/sbin/lockstat -AP -n 90000 -D10 sleep 5 ...
sys_diag:0717_034358: Dtrace: Involuntary Context Switches (icsw) by process .. (_dmpc Snap 2)
sys_diag:0717_034404: Dtrace: Cross CPU Calls (xcal) caused by process ........ (_dmpc Snap 2)
sys_diag:0717_034408: Dtrace: MUTEX try lock (smtx) by lwp/process ............ (_dmpc Snap 2)
sys_diag:0717_034408: ------- Data Collection COMPLETE -------
sys_diag:0717_034408: ###### SYSTEM ANALYSIS : INITIAL FINDINGS ... ######
sys_diag:0717_034414: ###### PERFORMANCE DATA : POTENTIAL ISSUES ######_____________________________________________________________________________________
sys_diag:0717_034414: ## Analyzing VMSTAT CPU Datafile :./sysd_socrates_070717_0332/sysd_vm_socrates_070717_033209.out ...
\* NOTE: 2.6936 % : 8 of 297 VMSTAT CPU entries are WARNINGS!! \*
TOTAL CPU AVGS : RUNQ= 0.1 : BThr= 0.0 : USR= 15.0 : SYS= 11.2 : IDLE= 73.5
PEAK CPU HWMs : RUNQ= 8 : BThr= 0 : USR= 51 : SYS= 96 : IDLE= 0
sys_diag:0717_034414: ## Analyzing VMSTAT MEMORY from Datafile :
\* NOTE: 0.673401 % : 2 of 297 VMSTAT MEMORY entries are WARNINGS!! \*
TOTAL MEM AVGS : SR= 0.0 : SWAP_free= 747697.4 K : FREE_RAM= 287786.6 K
PEAK MEM Usage: SR= 0 : SWAP_free= 500128.0 K : FREE_RAM= 57080.0 K
sys_diag:0717_034414: ## Analyzing MPSTAT Datafile : ./sysd_socrates_070717_0332/sysd_mp_\*.out ...
\* NOTE: 5.20134 % : 31 of 596 MPSTAT CPU entries are WARNINGS!! \*
CPU MP AVGS: Wt= 0: Xcal= 736: csw= 120: icsw= 3: migr= 5: smtx= 3: syscl= 1024
PEAK MP HWMs: Wt= 0: Xcal= 51771: csw= 14108: icsw= 32: migr= 55: smtx= 79: syscl= 25836
NOTE: 0.2% CPU cycles handling TLB MISSES (0.0% ITLB_misses: 0.2% DTLB_misses)
sys_diag:0717_034414: ## Analyzing IOSTAT Datafile :
\* NOTE: 14.4578 % : 24 of 166 IOSTAT entries are WARNINGS!! \*
TOP 10 Slowest IO Devices (\* AVG of non-zero device entries \*) :
r/s w/s kr/s kw/s actv wsvc_t asvc_t %w %b device # I/O Samples
32.6 10.8 263.6 24.6 0.8 0.0 13.7 0.0 19 c0t0d0 164
34.0 7.5 10.8 0.0 0.0 0.0 0.0 0.0 0 c0t1d0 2
CONTROLLER IO : AVG and TOTAL Throughput per HBA (\*active/non-zero entries only\*) :
c0 : AVG : 32.6 r/s | 10.8 w/s | 260.6 kr/s | 24.3 kw/s |
c0 : TOTAL: 5408 r | 1790 w | 43258 kr | 4037 kw | 166 entries
sys_diag:0717_034414: ## Analyzing NETSTAT Datafiles : ...
\* lo0 : NOTE: 0 % : 0 of 297 NETSTAT entries are WARNINGS!! \*
\* hme0 : NOTE: 0 % : 0 of 297 NETSTAT entries are WARNINGS!! \*
------------ \*MAX_RX_PKTS\* AVG_RX_PKTS AVG_RX_ERRS AVG_TX_PKTS AVG_TX_ERRS AVG_COLL
NET1 : lo0 : 4 0.0 0.0 0.0 0.0 0.0
------------ \*MAX_RX_PKTS\* AVG_RX_PKTS AVG_RX_ERRS AVG_TX_PKTS AVG_TX_ERRS AVG_COLL
NET2 : hme0 : 14 0.4 0.0 0.4 0.0 0.0
: hme0 : TOT_RX_Bytes TOT_TX_Bytes TOT_RX_Packets TOT_TX_Packets TOTAL_Seconds
22210 30348 124 112 328
: hme0:1: TOT_RX_Packets TOT_TX_Packets
: hme0:1: 0 0
NOTE: \*\* 2 ESTABLISHED connections (sockets) exist\*\*
\* NOTE: CPU=GRN : MEM=GRN : IO=YEL : NET=GRN \*
sys_diag:0717_034417: ... gen_html_hdr ...
sys_diag:0717_034417: ... gen_html_rpt ...
sys_diag:0717_034419: ## Generating TAR file : ./sysd_socrates_070717_0332.tar ...
tar -cvf ./sysd_socrates_070717_0332.tar ./sysd_socrates_070717_0332 1>/dev/null
compress ./sysd_socrates_070717_0332.tarData files have been TARed and compressed in :
\*\*\* ./sysd_socrates_070717_0332.tar.Z \*\*\*
------- Sys_Diag Complete -------
The following is an excerpt from the README_sys_diag.txt file which gives a
high level overview of the sys_diag capabilities and command line arguments /
/ usage examples... I've created this over many years to automate and reduce
the amount of time it takes to gather and correlate system data for conducting off-site (remote) Performance and Configuration Analysis. With sys_diag
all you need to do is download the ksh script.. and you're on your way. After
it's run.. you get a single .tar.Z that you can upload or email for remote
analysis.. (-G even includes a wide variety of DTrace examination..).
Read the following introduction.. and more specific examples will follow.
sys_diag v.7.04 Overview :
BACKGROUND / INTRODUCTION :sys_diag is a Solaris utility (ksh script) that can perform several
\*\* See : http://blogs.sun.com/toddjobson/ for other entries relating to system performance,
capacity planning, and systems architecture / availability.
\*\* For the latest release of sys_diag see either BigAdmin or SunFreeware.com at the following URL's :
Common Command Line usage and available parameters :
COMMAND USAGE :
# sys_diag [-a -A -c -C -d_ -D -f_ -g -G -H -I_ -l -L_ -n -o_ -p -P -s -S -T_ -t -u -v -V -h|-? ]
-a Application details (included in -l/-A)
-A ALL Options are turned on, except Debug and -u
-c Configuration details (included in -l/-A)
-C Cleanup Files and remove Directory if tar works
-d path Base directory for data directory / files
-D Debug mode (ksh set -x .. echo statements/variables/evaluations)
-f input_file Used with -t to list configuration files to Track changes for
-g gather Performance data (2 sec intervals for 5 mins, unless -I |-T exist)
-G GATHER Extra Perf data (S10 Dtrace, more lockstats, pmap/pfiles) vs -g
-h | -? Help / Command Usage (this listing)
-H HA config and stats
-I secs Perf Gathering Sample Interval (default is 2 secs)
-l Long Listing (most details, but not -g,-V,-A,-t,-D)
-L label_descr_nospaces (Descriptive Label For Report)
-n Network configuration and stats (also included in -l/-A except ndd settings)
-o outfile Output filename (stored under sub-dir created)
-p Generate Postscript Report, along with .txt, and .html
-P -d ./data_dir_path Post-process the Perf data skipped with -S and finish .html rpt
-s SecurIty configuration
-S SKIP POST PROCESSing of Performance data (use -P -d data_dir to complete)
-t Track configuration / cfg_file changes (Saves/Rpts cfg/file chgs \*see -f)
-T secs Perf Gathering Total Duration (default is 300 secs =5 mins)
-u unTar ed: (do NOT create a tar file)
-v version Information for sys_diag
-V Verbose Mode (adds path_to_inst, network dev's ndd settings, mdb, snoop..)
Longer message/error/log listings. Additionally, pmap is run if -g ||-G,
and the probe duration for Dtrace and lockstat sampling is widened
from 2 seconds (during -G) to 5 seconds (if -G && -V). Ping is
also run against the default route and google.com to guage latency.
NOTE: NO args equates to a brief rpt (No -A,-g/I,-l,-t,-D,-V,..)
\*\* Also, note that option/parameter ordering is flexible, as well as use of white
space before arguments to parameters (or not). The only requirement is to list
every option/parameter separately with a preceeding - (-g -l , but not -gl).
BOTH of the following command line syntax examples is functionally the same :
eg. ./sys_diag -g -I 1 -T 1800 -t -f ./config_files -l
./sys_diag -g -l -t -f./config_files -I1 -T1800
------------------------------------------------------------------------------------------------eg. Common Usage :
./sys_diag -l Creates a LONG /detailed configuration rpt (.html/.txt)
Without -l, the config report created has basic system cfg details.
./sys_diag -g -l gathers performance data at the default sampling rate of 2 secs for
a total duration of 5 mins, adding a color coded performance header/
Dashboard Summary section and any performance findings/
exceptions found to the long (-l) cfg rpt. Also takes (3) starting/
midpt/endpoint snapshots using minimal lockstat/kstat (1sec)
NOTE: -g is meant to gather perf data without overhead, therefore
only 1 second lockstat samples are taken. Use -G and/or -V
for more detailed system probing (see examples and notes below)
Using -V with -g, adds pmap/pfiles snapshots, vs. using -G
to also capture Dtrace and extended lockstat probing.\* Any time that sys_diag is run with either -g or -G, the command
line output is appended to the file sys_diag_perflog.out, which
gets copied and archived as part of the final .tar.Z output file.
./sys_diag -g -I 1 -T 600 -l gathers perf data at 1 sec samples for 10 mins and
creates a long config rpt as noted above. Also does
basic start/mid/endpoint sampling using lockstat/kstat/pmap.
./sys_diag -l -C creates long config rpt, and Cleans up..
aka removes the data directory after tar.Z completes
./sys_diag -d base_directory_path (changes the base dir for datafiles from curr dir)
./sys_diag -G -I 1 -T 600 -l Gathers DEEP performance & Dtrace/lockstat/pmap data
at 1 sec samples for 10 mins & creates a long cfg rpt
(in addition to the standard data gathering from -g).
\*NOTE: this runs all Dtrace/Lockstat/Pmap probing during 3 snapshot intervals
(beginning_0/midpoint_1/ and endpoint_#2 snapshots), limiting probing
overhead to BEFORE/AFTER the standard data gathering begins
(vmstat, mpstat, iostat, netstat, .. from -g).
The MIDPOINT probing occurs at a known point as not to confuse this
activity for other system processing.
\*Because of this, standard data collection may not start for 30+ seconds,
or until the beginning snapshot (snapshot_#0) is complete.(-g snapshot_#0 activities only take a couple seconds to complete, since
they do not include any Dtrace/lockstat.. beyond 1 sec samples).
./sys_diag -G -V -I 1 -T 600 Gathers DEEP, VERBOSE, performance & Dtrace/lockstat/pmap
data at 1 sec samples for 10 mins (using 5 second Dtrace and
Lockstat snapshots, vs. 2 second probes for only -G.
(in addition to the standard data gathering from -g).
./sys_diag -g -l -S (gathers perf data, runs long config rpt, and SKIPS Post-Processing
and .html report generation)
\*\* This allows for completing the post-processing/analysis activities
either on another system, or at a later time, as long as the data_directory
exists (which can be extracted from the .tar.Z, then refered to as
-d data_dir_path ). \*\* See the next example using -P -d data_path \*\*
./sys_diag -P -d ./data_dir_path (Completes Skipped Post-Processing & .html rpt creation)
In the area of Performance Analysis and related Monitoring tools, you'll find a plethora available for the Solaris environment. Each of them has it's own intrinsic costs associated.. listed here :
The Benefits of Accurate, Detailed, and Complete Data Gathering ...
\*\* NOTE: .. a key Attribute often left out is the ACCURACY and RELEVANCE of performance data captured (based up on the time it was captured, the sampling rates, and the level of detail provided).
This in many instances requires weighing the costs of having point in time event "detailed" snapshots (where the sampling rate intervals are very narrow.. per sec, etc.), vs. long-term historical trending data (where samples are aggregated and averaged over longer timeframes minimizing the storage requirements, but also smoothing out the Peak load visibility). For example, if you use a toolset or individual utility that can capture performance data at 1 second intervals, you will see a very granular view of systems utilization and PEAK load activity (resouce consumption, contention events, etc.).. VS.. using a historical trending toolset that can only save data at 1, 5, or 10 minute Averages.. (due to the contstraints of storage space available for the long periods of data that must be kept).
This might not seem like much would be missed, however.. even the difference between 1 second and 1 minute samples can be astronomical.. where 80 samples with 95% idle and 20 samples with 100% utilization (0% Idle) and a huge run queue will get "smoothed" out to a one minute sample where the box "appears" only 24% utilized (76% idle).. although the system is thrashing 20% of the time. Even within the period of a second, you have over a billion instructions that get run on modern cpu's running at GHz + clock rates (Billions of cycles per second).. and only one aggregated sample for that period.
For complete end-to-end Capacity Planning and Performance Analysis capabilities, BOTH types of data is generally required (longer term trending for Capacity Planning purposes via graphs, etc.. VS. short term detailed drill down of system activity for point in time PEAK LOAD periods, allowing for detailed performance and utilization assessment / correlation).
Without detailed and granular data during peak periods, there can be no real correlation of root causes or specific bottlenecks... and in the same regard, without long-term, historical data that shows growth rates in capacity and cycles (patterns and models) of utilization and Peak activity.. accurate Capacity Planning isn't feasible.
.. if data captured doesn't include peak activity, or the granularity of samples is too sparse.. (not reflecting peak events), ... then that data can only be useful for defining a BASELINE of Average Utilization.
MANY, many, .. tools
A wide variety of performance tools can be found.. from the high end.. using end-to-end third party products such as Teamquest (which provides a graphical, historical vantage point).. than need to be purchased, installed, and trained on... to the OS built-in utilities and the freely available open source / public domain variteies.
However, either way you go, be prepared for the requiring learning curve.. along with the extensive manual process and time required to identify and run the utilities, before you can capture and begin the extensive correlation process on the data from several disparate utilities (before you even get to do the analysis of your findings).
Either approach has it's advantages and disadvantages.. along with their strengths and weaknesses (3rd party purchased suites might save you time in graphical aggregation and correlation.. but tend to limit the level of detail and granularity available vs. what the OS utilities will provide).
The basic list of KEY "built-in" tools historically available for monitoring performance applies to nearly any Unix/Linux distribution, including the following partial list of common utilities used ... following the basic breakdown of computing subsystems :
\*\* CPU / Kernel Utilization :
--> vmstat (vm system cpu and kernel utilization metrics \*\* a great starting pt \*\*)
--> mpstat (multi processor .. per cpu performance statistics)
\*\* Memory / Kenel Utilization :
\*\* I/O Performance :
--> iostat (Standard IO.. ufs, .. IO performance utility)
--> vxstat (Veritas vxfs filesystem IO performance)
\*\* Network Utilization :
\*\* Process / Kernel :
\* sar (provides most basic types of high level performance metrics, assuming that system accounting is turned on, which does incur some level of system overhead when always running)
SOLARIS 10 ... Above and Beyond other Unix / Linux Distributions ...
In addition to the basic toolsets available, there exist the following key additions that Solaris 10 provides, which sets it apart from the other Unix / Linux variants.
\*\* DTrace (Dynamic Tracing via "D" language scripting and probe/providers)
__ Dtrace is the "Electron microscope" of performance analysis for a Solaris 10 system
See the DtraceToolkit for a long list of specific Dtrace scripts (several of which are used
within sys_diag, among others created)
\*\* lockstat (uses the kernel dtrace infrastructure) Summarizes system lock/mutex contention
\*\* Mdb (Modular Debugger)
\* kstat (Kernel statistics .. counters, etc..)
\* cpustat / cputrack (cpu statistics, system-wide or per process)
\* intrstat, trapstat (interrupt and system trap, I/DTLB_miss statistics, ..)
\* ... & many more.. [this list will be re-done in a future blog with a more thorough breakdown.. ]
The Time Saving.. automated nature... of SYS_DIAG :)
Over the past several years, I have created a utility called "sys_diag" that offers the capability of automatically capturing performance statistics, using nearly all available system utilities.. and aggregating the data, performing analysis and HTML report generation of findings. Sys_diag creates a single .tar.Z compressed archive that can be emailed/ftp'd.. for performing system configuration and/or performance analysis off-site.. from virtually anywhere.. saving a LOT of time.. not requiring any 3rd party tools or agents to be installed on a system other than downloading the "sys_diag" ksh script itself (with a color coded dashboard.. and links to detailed analysis findings). Virtually no learning curve is required for loading, running, and reflecting basic performance profiling, including high level subsystem bottlenecks (deeper root cause correlation might require some level of advanced sys admin knowledge).
Beyond performance analysis, sys_diag can be used to also generate a detailed configuration snapshot report, including OS, HW, Storage, SW, 3PP configuration attributes, among several other capabilities that it provides.
\*\* See the next blog entry for more details and examples on sys_diag \*\*.
The published repository and high level description of sys_diag is always available at BigAdmin using the following URL :
(Copyright 2007 Todd A. Jobson)
When we think of "Performance", the definition can have/take many connotations...
In the context of computing, the dictionary defines it as : (http://dictionary.reference.com/browse/performance)
PERFORMANCE (noun) :
"The manner in which or the efficiency with which something reacts or fulfills its intended purpose." or "the execution or accomplishment of work, acts, feats, etc."
From this definition, it can be readily seen that the "efficiency" and overall "utilization" of resources is a key characteristic of the "performance" of a system (also leaving room for some subjective interpretation).
Real World Performance.. and the holistic viewpoint
The other key aspects of assessing the performance, whether in the real world, or that of a system, relates directly to the volume of productive OUTPUT over a duration of TIME that a system produces.
In the arena of Information Technology.. as in real world performance (auto's, economics, the human body, etc..) the entity as a whole needs to be examined, allowing for symptoms to be identified in one or more areas... aka.. "sub-systems". Hence, the complete Integrated "system" as a whole.. comes to life with it's own unique dynamics and patterns that need to be examined.
(eg. one analogy might be the "performance" of a race car .. dependent upon the design/architecture of the vehicle.. and all it's constituent components.. the chasis [weight, flexibility, ..], the steering [responsiveness, turn ratio..], the engine [horsepower, air/fule intake, exhaust output, ..], the Transmission [gear ratios, latency in shifting.., MTBF of clutch,..], braking [responsiveness, 60-0 secs, ..], tires [ G's on the skidpad, wear rate, ..], .. and Overall Performance.. [0-60 acceleration, MPG, top speed, slalom speed, ..] .. individually each can be measured easily.. but as a whole.. the INTEGRATED "system dynamics" come into play ).
The same can be said (and analogous) to most "systems"... hence, looking at the environment as a whole is crucial ...
The "Application Environment" ...
That holistic entity in the arena of Computing is called the "Application Environment".. comprised of all the systems and underlying nested/encapsulated sub-systems. In the IT world, an Application Environment is composed of all the underlying infrastructure that together provides and supports the "Service(s)" (environments, systems, networks, storage, OS, Application Software, etc...).
"Perceived" Performance and Expectations :
For any system (or environment), the ultimate guage is in the "Perception" of it's performance, relating to whether or not it can fullfil the expectations of it's client user community.
How efficient, proficient, and/or productive we perceive something to be, is in large part.. a product of our vantage point (perception), and how we judge or evaluate it... according to our expectations, pre-conceived notions (rules), and the means available to us for measuring it (tools, etc..).
The perception of one impatient user doesn't always accurately reflect the responsiveness, efficiency, or other attributes for evaluating the performance / workload characterization of a system.
Understanding , Metrics, and Measurement ...
From this vantage point, it becomes evident that in Assessing a system, there must be measurement of key attributes .. aka.. METRICS... and in order to define key metrics that can/should be monitored, we must first UNDERSTAND the system and how it works (components, mechanics, inputs / outputs, among other items that can be measured).
Hence, "If you can't understand it... you can't effectively measure it, .. and if you can't measure it.. you can't assess it..". (T.Jobson 7/2006)
Requirements dictate Measurements... driven by Service Level Commitments :
Of the various vantage points that a system's performance is guaged, the following
attributes (relating to specific Metrics that can be sampled) are typically those
which Service Level Agreements (SLA's) and/or Commitments (SLC's) are based upon (reflecting Customer Requirements and "acceptable" Thresholds.. ) :
For any exceptions to the "acceptable" thresholds listed above, SLA's typically reflect PENALTIES ($$$).
Latency ... the heart of a Bottleneck ...
Each of the attributes and perceived guages of performance listed above has it's own intrinsic relationships and dependencies to specific subsystems and components... in turn reflecting a type of "latency" (delay in response). It is these latencies that are investigated and examined for root cause and correlation as the basis for most Performance Analysis activities.
\*\* STAY TUNED \*\*.. Look for my up-coming blog entry on "The many Flavors of system Latency..".
Future blog entries will expand upon this baseline definition of performance.. so keep your eye's peeled.. and look at the world around you.. from as many vantage points as possible... Perspective is key.. hand in hand with understanding the world around us... Don't be afraid to ask why.. and dig deeper.. there's typically a reason for everything if you look at it with an open mind.. understanding the fundamentals first !
Todd ;) :)
(Copyright 2007, Todd A. Jobson)
This blog does not reflect the viewpoint or opinions of Oracle. All comments are personal reflections and responsibility of Todd A. Jobson, Sr. and are implicitly copyrighted from the posted year to current year, to that effect.