Sunday Sep 30, 2007

The Many Flavors of System Latency.. along the Critical Path of Peak Performance


From an article that I wrote last month, published in the September 2007 issue of Sun's Technocrat, this examination of System Latency starts where we left off with the last discussion What is Performance ? .. in the Real World .  That discussion identified the following list of key attributes and metrics that most in the IT world associate with optimal system performance :
  • Response Times (Client GUI's, Client/Server Transactions, Service Transactions, ..) Measured as "acceptable" Latency.
  • Throughput (how much Volume of data can be pushed through a specific subsystem.. IO, Network, etc...)
  • Transaction Rates (DataBase, Application Services, Infrastructure / OS / Network.. Services, etc.).  These can be either rates per Second, Hour, or even Day... measuring various service-related transactions.
  • Failure Rates (# or Frequency of exceeding High or Low Water Marks .. aka Threshold Exceptions)
  • Resource Utilization (CPU Kernel vs. User vs. Idle, Memory Consumption, etc..)
  • Startup Time (System HW, OS boot, Volume Mgmt Mirroring, Filesystem validation, Cluster Data Services, etc..)
  • FailOver / Recovery Time (HA clustered DataServices, Disaster Recovery of a Geographic Service, ..)  Time to recover a failed Service (includes recovery and/or startup time of restoring the failed Service)
  • etc ...

Each of the attributes and perceived gauges of performance listed above have their own intrinsic relationships and dependencies to specific subsystems and components... in turn reflecting a type of "latency" (delay in response). It is these latencies that are investigated and examined for root cause and correlation as the basis for most Performance Analysis activities.

How do you define Latency ?

In the past, the most commonly used terminology relating to latency within the field of Computer Science had been "Rotational Latency". This was due to the huge discrepancy between the responsiveness of an operation requiring mechanical movement, vs. the flow of electrons between components, where previously the discrepancy was astronomical (nano seconds vs. milliseconds).  Although the most common bottlenecks do typically relate to physical disk-based I/O latency, the paradigm of latency is shifting.  With today's built in HW caching controllers and memory resident DB's, (along with other optimizations at the HW, media, drivers, and protocols...), the gap has narrowed. Realize that in 1 nanosecond (1 billionth of a second), electricity can travel approximately one foot down a wire (approaching the speed of light). 

However, given the industry's latest cpu's running multiple cores at clock speeds upwards of multiple GigaHertz (with >= 1 thread per core,  each theoretically executing > 1+ billion  instructions per second...), many bottlenecks can  now easily be realized within memory, where the densities have increased dramatically, the distances across huge supercomputer buses (and grids) have expanded dramatically, and most significantly.. the latency of memory has not decreased at the same rate as cpu speed increases. In order to best investigate system latency, we first need to define it and fully understand what we're dealing with.

LATENCY :

  • noun               The delay, or time that it takes prior to a function, operation, and/or transaction occurring.  (my own definition)
  • adj   (Latent)   Present or potential but not evident or active.
BOTTLENECK :
  • noun               A place or stage in a process at which progress is impeded.
THROUGHPUT :
  • noun              Output relative to input; the amount of data passing through a system from input to output.
BANDWIDTH :
  • noun              The amount of data that can be passed along a communications channel in a given period of time.

(definitions cited from www.dictionary.com)

 

The "Application Environment" and it's basic subsystems :

 

Once again, the all-inclusive entity that we need to realize and examine in it's entirety is the "Application Environment", and it's standard subsystems :

  • OS / Kernel (System processing)
  • Processors / CPU's
  • Memory
  • Storage related I/O
  • Network related I/O
  • Application (User) SW

 

The "Critical Path" of (End-to-End) System Performance :

Although system performance might frequently be associated with one (or a few) system metrics, we must take 10 steps back and realize that overall system performance is one long inter-related sequence of events (both parallel and sequential). Depending on the type of workload and services running within an Application Environment, the Critical Path might vary, as each system has it's own performance profile and related "personality. Using the typical OLTP RDBMS environment as an example, the Critical Path would include everything (and ALL Latencies incurred) between :

Client Node / User -> Client GUI -> Client Application / Services -> Client OS / Kernel -> Client HW -> NICs -> Client LAN -> (network / naming services, etc.. ) -> WAN (switches, routers, ...) -> ... Network Load Balancing Devices

-> Middleware / Tier(s) -> Web Server(s) -> Application Server(s) -> Directory, Naming, NFS... Servers/Services->

-> RDBMS Server(s) [Infrastructure Svcs, Application SW, OS / kernel, VM, FS / Cache, Device Drivers, System HW, HBA's, ...] -> External SAN /NAS I/O [ Switches, Zones/Paths, Array(s), Controllers, HW Cache, LUN(s), Disk Drives, .. ] -> RDBMS Svr ... LAN ...... -> ... and back to the Client Node through the WAN, etc... <<-

(NOTE: MANY sub-system components / interactions are left out in this example of a transaction and response between a client and DB Server)

 

Categories of Latency :

Latency, in and of itself, simply refers to a delay of sorts.  In the realm of Performance Analysis and Workload Characterization, an association can generally be made between certain types of latency and a specific sub-system "bottleneck".  However, in many cases the underlying "root causes of bottlenecks are the result of several overlapping conditions, none of which individually cause performance degradation, but together can result in a bottleneck. It is for this reason that performance analysis is typically an iterative exercise, where the removal of one bottleneck can easily result in the creation of another "hot spot elsewhere, requiring further investigation and /or correlation once a bottleneck has been removed.

 

Internal vs. External Latency ...

Internal Forms of Latency :

  • CPU Saturation (100% Utilization, High Run Queues, Blocked Kthreads, Cpu Contention ... Migrations / Context Switching / ... SMTX, ..)
  • Memory Contention (100% Utilization, Allocation Latency due to either location, Translation, and/or paging/swapping, ...)
  • OS Kernel Contention Overhead ( aka .. "Thrashing" due to saturation.. )
  • IO Latency ( Hot Spots, High Svc Times, ...)
  • Network Latency
  • OS Infrastructure Service Latency (Telnet, FTP, Naming Svcs, ...)
  • Application SW / Services (Application Libraries, JVM, DB, ...)

External Forms of Latency :

  • SAN or External Storage Devices (Arrays, LUNS, Controllers, Disk Drives, Switches, NAS, ...)
  • LAN/WAN Device Latency (Switches, Routers, Collisions, Duplicate IP's, Media Errors, ....)
  • External Services .. DNS, NIS, NFS, LDAP, SNMP, SMTP, DB, ....)
  • Protocol Latency (NACK's, .. Collisions, Errors, etc...)
  • Client Side Latency


Perceived vs. Actual Latency ...

For anyone that has worked in the field with end-users, they have likely experienced scenarios where users will attribute a change in application behavior to a performance issue, in many cases incorrectly. The following is a short list of the top reasons for a lapse in user perception of system performance :

  • Mis-Alignment of user expectations, vantage points, anticipation, etc.. (Responsiveness / Response Times, ...)
  • Deceptive expectations based upon marketing "PEAK" Throughput and/or CPU clock-speed #'s and promised increases in performance.  (high clock speeds do NOT always equate to higher throughput or better overall performance, especially if ANY bottlenecks are present)
  • PEAK Throughput #'s can only be achieved if there is NO bottleneck or related latency along the critical path as described above. The saturation of ANY sub-system will degrade the performance until that bottleneck is removed.

    The PEAK Performance of a system will be dictated by the performance of it's most latent and/or contentious components (or sub-systems) along the critical path of system performance. (eg. The PEAK bandwidth of a system is no greater than that of it's slowest components along the path of a transaction and all it's interactions.)

    As the holy grail of system performance (along with Capacity Planning.. and ROI) dictates, ... a system that allows for as close to 100% of CPU processing time as possible (vs. WAIT events that pause processing) is what every  IT Architect and System Administrator strives for.   This is where systems using CMT (multiple cores per cpu, each with multiple threads per core) shine, allowing for more processing to continue even when many threads are waiting on I/O.

     

     

    The Application Environment and it's Sub-Systems ... where the bottlenecks can be found

     

    Within Computing, or more broadly, Information Technology, "latency" and it's underlying causes can be tied to one or more specific "sub-systems". The following list reflects the first level of "sub-systems" that you will find for any Application Environment :

    Subsystem / Components

    Attributes and key Characteristics

    Related Metrics, Measurements, and/or Interactions

    System "Bus" / Backplane

    Backplane / centerplane, I/O Bus, etc.. (many types of connectivity and media are possible, all with individual response times and bandwidth properties).

    Busstat output, aggregated total throughput #'s (from kstat, etc..)

    CPU's

    # Cores, # HW Threads per core, Clock speed / Frequency in Ghz (cycles per second), Operations (instructions) per Sec, Cache, DMA, etc..

    vmstat, trapstat, cpustat, cputrack, mpstat, ... (Run Queue, Blocked Kthreads, ITLB_Misses, % S/U/Idle Utilization, # lwp's, ...)

    Memory / Cache

    Speed/Frequency of Bus, Bandwidth of Bus, Bus Latency, DMA Config, L1/L2/L3 Cache Locations/ Sizes, FS page cache, Physical Proximity of Cache and/or RAM, FS page caching, tmpfs, pagesizes, ..

    vmstat, pmap, mdb, kstat, prstat, trapstat, ipcs, pagesize, swap, ... (Cache Misses, DTLB_misses, Page Scan Rate, heap/stack/kernel sizes,..)

    Controllers (NIC's, HBA's, ..)

    NIC RX Interrupt Saturation, NIC Overflows, NIC / HBA Caching, HBA SW vs. HW RAID, Bus/Controller Bridges/Switches, DMP, MPxIO, ...

    netstat, kstat (RX Pkts / Sec, Network Errors, ...) , iostat, vxstat.. (Response Times, Storage device Svc_times..), lockstat, intrstat, ...

    Disk Based Devices

    Boot Devices, RAID LUN's, File Systems (types, block sizes, ...), Volumes, RAID configuration (stripes, mirrors, RAID Level, paths,...), physical fragmentation, Mpxio, etc..

    iostat, vxstat, kstat, dtrace, statspack, .. (%wait, Service Times, blocked kernel threads, ... FS/LUN Hot Spots)

    OS / Kernel

    Process Scheduling, Virtual Memory Mgmt, HW Mgmt/Control, Interrupt handling, polling, system calls, ...

    vmstat (utilization, interrupts, syscalls, %Sys / % Usr, ...), prstat, top, mpstat, ps, lockstat (for smtx, lock, spin.. contention), ...

    OS Infrastructure Services

    FTP, Telnet, BIND/DNS, Naming Svcs, LDAP, Authentication/Authoriz., ..

    prstat, ps, svcadm, .. various ..

    Application Services

    DB Svr, Web Svr, Application Svr, ...

     various...

     

Note, if you want a single Solaris utility to do the heavy lifting, performance / workload correlation, and reporting for you, take a look at sys_diag if you haven't already done so (or the README).

 

Media/ Transport Bandwidth and related Latencies :

 

The following table demonstrates the wide range of typical operating frequencies and latencies PER Sub-System, Component, and/or Media Type :

Component / Transport Media

Response Time / Frequency / Speed

 Throughput / Bandwidth

CPU

> 1+ Giga Hertz (1+ billion cycles per second)
\*  (# cores \* HW Threads / core)

>1 billion operations per second
(huge theoretical #ops/s per system)

Memory

DDR (PC-3200@200MHz/200MHz bus) ~5ns

DDR2 (PC2-5300@166MHz/333MHz bus) ~ 6ns

DDR2 (PC2-8500@266MHz/533MHz bus) ~ 3.75ns  <TBD>

nanoseconds (billionths of a second)

DDR-400 Peak Transfer 3.2 GB/s


DDR2-667 Pk Transfer 5.3GB/s

DDR2-1066 Pk Transfer 8.5GB/s <TBD>


Disk Devices

Service Times : ~5+ ms =
~ X ms Latency   +  Y ms Seek Times   
(1 millisecond = 1000th of a second)
[platter size, # cylinders/ platters, RPM,...]

varies greatly, see below

Ultra 320 SCSI (16 bit) parallel

(high performance, cable & dev limitations..)

Up to 320 MBps

SAS [Serial Attached SCSI]

Current
Future <TBD>

> 300 MBps (>3 Gbps)
Up to 1200 MBps <TBD>

SATA [Serial ATA]

low cost, higher capacity (poor performance)
Future <TBD>

Up to 300 MBps
Up to 600 MBps <TBD>

USB 2.0
10-200+ Microseconds
(1 microsecond [us] = 1 millionth of a second)
up to 480 Mbps (60 MBps)             ~40 MBps Real-World Usable
FireWire (IEEE 1394)

Up to 50 MBps

Fiber Channel (Dual Ch)

4 Gb  (4 / 2 / 1 Gb) \*2
8 Gb  (8 / 4 / 2 Gb) \*2  <TBD>

Up to 1.6 GBps (1 GB Usable)

Up to 3.2 GBps (1.8 GB Usable)

1 Gigabit Ethernet

\*\* Latency ~ 50 us [microseconds] \*\*

125 MBps (~1 Gbps) theoretical

10 Gigabit Ethernet

Up to 20 Gbps (<= 9 Gbps Usable)

Infiniband (Dual Ported HCA)

x4 (SDR / DDR) Dual Ported= \*2

\*\* Latency < 2 microseconds \*\*
x8 (DDR) \*2  <TBD>

2\*10Gb= 20 Gbps (16Gbps Usable)


Up to 40 Gbps (32 Gbps Usable)
PCI 2.2
32 bit @ 33 MHz
64 bit @ 33 MHz
64 bit @ 66MHz
133 MBps
266 MBps
533 MBps
PCI-X
64 bit bus width @ 100 MHz (parallel bus)
64 bit bus width @ 133 MHz (parallel bus)
Up to 800 MB/s
1066 MBps (1 GBps)
PCI-Express
v.1 serial bus / bi-directional @ 2.5 GHz


v.2  @ 5 GHz   <TBD>
(10's -100's of nanoseconds for latencies)
4 GBps (x16 lanes) one direction
8 GBps (x32 lanes) one direction
Up to 16 GBps bi-directional (x32)

32 GBps bi-directional (x32 lanes)

 

Other Considerations Regarding System Latency :

Other considerations regarding system latency that are often overlooked include the following, which offers us a more holistic vantage point of system performance and items that might work against "Peak system capabilities :

  • For Application SW that supports advanced capabilities such as Infiniband RDMA (Remote Direct Memory Access), interconnect latencies can be virtually eliminated via Application RDMA "kernel bypass".  This would be applicable in an HPC grid and/or possibly  Oracle RAC Deployments, etc. (confirming certifications of SW/HW..).
  • Level of Multi-Threading vs. Monolithic serial or "batch" jobs (If Applications are not Multi-Threaded, then SMP and/or CMT systems with multiple processors / cores will likely always remain under-utilized).
  • Architectural configurations supporting load distribution across multiple devices / paths (cpu's, cores, NIC's, HBA's, Switches, LUNs, Drives, ...)
  • System Over Utilization (too much running on one system.. due to under-sizing or over-growth, resulting in system "Thrashing" overhead)
  • External Latency Due to Network and/or SAN I/O Contention
  • Saturated Sub-Systems / Devices (NIC's, HBA's, Ports, Switches, ...) create system overhead handling the contention.
  • Excessive Interrupt Handling (vs. Polling, Msg passing, etc..), resulting in overhead where Interrupt Handling can cause CPU migrations / context switching (interrupts have the HIGHEST priority within the Solaris Kernel, and are handled even before RT processing, preempting running threads if necessary).   Note, this can easily occur with NIC cards/ports that become saturated (> ~25K RX pkts/sec), especially for older drivers and/or over-utilized systems.
  • Java Garbage collection Overhead (sub-par programming practices, or more frequently OLD JVM's, and/or missing compilation optimizations).
  • Use of Binaries that are compiled generically using GCC, vs. HW optimized compilations using Sun's Studio Compilers (Sun Studio 12 can give you 200% + better performance than gcc binaries).
  • Virtualization Overhead (significant overhead relating to traps and library calls... when using VmWare, etc..)
  • System Monitoring Overhead (the cumulative impact of monitoring utilities, tools, system accounting, ... as well as the IO incurred to store that historical performance trending data).
  • OS and/or SW ... Patches, Bugs, Upgrades (newly applied, or possibly missing)
  • Systems that are MIS-tuned, are accidents waiting to happen.  Only Tune kernel/drivers if you KNOW what you are doing, or have been instructed by support to do so (and have FIRST tested on a NON-production system).  I can't tell you how many performance issues I have encountered that were to do administrator "tweaks" to kernel tunables (to the point of taking down entire LAN segments !).  The defaults are generally the BEST starting point unless a world-class benchmarking effort is under-way.

 

The "Iterative" nature of Performance Analysis and System Tuning

No matter what the root causes are found to be, in the realm of Performance Analysis and system Tuning, ... once you remove one bottleneck, the system processing characteristics will change, resulting in a new performance profile, and new "hot spots" that require further data collection and analysis. The process is iterative, and requires a methodical approach to remediation.

Make certain that ONLY ONE (1) change is made at a time, otherwise, the effects ( + or - ) can not be quantified.

Hopefully at some point in the future we'll be operating at latencies measured in attoseconds (10 \^-18th, or 1 quintillionth of a second), but until then .... Happy tuning :)

For more information regarding Performance Analysis, Capacity Planning, and related Tools, review some of my other postings at :  http://blogs.sun.com/toddjobson/category/Performance+and+Capacity+Planning

 

Copyright 2007  Todd A. Jobson

Friday Aug 03, 2007

System Profiling 101 : Getting started using sys_diag v.7.04


The following entry is a variation of an article that I created for this month's Sun "Technocrat" publication (Aug. 2007).

This posting demonstrates the art of system profiling (from a high level overview) by introducing a few sample screenshots of the sys_diag  .html  report (it's header, Dashboard, and Table of Contents).. demonstrating how in a few minutes, sys_diag can present you with an accurately depicted system profile !

Note, the .html report snapshot samples presented here, match the command line output from my previous blog postings (from the same run of sys_diag).

If you haven't had the chance to try out sys_diag yet, this should give you the highlights of what you can expect in the .html report header sections.

Enjoy and let me know what you think,

Todd



Real world PROFILING ..

So.. what is Profiling.. ?? ... Well, in the real world, you can define profiling in many ways.

.. from the "profile" of the person standing next to you (what you see from your vantage point), to the personality "profiles" that we've all heard of in psychology (characterization based upon key attributes) ..

 
In your standard dictionary you'll find a definition such as this (from Dictionary.com) :

PROFILE :
     (-noun)

  •  the outline or contour of the human face, esp. the face viewed from one side.
  •  a verbal, arithmetical, or graphic summary or analysis of the history,
    status, etc., of a process, activity, relationship, or set of
    characteristics: a biochemical profile of a patient's blood; a profile of national consumer spending.
  •  a set of characteristics or qualities that identify a type or category of person or thing: a profile of a typical allergy sufferer.
  • Psychology. a description of behavioral and personality traits of a person compared with accepted norms or standards.

System Profiling .. in the World of Computing

Well, in the world of  technology, and more specifically.. Computing, "profiling" takes on it's own connotation, though similar to many of the more technical definitions noted above.

To some, system profiling simply includes a high level summary of resource utilization and bottleneck identification of a system during some period of data collection (point in time or over a duration).

"Broad Spectrum" System Profiling ..

System profiling to me is the characterization of a system as a whole, given a set of data, either for one event/point in time, or over a duration.  This characterization goes beyond workload (as you'll typically hear the term "workload characterization"), which is why I call it "Broad (or Full) Spectrum" profiling, more broadly taking into account and including :

  • Configuration characteristics (of all sub-systems/components within the "application environment" being profiled).
  • System Performance metrics captured, reflecting the variations in system/subsystem activity measurements (utilization, contention, throughput, latency, etc...).
  • Workload Characterization : Details correlating the Workload that was ongoing during the data collection (workload characteristics.. TPS/ Response Times/ Mbps/ ...), with the measurements taken.   (beyond the internal system workload identified, External sources need to be correlated)
  • A Characterized (Summarized) "Profile" of overall System Efficiency and "health" based upon Performance Analysis findings (system/sub-system Avg/Peaks.., Utilization vs. Workload, etc ...)
  • Notable Events and/or Exceptions encountered from the data available.
  • ... etc ...

"Narrow Spectrum" System Profiling ..

This would be in contrast to "Narrow Spectrum" (Focused) System Profiling, where attention to detail is focused in a very "narrow" and specific area of interest for analysis (typically in determining a Root Cause where a specific bottleneck is know within a sub-system or specific component of the system).

Note the common themes of defining Requirements, the "Application Environment", etc.. as presented in my previous postings .. (eg.  "What is Performance ? .. in the Real World", etc..).. and likely to be common themes.. in the Real World.. ;)

 
Look for more details on this and much more in an up-coming blog entry more thoroughly delineating the distinction between Profiling, Workload Characterization, Performance Analysis, and Capacity Planning....  

For now, enjoy the following discussion on how sys_diag can have you profiling in no time at all ... :)



Profiling with sys_diag ...

Automating Solaris Performance and Configuration Analysis


In the arena of Performance and Configuration Analysis, the freely available Solaris utility “sys_diag” offers the capability to automatically capture this information in a single .html report (also .txt and .ps) after running one easy to use ksh script. Typically, several utilities/tools need to be run separately, requiring manual collection, aggregation and correlation of the data, prior to conducting the analysis of data.

Sys_diag automates this legwork, by running over 100 Solaris built in commands/utilities (depending on the parameters used) and presenting the data as a structured report with a summarized header, a color-coded “dashboard” (broken down by high level workload characterization, sub-system findings, followed by a Table of Contents), all with links to the corresponding report sections with detailed configuration and/or performance analysis findings.
 

sys_diag 's  HTML Report Header :



sys_diag 's  HTML Performance "Dashboard" :

 The following sample .html system performance “Dashboard” (a portion of which is shown below) reflects the 4 key sub-systems (CPU/Kernel Profiling, Memory, IO, and Networking) as a summarized depiction of sub-system “health”, based against a list of rules / thresholds that the captured data is compared to during post-processing.

These rules and thresholds are listed in the Performance Analysis section (Section #24) and can be easily tuned / modified to offer more stringent or lenient identification of performance exceptions that contribute to the Green/ Yellow / Red (OK / Warning / Critical) color-coded status within each dashboard section. Within each section are listed the key performance metrics and a summary of exceptions, along with Average and Peak (High Water Mark) values present during the collection/sampling period. At the end of each section is a list of key “links” to the substantiating detailed data analysis within the report.  


\*Click to Enlarge\*


When run for performance data gathering (-g or -G), 2 types of performance data is captured :

\* vmstat, mpstat, iostat, netstat, .. data for a duration (-T total_secs), captured at specified sampling rates (-I interval_#secs). The default duration is 5 minutes of data capture @ 2 second intervals if -I / -T are not specified.

\* 3 Point in Time detailed snapshots (beginning, mid point, end point). If -G is used, and         Solaris 10 is the OS, then Dtrace and detailed lockstat, pmap/pfiles, cputrack, ...  snapshots will be taken (beyond the core “-g” snapshots that include  ps, netstat -i, vxstat, kstat,  ...).
 

 
Sys_diag has been run on virtually all models of Sun systems running Solaris 2.6 or > (from x86 laptops up to fully loaded E25K's), offering extensive Solaris 10 configuration and performance data, including DTrace snapshots. It creates a single .tar.Z compressed archive (including all raw, snapshot and post-processed datafiles) that can be emailed/ ftp'd.. for performing system configuration and/or performance analysis off-site.. from virtually anywhere.

This is one of the key characteristics that sys_diag offers.. to save a LOT of time.. not requiring many separate manual runs / collection / correlation of data, or the need for any 3rd party tools, libraries, or agents to be installed on a system other than downloading the "sys_diag" ksh script itself. Virtually no learning curve is required for loading, running, and reflecting basic performance profiling, including high level sub-system bottlenecks (deeper root cause correlation might require some level of advanced system administration knowledge, though virtually all the data needed will have been already captured by sys_diag).

This utility has been used extensively in the field over the last several years, run on literally hundreds of production systems as part of escalation root cause analysis, in addition to providing the basis for dozens of Architectural and/or Performance Assessments
(including formal Capacity Planning / Benchmarking). Graphing of the data captured (vmstat, netstat...) is also easy to do using StarOffice as explained in the README file that sys_diag creates.


sys_diag 's  HTML Report "Table of Contents" :

The screenshot below shows the Table of Contents and related sections available within the .html report (\* Click to Enlarge \*) :

 

Although this tool isn't meant to replace long-term historical Performance Trending and Capacity Planning packages (Teamquest, etc..), it provides the foundation and basis for a very robust starting point (and actually is much better at point in time workload characterization and root cause analysis of bottlenecks, where very granular detailed data correlation is required).

Over the time that sys_diag has been posted on BigAdmin, many Sun customers around the globe have downloaded and commented positively on their experiences with it. For more information, or to download and try it out for yourself , the following URL's should help you get started :

 

 

 

The latest release of sys_diag (v.7.04) is available from BigAdmin
(unpackaged ksh) at :

http://www.sun.com/bigadmin/jsp/descFile.jsp?url=descAll/sys_diag__solaris_c
http://www.sun.com/bigadmin/scripts/submittedScripts/sys_diag.txt


sys_diag  is also available as part of the "SunFreeware" Distribution
(packaged with the README) at :

http://www.sunfreeware.com/programlistsparc10.html#sys_diag


The following recent blog postings provide an extended overview of sys_diag and it's capabilities :

Solaris Performance Analysis and Monitoring Tools... at what cost ?...http://blogs.sun.com/toddjobson/entry/solaris_performance_monitoring_tools

What is sys_diag ?? .. Automating Solaris Performance Profiling and Workload Characterization.
http://blogs.sun.com/toddjobson/entry/what_is_sys_diag_automating
sys_diag v.7.04 command line output ...
http://blogs.sun.com/toddjobson/entry/sys_diag_v_7_04

 

\*\*Note, read the ksh script header pages or the README file prior to using, and ALWAYS test first on a representative non-production system.. as is the best practice when making ANY production environment changes... ;)

(Copyright 2007, Todd A. Jobson) 



Add to Technorati Favorites

About

This blog does not reflect the viewpoint or opinions of Oracle or Sun Microsystems. All comments are personal reflections and responsibility of Todd A. Jobson, and are copyrighted from the posted year to current year, to that effect.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today