Wednesday May 07, 2008

Pmonitor, Pms.sh and the almighty ps command

Originally Stracer (predecessor to Dirtracer) included a small shell script called Pmonitor written by Ben Gooley; then a Directory Server Support Engineer.  Pmonitor (Process Monitor) was a lightweight script that used the "ps" command with a series of switches to retrieve the Virtual Size (vsz), Resident Size (rsz) and cpu% time used to cpu time avail (pcpu).

man ps:
     vsz   The total size of the process in  virtual  memory,  in
           kilobytes.

     rss   The resident set size of the process, in kilobytes.

     pcpu  The ratio of CPU time used recently to CPU time avail-
           able  in  the  same period, expressed as a percentage.
           The  meaning  of  ``recently''  in  this  context   is
           unspecified.  The  CPU time available is determined in
           an unspecified manner.

Pmonitor was great to see an average cpu % busy (rudimentary load) as well as track the processes memory footprint over time.  This helps Sun Support Engineers to shed light on a possible memory leak if the process size never shrank.  It was a bit hard to see any real trend in just the raw data so many Engineers plotted the data to see it visually over time.

#./pmonitor 22241 1 5
DATE   -  [TIME] ------- PID   VSZ   RSS   PCPU
05/06-[14:27:50] ------- 22241 86296 56360  0.0
05/06-[14:27:51] ------- 22241 86296 56360  0.0
05/06-[14:27:52] ------- 22241 86296 56360  0.0
05/06-[14:27:53] ------- 22241 86296 56360  0.0
05/06-[14:27:54] ------- 22241 86296 56360  0.0

In walks Mark Reynolds (creator of logconv.pl) with Pms.sh; an enhanced Pmonitor script which adds a "growth" calculation shown in Kilobytes (k).

#./pms.sh 22241 1 5
DATE   -  [TIME] ------- PID   VSZ   RSS   PCPU
05/06-[14:28:20] ------- 22241 86296 56360  0.0
05/06-[14:28:21] ------- 22241 86296 56360  0.0    growth:   0 k
05/06-[14:28:22] ------- 22241 86296 56360  0.0    growth:   0 k
05/06-[14:28:23] ------- 22241 86296 56360  0.0    growth:   0 k
05/06-[14:28:24] ------- 22241 86296 56360  0.0    growth:   0 k

With Pms.sh we could now see the growth in the raw data without plotting it all the time.

04/24-[17:55:39] ------- 12489 5310368 5270584  2.3    growth:   0 k
04/24-[17:55:40] ------- 12489 5318432 5277048  5.6    growth:   8064 k
04/24-[17:55:42] ------- 12489 5319104 5277600  2.9    growth:   672 k
04/24-[17:55:56] ------- 12489 5319104 5277600  2.5    growth:   0 k

Not only could we see the growth we could also see when memory usage dropped.

04/24-[17:56:14] ------- 12489 5319104 5277600  1.0    growth:   0 k
04/24-[17:56:15] ------- 12489 5317560 5276424  3.5    growth:   -1544 k
04/24-[17:56:17] ------- 12489 5317560 5276424  5.4    growth:   0 k

I added Pms.sh 2.01 to the Dirtracer bundle starting with Dirtracer release 2.2 and not include Pms.sh 2.02 with Dirtracer 6.0.6

Uses for Pms.sh:

1) Memory Leaks; see above.
2) Gauging high cpu problems.  Dirtracer has prstats but in some circumstances prstat is not usable or recommended.
3) Looking at the trends of both of these elements over a long period of time.

System Impact:

Negligible.

I tested pms.sh using pms.sh as and top a gauge.

Top reported pms.sh uses less than .04 - .1% cpu on my Sunblade and pms.sh itself shows the pcpu as 0.0.

23661 root       1  50    0 1104K  832K sleep   0:00  0.09% pms.sh
23817 root       1   0    0 1104K  832K sleep   0:00  0.04% pms.sh

# ps -aef | grep pms.sh
    root 23661 21979  0 14:48:38 pts/3    0:00 /bin/sh ./pms.sh 22241 1 1000000

#./pms.sh 23661 1 1000
DATE   -  [TIME] ------- PID   VSZ   RSS   PCPU
05/06-[14:48:53] ------- 23661 1104  832  0.0
05/06-[14:48:54] ------- 23661 1104  832  0.0    growth:   0 k
05/06-[14:48:55] ------- 23661 1104  832  0.0    growth:   0 k
05/06-[14:48:57] ------- 23661 1104  832  0.0    growth:   0 k

Dirtracer + Pms.sh:

By default Dirtracer will run Pms.sh with the same NUMBEROFCHECKS and INTERVAL as seen in the dirtracer.config file.  If the Dirtracer mainloop is configured to run for 5 Checks at 5 Second Intervals then Pms.sh will do the same.

\*   pms.sh found                        [/export/home/hrdwired/PTS/dirtracertools/pms.sh]
\* Access/Error Logging                  [left as is]
\* Audit Logging                         [left as is]
\* Iostat available.     Executing...    [success]
\* Vmstat available.     Executing...    [success]
\* Mpstat available.     Executing...    [success]
\* pms.sh (pmonitor) executed            [success]
\*   pms.sh interval(5) x checks(5)      [pms.sh run time (25 sec.)]
\*                                       
\* Entering Main Performance Gathering Loop

Config parameters used with Pms.sh:

PMONITOR_INTERVAL:  The Pmonitor Interval allows you to set a smaller Interval than Dirtracer is running with.

If Dirtracers mainloop is configured to run at 30 seconds INTERVALs but you want to see Pms.sh output at 1 second Intervals set the PMONITOR_INTERVAL="1".

PMONITOR_ONLY:  The Pmonitor Only parameter will disable the Mainloops use of all its normal data capture points such as pstacks, prstats, cn=monitor and cache size searches.  This allows the user to gather a long term Pms.sh data set without the overhead of tons of Pstacks/Pstats, Ldap Searches etc.

I hope this gives you a detailed view of Pms.sh (Pmonitor) and how it is used within Dirtracer and Sun Support.

[LT]

Tuesday May 06, 2008

Tracing ns-slapd process id's dynamically.

I was asked on the monthly Directory Collaborators call something to the effect of the following...

"It looks like Dirtracer is bound to one process id (pid) in the config file, but what if the server is restarted, does it know the new pid number dynamically"?

Currently, no.  Dirtracer was built to trace one pid at a time and must be changed in the dirtracer.config file before running.

I thought more on this and came to the conclusion it is possible in a future version of Dirtracer.  If you have a dirtracer.config or dirtracer.config.last (the config from the last run dirtracer capture) but the process id has changed, I could back track using the pid file located in the <slapd instance name>/logs location to use the current pid.

This would be based on the dirtracer.config file parameter INSTANCENAME.

If the INSTANCENAME parameter was not set (would be in a dirtracer.config.last or dirtracer.config.x) then I would have to abort the run.

While possible, it would however have limitations (gotchyas) that could prevent data capture.  If the directory server was hung/deadlocked and an old dirtracer.config file was using with BORKED="0" (Not Hung) then when launched, Dirtracer would attempt an ldapsearch and itself hang waiting on the search return.

Note: because Dirtracer is a shell script and relies on ldapsearch as opposed to an internal search mechanism it is limited to the ldapsearch capabilities therefor does not have an option (currently) to timeout a connection.  I have been thinking recently of using my timer function for this issue.

This dynamic pid check would mostly be useful for a couple scenarios I can think of.

1) Dirtracer run's to gather historical/config info using a cron without having to check/recheck the running pid vs config file setting (in the event the ds was restarted).

2) To set up Problem Type config files in advance without the current ns-slapd process's pid number embedded in the config file.  In this case the admin could just run ./dirtracer -f ./dirtracer.config.hang with very short notice and capture data quicker than having to edit the config file and change the pid parameter.

I will think this idea over some more but it has merit so it could make its way into the next release.

[LT]

Monday May 05, 2008

Presenting at the Directory Collaboration Meeting [follow up]

Quick note on todays Directory Collaboration meeting:

Thanks all who attended, I received some great questions and comments on Dirtracer...always good to have positive feedback on your product.

In tomorrows blog I will discuss thoughts around one question asked in todays presentation "Tracing ns-slapd process id's dynamically".

Thanks again all!

 [LT]


 

Bork Bork Bork

The BORKED parameter.


I realize most of our international users of Dirtracer may not get the reference but the BORKED parameter got its name in part from the US TV show called The Muppet Show; something I watched as a child.  A show made up entirely of Puppets (Muppets) who's main character is called Kermit the Frog.

The BORKED parameter is a reference to the Swedish Chef special Bork-speak...a parody of a wacky Chef who speaks in unintelligible Swedish.  In certain tech circles "Borked" is synonymous with "Broken", and when developing this settings purpose the name just stuck.

http://en.wikipedia.org/wiki/The_Muppet_Show
http://en.wikipedia.org/wiki/Bork_bork_bork

Think of Borked as "Hung" when it comes to its use with Dirtracer.  If a Directory Server process is thought to be Hung, then use set the BORKED parameter to 1.

    BORKED="1"

Note:  I plan to rename the parameter to HUNG (Config File only) in the next release.

What does BORKED="1" do?

Normally Dirtracer will run a set of the following searches and or modiify's, if Borked is set to 1 (on) then these searches etc. are skipped.  Setting Borked to 1 helps make sure Dirtracer itself doesn't hang waiting on these ldapsearches to return.  If Borked is not set to 1 when the Directory Server is suspected of being Hung then Dirtracer will not complete its data gathering.

Searches:
    Backend Suffix names; naming contexts
    Backend Database names
    cn=monitor
    cache info searches
    nsds50ruv; replica ruv's
    cn=config info
    rootdse

Modifies; only completed if Dirtracer is configured to do so.

    PTS_CONFIG_LOGGING can set the server Logging level to the parameter value configured.  This sets the nsslapd-infolog-area or nsslapd-errorlog-level

    Examples:
        nsslapd-infolog-area: 4 sets Heavy Trace Logging
        nsslapd-infolog-area: 128 sets ACI debugging
        nsslapd-infolog-area: 8192  sets Replication Debug Logging

    See the following link for more info on Logging Levels
        http://docs.sun.com/source/816-6699-10/confattr.html#15873

Dirtracer can also set (rarely used) Logging On or Off.

    TURN_LOGGING_ON="0"             # Turn On access/error logs
    TURN_LOGGING_OFF="0"            # Turn Off access/error logs
    AUDIT_LOGGING_ON="0"            # Turn On audit logs
    AUDIT_LOGGING_OFF="0"           # Turn Off audit logs

As mentioned Dirtracer only completes the above ldapmodify's if configured to do so.

[LT]

Thursday May 01, 2008

The DATABIN parameter

I had a question from a Front Line Engineer recently where they did not understand how to select a proper location for the DATABIN parameter.

DATABIN="<DATA OUTPUT PATH>"    # Databin main path.
                                # Sub dirs will be created beneath this path.

The DATABIN is the path where you want Dirtracer to store the data it captures.  Special care should be taken when selecting the right path based on the project size of the data you need to gather.

Sun GDD Directory Dirtracer Reference Guide: Page 17

Disk Usage

Disk space used is almost entirely dependent on the following.

1. How Dirtracer is configured; i.e. what it is asked to gather.
2. How many loops Dirtracer is configured to complete.
  •     cache, monitor searches
  •     netstats, iostats, pstacks, prstats
  •     transaction log ls -l captures.
3. How many access/error and audit logs are captured.  
  •     configured from the GATHER_N_XXXXX_LOGS="N" parameters.
4. How big each of those logs are.  (var/adm/messages logs included)
5. Shared Memory (MMAP) files
  •     how big the ns-slapd process size is.
6. Cores
  •     how big the ns-slapd process size is.
7. Gcores
  •     how big the ns-slapd process size is.
8. If Dirtracer has REMOVE_TEMP_DATA=0.
  •     saves all temp files in addition to the final tar file.
9. If Dirtracer has SKIP_TAR_GZIP=1.
  •     Skips the final tar/gz saving 1⁄2 the space it normally uses; i.e. duplication of files  occurrs as files are tarred and gzipped.


The Engineer was also requested to setup two directory.config files to trace two separate slapd instances at the same time, on the same system.  Would the DATABIN parameters need to be different? No.

Early on I saw a problem with Stracer (the old Dirtracer) when customers would use the same DATABIN to store data and the previously capture files would be overwritten or you would have multiples of the same files.

I solved this issue by having Dirtracer create a unique time/date based directory under the defined DATABIN path.  Even if Dirtracer is run multiple times on the same system a new sub databin is created to segregate the data.

Example:

1) Set the DATABIN as follows.

DATABIN="/var/tmp/data"

2) Run Dirtracer 3 times and observe the directories created in /var/tmp/data/

root[/var/tmp/data]#ls -l
total 12
drwxr-xr-x  11 root     other       1536 Apr 21 15:38 042108-01
drwxr-xr-x  11 root     other       1536 Apr 21 15:42 042108-02
drwxr-xr-x  11 root     other       1536 Apr 21 15:43 042108-03

root[/var/tmp/data]#find . -name "dirtracer-\*gz" -print
./042108-01/dirtracer-834d2699-kaneda-080421-153743.tar.gz
./042108-02/dirtracer-834d2699-kaneda-080421-154144.tar.gz
./042108-03/dirtracer-834d2699-kaneda-080421-154335.tar.gz

You can clearly see how the data is separated and should not collide.

[LT]

Configurator, the dirtracer.config.template and their uses.

I was recently asked what the differences are between the dirtracer.config.template and the Configurator script and how they are used.

The previous version of my script Stracer used both a config file as well as a full range of command line switches.  The command line switches confused many and the config file then was not well documented.  As a result we had many Dirtracer's configured to capture the wrong type of data for the problem type.

Shortly after I decided to create the "Configurator", and released it with Stracer 1.9.3.  Configurator took the Problem Type encountered by the Customer and translated it into a working dirtracer.config file.  Originally Configurator contained 7 problem type options.  With Configurator 6.0.6 I have added Option 8 for a Configuration Only Capture.

--------------------------------------------------------------------------------
Sun Microsystems Configurator 6.0.6                                       
--------------------------------------------------------------------------------
Please choose the type of problem you are experiencing

Process Hung                            [1]
High CPU                                [2]
Replication                             [3]
Crashing                                [4]
Memory Leak                             [5]
Server Down                             [\*]     DISABLED - (SLAPDPID is set)
Basic Capture                           [7]
Config Only Capture                     [8]
--------------------------------------------





NOTE: Now that the Document for Dirtracer has progressed to this point I may have to add a full section for Configurator; even though it's interactive and self explaining.

Configurator takes you through the following sections in which to create a dirtracer.config file

1) Case Number (if available)
2) Slapd Instance selection.
3) Directory Manager Password entry
4) Data Storage location.  This is the location of the DATABIN parameter where all captured data will be stored.
5) Skip Tar Gzip question
6) Problem Type selection.
    a) Process Hung. Hang detection, Gcore selection
    b) High CPU. CPU % thrshold level, Gcore selection
    c) Replication.  Sets replication debug logging (8192)
    d) Crashing.
    e) Memory Leak.
    f) Server Down. DS version [5x|6x], Instance path entry.
    g) Basic Capture
    h) Config Only Capture
7) DS Log capture selection; access, error and audit logs.
8) Dirtracer Runtime selection.
9) Pmonitor (pms.sh) Runtime selection.
10) Configuration Summary
11) Data Capture Size guesstimation.
12) Config file (dirtracer.config) creation.

The Configurator is a good way to for those new to Dirtracer to quickly setup a dirtracer.config file for an event.

So what is the difference between the Configurator and the dirtracer.config.file template?  Well, Configurator asks questions to setup a ready to use dirtracer.config.  The dirtracer.config.template is just that...a template.  The dirtracer.config.template does contain all parameters available that would be set when creating a new dirtracer.config using the Configurator.  The dirtracer.config.template does however "have" to be edited in order to be used with Dirtracer and does not have Presets for Problem Types.

Without the following parameters properly set/changed, Dirtracer will exit and alert the admin the file needs to be changed.  Likewise the template contains some default settings.

SLAPDPID="<SLAPD PID>"          # Slapd pid number
MGRPW="<PASSWORD>"              # Mgr password
DATABIN="<DATA OUTPUT PATH>"    # Databin main path.

The template can be copied, renamed and edited to contain different parameter settings for the same problem types as seen above.  The dirtracer.config.template is completely self documented so administrators can quickly look at a parameter and select its use (or not).

Hope this was useful.

[LT]

Presenting at the Directory Collaboration Meeting

Good news!

I will be one of the speakers in this months Directory Collaboration Meeting this coming Monday May 02, 12pm ET.  Partners, if you normally attend this, it will be an opportunity to ask me questions on Dirtracer usage and the future of my product.

I will be giving a mini preso on Dirtracer for those who are unfamiliar with it; is this even possible? :)  In addition to the new features you can take advantage of in the new version 6.0.6.

Directory Collaboration:
"Regularly scheduled monthly Directory Services collaborative meetings providing information in field experiences, deployment and configuration strategies, knowledge sharing, and questions and answers."

Hope to see you there!

[LT]

Wednesday Apr 30, 2008

Dirtracer 6.0.6 Unleashed!

Dirtracer 6.0.6 Unleashed!

The latest version of Dirtracer is now available for Customer and Partner download on the BigAdmin System Administration Portal.

The major changes included in 6.0.6 are as follows:

1) Added SKIP_TAR_GZIP Code.  Skips all tar and gzip functions.
 - \*   Preparing files - pstack            [skipped - skip tar gzip enabled]

2) Added Carpet Version Type Check. Gets a better DS Type from 6.x
 - \* DS Version                            [6.1 - 64 bit - zip install]

 - 6658311: Dirtracer (GDD): 6.0.5 can report the wrong install type - zipinstall vs pkginstall (jes).

3) Reset the ldapsearch path ..uses locateLdapTools function
 - 5.1 

\* Ldap Tools Path                       [/data/sunone/ds51sp4/shared/bin]

 - 5.2 

\* Ldap Tools Path                       [/opt/ds52p6/shared/bin]

 - 6.x 

\* Ldap Tools Path                       [/opt/dsee62/dsrk6/bin]


4) Added a secondary check for backends. Highlights if DS6 & No Backends exist.

 \* Backends Found                        [none configured]

5) ps -ae changed to include ps -aef
 \* ps -aef                             [success]

6) Add a checkPatch for Sol 8 on 108995-08
 - 108995-08
 - SunOS 5.8: /usr/lib/libproc.so.1 patch
 - http://sunsolve.sun.com/search/document.do?assetkey=1-21-108995-08-1

 - Sol 5.9
 - http://sunsolve.sun.com/search/document.do?assetkey=1-21-117125-02-1
 - http://sunsolve.sun.com/search/document.do?assetkey=1-21-117125-03-1

7) Added CONFIG_ONLY runtime option.  See Ref. Guide for Config Only Capture.

8) Added an ls -laR of the slapd instance
 \* Gathering needed customer defined configuration
 \*   ls -laR of slapd Instance           [success]

Stay tuned for explanations of the new features and how they can benefit you.

[LT]

Friday Apr 25, 2008

The Dirtracer Blog is here!

Welcome to the new Dirtracer Blog!


I am Lee Trujillo, an Engineer supporting the Sun Java Directory Server since December 2003.  I am also the creator of Dirtracer, Sun's number one tool for tracing issues for our Sun Directory Server.

Early in 2004, I saw that the support organization had no formalized standard for asking for and obtaining data related to issues surrounding the Directory Server, so I created Stracer (Stack Tracer), the predecessor to Dirtracer (Directory Tracer).  Stracer 1.0 consisted of 169 lines of shell script that basically gathered pstacks, prstats and top output from a running ns-slapd process.  Conversely, Dirtracer 6x (2008) is a complex 2,590 line script of functions that can be combined in many ways to gather Directory data based on problem type.

Dirtracer is a troubleshooting tool designed to help reduce resolution time on complex Directory Server problems and to ease the data-gathering process for Sun's customers.

Dirtracer is part of the GDD (Gathering Debug Data) suite of tools and has already been used for years to tackle some of the most persistent, difficult Sun Java Systems Directory Server problems faced in the field.  For problems such as server hangs, crashes, and high cpu utilization, Dirtracer simplifies the sampling of system resources and crash data in order to help identify trends.

Save you and your customers time and aggravation -- discover the power of Dirtracer.

I expect Dirtracer version 6.0.6 to be available later today on the external Big Admin Administration Portal site for Customer and partner download.

[LT]
About

A Tech Blog about the Sun Java Systems Dirtracer Toolkit. Dirtracer and this blog written and maintained by Lee Trujillo an Oracle Senior Principal Support Engineer.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today