Tuesday Jul 07, 2009

DirTracer Overview on SLX (Sun Learning Exchange)

Great news all,

I have released my first vidcast on Sun's Learning Exchange (SLX) site! Dream Sequence

SLX 01 - DirTracer Overview (runtime 7min 32sec)

This is the first in what I hope to be a long series discussing DirTracer, Configurator and Pkgapp.  If you are unfamiliar with DirTracer, this Overview is an easy and fast way to get acquainted with Sun's #1 Directory support tool.

On July 24th I will be releasing the second vidcasts titled "DirTracer Quickstart".  This will walk you through getting DirTracer up and running in the shortest amount of time using Configurator.

I am currently in production on the third vidcast titled "Server Impact".  This is a hot topic when it comes to running DirTracer on production systems.

That's all for now folks,

Enjoy

Lee

Friday Jun 26, 2009

Running DirTracer in the Background

Hello all,

I was approached today by a Partner about running DirTracer in the background.  I believe I have been asked about this once before.

Currently I have not tested nor can DirTracer be run in the background.  I just tried this in fact and saw nothing in the databin path, nor in the /tmp area for the runtime log.  I have found a way however.

If I background or no hup it I get no output -> (nohup ./dirtracer -f ./dirtracer.config &).

But if I create a stub script that launches DirTracer with nohup and & and do the same with the stub...it works.

myzone root[/opt/dirtracer]#cat backgrounder
#!/bin/sh

nohup ./dirtracer -f ./dirtracer.config &
myzone root[/opt/dirtracer]#nohup ./backgrounder &
[1] 9159
myzone root[/opt/dirtracer]#Sending output to nohup.out

[1]    Done                          ./backgrounder

In most circumstances, (99% of the time) you should be running DirTracer in the foreground because the runtime is generally short.  I can only see backgrounding the process when you use the MEM_LEAK_TRACKING option or if you have such a long DirTracerrun that you will have to close down your terminals and leave.

Thanks to Brian for this great question.


Lee

Tuesday Jun 02, 2009

Pkgapp 3.2 for Solaris available on BigAdmin

 

Hey all,

Quick update to say Pkgapp 3.2 is now available externally on the Big Admin scripts site.  You can download it here.

Currently I am working on Pkgapp 3.2 ports to Linux.

Big Admin Scripts: Sun Java System Category

Big Admin MOTD

Cheers!

Lee

Tuesday May 26, 2009

DirTracer now DS7 compatible!

 Yes it's true!  DirTracer now works with DS7!It's the Tubes

I was fortunate to get access to DS7 during the Directory Masters event in New Jersey.  While there I was able to quickly install DS7 and migrate an old dead DS6.2 instance.  I ran DT 6.0.6 against this new instance and was ecstatic to see it ran with 0 errors.  In reality DS Development had moved two commands but DT had used its defaults to guess things like install type (zip/native) and where the dsrk's ldapsearch and ldapmodify were located.

Today I decided to revisit these two items and found it was easy as pie to change; only took a few minutes.  I had to copy/paste and change approximately seven lines of code (mostly path based) and that was it. 

To mirror the DT version I used when DS6 was released, I will now re-version this as DirTracer 7.0.0 or DT7.  While DT7 works perfectly with DS7,  I'm not saying DT7 won't change.  In fact I'm planning many new features for DT7 for its first

--------------------------------------------------------------------------------
Sun Microsystems dirtracer 7.0.0 Solaris Sparc                        05/26/2009
--------------------------------------------------------------------------------
\* Checking Config File                  [success 1211182548 != 3006111464]
\* Using config file                     [./dirtracer.config]
\* OS Type and Version                   [SunOS 10 sparc]
\* Dirtracer mainloop will run for       [1 sec.]
\* Databin parameter check 1             [success]
\* Databin parameter check 2             [success]
\* Databin Found                         [/var/tmp/dtcapture]
\* Databin used is                       [/var/tmp/dtcapture/052609-02]
\* Relocating runtime log                [success]
\* Case Number                           [NOT AVAILABLE]
\* User is root. Continuing...          
\* PID Check     (18925)                 [success]
\* Process State                         [not hung]
\* High CPU tracking                     [off]
\* Mem Leak tracking                     [off]
\* Config Only Capture                   [on]
\* Pstack Force (-F)                     [off]
\* Instance Name is                      [/opt/dsee62/ldapinst3]
\* Using host name                       [s-6000a-t6300i-zone8-sca11]
\* Using host id                         [84aaaac8]
\* Ldap Tools Path                       [/opt/dsee7/dsrk/bin]
\* DS Version                            [7.0 - 64 bit - zip install]
\* Backends Found                        [sun]
\* Transaction log location              [/opt/dsee62/ldapinst3/db]
\*   Solaris tar found                   [/usr/sbin/tar]
\*   gzip found                          [/bin/gzip]
\*   elfdump found                       [/usr/ccs/bin/elfdump]
\*   pstack found                        [/bin/pstack]
\*   prstat found                        [/bin/prstat]
\*   netstat found                       [/bin/netstat]
\*   iostat found                        [/bin/iostat]
\*   vmstat found                        [/bin/vmstat]
\*   mpstat found                        [/bin/mpstat]
\*   ldapmodify found                    [/opt/dsee7/dsrk/bin/ldapmodify]
\*   ldapsearch found                    [/opt/dsee7/dsrk/bin/ldapsearch]
\*   pms.sh found                        [/opt/dirtracer/dirtracertools/pms.sh]
\* Access/Error Logging                  [left as is]
\* Audit Logging                         [left as is]
\* Iostat available.     Executing...    [success]
\* Vmstat available.     Executing...    [success]
\* Mpstat available.     Executing...    [success]
\* pms.sh (pmonitor) executed            [success]
\*   pms.sh interval(1) x checks(1)      [pms.sh run time (1 sec.)]
\*                                      
\* Entering Main Performance Gathering Loop
\*                                      
\* Loop 0 - 090526-132724                [pstack/prstat override on]
  togo[1sec]-timer[1]
\*
\*                                      
\* Exiting Main Performance Gathering Loop
\*                                      
\* Done gathering stats                  [pms.sh iostats vmstats etc]
\* Access/Error Logging                  [left as is]
\* Audit Logging                         [left as is]
\* Logging level                         [left as is]
\*                                      
\* Gathering needed customer defined configuration
\*   ls -laR of slapd Instance           [success]
\*   nsslapd-db-home-directory           [success]
\*   nsslapd-db-logdirectory             [success]
\*   nsslapd-directory db                [success]
\*   nsslapd-directory sun               [success]
\*                                      
\* Gathering Other Environment Information
\*   ns-slapd version                    [success]
\*   nsds50ruv dc=sun,dc=com             [success]
\*   cn=config search                    [success]
\*   rootdse search                      [success]
\*   idsktune                            [not found]
\*   dirtracer7.0.0                      [success]
\*   /etc/release                        [success]
\*   /etc/system                         cp: cannot access /etc/system
[failed]
\*   /var/adm/messages                   [success]
\*   uname -a                            [success]
\*   uptime -a                           [success]
\*   date/time                           [success]
\*   ifconfig -a                         [success]
\*   mount -p                            [success]
\*   ps -aef                             [success]
\*   pflags                              [failed]
\*   pldd                                [failed]
\*   pmap                                [failed]
\*   pfiles                              [failed]
\*   prtconf -v                          prtconf: devinfo facility not available
[success]
\*   showrev -p                          [success]
\*   swap -s                             [success]
\*   pkginfo -l                          [success]
\*   df -k                               [success]
\*   logs - access: gathering 1 files   
\*       access                          [success]
\*       access rotation                 [success]
\*   logs - error: gathering 1 files    
\*       errors                          [success]
\*       error rotation                  [success]
\*   logs - audit: gathering 1 files    
\*       audit                           [success]
\*       audit rotation                  [success]
\*   config dse.ldif's                   [success]
\*   schema                              [success]
\*   shared mem files                    [skipped]
\*                                      
\* [ALERT 1] Shared Mem files were not included in the dirtracer tar file and were NOT copied/saved to the databin.
\*                                      
\*   ds lib files cksum                  [skipped]
\*   ds lib files elfdump                [skipped]
\* Writing used dirtracer.config.last    [success]
\* Copying dirtracer.config.last         [success]
\* Writing manifest-090526-132724.log    [success]
\*                                      
\* Packaging files                      
\*   Preparing files - pstack            [0 files]
\*   Preparing files - prstat            [0 files]
\*   Preparing files - netstat           [success]
\*   Preparing files - ds-config-data    [success]
\*   Preparing files - ds-logs           [success]
\*   Preparing files - ds-mmap-files     [success]
\*   Preparing files - ds-srch-monitor   [success]
\*   Preparing files - ds-srch-cache     [success]
\*   Preparing files - ds-txnlogs-data   [success]
\*   Preparing files - os-info           [success]
\*   Preparing files - gcore             [0 files]
\*   Preparing files - crash             [0 files]
\*   Preparing Main file                 [success]
\* Leaving my temp files intact         

Operations Complete
--------------------------------------------------------------------------------
1) Dirtracer capture data located in directory [ /var/tmp/dtcapture/052609-02 ]

Upload "only" this file to your supportfiles.sun.com cores directory at Sun

        [ dirtracer-84aaaac8-s-6000a-t6300i-zone8-sca11-090526-132724.tar.gz ]


                                Thank you.
                                Sun Software Technology Service Center (STSC)


NOTES:
1) You can check for updates to this script here:
        BigAdmin - http://www.sun.com/bigadmin/scripts/indexSjs.html

2) Release Notes and Guides located here:
        Docs - http://docs.sun.com/app/docs/doc/820-0437

3) GDD information located here:
        Docs - http://www.sun.com/service/gdd/index.xml

4) Please send all Bugs and RFE's to the following address:
        Subject "dirtracer bug/rfe" - gdd-issue-tracker@sun.com

5) Please send all other questions etc to:
        Subject "dirtracer feedback" - gdd-feedback@sun.com

--------------------------------------------------------------------------------

ALERTS ENCOUNTERED: See /var/tmp/dtcapture/052609-02/ALERTS-090526-132718.log


1) Shared Mem files were not included in the dirtracer tar file and were NOT copied/saved to the databin.

As I mentioned two blogs ago I was preparing to start releasing DT Video (pocasts?) on the Sun Learning Exchange (SLX).  The first draft is complete for my first video titled "DirTracer Overview".  I will start production of the video later today or tomorrow and hope to release it to SLX quite soon.

Cheers!

Lee

Thursday May 21, 2009

Pkgapp 3.2 for Solaris released!

Hi all!

Pkgapp 3.2 is available internally...yeah!

Artistic Bridge

I finally had time to update the Pkgapp documentation and retest it on various application cores/gcores and it is now ready for prime time.  I have also submitted it to BigAdmin and it should be available in the next few days.  If you need it now, please contact your favorite Sun Support representative and request he send it from the internal download site.  I will also update "this" post once Big Admin has uploaded the new version.

Pkgapp 3.2 has 2 big new features...

  1. It includes adm64 libraries that were previously not included making debugging easier on that platform for java issues.
  2. It includes a new core truncation feature developed by a member of the Directory Server Sustaining team.  The great news is, the feature not only works on Directory Server cores but any other core/gcore you may be dealing with!

If the core/gcore is good, the user is given the following message:

\* Checking if corefile is truncated     [CORE gcore.cshttpd.28734 GOOD Core contains 293 segments ]

If the core is truncated, the user is alerted and required to answer yes/no as to whether they understand the data may be totally unusable.

\* Checking if corefile is truncated     [WARNING! core gcore.cshttpd.28734.truncated is truncated (40396864d bytes instead of 40397397d bytes)]
\*                                     
\* ALERT! The core was found to be truncated, it therefore may not be usable
\* pkgapp will continue to gather data and you can still upload the result to Sun
\*                                     
\* Do you understand the core may not be usable? [yes|no] yes
                                        [Answer was -> yes]
\* Continuing...  

If you use/have an RSS Reader such as NetNewsWire or  Google Reader you can see the latest updates to Big Admin using the following links.

Big Admin Scripts: Sun Java System Category

Big Admin MOTD

Next up...I plan to finalize and release DTR and begin production of my first SLX video for Dirtracer users!

Take care,

Lee

Thursday May 07, 2009

DTR (Dirtracer Reporter) is coming and other news!

Hi all,

Thought I would drop in a quick blog and let you know what I've been up to.

Directory Masters Event in Somerset New Jersey Lee Trujillo talking DirTracer at the Directory Master event in Somerset NJ

As mentioned in my last blog I was asked to present Dirtracer at the DM event.  The event, hosted by Michael Melore, Wajih Ahmed and Ludovic Poitou was great!  I was able to meet people I've only talked to on the phone or emailed over the past 5 years.  I was able to meet with Sun Partners and Sun Development and Support Engineers...

I presented an overview of Dirtracer and its components as well as my latest project, DTR (Dirtracer Reporter); more on this below.  I have to admit I was very nervous.  In past lives I was a pre-sales engineer and presentations like this were easy.  Needless to say I haven't presented to a large group since joining Sun; just not something I do in my Support role.

In the end I forgot about being nervous and soon was on a roll.

It was great meting you all, I had a blast.  See Ludo's Sketches blog for more info on this event.

Dirtracer Reporter (DTR)

Over the past couple of months I have been ratifying a few data reporting tools I and others have created into one nice package called DTR.  Dirtracer Reporter is a bundle of existing Directory Server data reporting scripts rolled into a single usable interface.  DTR shines when run against Dirtracer captured data, it however can run against basic Directory Server data such as dse.ldif and log files.

I am in the final test phase and making sure a slight glitch here and there are fixed and hope to release to Big Admin in the next coming weeks.  More to come on DTR soon.

Sun Learning Exchange (SLX)

One of my other sub projects which will kick off soon is the creation of SLX videos for Dirtracer.  These will be video podcast in nature and will show how Dirtracer is used for various problem types.

 Here is a list of topics I hope to highlight.

  • Dirtracer overview
  • Process Hangs
  • High CPU
  • Replication
  • Crashes
  • Memory Leaks
  • Server Down issues
  • Basic Captures
  • Configuration only captures
  • Configurator for Dirtracer usage

These videos will supplement my blog content as well as giving you audio and visual queues of how Dirtracer looks when running for a specific problem type.

Until my next blog...ciao!

Lee

Monday Apr 27, 2009

Presenting at the Directory Masters Event!

I've been so busy with new tools and support calls I almost forgot to mention...I'm presenting Dirtracer and my new DTR at the Directory Masters Event in New Jersey this coming Wednesday April 29th!

I have been wacky busy in the past coupe of months getting a new version of Pkgapp3,  and my new DTR (Dirtracer Reporter) ready for prime time...

If you happen to make it to the event this week, stop by and say hi!

Hope to see you there,

Lee

Monday Nov 24, 2008

Pkgapp 3.0 for Solaris Available!

Hello all,

I'm please to announce I have released Pkgapp 3.0 for Solaris!

So far Pkgapp 3.0 is only availabe for Solaris but I will begin porting this code to Linux and HPUX.  I wanted to get Solaris first instead of waiting for all to be done...this way at least one unix flavor is available for use.

You can obtain Pkgapp 3 from Big Admin: Pkgapp 3.0 Solaris

Sun Employees can also find information on Sunspace.

Included files:

  • Pkgapp version 3 Script
  • Pkgapp Reference Guide: SUN-GDD_Pkgapp_Reference_Guide3.0.0.pdf
  • Pkgapp Readme
  • GDD Readme
  • GDD License

For information on what changed, what the usage looks like and a runtime example (using mysql) see my last post - Pkgapp 3.0 code has been finalized!

Enjoy all!

Lee

Wednesday Nov 19, 2008

Pkgapp 3.0 code has been finalized!

Pkgapp3 is the latest update to the Sun GDD tool set. These tools are designed to help customers gather the data needed for the best possible support on their issues.

Pkgapp is a shell tool built by Sun Support Engineering to help gather all OS and application libraries used by Sun Java Systems (and others) servers at runtime.  Pkgapp can be used to gather these libs, etc, from cores produced by crashes or gcores captured manually by administrators.

Pkgapp is part of the GDD (Gathering Debug Data) suite of tools and has been used for years to help Sun Support debug tricky crashes, as well as performance and memory leak problems where cores or gcores are available.

New major features in Pkgapp 3.0:

1) Now handles Java Cores and grabs jstack and jinfo.
 - See the -j and -J options

2) Includes all functionality that was available in both pkg_app 2.7 and pkgcore.

3) Can allow for the temporary files to be retained in /tmp
 - See the -r option

4) Allows pkgapp3 to be placed in a $PATH location; can now be run from any location.

5) Can use relative paths for the following options:
 -c <core file OR pid of a running process>
 -p <full path to, but not including the process binary> (ns-slapd, imapd, httpd etc.)
 -s <Storage; path to store the final tar file in>

6) Retains all lib/binary paths (pkgcore) as well as the pkg_app app/ location for older pkg_app aware applications.

7) Names the final storage path as pkgapp-<date>-<index> to help administrators locate the final tar file faster.
 - Ex: pkgapp3-111708-01/

8) Gathers the following Solaris only proctool commands from the core.
 - pldd
 - pstack
 - pmap
 - pcred
 - pflags
 - pargs

9) Checks to see if the core file is truncated and checks the core files cksum to ensure the file size match.

10) Gathers the following core and pkgapp3 specific data
 - core file cksum
 - core file truncation (good/bad)
 - core file size (ls -l)
 - library list (from the pldd)
 - manifest log (all files included int he final tar)
 - pkgapp3 arguments used.

11) Gathers the following OS specific data
 - coreadm info
 - date/time
 - messages (from /var/adm)
 - pkginfo
 - showrev -p
 - /etc/release (Solaris)
 - ulimit
 - uname -a

12) Creates a coreinfo.sh script to allow Sun Engineers to quickly run "coreinfo" for ns-slapd processes (only).

13) Alerts administrators they must upload 2 files (pkgapp tar & core) separately when they do not use the -i (include core) switch.

14) Will exit with a fatal error if a pldd cannot be properly retrieved from the core and alerts the administrator to run pkgapp3 using a pid as the -c option.

15) Saves the pkgapp3 arguments and runtime log into /var/tmp/pkgapp-history/.  Helps administrators and pkgapp see previous runs.

16) Updated usage and alert messages.

 Here are a couple previews of how the new code works.

 Example: Usage output

\* ----------------------------------------------------------------------------------
\* Sun Microsystems RSD pkgapp 3.0 Solaris                               [11/24/2008]
\* ----------------------------------------------------------------------------------
pkgapp 3.0, a Sun Microsystems data gathering utility.
Usage:
  pkgapp [options] -c <core file | pid> -p <full path (path only) to process binary> -s [path to write tar file]

Required parameters:
 -c <core file OR pid of a running process>
 -p <full path to, but not including the process binary> (ns-slapd, imapd, httpd etc.)

Optional parameters:
 -i (Include a previously generated core file with the final tar.gz)
 -j (Javacore; process a java core)
 -r (Remove all temp files)
 -q (Quiet)
 -d (Debug)
 -J <JstackPath; path to the jstack (jdk) commands>
        defaults to /usr/jdk/instances/...
 -s <Storage; path to store the final tar file in>

usage:  pkgapp -c <name of the core file> -p <path to process binary>
usage:  pkgapp -c <pid of the running app> -p <path to process binary>

Examples: these are examples mixing various parameters

Directory Server
pkgapp -i -r -c ./core.14740 -p /var/mps/ds52p4/bin/slapd/server/64

Messaging Server
pkgapp -c ./core.3496 -p /opt/SUNWmsgsr/lib

Web Server
pkgapp -c ./core.1092 -p /space/iws70/lib -s /var/crash

Calendar Server
pkgapp -r -c ./core -p /opt/SUNWics5/cal/lib

Sendmail
pkgapp -i -c 512 -p /usr/lib

Mysqld
pkgapp -i -r -c ./core -p /support/mysql-5.0.41/bin

Example: Runtime output

\* ----------------------------------------------------------------------------------
\* Sun Microsystems RSD pkgapp 3.0 Solaris                               [11/24/2008]
\* ----------------------------------------------------------------------------------
\* OS release                            [5.10]
\* Platform                              [SUNW,Sun-Blade-2500]
\* Checking [-c] is a core or pid        [using pid 1409]
\* Process Root                          [/support/mysql-5.0.41/data]
\* Databin parameter [-s] checks         [reset to /var/tmp/dev]
\* Databin found                         [/var/tmp/dev/test/cores/storage]
\* Databin writable check                [success]
\* Databin used/created is               [/var/tmp/dev/test/cores/storage/pkgapp-112408-05]
\* Creating temp area                    [/tmp/pkgapp.2928/]
\* Pid used, no corefile to check       
\* Process binary                        [mysqld]
\* Checking usage history                [not recently run]
\* mysqld binary bit version             [32]
\* Checking path [-p] to binary name     [success, path != binary name]
\* Checking path [-p] is a directory     [success]
\* Locating mysqld                       [success]
\* Checking located mysqld is 32 bit     [success]
\* Binary located                        [/support/mysql-5.0.41/bin/mysqld]
\* Adding binary to pkgapp.pldd          [success]
\* Grabbing pldd                         [success]
\* Grabbing pstack                       [success]
\* Grabbing pmap                         [success]
\* Grabbing pcred                        [success]
\* Grabbing pflags                       [success]
\* Grabbing pargs                        [success]
\* Provide the full path and name to the core file
\* If you do not have a core, enter "none"
\* Example - /data/cores/core.1445:      /var/tmp/dev/cores/core.1409
                                        [Answer was -> /var/tmp/dev/cores/core.1409]
\* Grabbing [-i] core/gcore              [success]
\* Javatools [-j] not set                [skipped]
\* Grabbing /var/adm/messages            [success]
\* Grabbing uname -a                     [success]
\* Grabbing date/time                    [success]
\* Grabbing showrev -p                   [success]
\* Grabbing pkginfo -l                   [success]
\* Grabbing /etc/release                 [success]
\* Grabbing coreadm                      [success]
\* Grabbing ulimit                       [success]
\* Grabbing libs                         [success]
\* Making lib paths app/                 [success]
\* Making lib paths libs/                [success]
\* Linking libraries                     [success]
\* Libraries linked                      [62 ttl]
\*                                      
\* Using hostid for naming .tar.gz       [837872d0]
\* Writing file                          [pkgapp-837872d0-s4u-2500a-brm04-081124-095809.tar.gz]
\*                                      
\* Done gathering files                 
\* Writing dbxrc & opencore.sh files     [success]
\* Writing manifest-081124-095809.log    [success]
\* Writing pkgapp-args-081124-095809     [success]
\* Creating final tarfile                [success]
\* Compressing tarfile                   [success]
\* End of runtime logging               
\* Saving history info                   [/var/tmp/pkgapp-history/history-081124-095809.log]
\* Saving runtime log                    [/var/tmp/pkgapp-history/runtime-081124-095809.log]
\* Removing [-r] temp area/files         [removed]
\*                                      
\* Operations Complete                  
----------------------------------------------------------------------------------
Upload the following file(s) to your supportfiles.sun.com Cores Directory at Sun

1) File(s) located in directory /var/tmp/dev/test/cores/storage/pkgapp-112408-05

                [ pkgapp-837872d0-s4u-2500a-brm04-081124-095809.tar.gz ]



                                Thank you.
                                Sun Software Technology Service Center (STSC)


NOTES:
1) You can check for updates to this script here:
        BigAdmin - http://www.sun.com/bigadmin/scripts/indexSjs.html
2) Release Notes and Guides located here:
        Docs - http://docs.sun.com/app/docs/doc/820-0437
3) GDD information located here:
        Docs - http://www.sun.com/service/gdd/index.xml

4) Please send all Bugs and RFE's to the following address:
        Subject "pkgapp bug/rfe" - gdd-issue-tracker@sun.com

5) Please send all other questions etc to:
        Subject "pkgapp feedback" - gdd-feedback@sun.com


I will announce Pkgapp 3.0 for Solaris's availability in the next few days...stay tuned!

Lee

Monday Aug 18, 2008

Pkg_app 3.0 coming soon.

I have finally had time to revamp the now aging pkg_app 2.7.

I have made Pkg_app 3.0 bigger better faster and will now include all libraries gathered by the older (2006) pkgcore.sh many have used in the past.

More later but here is a sneal peak.

--------------------------------------------------------------------------------
\* Sun Microsystems RSD pkg_app 3.0b Solaris[08/18/2008]
--------------------------------------------------------------------------------
\* OS release                            [5.9]
\* Platform                              [SUNW,Sun-Blade-1000]
\* Using core                            [/var/tmp/cores/Pkg_app/core.28907]
\* Process Root                          [/data/sunone/ds52p5/slapd-config]
\* Process binary                        [ns-slapd]
\* ns-slapd binary bit version           [64]
\* Checking path to binary name          [success, path != binary name]
\* Checking path is a directory          [success]
\* Locating ns-slapd                     [success]
\* Checking located ns-slapd is 64 bit   [success]
\* Binary located                        [/data/sunone/ds52p5/bin/slapd/server/64/ns-slapd]
\* Adding binary to pkg_app.pldd         [success]
\* Grabbing pldd                         [success]
\* Grabbing pstack                       [success]
\* Grabbing pmap                         [success]
\* Grabbing pcred                        [success]
\* Grabbing pflags                       [success]
\* Grabbing pargs                        [success]
\* Grabbing core/gcore                   [success]
\* Grabbing /var/adm/messages            [success]
\* Grabbing uname -a                     [success]
\* Grabbing date                         [success]
\* Grabbing showrev -p                   [success]
\* Grabbing libs                         [success]
\* Making lib paths                      [success]
\* Linking libraries                     [success]
\*                                      
\* Databin parameter check 1             [success]
\* Databin parameter check 2             [success]
\* Databin Found                         [/var/tmp/mycore]
\* Databin used/created is               [/var/tmp/mycore/081808-02]
\* Using hostid for naming .tar.gz       [834d2699]
\* Writing file                          [pkg_app-834d2699-kaneda-080818-080751.tar.gz]
\*                                      
\* Done gathering files                 
\* Writing dbx files                     [success]
\* Creating final tarfile                [success]
\* Compressing tarfile                   [success]
\* End of runtime logging               
\* Operations Complete
--------------------------------------------------------------------------------
Upload this file to your supportfiles.sun.com Cores Directory at Sun
File located in directory /var/tmp/mycore/081808-02


                [ pkg_app-834d2699-kaneda-080818-080751.tar.gz ]


                                Thank you.
                                Sun Software Technology Service Center (STSC)


NOTES:
1) You can check for updates to this script here:
        BigAdmin - http://www.sun.com/bigadmin/scripts/indexSjs.html

2) Release Notes and Guides located here:
        Docs - http://docs.sun.com/app/docs/doc/820-0437

3) GDD information located here:
        Docs - http://www.sun.com/service/gdd/index.xml

4) Please send all Bugs and RFE's to the following address:
        Subject "pkg_app bug/rfe" - gdd-issue-tracker@sun.com

5) Please send all other questions etc to:
        Subject "pkg_app feedback" - gdd-feedback@sun.com


root[/var/tmp/mycore/081808-02]#ls -l
total 38798
drwxrwxrwx   3 root     other        512 Aug 18 08:08 core-lib-data
-rwxrwxrwx   1 root     other        487 Aug 18 08:08 opencore
drwxrwxrwx   3 root     other        512 Aug 18 08:08 os-info
-rwxrwxrwx   1 root     other    19828244 Aug 18 08:08 pkg_app-834d2699-kaneda-080818-080751.tar.gz
drwxrwxrwx   2 root     other        512 Aug 18 08:08 pkgapp-info
drwxrwxrwx   2 root     other        512 Aug 18 08:08 proctool-info
-rwxrwxrwx   1 root     other       1575 Aug 18 08:08 runtime-080818-080751.log

 Beta testing within my group should begin this week.

Cheers!

Lee

Wednesday Jul 09, 2008

Suggestion Box

Hello all,

Just a quick update on the Dirtracer Blog front. I haven't forgotten the blog nor those who are waiting for updates...I however have been hit with a couple extremely tough support cases that took a ton of my time.

I decided to set up a Suggestion Box page where readers can suggest topics for me to cover in this blog.

Making a Suggestion is quick and easy.  You simply make a Comment to this Page in the Comment on this page. 

Comment how-to:

  1. Click this Comments link -> Comments
  2. Scroll down to the Comments: area
  3. Type in your Name, Email and the Suggestion in the Comment area
  4. Fill in the anti-spam answer and hit Post

Simple.  I will evaluate the suggestion and if it's a valid topic I will post a blog on the topic on Fridays.

Thanks all and I hope to see many suggestions.

LT

Tuesday Jun 10, 2008

Dirtracer + Core/Gcore = Separate upload for Pkg_app data.

Hello all,

Every now and then I run into an issue where the customer used Dirtracer to grab a Gcore (or two) or a Core file from a crash.  Dirtracer users are forgetting to upload the separate pkg_app tar.gz file when one is generated.

It is not a problem per-say as a it is users missing a critical message that is displayed at the end of a Dirtracer run when a successful Pkg_app file has been created.

Lets take a look at the messages I am talking about.

Suppose you define one of the following parameters in the dirtracer.config file:

CORE_LOCATION
GRAB_GCORE


Lets use GRAB_GCORE for a quick test; lets grab one Gcore.

GRAB_GCORE="1"


We then run Dirtracer as normal.

#./dirtracer -f ./dirtracer.config

--------------------------------------------------------------------------------
Sun Microsystems dirtracer 6.0.6 Solaris Sparc                        06/09/2008
--------------------------------------------------------------------------------


Dirtracer run through it's Main Loop and grabs one Gcore.

\* Entering Main Performance Gathering Loop
\*                                      
\* Loop 0 - 080609-131950               
\* Loop 1 - 080609-131956               
\* Loop 2 - 080609-132001               
\* Loop 3 - 080609-132007               
\* Loop 4 - 080609-132012               
\* togo[5sec]-timer[5]
\*                                      
\* Grabbing gcore 080609-132018          [success]


Because Dirtracer has noted the presence of the grab gcore parameter it will automatically run Pkg_app to gather the needed OS and DS libraries needed for debugging. 

By default Dirtracer waits 120 seconds for Pkg_app to complete.  If for some reason the Core/Gcore's header has an overrun (can't determine contents normally) or has other errors like it is truncated, Dirtracer will kill the Pkg_app gathering phase.  Users should check the Pkg_app tar contents to see if all libs were gathered and take appropriate actions.

\* Packaging files                      
\*   Preparing files - pkg_app           [waiting 120 sec.]      [success]


After the main Dirtracer tar file is created it will display the Operations Complete message and inform the user of which files he needs to upload to Sun.

As you can see below Dirtracer informs you there is both a Dirtracer and "separate" Pkg_app file which needs to be uploaded.  Users should pay special attention to the End Messages displayed when Dirtracer finishes.  Most users miss the second file (Pkg_app) requiring separate upload.


Operations Complete
--------------------------------------------------------------------------------
1) Dirtracer capture data located in directory [ /var/tmp/data/060908-04 ]

Upload "only" this file to your supportfiles.sun.com cores directory at Sun

        [ dirtracer-834d2699-mysystem-080609-132018.tar.gz ]

2) pkg_app has captured system libs as the result of a gcore or core being found
                As this file may be large, please ftp this separately to Sun
                pkg_app file located in /var/tmp/data/060908-04/pkg_app_data

                [pkg_app834d2699-mysystem-080609-132051.tar.Z ]


You might ask why is Pkg_app's tar simply not included with the main Dirtracer tar file.  The answer is simple...size.  Dirtracer captures by itself can be big if a customer does not archive their log files on a heavily used system and the final size can be massive if a Core/Gcore is captured.  It simply wasn't an option to include a big Pkg_app file with an already bigger Dirtracer file.

[LT]

Wednesday Jun 04, 2008

Using Dirtracer for Proactive Analysis

BACKGROUND:

One of our Customers was seeing a performance issue with their Siteminder enabled applications each day around 4:30pm.  While the Customer was using Dirtracer to capture event data, the time it took for Administration staff to see the alerts made capturing the event difficult.  The event, when it occurred took approximately 60 seconds from start to finish.  By the time the Dirtracers were started, the event had already passed 9-10 minutes before.

The Customer setup their BMC Control M monitoring facility so they could automate a 4 way Dirtracer capture at the same time when the event was noticed.

NOTE: they did not see any alerts/issues with the Directory or Siteminder Servers at this time but they launched Dirtracer via BMC Control M as a precaution only.

With this said, I analyzed the Dirtracers thinking there was a problem.

SYSTEM INFO:

1) The Customer captured data from 4 systems at the same time; 01, 02, 03 and system 04.

01    #dirtracer Started:     080422-174012
02    #dirtracer Started:     080422-174013
03    #dirtracer Started:     080422-174127
04    #dirtracer Started:     080422-174126


2) Connection Load.  I see not all servers are equal here; 01 and 04 are taking a lot less of a load than 02 and 03.

01    91 connections average
02    185 connections constant
03    171 connections constant
04    124 connections average


Summary Connection Snippets: taken from cn=monitor searches.

               Conn      Ttl           Curr      Read    ReqQue     OpsInitiated vs.
File            Peak  Th  Cons          Cons      Waitr   BakLog     OpsCompleted
====================================================================================
01:
[080422-175815] 1984   24  525136       101       0       0          332685-332674=11
[080422-175852] 1984   24  525143       102       0       0          332906-332899=7
[080422-175928] 1984   24  525150       103       0       0          333155-333140=15
[080422-180005] 1984   24  525159       87        0       0          331583-331576=7
[080422-180043] 1984   24  525167       82        0       0          331188-331186=2

02: Summary Snippet
[080422-174222] 1984   24  502241       185       0       0          2155474-2155468=6
[080422-174227] 1984   24  502245       185       0       0          2156076-2156073=3
[080422-174232] 1984   24  502249       185       0       0          2156687-2156680=7
[080422-174238] 1984   24  502253       185       2       2          2157573-2157565=8
[080422-174243] 1984   24  502259       185       1       0          2158588-2158583=5

03:
[080422-174513] 2256   24  503749       171       0       0          722384-722378=6
[080422-174518] 2256   24  503755       171       0       0          722607-722603=4
[080422-174522] 2256   24  503759       171       1       0          722890-722886=4
[080422-174527] 2256   24  503762       171       0       0          723230-723226=4
[080422-174531] 2256   24  503767       173       0       1          723506-723498=8

04:
[080422-174633] 8111   24  497842       123       1       0          383898-383884=14
[080422-174644] 8111   24  497848       123       0       2          384516-384511=5
[080422-174655] 8111   24  497855       124       0       1          384996-384993=3
[080422-174706] 8111   24  497860       124       0       1          385318-385313=5
[080422-174717] 8111   24  497864       124       0       0          385745-385739=6


3) The CPU % of each server mostly mirrors the load differences seen from above;  01 and 04 are taking a lot less of a load than 02 and 03.

01    1% average busy
02    16% average busy
03    20% average busy
04    8% average busy

CPU (pcpu) taken from the pms.sh (ps) file.

01:
04/22-[17:41:49] ------- 12489 5370536 5331048  1.1    growth:   0 k

02:
04/22-[17:41:39] ------- 6236 2216344 2193576 15.3    growth:   8 k

03:
04/22-[17:42:41] ------- 7930 2214432 2192016 21.3    growth:   -40 k

04:
04/22-[17:42:48] ------- 10554 3717880 3695216  9.1    growth:   -360 k


4) Iostats (disk performance).  Since the servers are using Veritas File System I cannot tell which disk slices the DS runs on versus which disks show higher than normal iostat's.

The DS is installed in the following location for all servers:

- /dev/vx/dsk/app/dsserver 1048576   18603  965640     2%    /opt/servers/slapd-master

01    Disks peak to 61% busy    (1 time)  GOOD
02    Disks peak to 58%    busy    (1 time)  GOOD
03    Disks peak to 100% busy    (2 times)  FAIR
04    Disks peak to 100% busy    (8 times)  BAD

As a result of the 100% disk busy times the average disk io service time and service wait times increase.

01    avg service time 8.8/ms - wait service 0/ms
02    avg service time 35/ms - wait service 0/ms
03    avg service time 199.5/ms - wait service 0/ms
04    avg service time 105.6/ms - wait service 373.5/ms
01
Tue Apr 22 17:41:07 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0    2.0    0.0    0.0  0.0  0.0    0.0    3.6   0   0 c1
   1.0    4.0    0.0    0.1  0.0  0.0    0.0    8.7   0   0 c2
  61.0   34.0    0.6    1.4  0.0  0.9    0.0    9.9   0  10 c4
 102.0   47.0    1.1    2.0  0.0  1.3    0.0    8.7   0  10 c5
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.5   0   0 c4t0d1
  61.0   32.0    0.6    1.4  0.0  0.9    0.0   10.1   0  60 c4t0d2
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.4   0   0 c4t0d5
 102.0   44.0    1.1    1.9  0.0  1.3    0.0    8.8   0  61 c5t1d2
   0.0    3.0    0.0    0.1  0.0  0.0    0.0    0.7   0   0 c5t1d5
   0.0    2.0    0.0    0.0  0.0  0.0    0.0    3.6   0   1 c1t2d0
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    8.5   0   3 c2t2d0
   1.0    0.0    0.0    0.0  0.0  0.0    0.0    9.5   0   1 c2t0d0

02
Tue Apr 22 17:40:37 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   1.0   23.0    0.0    1.2  0.0  1.0    0.0   41.0   0   2 c1
   0.0   19.0    0.0    1.1  0.0  1.2    0.0   61.1   0   2 c2
   3.0  701.0    0.0    5.8  0.0 24.6    0.0   34.9   0  20 c4
   1.0  648.0    0.0    5.3  0.0 23.5    0.0   36.3   0  20 c5
   0.0    4.0    0.0    0.0  0.0  0.0    0.0    9.6   0   1 c4t0d0
   1.0  351.0    0.0    2.9  0.0 12.4    0.0   35.1   0  58 c4t0d2
   2.0  290.0    0.0    2.3  0.0  9.7    0.0   33.3   0  50 c4t0d3
   0.0   56.0    0.0    0.5  0.0  2.5    0.0   43.9   0  10 c4t0d5
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    7.7   0   1 c5t1d0
   0.0  323.0    0.0    2.6  0.0 11.7    0.0   36.4   0  57 c5t1d2
   1.0  269.0    0.0    2.2  0.0  9.1    0.0   33.8   0  54 c5t1d3
   0.0   52.0    0.0    0.4  0.0  2.6    0.0   50.2   0   9 c5t1d5
   1.0    0.0    0.0    0.0  0.0  0.0    0.0    5.4   0   1 c1t8d0
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    5.9   0   2 c1t2d0
   0.0    9.0    0.0    0.6  0.0  0.6    0.0   61.6   0  13 c1t12d0
   0.0   10.0    0.0    0.6  0.0  0.4    0.0   40.0   0   9 c1t9d0
   0.0    8.0    0.0    0.5  0.0  0.6    0.0   76.2   0  13 c2t12d0
   0.0   11.0    0.0    0.6  0.0  0.6    0.0   50.1   0  12 c2t9d0

03
Tue Apr 22 17:42:06 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0   34.0    0.0    1.5  0.0  1.7    0.0   49.7   0   3 c1
   0.0   33.0    0.0    1.7  0.0  2.1    0.0   63.3   0   3 c2
   0.0  556.0    0.0    4.5  0.0 89.8    0.0  161.5   0  23 c4
   2.0  279.0    0.0    3.2  0.0 12.2    0.0   43.3   0  15 c5
   0.0    9.0    0.0    0.1  0.0  0.0    0.0    5.3   0   3 c4t8d0
   0.0  103.0    0.0    0.9  0.0  1.1    0.0   11.1   0  38 c4t10d0
   0.0  444.0    0.0    3.5  0.0 88.6    0.0  199.5   0 100 c4t11d0
   2.0  249.0    0.0    2.0  0.0 11.0    0.0   43.8   0  76 c5t10d0
   0.0    9.0    0.0    0.1  0.0  0.0    0.0    4.1   0   3 c5t11d0
   0.0   21.0    0.0    1.1  0.0  1.1    0.0   53.6   0  13 c5t13d0
   0.0   10.0    0.0    0.6  0.0  0.7    0.0   73.7   0  13 c2t11d0
   0.0    8.0    0.0    0.4  0.0  0.4    0.0   52.3   0  11 c2t9d0
   0.0   15.0    0.0    0.7  0.0  0.9    0.0   62.3   0  12 c2t12d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    6.8   0   1 c1t2d0
   0.0   11.0    0.0    0.6  0.0  0.5    0.0   44.1   0  13 c1t11d0
   0.0   14.0    0.0    0.6  0.0  0.8    0.0   55.2   0  12 c1t9d0
   0.0    7.0    0.0    0.3  0.0  0.4    0.0   59.3   0  12 c1t12d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   12.0   0   1 c1t3d0

04
Tue Apr 22 17:42:23 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0  299.1    0.0    2.3 100.8 29.8  337.1   99.6  16  17 c1
   0.0  349.1    0.0    2.7 107.5 29.6  308.0   84.9  16  17 c2
   0.0    5.0    0.0    0.1  0.0  0.0    0.0    4.5   0   0 c4
   0.0   48.0    0.0    2.6  0.0  0.6    0.0   12.5   0   2 c5
   0.0    5.0    0.0    0.1  0.0  0.0    0.0    4.5   0   2 c4t8d0
   0.0    2.0    0.0    0.0  0.0  0.0    0.0   12.4   0   2 c5t8d0
   0.0   22.0    0.0    1.2  0.0  0.3    0.0   12.2   0   4 c5t9d0
   0.0   22.0    0.0    1.2  0.0  0.3    0.0   13.7   0   4 c5t10d0
   0.0    1.0    0.0    0.1  0.0  0.0    0.0    4.7   0   0 c5t11d0
   0.0    1.0    0.0    0.1  0.0  0.0    0.0    1.8   0   0 c5t12d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   12.5   0   1 c2t11d0
   0.0  181.1    0.0    1.4 51.9 14.6  286.7   80.7  92 100 c2t0d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   11.5   0   1 c2t13d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    4.1   0   0 c2t4d0
   0.0  164.0    0.0    1.3 55.6 15.0  339.0   91.4 100 100 c2t10d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    7.4   0   1 c2t5d0
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    6.5   0   3 c1t11d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    7.3   0   1 c1t13d0
   0.0  142.0    0.0    1.1 53.0 15.0  373.2  105.6 100 100 c1t0d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   12.3   0   1 c1t4d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    6.4   0   1 c1t3d0
   0.0  148.0    0.0    1.1 47.8 14.7  322.9   99.5  92 100 c1t10d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    7.7   0   1 c1t2d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    9.5   0   1 c1t5d0


5) Other ns-slapd instances running on these servers.  There are 5 total directory server instances on each of the 4 servers so I cannot see of the other ns-slapd instances are loading the disks to the levels seen above.  Significant load on the unknown instances could impact the disk used by the known instances and affect performance.

6) Dirtracer itself is not able to complete a normal MainLoop check in the normal time as configured; Dirtracer was confiured to loop trough each check in the MainLoop at 1 second INTERVALs. 

Each loop executes a pstack, prstat, netstat, cn=monitor search and 1 x backend cache info search (1 backend in this case).  Each of these data points are very lightweight and complete sub-second.

On Average each server completes a Loop in 5-41 seconds each.  It is unknown why these servers cannot complete these loops as fast as configured.

See the following examples: each Loop shows the date/time stamp when the loop starts...note how many seconds had passed between each loop (should have been 1 second between loops).

01
\* Loop 0 - 080422-174027             
\* Loop 1 - 080422-174106             
\* Loop 2 - 080422-174148             
\* Loop 3 - 080422-174228            
\* Loop 4 - 080422-174308           

02
\* Loop 0 - 080422-174030             
\* Loop 1 - 080422-174036             
\* Loop 2 - 080422-174041            
\* Loop 3 - 080422-174045             
\* Loop 4 - 080422-174050           

03
\* Loop 0 - 080422-174143             
\* Loop 1 - 080422-174148            
\* Loop 2 - 080422-174153            
\* Loop 3 - 080422-174158             
\* Loop 4 - 080422-174203

04
\* Loop 0 - 080422-174139             
\* Loop 1 - 080422-174150             
\* Loop 2 - 080422-174201             
\* Loop 3 - 080422-174213             
\* Loop 4 - 080422-174225 

LDAP INFO:

1) A quick check on the Dirtracer search etimes shows each are sub-second.

On 04 only a few were 0.1 to .2 seconds.  The following were taken from access log snippets captured.

root[/customerdata/6xxxxxxx/Apr22/04-dirtracernxcspld4-080422/ds-logs/working]#cat conn-49\* | awk '{print $11}' | grep etime | grep -v etime=0\\.0
etime=0.246540
etime=0.118270
etime=0.110770
etime=0.143630

2) Using the logs I found there was a Siteminder Policy Server that was configured to use an improper search base (dc=com).  For this data set only 01 doesn't have any Siteminder searching using an incorrect base; 02, 03 and 04 all to.  04 having the most.

01:    None
02:    4604            dc=com searches
03:    5457            dc=com searches
04:    13749          dc=com searches


I as able to track down the IP in only 1 server (04) as the connections in the others were already running.  I need to check old logs to see if there are other IP addresses these are coming from.

10.10.10.1

[22/Apr/2008:17:04:52 -0400] conn=496746 op=-1 msgId=-1 - fd=79 slot=79 LDAP connection from 10.10.10.1 to 10.10.10.181
[22/Apr/2008:17:04:52 -0400] conn=496746 op=0 msgId=1 - BIND dn="uid=siteminder,ou=service,dc=xxxxxx,dc=com" method=128 version=3
[22/Apr/2008:17:04:52 -0400] conn=496746 op=0 msgId=1 - RESULT err=0 tag=97 nentries=0 etime=0.000910 dn="uid=siteminder,ou=service,dc=xxxxxx,dc=com"
...
[22/Apr/2008:17:04:52 -0400] conn=496746 op=3 msgId=4 - SRCH base="dc=com" scope=2 filter="(&(uniqueMember=uid=userremoved,ou=People,dc=xxxxxx,dc=com)(objectClass=groupOfUniqueNames\*))" attrs="cn"
[22/Apr/2008:17:04:52 -0400] conn=496746 op=3 msgId=4 - RESULT err=32 tag=101 nentries=0 etime=0.000540
...
[22/Apr/2008:17:04:52 -0400] conn=496746 op=5 msgId=6 - SRCH base="dc=com" scope=2 filter="(&(uniqueMember=uid=userremoved,ou=People,dc=xxxxxx,dc=com)(objectClass=groupOfUniqueNames\*))" attrs="cn"
[22/Apr/2008:17:04:52 -0400] conn=496746 op=5 msgId=6 - RESULT err=32 tag=101 nentries=0 etime=0.000480
...
[22/Apr/2008:17:04:52 -0400] conn=496746 op=7 msgId=8 - SRCH base="dc=com" scope=2 filter="(&(uniqueMember=uid==userremoved,,ou=People,dc==xxxxxx,,dc=com)(objectClass=groupOfUniqueNames\*))" attrs="cn"
[22/Apr/2008:17:04:52 -0400] conn=496746 op=7 msgId=8 - RESULT err=32 tag=101 nentries=0 etime=0.000460
...


The above connection (conn=496746) contains searches using the incorrect base 1195 times alone.

root[/customerdata/6xxxxxx/Apr22/04-dirtracernxcspld4-080422/ds-logs]#gc err=32 conn-496746.log
1195


3) High etimes / high etime servers.   Overall server 04 is the biggest high etime server, followed by 01.

Top 5 High Etimes

01
etime=413.555070    1      
etime=300.814310 notes=F    1      
etime=115.912590    1      
etime=39.566400    1     
etime=39.553510 notes=F    1       

02
etime=8.719340    1      
etime=6.210240    1      
etime=6.101450    1      
etime=5.814920    1      
etime=5.802740    1    

03
etime=7.325220    1      
etime=5.766010    1      
etime=5.558690    1      
etime=5.186940    1      
etime=5.147010    1

04
etime=1876.043450 notes=U    1      
etime=1609.753640 notes=U    1      
etime=1275.956850 notes=U    1      
etime=1251.870940 notes=U    1 

We now know 04's highest offending etimes come from 4 cn=\* searches from a certain application.  IP 10.10.10.225

nentries=598115                 4
[22/Apr/2008:16:48:50 -0400] conn=496301 op=-1 msgId=-1 - fd=47 slot=47 LDAP connection from 10.10.10.225 to 10.10.10.181
[22/Apr/2008:16:48:50 -0400] conn=496301 op=0 msgId=1 - BIND dn="uid=appl100,ou=service,dc=xxxxxx,dc=com" method=128 version=3
[22/Apr/2008:16:48:50 -0400] conn=496301 op=0 msgId=1 - RESULT err=0 tag=97 nentries=0 etime=0.002810 dn="uid=appl100,ou=service,dc=xxxxxx,dc=com"
[22/Apr/2008:16:48:50 -0400] conn=496301 op=1 msgId=2 - SRCH base="ou=people,dc=xxxxxx
,dc=com" scope=1 filter="(cn=\*)" attrs="uid givenName sn mail region modifiers uid modifyTimestamp newOSUser OSmanaged OSstatus"
[22/Apr/2008:17:15:39 -0400] conn=496301 op=1 msgId=2 - RESULT err=0 tag=101 nentries=598115 etime=1609.753640 notes=U
[22/Apr/2008:17:15:40 -0400] conn=496301 op=2 msgId=3 - UNBIND
[22/Apr/2008:17:15:40 -0400] conn=496301 op=2 msgId=-1 - closing - U1
[22/Apr/2008:17:15:40 -0400] conn=496301 op=-1 msgId=-1 - closed.


CONFIGURATION INFO:

1) The Dirtracer captures highlighted differences in cache settings.  Both servers 02 and 03 have 1GB total whereas servers 01 has 3 and 04 has 2GB cache.

01
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  2,441,406 Kb  (2500000000 bytes)
       --------------------------------------------------
       Total Cache Size:   3 Gb,  3,408 Mb,  3,489,982 Kb,  (3573741824 bytes)

02
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  1,024,000 Kb  (1048576000 bytes)
       --------------------------------------------------
       Total Cache Size:   1 Gb,  2,024 Mb,  2,072,576 Kb,  (2122317824 bytes)

03
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  1,024,000 Kb  (1048576000 bytes)
       --------------------------------------------------
       Total Cache Size:   1 Gb,  2,024 Mb,  2,072,576 Kb,  (2122317824 bytes)

04
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  1,126,400 Kb  (1153433600 bytes)
       --------------------------------------------------
       Total Cache Size:   2 Gb,  2,124 Mb,  2,174,976 Kb,  (2227175424 bytes)



2) None of the four servers uses a tmpfs (tmp file system) for its db home; this affects performance on Unix systems.

"UNIX only. Used to fix a situation on UNIX platforms where the operating system endlessly flushes pages. This flushing can be so excessive that performance of the entire system is severely degraded."

   See: nsslapd-db-home-directory http://docs.sun.com/source/816-6699-10/plugattr.html

Each is using the main veritas partition for the db home.

   db-home-directory: /opt/servers/slapd-master/db

3) Each server used VXFS (Veritas File System) which has known fsync (file sync) problems when flushing Dirty Pages from Memory/Cache to the DB's.  This is done mostly during the transaction checkpoint txn_checkpoint.  The resulting fsync problem results in an overall slowdown of the DS process which will affect etimes on all other operations while the checkpoint occurs.  I have seen this on Veritas 4.4 and the customer is running an even older version.

   NAME:  VERITAS File System
   VENDOR:  VERITAS Software
   PSTAMP:  VERITAS-3.4FS-2003-01-29PID=110435-08

The problem can be seen using the cn=monitor search in the DS to show read waiters and the request queue backlog.  See the cn=monitor summary snippet at the top (above).  Anything more than 0 for either is not optimal.  Each of the Four servers has between 1 and 3 read waiters or request queue backlogs at any given time.  Generally each is running with 0 most of the time.

4) I asked a while ago if the customer did the following.

  • Upgraded from Directory Server Patch 4 - YES
  • Used the ActivateFix script in Patch 6 - NO


I saw these errors in the logs from servers 01, 03 and 04.

[22/Apr/2008:17:47:00 -0400] - WARNING<10307> - Replication  - conn=-1 op=-1 msgId=-1 - Configuration warning  Tombstone purging internal search is unindexed in replica dc=xxxxxx,dc=com


It is possible they have hit bug  6175472 and are seeing some result of the tombstone purging performance hit.  Allids threshold is only 4000 on these servers.  Tombstones are DELeted entries waiting to be purged.  If the number of deleted entries has gone above 4k the objectclass index becomes allids and the searches for the tombstone entries is unindexed and takes a long time.

6175472 MMR: High performance degradation when purging tombstones

I verified the dse.ldif's to see if the fix was enabled.  Part of the fix places a nsMatchingRule oid in the indexes entries as seen in the dse.ldif.  The Matching Rule is NOT in any indexes of any of the servers dse.ldif

nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150


01:
root[/customerdata/6xxxxxxx/Apr22/01-dirtracernxcspld1-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

02:
root[/customerdata/6xxxxxxx/Apr22/02-dirtracernxcspld2-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

03
root[/customerdata/6xxxxxxx/Apr22/03-dirtracernxcspld3-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

04
root[/customerdata/6xxxxxxx/Apr22/04-dirtracernxcspld4-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

Even though the above data was capture when there was no Directory or  Siteminder server problems, the Dirtracer data captured helped to highlight the following

  1. problematic/incorrect directory server configuration/architecture
  2. possible load balancing problems; i.e. sending more connections two two server as opposed to completely load balanced.
  3. disk capacity/io peak issues possibly from other heavily loaded ns-slapd instance
  4. incorrect application configuration
  5. overall etimes of the servers

[LT]

Tuesday May 27, 2008

Pkg_app in a Clustered Environment

Hello all,

I was recently contacted by a Sun Customer Advocate regarding Pkg_app, Dirtracer and Clustered environments.

First some up front info on Pkg_app.  Pkg_app was originally created by Rick Zagorski and I took over its development in 2006.  I took what Rick had coded and added to it specifically (at the time) to help with Directory Server core/gcore files.  This was released as pkg_app 2.0 and was a mash of Ricks code with some additions of mine.  As you know I created Dirtracer (Stracer) in 2004 and continue its development to help customers gather data on their Directory Server problems. 

The Customer was having issues with Pkg_app and Dirtracer when they attempted to use it in a Clustered Environment.   Even though the customer was running an old version of Pkg_app, Pkg_app and Dirtracer are "not" tested nor certified for use in a Clustered Environment.  I've have only had one prior request to have pkg_app/dirtracer be Cluster aware so it has not been on the top of the revision list.

I will be revamping the Pkg_app code to take into account all libraries etc. that Pkgcore gets for use with Java Cores and will look into the Cluster issue.

More news on this later.

[LT]

Wednesday May 21, 2008

Checks, Intervals and Loops Oh My - Part 3

This is my final blog on the Number of Checks and Interval settings within Dirtracer.

Lets discuss one of the most basic elements of Dirtracer...the Main Performance Gathering Loop and  two if its governing parameters.

        NUMBEROFCHECKS and INTERVAL

Definitions:

    NUMBEROFCHECKS    # Number of checks: total number of loops
    INTERVAL            # Interval: seconds between loops

These two parameters tell Dirtracer how long to run and how many Main Loop data points to gather.  The settings for these two parameters are totally dependent on the problem type you are gathering for.

The following are the data points normally gathered in the main loop; these are all of course configurable whether all, some or none are gathered all the time.

  • netstat info
  • ls -la from the transaction logs
  • pstacks
  • prstats
  • cn=monitor searches
  • main cache searches; 1 search per loop.
  • db cache searches; 1 search per backend per loop
  • gcore(s)


As mentioned above an Admin can set the Number of Checks or "data points" to gather.  Lets look at some example problem types versus the settings you may want to capture them with.

The Number of Checks and Interval parameters govern how many loops and how long the Main Loop runs for however they are also used when executing the following commands.  These commands are launched before the Main Loop is engaged and will run in the background while the Main Loop is active.

  • Iostats
  • Vmstats
  • Mpstats
  • Pms.sh

See the man pages for iotstat, vmstat and mpstat for more information on these commands.

Please see my previous blog on pms.sh (Pmonitor) for more information regarding this script.


[LT]
About

A Tech Blog about the Sun Java Systems Dirtracer Toolkit. Dirtracer and this blog written and maintained by Lee Trujillo an Oracle Senior Principal Support Engineer.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today