Wednesday May 25, 2011

DT (DirTracer) 7.1 has been unleashed!

DT 7.1 is finally here!

Photo © Lee TrujilloIt has been over a year since I released 7.0 and I am pleased to announce 7.1 is finally here. It was a long time coming, but I'm happy with the results.  DT 7.1 has fixes, re-branding and new enhancements including the new Pkgapp 3.5.1 which will help solve Directory Server issues.  There were quite a few items I was hoping to add but as development goes, it's easy to keep adding and adding features. I had to place a line in the sand and release it.

DT 7.1 for Solaris and Linux are now both available from Oracle Knowledge Document 1291443.1

Changes in DT 7.1

DirTracer:

  • Rebranded to Oracle from Sun Microsystems.
  • Included the new Pkgapp 3.5.1.
  • Fixed checkBitVersion to determine correct binary bitversion.
  • Fixed checkDSType for carpet issues.
  • Fixed locateLdapTools to locate tools properly.
  • Created locateCarpet to properly find the carpet command.
  • Gather release file from /etc for all Linux'.
  • Fixed how ns-slapd file is located for 64 bit versions (linux).
  • Created an appCheck function to locate system commands.
  • Fixed DT from trying to grab a gcore when gdb is not present. (linux)
  • Minor bug fixes.
  • Removed the included ldapsearch and ldapmodify commands and their respective libaries from the tar as they are no longer needed.

Configurator:

  • Rebranded to Oracle from Sun Microsystems.
  • Fixed checkBitVersion to match DT7.
  • Changed old LOG_IF_HIGH_CPU attrs to HIGH_CPU etc.
  • Changed how I deal with the Instance Selector. Cleaner interface.
  • Removed the need to select gathering logs when option 7 ot 8 are selected.
  • Fixed bug 6965060 for customers that have massively long instance path names. - 6965060 DirTracer: 7.0 Configurator cannot locate dse.ldif if the full inst path is > 80 chars: a ps limit.

 Compressor:

  • Compressor is a small script used to tar/gzip captured files after the fact if the SKIP_TAR_GZIP option was used. This is version 1.0 and is included in this bundle.  See README.FIRST for more information.

Enjoy all!

Lee

Thursday Mar 10, 2011

DT7.0 and Pkgapp 3.4 Available via the My Oracle Support portal

Due to popular demand...

Photo © Lee TrujilloI've added the DirTracer and Pkgapp tar files to a Knowledge Portal document so customers now can access and download from a more reliable location.  

Please login to the My Oracle Support portal and search for the following Doc ID.

  • DirTracer7 (DT7) - Oracle Support Doc ID 1291443.1
  • Pkgapp 3.4 - Oracle Support Doc ID 1274584.1

This document is a truncated copy of the included .pdf file included with the Pkgapp bundle.  See the "Download" links near the bottom.

Cheers!

Lee

Wednesday Dec 16, 2009

The New DirTracer (DT7) and Pkgapp (3.4) are here!

Just in time for the holidays!

I am pleased to finally announce new versions of not only Pkgapp, but DirTracer as well.Photo © Lee Trujillo

Updated: March 03, 2011 with the Oracle Support Doc ID's for these.

  • DirTracer7 (DT7) - Oracle Support Doc ID 1291443.1
  • Pkgapp 3.4 - Oracle Support Doc ID 1274584.1

Both products have gone through some serious revisions to make them not only easier on the processes being tracked, but also to make their captured data slimmer. Here is a complete list of the changes to each product.  Please see the included Reference Guide for more detail.

DT7 (DirTracer 7.0)

Improvements

  1. Directory Server 7 compatible. DT7 can work with directory server versions 5.1, 5.2.x, 6.x as well as 7.0.
  2. Data capture size reduced by ½. Disk usage with a Basic Capture in DT6.0.7 was 4169 kb.  The same capture size is now 2191 kb.
  3. Internal tar files removed.
  4. Safer when used against ds processes...removed pflags, pldd, pmap, pfiles as they can create locking issues with the ds process on some os versions.  Currently the only use of these proctools are against gcores or cores if provided.
  5. Checks available databin space when a gcore is requested.  Will disable GRAB_GCORE if space is needed.  Helps customers from inadvertently filling disk space to 100% with a gcore.
  6. New validate data script provided.  Checks the manifest against the captured data.
  7. Now uses Pkgapp 3.3.
  8. Pkgapp captures are now included with final tar.  Customers no longer have to upload a second file for data analysis. 
  9. DT7 now has an epoch check feature alerting the user if the DT7 version being used is more than 120 days old.   A simple alert to have the customer check Big Admin for a newer version.
  10. Renamed the databin from a numbered path to include the name dirtracer.
  11. Changed the name of some config parameters.
  12. Changed how the high cpu tracking is seen.
  13. New data captured.
  14. Added an overall timer which is displayed at the end of the DT7 run.
  15. Renamed the README.dirtracer to README.changelog
  16. Created a text based README.FIRST.  This is a Quickstart Guide if you don't have time to read the full Reference Guide.
  17. Renamed the command file from "dirtracer" to dt7.  Also renamed the config file from "dirtracer.config" to "dt.conf" to make commands easier to type.

Fixes

  1. Fixed the nsds50ruv search to be a scope sub search as opposed to one.
  2. Fixed (enhanced) the Control C feature so users can Control C once to exit the Main Loop; this advances DT7 to the Post Loop data gathering phase.  Control C a second time allows the user to exit even the Post Loop.
  3. Fixed a minor issue with how the values were store in the dt.conf.last file for the GATHER_N_XXXXX_LOGS parameters.
  4. Fixed the paths to check “carpet” type for Directory Server 7.
  5. Fixed an issue in Configurator which wouldn't return the stty echo if you used Control C.
  6. Fixed an issue where DT7 couldn't derive the correct bit mode (32 or 64bits) when SERVER_DOWN=1.  Both DT7 and Configurator ask the user for this information if it is not available.
  7. Fixed an issue where Pkgapp isn't gathered if DT7 is set to capture 2 gcores and is exited before the second gcore is gathered.

Pkgapp 3.3

Improvements

  1. Added a new parameter -l.  This takes a known good pldd text file to be used in place of either a pldd against the pid or the core.  This allows admins to use a previously generated pldd text file, instead of attaching to the live pid or a possibly bad core file.
  2. Changed the overall usage. Both can be used on Solaris but the -p <pid> is now required on Linux.  This is due to a limitation of the ldd command and lack of pldd on Redhat and SuSE linux.  ldd misses most libs that a real process runs with. -p is now used for the <pid> and -c is now sed for the <core>.
  3. Added an Epoch (date of the first run) check to alert users when their pkgapp version is 120 days old.   Just a simple reminder to check if a newer version exists.
  4. Uses the /proc file system to quickly gather library names instead of using pldd etc (when a pid is used).  Solaris - uses pldd on the core if available. Linux - requires a pid to use the /proc fs as ldd is not accurate.
  5. Created a validate-libs.sh file which can be used to validate all the libraries gathered (as seen in the manifest) in the capture paths.
  6. Now copies the library-list and a README to the /var/tmp/pkgapp-history/
  7. Updated the post-run NOTES: section to include new links and information.

Fixes

  1. Fixed bug 6881865: 6881865 Pkgapp 3.2 fails to exec gb when pkgapp is run from outside its install path
  2. Fixed bug 6882174: 6882174 Pkgapp 3.2 fails to gather jstack/jinfo

What is next?

Due to time constraints (my regular job, etc.) I had to cut development of DT7 off as DS7 was released.  I had a bigger list of the enhancements I wanted to add but ran out of time.  Here are a few of them I want to explore for DT7.1.

  1. Anonymizer. Data anonymizer for sensitive customer data.
  2. ReptopCk inclusion
  3. Search Debug Feature. Set debug: 4 on, play SRCH/MOD etc, set debug off. Great for debugging problem searches/mods.
  4. Add precise timing for all DT executions. Would get the date before each command ever done as well as adding it to a log...so you can see each command executed in a linear list.
  5. For DS6 and 7 do an ls -laR on the CARPETPATH and INSTANCEPATH
  6. Rewrite the config file processor
  7. Add a new command processor. This would allow for more flexibility to run a specific search or use a Dtrace script.

More on these later.

I hope the new versions help you out, and remember you can always send me feedback for new improvements or for issues encountered.

 Happy Holidays and Regards,

 Lee

Monday Sep 21, 2009

Server Impact on Sun Learning Exchange

Just a quick note:

I have released SLX 03 - Server Impact.  This is a work in progress and I will be adding a new PDF to it with full statistics showing real server impact from DirTracer.  I have the test servers and slamd setup and will get the tests run soon.

Cheers!

Lee

Thursday Jul 30, 2009

DirTracer, Pkgapp and SLX

Alo all,

I received good news from the Big Admin admins today, the new DirTracer and Pkgapp versions are online!

The following versions are now available here on Big Admin. For information on the latest changes in this update, please see my previous blog - DT 6.0.7 Coming soon!

DirTracer 6.0.7 Photo © Lee Trujillo

Pkgapp 3.2

I have also released the SLX 02 DirTracer Quickstart on SLX (Sun Learning Exchange) - "This video is a Quickstart guide to getting DirTracer6 running in a short amount of time using Configurator 6. I also step through a real example of how DirTracer runs using the config file created by Configurator."

Finally, I am in production of the latest DirTracer SLX video "Server Impact".  Using Slamd, Cacti and DirTracer I plan on showing exactly what impact a Directory Server will see from DirTracer runs based on different run/problem types.  More on this later.

Cheers!

Lee

Thursday Jul 23, 2009

Multiple DirTracer iterations - What not to do.

Hey all,

Quick tech tip here. 

We recently came across a situation where a Customer ran DirTracer 3 separate times in the same second; one for each of the three instances on a single system.  Each config file had the correct configuration for each instance but all three were used at the same time.

Example: Using a wrapper script to launch all 3 in the same second.

#!/bin/sh

nohup ./dirtracer -f ./dirtracer.config.1 &

nohup ./dirtracer -f ./dirtracer.config.2 &

nohup ./dirtracer -f ./dirtracer.config.3 &

As a result, DirTracer is launched 3 times in the same second and each see's that there is no existing sub-databin inside /var and only 1 sub-databin is created; i.e. /var/072309-01. 

<snip>

\* Using config file                     [./dirtracer.config.1]
\* Using config file                     [./dirtracer.config.2]
\* Using config file                     [./dirtracer.config.3]
\* Dirtracer mainloop will run for     \* Dirtracer mainloop will run for      \* Dirtracer mainloop will run for   

[100 sec. Between 1 & 2 min.]
[100 sec. Between 1 & 2 min.]
[100 sec. Between 1 & 2 min.]
\* Databin parameter check 1             \* Databin parameter check 1             \* Databin parameter check 1         

[success]
[success]
[success]
\* Databin parameter check 2             \* Databin parameter check 2             \* Databin parameter check 2         

[success]
[success]
[success]
\* Databin Found                         [/var]
\* Databin Found                         [/var]
\* Databin Found                         [/var]
\* Databin used is                       \* Databin used is                       \* Databin used is                   

[/var/072309-01]
[/var/072309-01]

[/var/072309-01]

<snip>

To ensure DirTracer runs properly and uses a separate sub databin (as intended) the following workaround can be used.

#!/bin/sh

nohup ./dirtracer -f ./dirtracer.config &

sleep 1

nohup ./dirtracer -f ./dirtracer.config &

sleep 1

nohup ./dirtracer -f ./dirtracer.config &

By giving DirTracer a 1 second delay between executions gives it enough time to notice there is an exiting sub-databin in the intended path and increment the sub-databin number.

Example:

myzone root[/var/tmp/data]# ls -l
drwxr-xr-x  11 root     root          30 Jul 23 14:48 072309-01
drwxr-xr-x  11 root     root          30 Jul 23 14:48 072309-02
drwxr-xr-x  11 root     root          30 Jul 23 14:48 072309-03

Hope this helps...

Cheers,

Lee


Tuesday Jul 21, 2009

DT 6.0.7 and Pkgapp 3.2 (Linux) Available!

Hi all,

Good news is always welcome :D

I have finalized testing of not only DirTracer (DT) 6.0.7 but have also finished coding and testing of Pkgapp 3.2 for both Redhat and SuSE Linux. Photo © Lee Trujillo

See my previous blog DT 6.0.7 Coming soon! or the README.dirtracer for details on what has changed.

Available DirTracer and Pkgapp versions:

  • Solaris Sparc 8/9/10
  • Solaris x86 9/10
  • SuSE Linux 9 U4/10 U1
  • Redhat Linux 3/4/5

Remember: DirTracer comes with Pkgapp (see the dirtracertools path) but for non Sun Directory Server customers, Pkgapp 3.2 can be downloaded separately.

Currently these versions are only available internally until BigAdmin can update their copies.  Until that happens, contact your Sun Customer Service Representative and have them send you the latest versions.

Enjoy!

Lee

Thursday Jul 16, 2009

DT 6.0.7 Coming soon!

Hi all,

Well, I hopped on the fix for the last updates bug so fast I decided to just release a new version of DirTracer (DT 6.0.7).

 Here is a list of the current change's...

DirTracer:
  1. Fixed bug 6861010 DirTracer 6.0.6 incorrectly identifies the DS as 64bit if running in as 32bit on AMD64 (Solaris Only).
  2. Removed post loop (non needed) pflag, pmap, pfiles gathering (Solaris Only).
  3. Replaced the older Pkg_app 2.7 with the new Pkgapp 3.2 (Solaris Sparc, Solaris x86 and Linux only at this time).
  4. Pkgapp lib tar file now included with the final DirTracer tar.gz
  5. Also added a check to skip grabbing /etc/system on a Solaris 10 Zone when not available.
  6. Added links to my Sun Blog and Sun Learning Exchange Video's for DirTracer to the displayed links when DT completes.

Configurator
  1. Fixed an issue when long path names are used for the server root.  Configuator would fail to derive the correct server root to locate pwdhash and would fail.  See bug 6511933 Dirtracer (GDD): greps on dse.ldif parameters fail when ldif wrapping is used.

As the changes only affected a couple of bugs I will not be reworking the pdf doc giude to reflect the new version information.  This also allowes me to quickly generae and release new fixes for current issues.

I have made the above changes to the Solaris Sparc, Solaris x86, Linux and HP-UX versions.  I have not coded a new version of Pkgapp 3.2 yet for HP-UX so I will release this version a bit later.  I will update soon when I have released DirTracer internally for download.  Once I do and you wish to get a copy, please contact your Sun Frontline Engineer.

Regards,

Lee

Wednesday Jul 15, 2009

Alert: DirTracer 6.0.6 incorrectly identifies the DS as 64bit if running in as 32bit on AMD64

Hi all,

Quick update on DirTracer.  I knew about this issue for a couple of months but haven't had time to deal with it.  I have created a bug (see below) and a fix.  I will try to get this out soon as DirTracer 6.0.7.  The fix is already in DT7 (DirTracer 7)

CR 6861010 DirTracer 6.0.6 incorrectly identifies the DS as 64bit if running in as 32bit on AMD64

Cheers all

Lee

Friday Jun 26, 2009

Running DirTracer in the Background

Hello all,

I was approached today by a Partner about running DirTracer in the background.  I believe I have been asked about this once before.

Currently I have not tested nor can DirTracer be run in the background.  I just tried this in fact and saw nothing in the databin path, nor in the /tmp area for the runtime log.  I have found a way however.

If I background or no hup it I get no output -> (nohup ./dirtracer -f ./dirtracer.config &).

But if I create a stub script that launches DirTracer with nohup and & and do the same with the stub...it works.

myzone root[/opt/dirtracer]#cat backgrounder
#!/bin/sh

nohup ./dirtracer -f ./dirtracer.config &
myzone root[/opt/dirtracer]#nohup ./backgrounder &
[1] 9159
myzone root[/opt/dirtracer]#Sending output to nohup.out

[1]    Done                          ./backgrounder

In most circumstances, (99% of the time) you should be running DirTracer in the foreground because the runtime is generally short.  I can only see backgrounding the process when you use the MEM_LEAK_TRACKING option or if you have such a long DirTracerrun that you will have to close down your terminals and leave.

Thanks to Brian for this great question.


Lee

Tuesday May 26, 2009

DirTracer now DS7 compatible!

 Yes it's true!  DirTracer now works with DS7!It's the Tubes

I was fortunate to get access to DS7 during the Directory Masters event in New Jersey.  While there I was able to quickly install DS7 and migrate an old dead DS6.2 instance.  I ran DT 6.0.6 against this new instance and was ecstatic to see it ran with 0 errors.  In reality DS Development had moved two commands but DT had used its defaults to guess things like install type (zip/native) and where the dsrk's ldapsearch and ldapmodify were located.

Today I decided to revisit these two items and found it was easy as pie to change; only took a few minutes.  I had to copy/paste and change approximately seven lines of code (mostly path based) and that was it. 

To mirror the DT version I used when DS6 was released, I will now re-version this as DirTracer 7.0.0 or DT7.  While DT7 works perfectly with DS7,  I'm not saying DT7 won't change.  In fact I'm planning many new features for DT7 for its first

--------------------------------------------------------------------------------
Sun Microsystems dirtracer 7.0.0 Solaris Sparc                        05/26/2009
--------------------------------------------------------------------------------
\* Checking Config File                  [success 1211182548 != 3006111464]
\* Using config file                     [./dirtracer.config]
\* OS Type and Version                   [SunOS 10 sparc]
\* Dirtracer mainloop will run for       [1 sec.]
\* Databin parameter check 1             [success]
\* Databin parameter check 2             [success]
\* Databin Found                         [/var/tmp/dtcapture]
\* Databin used is                       [/var/tmp/dtcapture/052609-02]
\* Relocating runtime log                [success]
\* Case Number                           [NOT AVAILABLE]
\* User is root. Continuing...          
\* PID Check     (18925)                 [success]
\* Process State                         [not hung]
\* High CPU tracking                     [off]
\* Mem Leak tracking                     [off]
\* Config Only Capture                   [on]
\* Pstack Force (-F)                     [off]
\* Instance Name is                      [/opt/dsee62/ldapinst3]
\* Using host name                       [s-6000a-t6300i-zone8-sca11]
\* Using host id                         [84aaaac8]
\* Ldap Tools Path                       [/opt/dsee7/dsrk/bin]
\* DS Version                            [7.0 - 64 bit - zip install]
\* Backends Found                        [sun]
\* Transaction log location              [/opt/dsee62/ldapinst3/db]
\*   Solaris tar found                   [/usr/sbin/tar]
\*   gzip found                          [/bin/gzip]
\*   elfdump found                       [/usr/ccs/bin/elfdump]
\*   pstack found                        [/bin/pstack]
\*   prstat found                        [/bin/prstat]
\*   netstat found                       [/bin/netstat]
\*   iostat found                        [/bin/iostat]
\*   vmstat found                        [/bin/vmstat]
\*   mpstat found                        [/bin/mpstat]
\*   ldapmodify found                    [/opt/dsee7/dsrk/bin/ldapmodify]
\*   ldapsearch found                    [/opt/dsee7/dsrk/bin/ldapsearch]
\*   pms.sh found                        [/opt/dirtracer/dirtracertools/pms.sh]
\* Access/Error Logging                  [left as is]
\* Audit Logging                         [left as is]
\* Iostat available.     Executing...    [success]
\* Vmstat available.     Executing...    [success]
\* Mpstat available.     Executing...    [success]
\* pms.sh (pmonitor) executed            [success]
\*   pms.sh interval(1) x checks(1)      [pms.sh run time (1 sec.)]
\*                                      
\* Entering Main Performance Gathering Loop
\*                                      
\* Loop 0 - 090526-132724                [pstack/prstat override on]
  togo[1sec]-timer[1]
\*
\*                                      
\* Exiting Main Performance Gathering Loop
\*                                      
\* Done gathering stats                  [pms.sh iostats vmstats etc]
\* Access/Error Logging                  [left as is]
\* Audit Logging                         [left as is]
\* Logging level                         [left as is]
\*                                      
\* Gathering needed customer defined configuration
\*   ls -laR of slapd Instance           [success]
\*   nsslapd-db-home-directory           [success]
\*   nsslapd-db-logdirectory             [success]
\*   nsslapd-directory db                [success]
\*   nsslapd-directory sun               [success]
\*                                      
\* Gathering Other Environment Information
\*   ns-slapd version                    [success]
\*   nsds50ruv dc=sun,dc=com             [success]
\*   cn=config search                    [success]
\*   rootdse search                      [success]
\*   idsktune                            [not found]
\*   dirtracer7.0.0                      [success]
\*   /etc/release                        [success]
\*   /etc/system                         cp: cannot access /etc/system
[failed]
\*   /var/adm/messages                   [success]
\*   uname -a                            [success]
\*   uptime -a                           [success]
\*   date/time                           [success]
\*   ifconfig -a                         [success]
\*   mount -p                            [success]
\*   ps -aef                             [success]
\*   pflags                              [failed]
\*   pldd                                [failed]
\*   pmap                                [failed]
\*   pfiles                              [failed]
\*   prtconf -v                          prtconf: devinfo facility not available
[success]
\*   showrev -p                          [success]
\*   swap -s                             [success]
\*   pkginfo -l                          [success]
\*   df -k                               [success]
\*   logs - access: gathering 1 files   
\*       access                          [success]
\*       access rotation                 [success]
\*   logs - error: gathering 1 files    
\*       errors                          [success]
\*       error rotation                  [success]
\*   logs - audit: gathering 1 files    
\*       audit                           [success]
\*       audit rotation                  [success]
\*   config dse.ldif's                   [success]
\*   schema                              [success]
\*   shared mem files                    [skipped]
\*                                      
\* [ALERT 1] Shared Mem files were not included in the dirtracer tar file and were NOT copied/saved to the databin.
\*                                      
\*   ds lib files cksum                  [skipped]
\*   ds lib files elfdump                [skipped]
\* Writing used dirtracer.config.last    [success]
\* Copying dirtracer.config.last         [success]
\* Writing manifest-090526-132724.log    [success]
\*                                      
\* Packaging files                      
\*   Preparing files - pstack            [0 files]
\*   Preparing files - prstat            [0 files]
\*   Preparing files - netstat           [success]
\*   Preparing files - ds-config-data    [success]
\*   Preparing files - ds-logs           [success]
\*   Preparing files - ds-mmap-files     [success]
\*   Preparing files - ds-srch-monitor   [success]
\*   Preparing files - ds-srch-cache     [success]
\*   Preparing files - ds-txnlogs-data   [success]
\*   Preparing files - os-info           [success]
\*   Preparing files - gcore             [0 files]
\*   Preparing files - crash             [0 files]
\*   Preparing Main file                 [success]
\* Leaving my temp files intact         

Operations Complete
--------------------------------------------------------------------------------
1) Dirtracer capture data located in directory [ /var/tmp/dtcapture/052609-02 ]

Upload "only" this file to your supportfiles.sun.com cores directory at Sun

        [ dirtracer-84aaaac8-s-6000a-t6300i-zone8-sca11-090526-132724.tar.gz ]


                                Thank you.
                                Sun Software Technology Service Center (STSC)


NOTES:
1) You can check for updates to this script here:
        BigAdmin - http://www.sun.com/bigadmin/scripts/indexSjs.html

2) Release Notes and Guides located here:
        Docs - http://docs.sun.com/app/docs/doc/820-0437

3) GDD information located here:
        Docs - http://www.sun.com/service/gdd/index.xml

4) Please send all Bugs and RFE's to the following address:
        Subject "dirtracer bug/rfe" - gdd-issue-tracker@sun.com

5) Please send all other questions etc to:
        Subject "dirtracer feedback" - gdd-feedback@sun.com

--------------------------------------------------------------------------------

ALERTS ENCOUNTERED: See /var/tmp/dtcapture/052609-02/ALERTS-090526-132718.log


1) Shared Mem files were not included in the dirtracer tar file and were NOT copied/saved to the databin.

As I mentioned two blogs ago I was preparing to start releasing DT Video (pocasts?) on the Sun Learning Exchange (SLX).  The first draft is complete for my first video titled "DirTracer Overview".  I will start production of the video later today or tomorrow and hope to release it to SLX quite soon.

Cheers!

Lee

Tuesday Jun 10, 2008

Dirtracer + Core/Gcore = Separate upload for Pkg_app data.

Hello all,

Every now and then I run into an issue where the customer used Dirtracer to grab a Gcore (or two) or a Core file from a crash.  Dirtracer users are forgetting to upload the separate pkg_app tar.gz file when one is generated.

It is not a problem per-say as a it is users missing a critical message that is displayed at the end of a Dirtracer run when a successful Pkg_app file has been created.

Lets take a look at the messages I am talking about.

Suppose you define one of the following parameters in the dirtracer.config file:

CORE_LOCATION
GRAB_GCORE


Lets use GRAB_GCORE for a quick test; lets grab one Gcore.

GRAB_GCORE="1"


We then run Dirtracer as normal.

#./dirtracer -f ./dirtracer.config

--------------------------------------------------------------------------------
Sun Microsystems dirtracer 6.0.6 Solaris Sparc                        06/09/2008
--------------------------------------------------------------------------------


Dirtracer run through it's Main Loop and grabs one Gcore.

\* Entering Main Performance Gathering Loop
\*                                      
\* Loop 0 - 080609-131950               
\* Loop 1 - 080609-131956               
\* Loop 2 - 080609-132001               
\* Loop 3 - 080609-132007               
\* Loop 4 - 080609-132012               
\* togo[5sec]-timer[5]
\*                                      
\* Grabbing gcore 080609-132018          [success]


Because Dirtracer has noted the presence of the grab gcore parameter it will automatically run Pkg_app to gather the needed OS and DS libraries needed for debugging. 

By default Dirtracer waits 120 seconds for Pkg_app to complete.  If for some reason the Core/Gcore's header has an overrun (can't determine contents normally) or has other errors like it is truncated, Dirtracer will kill the Pkg_app gathering phase.  Users should check the Pkg_app tar contents to see if all libs were gathered and take appropriate actions.

\* Packaging files                      
\*   Preparing files - pkg_app           [waiting 120 sec.]      [success]


After the main Dirtracer tar file is created it will display the Operations Complete message and inform the user of which files he needs to upload to Sun.

As you can see below Dirtracer informs you there is both a Dirtracer and "separate" Pkg_app file which needs to be uploaded.  Users should pay special attention to the End Messages displayed when Dirtracer finishes.  Most users miss the second file (Pkg_app) requiring separate upload.


Operations Complete
--------------------------------------------------------------------------------
1) Dirtracer capture data located in directory [ /var/tmp/data/060908-04 ]

Upload "only" this file to your supportfiles.sun.com cores directory at Sun

        [ dirtracer-834d2699-mysystem-080609-132018.tar.gz ]

2) pkg_app has captured system libs as the result of a gcore or core being found
                As this file may be large, please ftp this separately to Sun
                pkg_app file located in /var/tmp/data/060908-04/pkg_app_data

                [pkg_app834d2699-mysystem-080609-132051.tar.Z ]


You might ask why is Pkg_app's tar simply not included with the main Dirtracer tar file.  The answer is simple...size.  Dirtracer captures by itself can be big if a customer does not archive their log files on a heavily used system and the final size can be massive if a Core/Gcore is captured.  It simply wasn't an option to include a big Pkg_app file with an already bigger Dirtracer file.

[LT]

Wednesday Jun 04, 2008

Using Dirtracer for Proactive Analysis

BACKGROUND:

One of our Customers was seeing a performance issue with their Siteminder enabled applications each day around 4:30pm.  While the Customer was using Dirtracer to capture event data, the time it took for Administration staff to see the alerts made capturing the event difficult.  The event, when it occurred took approximately 60 seconds from start to finish.  By the time the Dirtracers were started, the event had already passed 9-10 minutes before.

The Customer setup their BMC Control M monitoring facility so they could automate a 4 way Dirtracer capture at the same time when the event was noticed.

NOTE: they did not see any alerts/issues with the Directory or Siteminder Servers at this time but they launched Dirtracer via BMC Control M as a precaution only.

With this said, I analyzed the Dirtracers thinking there was a problem.

SYSTEM INFO:

1) The Customer captured data from 4 systems at the same time; 01, 02, 03 and system 04.

01    #dirtracer Started:     080422-174012
02    #dirtracer Started:     080422-174013
03    #dirtracer Started:     080422-174127
04    #dirtracer Started:     080422-174126


2) Connection Load.  I see not all servers are equal here; 01 and 04 are taking a lot less of a load than 02 and 03.

01    91 connections average
02    185 connections constant
03    171 connections constant
04    124 connections average


Summary Connection Snippets: taken from cn=monitor searches.

               Conn      Ttl           Curr      Read    ReqQue     OpsInitiated vs.
File            Peak  Th  Cons          Cons      Waitr   BakLog     OpsCompleted
====================================================================================
01:
[080422-175815] 1984   24  525136       101       0       0          332685-332674=11
[080422-175852] 1984   24  525143       102       0       0          332906-332899=7
[080422-175928] 1984   24  525150       103       0       0          333155-333140=15
[080422-180005] 1984   24  525159       87        0       0          331583-331576=7
[080422-180043] 1984   24  525167       82        0       0          331188-331186=2

02: Summary Snippet
[080422-174222] 1984   24  502241       185       0       0          2155474-2155468=6
[080422-174227] 1984   24  502245       185       0       0          2156076-2156073=3
[080422-174232] 1984   24  502249       185       0       0          2156687-2156680=7
[080422-174238] 1984   24  502253       185       2       2          2157573-2157565=8
[080422-174243] 1984   24  502259       185       1       0          2158588-2158583=5

03:
[080422-174513] 2256   24  503749       171       0       0          722384-722378=6
[080422-174518] 2256   24  503755       171       0       0          722607-722603=4
[080422-174522] 2256   24  503759       171       1       0          722890-722886=4
[080422-174527] 2256   24  503762       171       0       0          723230-723226=4
[080422-174531] 2256   24  503767       173       0       1          723506-723498=8

04:
[080422-174633] 8111   24  497842       123       1       0          383898-383884=14
[080422-174644] 8111   24  497848       123       0       2          384516-384511=5
[080422-174655] 8111   24  497855       124       0       1          384996-384993=3
[080422-174706] 8111   24  497860       124       0       1          385318-385313=5
[080422-174717] 8111   24  497864       124       0       0          385745-385739=6


3) The CPU % of each server mostly mirrors the load differences seen from above;  01 and 04 are taking a lot less of a load than 02 and 03.

01    1% average busy
02    16% average busy
03    20% average busy
04    8% average busy

CPU (pcpu) taken from the pms.sh (ps) file.

01:
04/22-[17:41:49] ------- 12489 5370536 5331048  1.1    growth:   0 k

02:
04/22-[17:41:39] ------- 6236 2216344 2193576 15.3    growth:   8 k

03:
04/22-[17:42:41] ------- 7930 2214432 2192016 21.3    growth:   -40 k

04:
04/22-[17:42:48] ------- 10554 3717880 3695216  9.1    growth:   -360 k


4) Iostats (disk performance).  Since the servers are using Veritas File System I cannot tell which disk slices the DS runs on versus which disks show higher than normal iostat's.

The DS is installed in the following location for all servers:

- /dev/vx/dsk/app/dsserver 1048576   18603  965640     2%    /opt/servers/slapd-master

01    Disks peak to 61% busy    (1 time)  GOOD
02    Disks peak to 58%    busy    (1 time)  GOOD
03    Disks peak to 100% busy    (2 times)  FAIR
04    Disks peak to 100% busy    (8 times)  BAD

As a result of the 100% disk busy times the average disk io service time and service wait times increase.

01    avg service time 8.8/ms - wait service 0/ms
02    avg service time 35/ms - wait service 0/ms
03    avg service time 199.5/ms - wait service 0/ms
04    avg service time 105.6/ms - wait service 373.5/ms
01
Tue Apr 22 17:41:07 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0    2.0    0.0    0.0  0.0  0.0    0.0    3.6   0   0 c1
   1.0    4.0    0.0    0.1  0.0  0.0    0.0    8.7   0   0 c2
  61.0   34.0    0.6    1.4  0.0  0.9    0.0    9.9   0  10 c4
 102.0   47.0    1.1    2.0  0.0  1.3    0.0    8.7   0  10 c5
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.5   0   0 c4t0d1
  61.0   32.0    0.6    1.4  0.0  0.9    0.0   10.1   0  60 c4t0d2
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.4   0   0 c4t0d5
 102.0   44.0    1.1    1.9  0.0  1.3    0.0    8.8   0  61 c5t1d2
   0.0    3.0    0.0    0.1  0.0  0.0    0.0    0.7   0   0 c5t1d5
   0.0    2.0    0.0    0.0  0.0  0.0    0.0    3.6   0   1 c1t2d0
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    8.5   0   3 c2t2d0
   1.0    0.0    0.0    0.0  0.0  0.0    0.0    9.5   0   1 c2t0d0

02
Tue Apr 22 17:40:37 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   1.0   23.0    0.0    1.2  0.0  1.0    0.0   41.0   0   2 c1
   0.0   19.0    0.0    1.1  0.0  1.2    0.0   61.1   0   2 c2
   3.0  701.0    0.0    5.8  0.0 24.6    0.0   34.9   0  20 c4
   1.0  648.0    0.0    5.3  0.0 23.5    0.0   36.3   0  20 c5
   0.0    4.0    0.0    0.0  0.0  0.0    0.0    9.6   0   1 c4t0d0
   1.0  351.0    0.0    2.9  0.0 12.4    0.0   35.1   0  58 c4t0d2
   2.0  290.0    0.0    2.3  0.0  9.7    0.0   33.3   0  50 c4t0d3
   0.0   56.0    0.0    0.5  0.0  2.5    0.0   43.9   0  10 c4t0d5
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    7.7   0   1 c5t1d0
   0.0  323.0    0.0    2.6  0.0 11.7    0.0   36.4   0  57 c5t1d2
   1.0  269.0    0.0    2.2  0.0  9.1    0.0   33.8   0  54 c5t1d3
   0.0   52.0    0.0    0.4  0.0  2.6    0.0   50.2   0   9 c5t1d5
   1.0    0.0    0.0    0.0  0.0  0.0    0.0    5.4   0   1 c1t8d0
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    5.9   0   2 c1t2d0
   0.0    9.0    0.0    0.6  0.0  0.6    0.0   61.6   0  13 c1t12d0
   0.0   10.0    0.0    0.6  0.0  0.4    0.0   40.0   0   9 c1t9d0
   0.0    8.0    0.0    0.5  0.0  0.6    0.0   76.2   0  13 c2t12d0
   0.0   11.0    0.0    0.6  0.0  0.6    0.0   50.1   0  12 c2t9d0

03
Tue Apr 22 17:42:06 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0   34.0    0.0    1.5  0.0  1.7    0.0   49.7   0   3 c1
   0.0   33.0    0.0    1.7  0.0  2.1    0.0   63.3   0   3 c2
   0.0  556.0    0.0    4.5  0.0 89.8    0.0  161.5   0  23 c4
   2.0  279.0    0.0    3.2  0.0 12.2    0.0   43.3   0  15 c5
   0.0    9.0    0.0    0.1  0.0  0.0    0.0    5.3   0   3 c4t8d0
   0.0  103.0    0.0    0.9  0.0  1.1    0.0   11.1   0  38 c4t10d0
   0.0  444.0    0.0    3.5  0.0 88.6    0.0  199.5   0 100 c4t11d0
   2.0  249.0    0.0    2.0  0.0 11.0    0.0   43.8   0  76 c5t10d0
   0.0    9.0    0.0    0.1  0.0  0.0    0.0    4.1   0   3 c5t11d0
   0.0   21.0    0.0    1.1  0.0  1.1    0.0   53.6   0  13 c5t13d0
   0.0   10.0    0.0    0.6  0.0  0.7    0.0   73.7   0  13 c2t11d0
   0.0    8.0    0.0    0.4  0.0  0.4    0.0   52.3   0  11 c2t9d0
   0.0   15.0    0.0    0.7  0.0  0.9    0.0   62.3   0  12 c2t12d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    6.8   0   1 c1t2d0
   0.0   11.0    0.0    0.6  0.0  0.5    0.0   44.1   0  13 c1t11d0
   0.0   14.0    0.0    0.6  0.0  0.8    0.0   55.2   0  12 c1t9d0
   0.0    7.0    0.0    0.3  0.0  0.4    0.0   59.3   0  12 c1t12d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   12.0   0   1 c1t3d0

04
Tue Apr 22 17:42:23 2008
                   extended device statistics                r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0  299.1    0.0    2.3 100.8 29.8  337.1   99.6  16  17 c1
   0.0  349.1    0.0    2.7 107.5 29.6  308.0   84.9  16  17 c2
   0.0    5.0    0.0    0.1  0.0  0.0    0.0    4.5   0   0 c4
   0.0   48.0    0.0    2.6  0.0  0.6    0.0   12.5   0   2 c5
   0.0    5.0    0.0    0.1  0.0  0.0    0.0    4.5   0   2 c4t8d0
   0.0    2.0    0.0    0.0  0.0  0.0    0.0   12.4   0   2 c5t8d0
   0.0   22.0    0.0    1.2  0.0  0.3    0.0   12.2   0   4 c5t9d0
   0.0   22.0    0.0    1.2  0.0  0.3    0.0   13.7   0   4 c5t10d0
   0.0    1.0    0.0    0.1  0.0  0.0    0.0    4.7   0   0 c5t11d0
   0.0    1.0    0.0    0.1  0.0  0.0    0.0    1.8   0   0 c5t12d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   12.5   0   1 c2t11d0
   0.0  181.1    0.0    1.4 51.9 14.6  286.7   80.7  92 100 c2t0d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   11.5   0   1 c2t13d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    4.1   0   0 c2t4d0
   0.0  164.0    0.0    1.3 55.6 15.0  339.0   91.4 100 100 c2t10d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    7.4   0   1 c2t5d0
   0.0    4.0    0.0    0.1  0.0  0.0    0.0    6.5   0   3 c1t11d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    7.3   0   1 c1t13d0
   0.0  142.0    0.0    1.1 53.0 15.0  373.2  105.6 100 100 c1t0d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0   12.3   0   1 c1t4d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    6.4   0   1 c1t3d0
   0.0  148.0    0.0    1.1 47.8 14.7  322.9   99.5  92 100 c1t10d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    7.7   0   1 c1t2d0
   0.0    1.0    0.0    0.0  0.0  0.0    0.0    9.5   0   1 c1t5d0


5) Other ns-slapd instances running on these servers.  There are 5 total directory server instances on each of the 4 servers so I cannot see of the other ns-slapd instances are loading the disks to the levels seen above.  Significant load on the unknown instances could impact the disk used by the known instances and affect performance.

6) Dirtracer itself is not able to complete a normal MainLoop check in the normal time as configured; Dirtracer was confiured to loop trough each check in the MainLoop at 1 second INTERVALs. 

Each loop executes a pstack, prstat, netstat, cn=monitor search and 1 x backend cache info search (1 backend in this case).  Each of these data points are very lightweight and complete sub-second.

On Average each server completes a Loop in 5-41 seconds each.  It is unknown why these servers cannot complete these loops as fast as configured.

See the following examples: each Loop shows the date/time stamp when the loop starts...note how many seconds had passed between each loop (should have been 1 second between loops).

01
\* Loop 0 - 080422-174027             
\* Loop 1 - 080422-174106             
\* Loop 2 - 080422-174148             
\* Loop 3 - 080422-174228            
\* Loop 4 - 080422-174308           

02
\* Loop 0 - 080422-174030             
\* Loop 1 - 080422-174036             
\* Loop 2 - 080422-174041            
\* Loop 3 - 080422-174045             
\* Loop 4 - 080422-174050           

03
\* Loop 0 - 080422-174143             
\* Loop 1 - 080422-174148            
\* Loop 2 - 080422-174153            
\* Loop 3 - 080422-174158             
\* Loop 4 - 080422-174203

04
\* Loop 0 - 080422-174139             
\* Loop 1 - 080422-174150             
\* Loop 2 - 080422-174201             
\* Loop 3 - 080422-174213             
\* Loop 4 - 080422-174225 

LDAP INFO:

1) A quick check on the Dirtracer search etimes shows each are sub-second.

On 04 only a few were 0.1 to .2 seconds.  The following were taken from access log snippets captured.

root[/customerdata/6xxxxxxx/Apr22/04-dirtracernxcspld4-080422/ds-logs/working]#cat conn-49\* | awk '{print $11}' | grep etime | grep -v etime=0\\.0
etime=0.246540
etime=0.118270
etime=0.110770
etime=0.143630

2) Using the logs I found there was a Siteminder Policy Server that was configured to use an improper search base (dc=com).  For this data set only 01 doesn't have any Siteminder searching using an incorrect base; 02, 03 and 04 all to.  04 having the most.

01:    None
02:    4604            dc=com searches
03:    5457            dc=com searches
04:    13749          dc=com searches


I as able to track down the IP in only 1 server (04) as the connections in the others were already running.  I need to check old logs to see if there are other IP addresses these are coming from.

10.10.10.1

[22/Apr/2008:17:04:52 -0400] conn=496746 op=-1 msgId=-1 - fd=79 slot=79 LDAP connection from 10.10.10.1 to 10.10.10.181
[22/Apr/2008:17:04:52 -0400] conn=496746 op=0 msgId=1 - BIND dn="uid=siteminder,ou=service,dc=xxxxxx,dc=com" method=128 version=3
[22/Apr/2008:17:04:52 -0400] conn=496746 op=0 msgId=1 - RESULT err=0 tag=97 nentries=0 etime=0.000910 dn="uid=siteminder,ou=service,dc=xxxxxx,dc=com"
...
[22/Apr/2008:17:04:52 -0400] conn=496746 op=3 msgId=4 - SRCH base="dc=com" scope=2 filter="(&(uniqueMember=uid=userremoved,ou=People,dc=xxxxxx,dc=com)(objectClass=groupOfUniqueNames\*))" attrs="cn"
[22/Apr/2008:17:04:52 -0400] conn=496746 op=3 msgId=4 - RESULT err=32 tag=101 nentries=0 etime=0.000540
...
[22/Apr/2008:17:04:52 -0400] conn=496746 op=5 msgId=6 - SRCH base="dc=com" scope=2 filter="(&(uniqueMember=uid=userremoved,ou=People,dc=xxxxxx,dc=com)(objectClass=groupOfUniqueNames\*))" attrs="cn"
[22/Apr/2008:17:04:52 -0400] conn=496746 op=5 msgId=6 - RESULT err=32 tag=101 nentries=0 etime=0.000480
...
[22/Apr/2008:17:04:52 -0400] conn=496746 op=7 msgId=8 - SRCH base="dc=com" scope=2 filter="(&(uniqueMember=uid==userremoved,,ou=People,dc==xxxxxx,,dc=com)(objectClass=groupOfUniqueNames\*))" attrs="cn"
[22/Apr/2008:17:04:52 -0400] conn=496746 op=7 msgId=8 - RESULT err=32 tag=101 nentries=0 etime=0.000460
...


The above connection (conn=496746) contains searches using the incorrect base 1195 times alone.

root[/customerdata/6xxxxxx/Apr22/04-dirtracernxcspld4-080422/ds-logs]#gc err=32 conn-496746.log
1195


3) High etimes / high etime servers.   Overall server 04 is the biggest high etime server, followed by 01.

Top 5 High Etimes

01
etime=413.555070    1      
etime=300.814310 notes=F    1      
etime=115.912590    1      
etime=39.566400    1     
etime=39.553510 notes=F    1       

02
etime=8.719340    1      
etime=6.210240    1      
etime=6.101450    1      
etime=5.814920    1      
etime=5.802740    1    

03
etime=7.325220    1      
etime=5.766010    1      
etime=5.558690    1      
etime=5.186940    1      
etime=5.147010    1

04
etime=1876.043450 notes=U    1      
etime=1609.753640 notes=U    1      
etime=1275.956850 notes=U    1      
etime=1251.870940 notes=U    1 

We now know 04's highest offending etimes come from 4 cn=\* searches from a certain application.  IP 10.10.10.225

nentries=598115                 4
[22/Apr/2008:16:48:50 -0400] conn=496301 op=-1 msgId=-1 - fd=47 slot=47 LDAP connection from 10.10.10.225 to 10.10.10.181
[22/Apr/2008:16:48:50 -0400] conn=496301 op=0 msgId=1 - BIND dn="uid=appl100,ou=service,dc=xxxxxx,dc=com" method=128 version=3
[22/Apr/2008:16:48:50 -0400] conn=496301 op=0 msgId=1 - RESULT err=0 tag=97 nentries=0 etime=0.002810 dn="uid=appl100,ou=service,dc=xxxxxx,dc=com"
[22/Apr/2008:16:48:50 -0400] conn=496301 op=1 msgId=2 - SRCH base="ou=people,dc=xxxxxx
,dc=com" scope=1 filter="(cn=\*)" attrs="uid givenName sn mail region modifiers uid modifyTimestamp newOSUser OSmanaged OSstatus"
[22/Apr/2008:17:15:39 -0400] conn=496301 op=1 msgId=2 - RESULT err=0 tag=101 nentries=598115 etime=1609.753640 notes=U
[22/Apr/2008:17:15:40 -0400] conn=496301 op=2 msgId=3 - UNBIND
[22/Apr/2008:17:15:40 -0400] conn=496301 op=2 msgId=-1 - closing - U1
[22/Apr/2008:17:15:40 -0400] conn=496301 op=-1 msgId=-1 - closed.


CONFIGURATION INFO:

1) The Dirtracer captures highlighted differences in cache settings.  Both servers 02 and 03 have 1GB total whereas servers 01 has 3 and 04 has 2GB cache.

01
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  2,441,406 Kb  (2500000000 bytes)
       --------------------------------------------------
       Total Cache Size:   3 Gb,  3,408 Mb,  3,489,982 Kb,  (3573741824 bytes)

02
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  1,024,000 Kb  (1048576000 bytes)
       --------------------------------------------------
       Total Cache Size:   1 Gb,  2,024 Mb,  2,072,576 Kb,  (2122317824 bytes)

03
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  1,024,000 Kb  (1048576000 bytes)
       --------------------------------------------------
       Total Cache Size:   1 Gb,  2,024 Mb,  2,072,576 Kb,  (2122317824 bytes)

04
       Total DB Cache:     1,048,576 Kb  (1073741824 bytes)
       Total Entry Cache:  1,126,400 Kb  (1153433600 bytes)
       --------------------------------------------------
       Total Cache Size:   2 Gb,  2,124 Mb,  2,174,976 Kb,  (2227175424 bytes)



2) None of the four servers uses a tmpfs (tmp file system) for its db home; this affects performance on Unix systems.

"UNIX only. Used to fix a situation on UNIX platforms where the operating system endlessly flushes pages. This flushing can be so excessive that performance of the entire system is severely degraded."

   See: nsslapd-db-home-directory http://docs.sun.com/source/816-6699-10/plugattr.html

Each is using the main veritas partition for the db home.

   db-home-directory: /opt/servers/slapd-master/db

3) Each server used VXFS (Veritas File System) which has known fsync (file sync) problems when flushing Dirty Pages from Memory/Cache to the DB's.  This is done mostly during the transaction checkpoint txn_checkpoint.  The resulting fsync problem results in an overall slowdown of the DS process which will affect etimes on all other operations while the checkpoint occurs.  I have seen this on Veritas 4.4 and the customer is running an even older version.

   NAME:  VERITAS File System
   VENDOR:  VERITAS Software
   PSTAMP:  VERITAS-3.4FS-2003-01-29PID=110435-08

The problem can be seen using the cn=monitor search in the DS to show read waiters and the request queue backlog.  See the cn=monitor summary snippet at the top (above).  Anything more than 0 for either is not optimal.  Each of the Four servers has between 1 and 3 read waiters or request queue backlogs at any given time.  Generally each is running with 0 most of the time.

4) I asked a while ago if the customer did the following.

  • Upgraded from Directory Server Patch 4 - YES
  • Used the ActivateFix script in Patch 6 - NO


I saw these errors in the logs from servers 01, 03 and 04.

[22/Apr/2008:17:47:00 -0400] - WARNING<10307> - Replication  - conn=-1 op=-1 msgId=-1 - Configuration warning  Tombstone purging internal search is unindexed in replica dc=xxxxxx,dc=com


It is possible they have hit bug  6175472 and are seeing some result of the tombstone purging performance hit.  Allids threshold is only 4000 on these servers.  Tombstones are DELeted entries waiting to be purged.  If the number of deleted entries has gone above 4k the objectclass index becomes allids and the searches for the tombstone entries is unindexed and takes a long time.

6175472 MMR: High performance degradation when purging tombstones

I verified the dse.ldif's to see if the fix was enabled.  Part of the fix places a nsMatchingRule oid in the indexes entries as seen in the dse.ldif.  The Matching Rule is NOT in any indexes of any of the servers dse.ldif

nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150


01:
root[/customerdata/6xxxxxxx/Apr22/01-dirtracernxcspld1-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

02:
root[/customerdata/6xxxxxxx/Apr22/02-dirtracernxcspld2-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

03
root[/customerdata/6xxxxxxx/Apr22/03-dirtracernxcspld3-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

04
root[/customerdata/6xxxxxxx/Apr22/04-dirtracernxcspld4-080422/ds-config-data]#grep -ic 'nsMatchingRule: 1.3.6.1.4.1.42.2.27.9.4.150' dse.ldif
0

Even though the above data was capture when there was no Directory or  Siteminder server problems, the Dirtracer data captured helped to highlight the following

  1. problematic/incorrect directory server configuration/architecture
  2. possible load balancing problems; i.e. sending more connections two two server as opposed to completely load balanced.
  3. disk capacity/io peak issues possibly from other heavily loaded ns-slapd instance
  4. incorrect application configuration
  5. overall etimes of the servers

[LT]

Tuesday May 27, 2008

Pkg_app in a Clustered Environment

Hello all,

I was recently contacted by a Sun Customer Advocate regarding Pkg_app, Dirtracer and Clustered environments.

First some up front info on Pkg_app.  Pkg_app was originally created by Rick Zagorski and I took over its development in 2006.  I took what Rick had coded and added to it specifically (at the time) to help with Directory Server core/gcore files.  This was released as pkg_app 2.0 and was a mash of Ricks code with some additions of mine.  As you know I created Dirtracer (Stracer) in 2004 and continue its development to help customers gather data on their Directory Server problems. 

The Customer was having issues with Pkg_app and Dirtracer when they attempted to use it in a Clustered Environment.   Even though the customer was running an old version of Pkg_app, Pkg_app and Dirtracer are "not" tested nor certified for use in a Clustered Environment.  I've have only had one prior request to have pkg_app/dirtracer be Cluster aware so it has not been on the top of the revision list.

I will be revamping the Pkg_app code to take into account all libraries etc. that Pkgcore gets for use with Java Cores and will look into the Cluster issue.

More news on this later.

[LT]

Wednesday May 21, 2008

Checks, Intervals and Loops Oh My - Part 3

This is my final blog on the Number of Checks and Interval settings within Dirtracer.

Lets discuss one of the most basic elements of Dirtracer...the Main Performance Gathering Loop and  two if its governing parameters.

        NUMBEROFCHECKS and INTERVAL

Definitions:

    NUMBEROFCHECKS    # Number of checks: total number of loops
    INTERVAL            # Interval: seconds between loops

These two parameters tell Dirtracer how long to run and how many Main Loop data points to gather.  The settings for these two parameters are totally dependent on the problem type you are gathering for.

The following are the data points normally gathered in the main loop; these are all of course configurable whether all, some or none are gathered all the time.

  • netstat info
  • ls -la from the transaction logs
  • pstacks
  • prstats
  • cn=monitor searches
  • main cache searches; 1 search per loop.
  • db cache searches; 1 search per backend per loop
  • gcore(s)


As mentioned above an Admin can set the Number of Checks or "data points" to gather.  Lets look at some example problem types versus the settings you may want to capture them with.

The Number of Checks and Interval parameters govern how many loops and how long the Main Loop runs for however they are also used when executing the following commands.  These commands are launched before the Main Loop is engaged and will run in the background while the Main Loop is active.

  • Iostats
  • Vmstats
  • Mpstats
  • Pms.sh

See the man pages for iotstat, vmstat and mpstat for more information on these commands.

Please see my previous blog on pms.sh (Pmonitor) for more information regarding this script.


[LT]
About

A Tech Blog about the Sun Java Systems Dirtracer Toolkit. Dirtracer and this blog written and maintained by Lee Trujillo an Oracle Senior Principal Support Engineer.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today