Thursday Nov 14, 2013

Performance Analysis of Oracle Traffic Director or Web Server

Performance Analysis of Oracle Trafiic Director or Oracle iPlanet Web Server using Oracle Sun Solaris Studio Performance Analyzer and DTrace scripts.

In this blog I will show how to use the Sun Studio Collector and Performance Analyzer and DTrace script to measure the performance.

1. Using Oracle Solaris Studio 12.2 Performance Analyzer with Oracle Traffic Director or Oracle iPlanet Web Server

Thanx to Basant for teaching me collector and analyzer and thanx to Julien for his help on DTrace scripts.

Install Sun Studio 12. Lets say is installed in /opt/SUNWSpro.

1.1 setting up ~/.er.rc

cat ~/.er.rc
dmetrics e.user:i.user:e!wall:i!wall:e.total:i.total:e.system:i.system:e!wait: \
i!wait:e!lock:i!lock:e!text:i!text:e!data:i!data:e!owait:i!owait:!size:!address:name
dsort e.user
scc basic:version:warn:parallel:query:loop:pipe:inline:memops:fe:cg
dcc basic:version:warn:parallel:query:loop:pipe:inline:memops:fe:cg:src
sthresh 75
dthresh 75
name long
view user
tlmode thread:root:depth=10
tldata sample:clock:hwc:heaptrace:synctrace:mpitrace:msgtrace:datarace
setpath $expts:.
tabs functions:callers-callees:source:disasm:timeline:header:
en_desc on

1.2 Collecting Data Using the collect Command

1.2.1& Run the Collector  using the collect command

# collect collect-options program program-arguments

For OTD or Web Server we edit start script bin/startserv to have a new --collect option as shown below 

Red color lines are the new lines I have added.  Copied all the lines in start option, replaced ${SERVER_BIN} (which is trafficd-wdog) by "trafficd". And added "collect $COLLECT_OPTS" before it.

In the section below replace <profiler directory> by the directory where you want the profiler to collect data.

case $COMMAND in
    --start|-start) 
        ${SERVER_BIN} -d "${SERVER_CONFIG_DIR}" \ 
            -r "${OTD_PRODUCT_HOME}" -t "${SERVER_TEMP_DIR}" \
            -u "${SERVER_USER}" ${SVC_OPT} $@
        STATUS=$?
        if [ $STATUS -ne 0 ] ; then 
             exit $STATUS
        fi
        enable_failover
        ;;
    --collect|-collect)
        PATH=$PATH:/opt/SUNWspro/bin
        COLLECT_OPTS="-t 180-300 -F all -d <profiler directory>";
        collect $COLLECT_OPTS trafficd -d "${SERVER_CONFIG_DIR}" \
            -r "${OTD_PRODUCT_HOME}" -t "${SERVER_TEMP_DIR}" \ 
            -u "${SERVER_USER}" ${SVC_OPT} $@
        STATUS=$?
        if [ $STATUS -ne 0 ] ; then
            exit $STATUS
        fi
        ;;

for web server instead of trafficd it will be webservd. 

1.2.2 start the server using --collect option

Since we replaced trafficd-wdog by trafficd, the server will start up without any watchdog process and will run in console, not in background.  

bin/startserv --collect

1.2.3 run some stress tests

1.2.4 After 5-6 minutes stop the server

This will create a directory called (e.g.) 'test.1.er' which contains the experiment.

The default name for a new experiment is test.1.er.

The Collector automatically increments n by one in the names of subsequent experiments.

1.3 Open the profile and start the Oracle Sun Studio 12.2 analyzer

Set DISPLAY env.

$ cd <profiler directory>
$ export JAVA_PATH=/opt/SUNWwbsvr/jdk 
$ /opt/SUNWspro/bin/analyzer test.1.er/

1.4 The er_print utility prints an ASCII version of the various displays supported by the Performance Analyzer.

$ /opt/SUNWspro/bin/er_print -outfile er_print1.out -functions test.1.er 

In this, functions are sorted by "Exclusive User CPU Time"

or

$ /opt/SUNWspro/bin/er_print -outfile er_print2.out \
-metrics e.user:i.user:e.%user:i.%user:e.total:i.total:e.%total:i.%total:name \
-sort i.total -functions test.1.er 

In this, functions are sorted by "Inclusive Total LWP Time"

You can look at these files and figure out which function is taking how much time.

These files look like :

Functions sorted by metric: Exclusive User CPU Time 
Excl.     Incl.     Excl.       Incl.       Excl.     Incl.      Name  
User CPU  User CPU  Total LWP   Total LWP   Sys. CPU  Sys. CPU         
   sec.      sec.         sec.        sec.     sec.      sec.     
851.666   851.666   105477.573  105477.573  106.525   106.525    <Total>
493.435   493.435      558.110     558.110   26.368    26.368    fn1
 56.840   326.308       64.015     368.568    3.042    17.352    fn2
 28.280    28.280       34.574      34.574    1.701     1.701    fn3

...


2. Using DTrace script to see how much time is spent in which function

#!/usr/sbin/dtrace -s
#pragma D option bufsize=1g
#pragma D option specsize=1g
#pragma D option aggsize=1g
#pragma D option dynvarsize=1g

pid$target:::entry
{
 self->ts[probefunc] = timestamp;
}

pid$target:::return
/self->ts[probefunc]/
{
    @time[probefunc, probemod] = sum(timestamp - self->ts[probefunc]);
    self->ts[probefunc] = 0;
}

Note I have given 1g sizes you can tune it as per your machine.

Run this D script using :

#sudo dtrace -s functime.d -p 27910 -o dtrace.log 

where 27910 is the pid of the process you are examining (in this case webservd or trafficd).

This will generate the output as shown below 

  PListFindValue                       libns-httpd40.so                                        3871
  getbucketnum                         libc.so.1                                               4995
  R_SSL_version                        libnnz11.so                                             6106
  http_format_server                   libns-httpd40.so                                        6807
  void SimpleHashTemplateBase::_insert(unsigned long,hashEntry*)  libns-httpd40.so             7059
  long atoi64(const char*)             libns-httpd40.so                                        7288

...


I wrote this wrapper script to report percentages.

To get the last column I used logic of looking for spaces after the string "lib". For LM1`ld.so.1, I temporarily added a hack.

#!/usr/bin/perl
$logfile = "dtrace.log";
$tmpfile = "temp";
open(IN,"<",$logfile) || die "Can not open $logfile: $!";
$total = 0;
$total_rounded = 0;
while (<IN>) {
    chomp($_);
    s/(.*\s*)lib([^\s]*)\s*([0-9]*)/$1lib$2=$3/g;
    my ($a, $b) = split('=');
    $total += $b;
}
print "total = $total\n";
close(IN);
open(OUT,">", $tmpfile) || die "Can not open $tmpfile: $!";
open(IN,"<",$logfile) || die "Can not open $logfile: $!";
while (<IN>) {
    chomp($_);
    s/LM1`ld.so.1/libld.so.1/g;
    s/(.*\s*)lib([^\s]*)\s*([0-9]*)/$1lib$2=$3/g;
    my ($a, $b) = split('=');
    $rounded = sprintf("%.10f", ($b*100)/$total);
    $a =~ s/libld.so.1/LM1-ld.so.1/g;
    print OUT "$b   $rounded%   $a\n";
    $total_rounded += $rounded;
}
close(IN);
close(OUT);
print "total rounded = $total_rounded\n";
`sort -n -r $tmpfile | tee $logfile.sorted`;
`rm $tmpfile`


This produced output of the following format :

20318357230394   16.4117840797%     poll                                      libc.so.1
20317702791746   16.4112554688%     _pollsys                                  libc.so.1
20313615393944   16.4079539474%     __pollsys                                 libc.so.1
2684593654698   2.1684317735%     int DaemonSession::GetConnection()          libns-httpd40.so

...

3. Using DTrace Profile Probes

 #!/usr/sbin/dtrace -s
profile-1000
/pid == $1 && arg1 != NULL/
{
    @proc[umod(arg1), ufunc(arg1),ustack()] = count(); 
}
END
{
    printa(@proc); 
}


run it as :

dtrace -x ustackframes=20 -s profiler-probes.d <pid> -o dtrace.log

It creates output in the format library name, function name, user stack and the count number of times called, newline

  libc.so.1                                           libc.so.1`mutex_lock_impl        
              libc.so.1`mutex_lock_impl+0xaf
              libc.so.1`mutex_lock+0xb
              libumem.so.1`umem_cache_alloc+0x33
              libumem.so.1`umem_alloc+0xaf
              libumem.so.1`malloc+0x2e
              libnnz11.so`sys_malloc+0x23
             1068

....



4. References

Wednesday Oct 07, 2009

About LDAP connections in Sun Web Server 7.0

About LDAP connections in Sun Web Server 7.0 

When I have in my server.xml confiured an Sun LDAP server as given below  :

  <auth-db>
    <name>ldap</name>
    <url>ldap://localhost:1369/o%3DTestCentral</url>
    <property>
        <name>binddn</name>
        <value>cn=Directory Manager</value>
    </property>
    <property>
        <name>bindpw</name>
        <value>...</value><encoded/>
    </property>
  </auth-db>

When I have ACL set

acl "uri=/";
deny (all) user="anyone"; 
allow (all) user="all";  

When I send a request via browser as user "alpha" and its correct password.
First it binds as the user specified in server.xml "binddn" property ( in this case "cn=Directory Manager")
Then sends a LDAP search query with filter "uid=alpha".
Third step it does is it binds as that user "alpha" with the password user specified. If bind is successful, access is allowed.

Here are the entries I get in LDAP server access logs:

[07/Oct/2009:13:16:16 +051800] conn=2 op=-1 msgId=-1 - fd=34 slot=34 LDAP connection from 127.0.0.1 to 127.0.0.1
[07/Oct/2009:13:16:16 +051800] conn=2 op=0 msgId=1 - BIND dn="cn=Directory Manager" method=128 version=3
[07/Oct/2009:13:16:16 +051800] conn=2 op=0 msgId=1 - RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager"

[07/Oct/2009:13:16:16 +051800] conn=2 op=1 msgId=2 - SRCH base="o=testcentral" scope=2 filter="(uid=alpha)" attrs="c"
[07/Oct/2009:13:16:16 +051800] conn=2 op=1 msgId=2 - RESULT err=0 tag=101 nentries=1 etime=0

[07/Oct/2009:13:16:16 +051800] conn=2 op=2 msgId=3 - BIND dn="uid=alpha,o=TestCentral" method=128 version=3
[07/Oct/2009:13:16:16 +051800] conn=2 op=2 msgId=3 - RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=alpha,o=testcentral"

To understand why we see these three entries, I started debugging Web Server from LASUserEval function and came at get_is_valid_password_ldap function. Note that line numbers in Open Web Server may differ but the concepts are the same :

t@16 (l@16) stopped in get_is_valid_password_ldap at line 455 in file "ldapacl.cpp"
  467       if ((rv = ld->find_userdn(raw_user, basedn, &userdn)) == LDAPU_SUCCESS) {
(dbx) p filter
filter = "uid=alpha"
... 
t@16 (l@16) stopped in LdapSession::find_userdn at line 1253 in file "LdapOps.cpp"
 1253           retval = find(base, LDAP_SCOPE_SUBTREE, filter, attrs, 1 /\* no attrs \*/, res);
(dbx) s
...
t@16 (l@16) stopped in LdapSession::find at line 1174 in file "LdapOps.cpp"
 1174       res = search(base, scope, filter, (char \*\*)attrs, attrsonly, 0);
(dbx) p base
base = 0x6ed6048 "o=TestCentral"
(dbx) p filter
filter = 0xfbfbc98c "uid=alpha"
...
(dbx) n
t@16 (l@16) stopped in LdapSession::search at line 289 in file "LdapSession.cpp"  
289           rv = bindAsDefault();
(dbx) s
t@16 (l@16) stopped in LdapSession::bindAsDefault at line 235 in file "LdapSession.cpp"
  235       int msg_id = ldap_simple_bind(session, ldapRealm->getBindName(), ldapRealm->getBindPwd());
(dbx) p msg_id
msg_id = 1
...
(dbx) n
t@16 (l@16) stopped in LdapSession::bindAsDefault at line 240 in file "LdapSession.cpp"
  240       int ret = ldap_result( session, msg_id, 0, &time_out, &res);
(dbx) p ret
ret = 97
...

As we can see Web Server first tries to bindAsDefault ie. bind as cn=Directory Manager. by looking at LDAP Server logs, we confirm this.

[07/Oct/2009:14:52:03 +051800] conn=4 op=-1 msgId=-1 - fd=34 slot=34 LDAP connection from 127.0.0.1 to 127.0.0.1
[07/Oct/2009:14:52:03 +051800] conn=4 op=0 msgId=1 - BIND dn="cn=Directory Manager" method=128 version=3
[07/Oct/2009:14:52:03 +051800] conn=4 op=0 msgId=1 - RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager"

When bind is successful, now it tries LDAP search on basedn="o=TestCentra" with filter "uid=alpha"

...
(dbx) n
t@16 (l@16) stopped in LdapSession::search at line 296 in file "LdapSession.cpp"  
296           rv = lastoprv = ldap_search_ext_s(session, base, scope, filter,
297                                      attrs, attrsonly,
298                                      NULL, NULL, &time_out, LDAP_NO_LIMIT, &search_results);
(dbx) p base
base = 0x6ed6048 "o=TestCentral"
(dbx) p scope
scope = 2
(dbx) p filter
filter = 0xfbfbc98c "uid=alpha"
...
(dbx) n
t@16 (l@16) stopped in LdapSession::find at line 1178 in file "LdapOps.cpp"
 1178           break;
(dbx) n
t@16 (l@16) stopped in LdapSession::find at line 1187 in file "LdapOps.cpp"
 1187       numEntries = res->entries();
(dbx) n
t@16 (l@16) stopped in LdapSession::find at line 1189 in file "LdapOps.cpp"
 1189       if (numEntries == 1) {
(dbx) p numEntries
numEntries = 1
(dbx) n
...
(dbx) n
t@16 (l@16) stopped in LdapSession::find_userdn at line 1278 in file "LdapOps.cpp"
 1278       \*dn = entry->DN();
(dbx) p \*dn
\*dn = 0x2d87b0 "uid=alpha,o=TestCentral"
(dbx)
...

Looking at LDAP server access logs it has created this "SRCH" entry :

[07/Oct/2009:14:55:51 +051800] conn=4 op=1 msgId=2 - SRCH base="o=testcentral" scope=2 filter="(uid=alpha)" attrs="c"
[07/Oct/2009:14:55:51 +051800] conn=4 op=1 msgId=2 - RESULT err=0 tag=101 nentries=1 etime=0

So at the end of find_userdn function, we get our userdn "uid=alpha,o=TestCentral". Coming back to debugger

(dbx) p userdn
userdn = 0x2d87b0 "uid=alpha,o=TestCentral"
(dbx)
t@16 (l@16) stopped in get_is_valid_password_ldap at line 531 in file "ldapacl.cpp"
  531                   rv = ld->userdn_password(userdn, raw_pw);
(dbx) s
...
(dbx) n
t@16 (l@16) stopped in LdapSession::userdn_password at line 1052 in file "LdapOps.cpp"
 1052       int msgid = ldap_simple_bind(session, userdn, password);
(dbx) p userdn
userdn = 0x2d87b0 "uid=alpha,o=TestCentral"
...
(dbx) n
t@16 (l@16) stopped in LdapSession::userdn_password at line 1058 in file "LdapOps.cpp"
 1058           rc = ldap_result(session, msgid, 0, &zerotime, &res );
(dbx) n
(dbx) p rc
rc = 97
...

As we can see Web Server tries to bind to LDAP server with userdn "uid=alpha,o=TestCentral" and we can confirm that by looking at LDAP server logs :

[07/Oct/2009:15:00:37 +051800] conn=4 op=2 msgId=3 - BIND dn="uid=alpha,o=TestCentral" method=128 version=3
[07/Oct/2009:15:00:37 +051800] conn=4 op=2 msgId=3 - RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=alpha,o=testcentral"



Also refer Jyri's blog for troubleshooting

Friday Jul 10, 2009

Installing Sun Java System Web Server 7.0 on CentOS 5.3 or Fedora 10-11 or Ubuntu

Installing Sun Java System Web Server 7.0 on CentOS 5.3 or Fedora 10/11 or Ubuntu

In my free time, I just tried out Sun Java System Web Server 7.0 update 5 on CentOS 5.3 and Fedora10 and 11 and Ubuntu. Joe Mccabe has also written a similar blog on this

Download Sun Java System Web Server 7.0 for Linux and extract the contents of tar.gz file

$gunzip  sjsws-7_0u5-linux-i586.tar.gz; tar -xvf sjsws-7_0u5-linux-i586.tar

Run installer, it will fail

$./setup

error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory

CentOS

To fix this problem  install "compat-libstdc++-33" package as shown below
$sudo yum -y install compat-libstdc++-33

Now run setup, it will work fine.

Fedora 10/11

Install "compat-libstdc++-33" package as shown below
$sudo yum -y install compat-libstdc++-33

There is one more problem in Fedora 11 but is I heard fixed in 7.0 update 6 (onwards).

If you get error message like

lib/libfreebl3.so: version `NSSRAWHASH_3.12.3' not found (required by /lib64/libcrypt.so.1)

You need workaround as given in https://forums.oracle.com/forums/thread.jspa?threadID=2212337

Ubuntu

You need to install libstdc++5 as shown below

$sudo apt-get install libstdc++5


If you get an error which has something that looks like /bin/domainname not found, you need to install

$sudo apt-get install nis

I have tested on Ubuntu 9.04.

*Note that Sun Java System Web Server 7.0 update 5 or 6 is not officially certified or supported on CentOS or Fedora or Ubuntu, but here's how you can make it work.

References

Outdated links :


Thursday May 14, 2009

Installing Sun Java System Web Server 7.0 on Mandriva

Installing Sun Java System Web Server 7.0 on Mandriva

If you are getting an error "The Runtime User ID does not exist. Specify a valid UNIX user." while installing Sun Java System Web Server on Mandriva, all you have to do is create a softlink as shown below.  Its simple !

#ln -s /bin/id /usr/bin/id

You can also check if /usr/bin/cat exists or not. If it doesn't, then you have to create a softlink as given below

#ln -s /bin/cat /usr/bin/cat
  

\*Note that Sun Java System Web Server is not officially certified or supported on Mandriva

Thursday Mar 15, 2007

Troubleshooting memory leaks and memory corruptions in Sun Java System Web Server 7.0

Troubleshooting memory leaks and memory corruptions in Sun Java System Web Server 7.0

In this blog I am trying to show how to use libumem and watchmalloc to find more information about memory leaks and memory corruptions in Sun Java System Web Server 7.0.

Using libumem to find memory leaks and memory corruptions

1) Disable pools
Add this line in magnus.conf
Init fn="pool-init" disable="true"
Note this is not a supported public interface.
2) Set environment variable UMEM_DEBUG
$export UMEM_DEBUG="default"
Refer man pages for umem_debug (3MALLOC) for more details.
3)  Start the Web Server instance
$./bin/startserv
4) Note down the pid of the webservd process
$ps -eaf |grep webservd
You will see two webservd processes, note down the highest pid.
5) Dump the initial core
$gcore -o core.pid.start pid
6) Run tests , send some requests which cause memory corruption or leaks.
7) Dump core again
gcore -o core.pid.end pid
8) Compare memory allocated between these two should show the leaks.
$mdb core.pid.start  
>::findleaks -d ! cat > findleaks.start.log
( Or you can also run $echo ::findleaks -d |mdb core.pid.start > findleaks.start.log )

$mdb core.pid.end
>::findleaks -d ! cat > findleaks.end.log

compare these two logs manually to see if there are leaks.
If these logs are unreadable you can use Sun studio's c++filt or in mdb give this command as shown below
> $G
C++ symbol demangling enabled
>

9) For memory corruptions, try umem_status, bufctl_audir and umem_verify commands in the last core:
$mdb core.pid.end
>::umem_status
...
If it shows a corrupted buffer try looking at its contents
(For example >26a3640/40X)
or
>>bufferaddress::bufctl_audit
(For example >26a4618::bufctl_audit)

umem_status command, also shows if a buffer is accessed after its already freed. For example, once it showed me
> ::umem_status
Status: ready and active
Concurrency: 2
Logs: (inactive)
Message buffer:
umem allocator: buffer modified after being freed
modification occurred at offset 0x8 (0xdeadbeefdeadbeef replaced by 0xde20be20de20be20)
...

You can also use umem_verify command to see if one of the umem caches has a corrupted buffer.

Using watchmalloc for memory corruptions

1) Disable pools
Add the following line in magnus.conf
Init fn="pool-init" disable="true"
2) Add LD_PRELOAD watchmalloc with MALLOC_DEBUG=WATCH in start script , comment out libumem and libCld portions

$diff -u startserv startserv.watchmalloc
--- startserv     Wed Dec  7 14:03:40 2005
+++ startserv.watchmalloc     Thu Mar 15 13:19:06 2007
@@ -9,6 +9,9 @@
 SERVER_BIN_DIR="/space/wsDec7/iplanet/ias/server/work/B1/SunOS5.8_DBG.OBJ/bin"
 SERVER_LIB_DIR="/space/wsDec7/iplanet/ias/server/work/B1/SunOS5.8_DBG.OBJ/lib"
 SERVER_BIN=webservd-wdog
+LIBWMC=/usr/lib/watchmalloc.so.1
+LD_PRELOAD_32="${LIBWMC} ${LD_PRELOAD}"; export LD_PRELOAD_32
+MALLOC_DEBUG="WATCH"; export MALLOC_DEBUG

 # Add path to server binaries to PATH
 PATH="${SERVER_BIN_DIR}:${SERVER_LIB_DIR}:/bin:${PATH}"; export PATH
@@ -47,26 +50,6 @@
 # Add instance-specific information to SHLIB_PATH for HP-UX
 SHLIB_PATH="${SERVER_LIB_PATH}:${SERVER_JVM_LIBPATH}:${SHLIB_PATH}"; export SHLIB_PATH

-# Preload libumem to improve performance on Solaris 10
-LIBUMEM_32=/usr/lib/libumem.so
-if [ -f "${LIBUMEM_32}" ] ; then
-    if [ `uname -r | sed s/\\\\\\.//` -ge 510 ] ; then
-        LD_PRELOAD_32="${LIBUMEM_32} ${LD_PRELOAD_32}"; export LD_PRELOAD_32
-    fi
-fi
-LIBUMEM_64=/usr/lib/64/libumem.so
-if [ -f "${LIBUMEM_64}" ] ; then
-    if [ `uname -r | sed s/\\\\\\.//` -ge 510 ] ; then
-        LD_PRELOAD_64="${LIBUMEM_64} ${LD_PRELOAD_64}"; export LD_PRELOAD_64
-    fi
-fi
-
-# Preload libCld to resolve -compat=4/-compat=5 C++ ABI issues on Solaris
-LIBCLD="${SERVER_LIB_DIR}/libCld.so"
-if [ -f "${LIBCLD}" ] ; then
-    LD_PRELOAD_32="${LIBCLD} ${LD_PRELOAD_32}"; export LD_PRELOAD_32
-fi
-
 if [ $# -eq 0 ] ; then
     COMMAND=--start;
 elif [ "$1" = "-i" ] ; then
3) Comment out compat4/compat5 NSAPI Init plugins in magnus.conf if they exist (most probably not)
4) Start the Web Server instance
$./bin/startserv
This makes the server extremely slow.
5) Run tests, send requests to the server,
The server may crash at proper places rather than random ones.


In general if you want to see what's happening when you send a request you can try using truss : truss -o truss.log -u "\*" -d -fael -rall -vall -wall -p <pid>

More information about this is in http://docs.sun.com/app/docs/doc/816-5165/truss-1?l=ja&a=view

Note that -u option will show the user-level function call tracing.

Links:
http://sunsolve.sun.com/search/document.do?assetkey=1-9-70641
http://developers.sun.com/solaris/articles/libumem_library.html
About

Meena Vyas

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today