Wednesday Jul 30, 2008

vi style key binding for bash.

vi style key binding for bash. Bash supports both emap and vi style key binding. emacs is a default keybinding. Users need to use "set -o vi" to switch to vi style binding. I have the carpel tunnel type of problem so I always use vi style binding. In bash, I want to have multi-key binding e.g "th" bind to history command, t1 to last command's 1st argument etc. Frankly speaking, I didn't like the bash vi binding syntax. It is very cryptic and possibly very error prone. Here are my bindings :

te : Print this help
tl : listing files in current directory(run ls command)
th : show history of last 8 commands
t0-t9 : last command arguments 0-9
tt : Last command's all argument
tk : Don't run(kill) the current command but save it in history.
tr : reverse-search-history but conituation must be done by C-r
tc : Equivalent to ctrl-c. kill line and go to input mode
tw : next word
tq : history with shortcut macro
ts : print history to execute command
ty : Change editor : For internal use only
tx : Execute perl expression on current statement
z  : vi-search-again
g : goto mark

How it works :
$ <Esc>te
The above will print the help.
$ ls .bash_keybinding.sh
$ vi <Esc>t1
will expand to
$ vi .bash_keybinding.sh

$ vi <Esc>tq
will show last 25 commands
25 ----> gvim(qp0) dtraceno-op.txt(qp1)
24 ----> cd(qo0) ../books/(qo1)
23 ----> ls(qn0)
22 ----> cd(qm0) solaris(qm1)
20 ----> ls(qk0) -l(qk1)
19 ----> kpdf(qj0) solarispkg.pdf(qj1)
18 ----> kpdf(qi0) solarispkg.pdf(qi1)
16 ----> cd(qg0) ..(qg1)
14 ----> kpdf(qe0) dtrace.pdf(qe1)
12 ----> cd(qc0) ../blogs/(qc1)
9 ----> ls(q90) \*.html(q91)
7 ----> cvs(q70) add(q71) dtraceno-op.html(q72)
6 ----> cvs(q60) commit(q61)
5 ----> cd(q50) ~/bin/(q51)
2 ----> gvim(q20) .bash_keybinding.sh(q21)
1 ----> ls(q10) .bash_keybinding.sh(q11)
$ vi <Esc>qi1
will expand to
$ vi solarispkg.pdf

Basically "tq" command generates 25 line history and bound their arguments with "qxx" keys.

Syntax of bash key binding command is horrible. Suppose I want to bind "tc" to Ctrl-C, here is the bind command, I need to use.
bind -m vi-command '"\\x74\\x63"':"\\"\\C-a\\C-K\\""

For a complicated set of operations, I wrote a perl script bashhelper.pl. Here
is how "tq" is bound:
bind -m vi-command -x '"tq":"history 25 | perl $bashhelper --leader q --format-history"'

Here are files for complete reference :
.bash_keybinding.sh
bashhelper.pl

How dtrace have no impact on performance when probes are not in use.

How dtrace have no impact on performance when probes are not in use. dtrace probes can be used in any "C" applications. Slowly lots of server softwares on solaris are integrating dtrace probes in to their core functionality. Initial fear to a new user is that it may have a impact on performance. Dtrace replaces probes with "nop" instructions when not in use. When users typically uses these probes then it have some impact on performance. I wrote a simple test application to see what happens during runtime of a process. (testapp.c)
#include <stdio.h>
#include <unistd.h>
#include <sys/sdt.h>

void func()
{
    DTRACE_PROBE(firsttestapp, query__func);
}
int main()
{
    int i = 0;
    for(i = 0; i < 100; ++i)
    {
        func();
        sleep(2);
        printf("returned from func call\\n");
        fflush(stdout);
    }
}
And a small provider (mytestapp.d):
provider firsttestapp {
probe query__func();
};
In this test application func will be called every 2 seconds. func has a dtrace probe which doesn't do anything. I compiled my dtrace program using following steps :
$ cc -c -xarch=generic -g testapp.c
$ dtrace -G -32 -s mytestapp.d testapp.o
$ cc -xarch=generic -g mytestapp.o testapp.o -o testapp

$ dbx ./testapp
(dbx) dis func
0x00011410: func       :        save     %sp, -96, %sp
0x00011414: func+0x0004:        nop
0x00011418: func+0x0008:        nop
0x0001141c: func+0x000c:        ret
Notice that function "func" doesn't have any instruction. It has two nop instructions. Let us run the application and run a dtrace program from other terminal which uses this probe and see what happens to the application during runtime.
bash-3.00# dtrace -l | grep testapp
61026 firsttestapp26498           testapp                              func query-func
# dtrace -n 'firsttestapp\*::func:query-func { @[probefunc] = count(); }'
dtrace: description 'firsttestapp\*::func:query-func ' matched 1 probe

(dbx) dis func
0x00011410: func       :        save     %sp, -96, %sp
0x00011414: func+0x0004:        nop
0x00011418: func+0x0008:        ta       %icc,0x00000038
0x0001141c: func+0x000c:        ret
0x00011420: func+0x0010:        restore
Notice that 1 of the nop is now replaced with trap instruction "ta %icc,". So when probes are used "nop" instruction is replaced with a trap instruction. When probes are not in use, it is simply a nop instruction. This is the reason that probes have no runtime impact on performance of applications when they are not in use.

Tuesday Jul 22, 2008

Compiling Subversion in coolstack 1.3

CompilingSubversion
First set the PATH. Add the Studio 12 compiler into the PATH.
# PATH=/usr/sbin:/usr/bin/:/usr/ccs/bin:/opt/SUNWspro/bin/
In my test machine Studio 12 was installed in /opt/SUNWspro/bin. Now configure and compile neon and subversion.

Compile neon 0.28.2
# CFLAGS="-I/opt/coolstack/include" LDFLAGS="-L/opt/coolstack/lib" \\
        PATH=/opt/coolstack/bin:$PATH ./configure --prefix=/opt/coolstack \\
        --with-libxml2=/opt/coolstack --with-ssl=openssl
# /usr/sfw/bin/gmake
# /usr/sfw/bin/gmake install
Compile subversion 1.5.0
# CFLAGS="-I/opt/coolstack/include" LIBS="-lintl -lgettextlib" \\
          LD_LIBRARY_PATH=/opt/coolstack/lib LDFLAGS="-L/opt/coolstack/lib -R/opt/coolstack/lib" \\
          ./configure --prefix=/opt/coolstack --with-apxs=/opt/coolstack/apache2/bin/apxs \\
                      --with-apr=/opt/coolstack/apache2 --with-apr-util=/opt/coolstack/apache2 \\
                      --with-neon=/opt/coolstack
# /usr/sfw/bin/gmake
# /usr/sfw/bin/gmake install
Note that gettextlib and intl libraries are required to compile subversion. These libraries are present in /opt/coolstack/lib directory. These libraries are packaged in CSKlibs package.

Linking improvements in Cool Stack 1.3

linkingimprovements CoolStack 1.3 has improved linking of various components. To make recent versions of apache, php and other web stack components, coolstack needs lots of base libraries. Some of these libraries are installed in /usr/lib and some are installed in /usr/sfw/lib. Here are the requirements for coolstack :
  • Need to bundle higher version of library than is available in S10. Example is libxml2. php 5 doesn't compile with libxml2 shipped with S10.
  •  Use as many as /usr/lib and /usr/sfw/lib libraries to avoid huge download size of packages.
  • When a library is present at two location, set the RPATH/RUNPATH in libraries correctly so that application will look in /opt/coolstack/lib directory first and then it will look into /usr/sfw/lib directory.

Duplicated libraries caused lots of trouble in coolstack. Here are few issues which we encountered in Cool Stack 1.2.
  1.  CoolStack 1.2 bundled libxml2 but didn't bundle libxslt. libxslt needs libxml2.  Let us run the ldd on php extension xsl.so. I have edited the output to make it brief :
    bash-3.00# ldd /opt/coolstack/php5/lib/php/extensions/no-debug-non-zts-20060613/xsl.so
    libreadline.so.5 => /opt/coolstack/lib/libreadline.so.5
    libcurses.so.1 => /lib/libcurses.so.1
    libexslt.so.0 => /usr/lib/libexslt.so.0
    libxslt.so.1 => /usr/lib/libxslt.so.1
    libxml2.so.2 => /opt/coolstack/lib/libxml2.so.2
    ...
    libxml2.so.2 => /usr/lib/libxml2.so.2
    ...
    Note that libxml2 is loaded twice. The reason is that when libxslt.so was loaded, it loaded libxml2.so from /usr/lib directory not from /opt/coolstack/lib directory. Reason is obvious /usr/lib/libxslt.so looks for a particular libxml2 (SUNW_1.6) and find it in /usr/lib directory. The result was obvious, xsl application failed to work with coolstack 1.2. Php crashed.

  2. CoolStack 1.2 bundled libfreetype but didn't bundle libfontconfig. Php's gd extension needs libgd.so. Here is the output (edited) of ldd on libgd.so.
    bash-3.00# ldd /opt/coolstack/lib/libgd.so
    ...
    libfontconfig.so.1 => /usr/lib/libfontconfig.so.1
    libfreetype.so.6 => /opt/coolstack/lib/libfreetype.so.6
    libpng12.so.0 => /usr/lib/libpng12.so.0
    libiconv.so.2 => /opt/coolstack/lib/libiconv.so.2
    ...
    libfreetype.so.6 => /usr/sfw/lib/libfreetype.so.6
    libexpat.so.0 => /usr/sfw/lib/libexpat.so.0
    ...
    Note that libfreetype.so is loaded twice. Once from /opt/coolstack/lib directory and once from /usr/sfw/lib directory. /usr/sfw/lib/libfreetype.so was loaded by /usr/lib/libfontconfig.so. If you look at the RPATH of libfontconfig, libfontconfig looks for libfreetype from /usr/sfw/lib directory.
    bash-3.00# dump -Lvp /usr/lib/libfontconfig.so.1 | grep PATH
    [9] RUNPATH /usr/sfw/lib
    [10] RPATH /usr/sfw/lib
    I don't know what was the effect of the above problem but I believe it would have caused problem in gd's functionality.
  3. libexpat.so is present in /usr/sfw/lib which can be accedently get preference over /opt/coolstack/apache2/lib/libexpat.so
  4. libmysqlclient.so is present in /usr/sfw/lib. Clients can accidently load /usr/sfw/lib version, instead of /opt/coolstack/mysql_32bit/lib/mysql/libmysqlclient.so
Linking improvements in Cool Stack 1.3
  • 1. libxslt is now bundled in CSKruntime package.
  • libfontconfig is now bundled in CSKlibs package.
  • To handle dependent libraries on /usr/sfw/lib CoolStack 1.3 now creates symlink in /opt/coolstack/lib directory. Similarily symlinks have been created to include header files from /usr/sfw/include.
  • To the best of it's effort, none of the libraries have /usr/sfw/lib in it's RPATH/RUNPATH.
  • Almost all of the places where /usr/sfw was passed in configure options, now /opt/coostack is passed as a configure option. E.g. --with-ssl=/usr/sfw is replaced with --with-ssl=/opt/coolstack
  • Any library which links with libmysqlclient, /opt/coolstack/mysql_32bit/lib/mysql is added into it's RPATH.
  • libtool files create problems in linking other components. Therefore libtool files (.la) are not bundled in coolstack 1.3.

Monday Jul 21, 2008

Custom SSL in CoolStack 1.3

Custom ssl in coolstack 1.3 In CoolStack 1.3, applications which needs ssl are linked against /opt/coolstack/lib/libssl.so and /opt/coolstack/lib/libcrypto.so. In previous versions of coolstack, applications were directly linked with /usr/sfw/lib. There are several applications which uses ssl e.g apache, mysql, ruby etc.

Solaris 10 has openssl 0.9.7 installed in /usr/sfw/lib directory. If you have higher versions of openssl than 0.9.7 then in coolstack 1.3, it is very easy to use custom ssl without any recomplication.

In CoolStack 1.3, /opt/coolstack/lib/libssl.so and libcrypto.so are just symlinks to ssl libraries in /usr/sfw/lib directory. Suppose your custom ssl is located in /opt/ssl/lib then to to use custom ssl in coolstack 1.3, you need to remove the old symlinks and create new symlinks to new location of openssl.
# cd /opt/coolstack/lib
# rm libssl.so
# rm libssl.so.0.9.7
# ln -s /opt/ssl/lib/libssl.so libssl.so
# ln -s /opt/ssl/lib/libssl.0.9.8.so libssl.so.0.9.7

# rm libcrypto.so
# rm libcrypto.so.0.9.7
# ln -s /opt/ssl/lib/libcypto.so libcrypto.so
# ln -s /opt/ssl/lib/libcrypto.so.0.9.8 libcrypto.so.0.9.7

Monday Jun 16, 2008

Sun Java System Web Server and dtrace.

Sun Java System Web Server and dtrace dtrace is often used to trace kernel and applications. It would be interesing to monitor Web Server using dtrace. Sun Web Server does not contain built in dtrace probes but it is possible to write a small NSAPI plug-in which can be used to monitor Web Server.
In this blog, I will show how to write a small NSAPI plug-in and load the plug-in in to the Web Server and insert a log level hook (AddLog). This NSAPI plug-in will contain a dtrace probe. When plug-in is loaded in to the Web Server, Web Server will register the dtrace probes with kernel. During execution of HTTP request, Web Server will invoke AddLog (at the end of the request) which will fire the dtrace probe. Dtrace probe will provide information about the request and response. Using this probe, it is possible to write "top" like utility for Web Server.
Here is my NSAPI plug-in's code wbdtrace.c
/\*
\* Copyright (c) 2005, 2008 Sun Microsystems, Inc. All Rights Reserved.
\* Use is subject to license terms.
\*/
#include "nsapi.h"
#include <sys/sdt.h>

int NSAPI_PUBLIC dtrace_log_request(pblock \*pb, Session \*sn, Request \*rq)
{
DTRACE_PROBE4(sjsws, log__request,
(rq && rq->reqpb) ? pblock_pblock2str((pblock\*) rq->reqpb, NULL) : NULL,
(rq && rq->headers) ? pblock_pblock2str((pblock\*) rq->headers, NULL) : NULL,
(rq && rq->srvhdrs) ? pblock_pblock2str((pblock\*) rq->srvhdrs, NULL) : NULL,
(rq && rq->vars) ? pblock_pblock2str((pblock\*) rq->vars, NULL) : NULL,
);
return REQ_NOACTION;
}
Here is the dtrace provider's code sjsws.d
/\*
\* Copyright (c) 2005, 2008 Sun Microsystems, Inc. All Rights Reserved.
\* Use is subject to license terms.
\*/
provider sjsws {
probe log__request(string, string, string, string);
};
We now need to compile this plug-in and load into the Web Server. Here is how I compiled the plug-in :
$ cc -c -KPIC -mt -DXP_UNIX -I/opt2/installed/ws70u1/include wbdtrace.c
$ dtrace -G -s sjsws.d wbdtrace.o
$ cc -KPIC -mt -G -L/opt2/installed/ws70u1/lib -lCrun -lCstd -ldl -lposix4 -lns-httpd40 -o libwbdtrace.so wbdtrace.o sjsws.o
Now I copied the plug-in to Web Server's plug-in directory :
$ cp libwbdtrace.so /opt/ws70u1/plug-ins/
Now I loaded the plug-in into the Web Server and inserted a AddLog statement into obj.conf file. Here are the configuration changes which are required for this plug-in to work.
magnus.conf :
Init fn="load-modules" shlib="libwbdtrace.so" shlib_flags="(global|now)" funcs="dtrace_log_request"

obj.conf :
<Object name="default">
...
AddLog fn="dtrace_log_request"
</Object>

After making the configuration changes I restarted the Web Server :
$ bin/stopserv; bin/startserv
Web Server is now started and plug-in is loaded. This dtrace NSAPI plug-in will register the dtrace probe (sjsws provider) with kernel. Now let us see how Web Server dtrace probe looks like using "dtrace -l":
# dtrace -l | grep sjsws
53309 sjsws6523 libwbdtrace.so dtrace_log_request log-request
53310 sjsws6524 libwbdtrace.so dtrace_log_request log-request
Note that both primordial (53310) and child process (53309) have registered the probe with Solaris kernel. But since only child process will serve the request so only child process will fire the dtrace probes. Now let us write a small dtrace script (viewreq.d) which just print a message whenever our probe is fired.

sjsws\*::dtrace_log_request:log-request
{
@[probefunc] = count();
printf("\\narg0 = %x arg0= %s \\n arg1=%s \\nargs2=%s arg3 = %s\\n",
arg0,
copyinstr(arg0),
copyinstr(arg1),
copyinstr(arg2),
copyinstr(arg3));
}

dtrace:::END
{
trace("End of dtrace\\n");
}

Now I ran this script and sent a HTTP request "GET /" from browser.
# dtrace -s viewreq.d
dtrace: script 'viewreq.d' matched 3 probes
CPU ID FUNCTION:NAME 1 53310 dtrace_log_request:log-request arg0 = 7553c8 arg0= clf-request="GET / HTTP/1.1" ... \^C 0 2 :END End of dtrace
Now whenever message is logged, our dtrace probe is fired. Our dtrace script just printed the various pblocks.

Based on the above dtrace NSAPI plug-in, it is possible to write monitoring code in scripting languages e.g perl. Here are some of the scripts :
countreq.pl
viewbrowser.pl
viewresponsecodes.pl
wbdtrace.pm
wstop.pl
These are perl scripts which encapsulates the dtrace script. Most of the text processing is written in perl. Dtrace probe provide the pblock content from where request and response information is obtained.

Here is the output of these scripts.

View various types of HTTP requests count with timestamp

# perl countreq.pl
Timestamp GETs POSTs HEADs TRACEs 12:41:13 0 0 0 0 12:41:18 5469 0 0 0 12:41:23 10000 0 0 0

View various types of HTTP requests count with client browsers

# perl viewbrowser.pl
Browser   GETs POSTs HEADs TRACEs
Firefox   19   0     0     0
MSIE      0    0     0     0
Wget      0    0     0     0
Other     303  0     0     0

View HTTP requests' URIs and their response codes

# perl viewresponsecodes.pl
Response Code   Uri                            File Name
200             /index.html                    /index.html
304             /index.html                    /index.html
403             /img/gradation_header_L.gif    /img/gradation_header_L.gif
403             /img/gradation_header_R.gif    /img/gradation_header_R.gif
\^C

View HTTP requests' URIs for a particular response code

# perl viewresponsecodes.pl -c 200
Response Code   Uri                            File Name
200             /index.html                    /index.html
\^C

A top like utility for Sun Web Server (similar to apachetop)

# perl wstop.pl -d 10
20:10:40  Requests: 4223  (  422/sec) Bytes: 12930794(1293079/sec)
Requests: GETs: 4223   POSTs: 0      HEADs: 0      TRACE: 0
Responses: 1xx:0      2xx:4025   3xx:68     4xx:130    5xx:0

Requests  Reqests/sec  Bytes Sent  URI
601       60           7234838      /index.html
545       54           198925       /img/gradation_header_L.gif
498       49           3988482      /img/side_bg.gif
396       39           930204       /img/sun_logo.gif
359       35           56004        /img/gradation_header-btm.gif
342       34           2394         /cgi-bin/test.pl
250       25           10750        /img/a.gif
240       24           96720        /img/gradation_header_R.gif
206       20           52324        /img/footer_R.gif
191       19           58828        /img/gradation_header.gif
130       13           37960        /notfound.html
115       11           15410        /img/content_hline.gif
113       11           29041        /img/footer_L.gif
109       10           207754       /img/sjsws_title_text.gif
68        6            0            /yahoo
60        6            11160        /img/footer.gif
\^C
Here is the compiled binary for the plugin :
Solaris sparc
Solaris x86

Wednesday Jun 11, 2008

Reverse proxy and redirecting request.

Web Server reverse proxy and redirect. If reverse proxy instance wants to redirect every request using the following
syntax :
NameTrans fn="redirect" url-prefix="http://host:port/xyz"

It works fine in normal instances but it creates problem in reverse proxy
functionality. Reverse proxy have a thread named route_offline_thread which
uses "OPTIONS" HTTP method to find the health of the backend instance. The
result of the above is that OPTIONS request is also redirected and
route_offline_thread doesn't work properly. There is a very simple workaround
for this situation. Just do not redirect the OPTIONS request in RPP. Use the
following instead :

<If not $internal and not method = "OPTIONS">
    NameTrans fn="redirect" url-prefix="http://host:port/xyz"
</If>

Reverse proxy instance health check (route_offline_thread)

How Sun Web Server reverse proxy manages backend instances lifecycle.[Read More]

Monday Apr 28, 2008

Connection pooling in Sun Java System Web Server

Web Server Architecture : Connection pooling
Connection pooling is one of the cool features of Sun Web Server. This feature has been there from a long time. When a request comes to Web Server, acceptor thread accept the requests and puts into the connection queue. One of the daemon session threads (also called worker threads) pulls the request from the connection queue and starts processing the request. After the request is served, if the client has requested the connection to be alive then connection will go to the keep alive pollmanager. Keep alive threads poll all keep alive connections, whenever a new request comes to the this connection then keep alive threads again puts the connection to the connection queue. If there is no further request to the same connection for some period (keep alive timeout) then keep alive threads close the connection.


The reason why Sun Web Server perform so well under high load conditions e.g in specweb is because of the above Conneciton pooling architecture. In Apache Web Server, prefork and worker mpm are most common mpms. One of the problem with these mpm is that connection is bound to a thread.  So what this mean is that number of concurrent connection is typically the number of processing threads. If load increases beyond some point then connections will start timing out very soon. Event mpm tries to address this problem.


Tunables :
Here are some of the configuration parameters which affect the connection queue
directly.
(a) thread-pool --> queue-size
(b) http-listener->listen-queue-size.
(c) keepalive->timeout
(d) keepalive->pollinterval

listen-queue-size
As I wrote before, acceptor threads accept the connection and puts into the connection queue. The question is what really happen to new connections when OS is busy and OS has not yet scheduled the acceptor thread. OS kernel maintains the TCP connections on behalf of Web Server process. listen-queue-size is the number of connections kernel will accept before application accept the connection. If there is more than listen-queue-size connections before Web Server calls accept, then new connection will be rejected. This is not at all a common situation but could happen on very busy systems. Here is a small experiment to demostrate this situation:

Step 1 : I sent a stop signal to my child Web Server process so that acceptor thread
         won't be able to accept any new connection.
Step 2 : I sent a 200 simultaneous request (using apache benchmark tool ab)
$ ab -c 200 -n 200 http://hostname/index.html
Step 3 :  I ran the netstat -an command to see the connections
Here is the connection line looks.
192.18.120.211.3014  129.150.16.164.48150  5312      0 49437      0 ESTABLISHED

As expected, kernel will reject the new connections. Here is the ab output :
Benchmarking chilidev4.red.iplanet.com (be patient)
apr_poll: The timeout specified has expired (70007)

What happens if we disable the connection pool :
If we disable the thread-pool. In such cases, there is no connection queue, no daemon session threads. Acceptor threads themselves process the request. Here is the typical call stack will look like :
-----------------  lwp# 12 / thread# 12  --------------------
 fd5c1134 pollsys  (fc8efc80, 1, fc8efc00, 0)
 fd55d028 poll     (fc8efc80, 1, 1388, 10624c00, 0, 0) + 7c
 fe3d862c pt_poll_now (fc8efcec, 1, fc8efd1c, fc8efc80, 20, ffffffff) + 4c
 fe3d9ed0 pt_Accept (247408, 1bd990, ffffffff, 15, 0, ffffffff) + cc
 ff17db5c PRFileDesc\*ListenSocket::accept(PRNetAddr&,const unsigned) (1bd708, 1bd990, ffffffff, 2060, 245a88, 1bd988) + c
 ff16e67c int DaemonSession::GetConnection(unsigned) (2b7648, ffffffff, ff2ff800, ffffe800, ff26f000, ff2abc00) + 64
 ff16edb0 void DaemonSession::run() (6c0008, 2b7648, 2b7668, 2b76f0, 2000, ffffffff) + 150
 fef76d48 ThreadMain (2b7648, 2a6eb0, 0, fef89c00, ff16ec60, ff2abc64) + 1c
 fe3de024 _pt_root (2a6eb0, fef76d2c, 400, fe3f6ba4, 0, fe3f891c) + d4
 fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)


Few Call stacks :
Let us see the pstack information to see how it looks like. In my test configuration, I setup the min/max threads to 2 and disabled j2ee plugin. Then I dumped the stack using :
$ pstack <child_webservd_pid> | c++filt > pstack.txt

Let us see what are various threads doing :
1. Acceptor threads call stack :

Here is the call stack of acceptor thread.
-----------------  lwp# 13 / thread# 13  --------------------
 fd5c1134 pollsys  (fc8bfc88, 1, fc8bfc08, 0)
 fd55d028 poll     (fc8bfc88, 1, 1388, 10624c00, 0, 0) + 7c
 fe3d862c pt_poll_now (fc8bfcf4, 1, fc8bfd24, fc8bfc88, 20, ffffffff) + 4c
 fe3d9ed0 pt_Accept (245368, 20ce50, ffffffff, 15, 1, ffffffff) + cc
 ff17db5c PRFileDesc\*ListenSocket::accept(PRNetAddr&,const unsigned) (1bd708, 20ce50, ffffffff, 3, 2ab988, 2ab988) + c
 ff1782a4 void Acceptor::run() (12e488, 245908, 20ce48, 6, 3e8, 45) + 184
 fef76d48 ThreadMain (12e488, 11d828, 0, fef89c00, ff178120, ff2ac444) + 1c
 fe3de024 _pt_root (11d828, fef76d2c, 400, fe3f6ba4, 0, fe3f891c) + d4
 fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)

2. Idle Daemon session thread's call stack :
Here is the stack trace of an idle deamon session thread :
-----------------  lwp# 16 / thread# 16  --------------------
 fd5c0408 lwp_park (0, 0, 0)
 fd5ba49c cond_wait_queue (50a10, 2cfec8, 0, 0, 0, 0) + 28
 fd5baa1c cond_wait (50a10, 2cfec8, 0, 1c, 0, fcd52d00) + 10
 fd5baa58 pthread_cond_wait (50a10, 2cfec8, 1, fe3f8518, 5fc, 400) + 8
 fe3d79e8 PR_WaitCondVar (50a08, ffffffff, 2a7700, 0, 2ab988, 0) + 64
 ff17797c Connection\*ConnectionQueue::GetReady(unsigned) (8bcc8, ffffffff, ffffffff, 8bcc8, 5fc, 2ab968) + c4
 ff16e630 int DaemonSession::GetConnection(unsigned) (2b7648, ffffffff, ff2ff800, 0, ff26f000, ff2abc00) + 18
 ff16edb0 void DaemonSession::run() (746008, 2b7648, 2b7668, 2b76f0, 2000, ffffffff) + 150
 fef76d48 ThreadMain (2b7648, 2a7700, 0, fef89c00, ff16ec60, ff2abc64) + 1c
 fe3de024 _pt_root (2a7700, fef76d2c, 400, fe3f6ba4, 1, fe3f891c) + d4
 fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)

As connection queue is empty so this daemon session is waiting for request to arrive in connection queue.

3. Processing Daemon session thread's call stack :
Here is the another daemon session thread (which is processing the "/cgi-bin/test.pl" request) :
 fd5c1248 read     (1d, 758088, 2000)
 ff167588 int ChildReader::read(void\*,int) (3c9c4, 758088, 2000, 0, b71b00, 1) + 1c
 ff0f2e30 INTnetbuf_next (758028, 1, 2000, 2001, 60, 758028) + 2c
 ff13a364 int cgi_scan_headers(Session\*,Request\*,void\*) (11ee70, 11eee8, 758028, 27c, ff29bbd8, 0) + 84
 ff13abd4 int cgi_parse_output(void\*,Session\*,Request\*) (758028, 11ee70, 11eee8, ff2a1c50, ff29bbd8, 6e) + 1c
 ff13b8cc cgi_send (2c6dc8, 11ee70, 11eee8, 3c948, 1400, ff2a25cc) + 514
 ff10df68 func_exec_str (2354f8, 2c6dc8, 0, fc348, 11eee8, 11ee70) + 1c0
 ff10edc0 INTfunc_exec_directive (3d948, 2c6dc8, 11ee70, 11eee8, 280a28, 1) + 2a0
 ff113b60 INTservact_service (0, 11ee70, 2c6dc8, 0, 11eee8, 27b470) + 374
 ff11472c INTservact_handle_processed (0, 11eee8, 11eee8, 11ee70, 0, 2d6328) + 8c
 ff172064 void HttpRequest::UnacceleratedRespond() (11edc8, 3, 1, ff26f400, 0, 20) + e34
 ff170a24 int HttpRequest::HandleRequest(netbuf\*,unsigned) (11edc8, 754008, 11edf8, 2000, 754068, 11edd0) + 7a8
 ff16f10c void DaemonSession::run() (ffffffff, 280a08, 280a28, 280ab0, 1, ffffffff) + 4ac
 fef76d48 ThreadMain (280a08, 266728, 0, fef89c00, ff16ec60, ff2abc64) + 1c
 fe3de024 _pt_root (266728, fef76d2c, 400, fe3f6ba4, 1, fe3f891c) + d4
 fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)

Note that the above thread is waiting for data from child cgi process (test.pl).

4. Keep alive thread's call stack :
Let us see the call stack of keep alive thread :
-----------------  lwp# 9 / thread# 9  --------------------
 fd5c0408 lwp_park (0, 0, 0)
 fd5ba49c cond_wait_queue (2c6e98, 7092c8, 0, 0, 0, 0) + 28
 fd5baa1c cond_wait (2c6e98, 7092c8, 0, 1c, 0, fcd51100) + 10
 fd5baa58 pthread_cond_wait (2c6e98, 7092c8, 1, fe3f8518, 5fc, 400) + 8
 fe3d79e8 PR_WaitCondVar (2c6e90, ffffffff, 13e790, 0, 0, 0) + 64
 ff176024 void PollArray::GetPollArray(int\*,void\*\*) (25e4c8, fcc3fec0, fcc3fec4, 3, 0, ff2ac000) + 5c
 ff176984 void KAPollThread::run() (15c5a8, 5, 4, ff2ac000, ff2ff8cc, fcc3fec4) + 6c
 fef76d48 ThreadMain (15c5a8, 13e790, 0, fef89c00, ff176918, ff2ac3b8) + 1c
 fe3de024 _pt_root (13e790, fef76d2c, 400, fe3f6ba4, 1, fe3f891c) + d4
 fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)


Compiling Subversion in coolstack 1.2

CompilingSubversion Subversion compilation with Cool Stack 1.2 is not very straightforward. It is
a bug in Cool Stack 1.2

Coolstack apache needs ssl from /usr/sfw directory so Coolstack apache is
compiled with LDFLAGS="-L/opt/coolstack/lib -L/usr/sfw/lib" configure option.

This results in problems in libapr-1.la in libapruutil-1.la files. So when we
configure/compile subversion libexpat.so is available at multiple places. This
seems to be bug in coolstack apache. Apache apr should avoid having
/usr/sfw/lib in their RPATH.

I found out the workaround for this solution.

# echo $PATH
/opt/coolstack/lib:/usr/sbin:/usr/bin:/usr/ccs/bin:/opt/SUNWspro/bin

Step 1 : Remove libapr-1.la and libaprutil-la from /opt/coolstack/apache2/lib directory

# cd /opt/coolstack/apache2/lib/
# mv libapr-1.la /tmp/
# mv libaprutil-1.la /tmp/


Step 2 : Compile neon-0.25.5.tar.gz

# /usr/sfw/bin/gtar -zxf neon-0.25.5.tar.gz
# cd neon-0.25.5
# CFLAGS="-I/opt/coolstack/include" LDFLAGS="-L/opt/coolstack/lib" ./configure --prefix=/opt/coolstack --with-libxml2=/opt/coolstack --with-ssl=openssl
# make
# make install

Step 3 : configure subversion

# rm -rf subversion-1.4.6
# /usr/sfw/bin/gtar -jxf subversion-1.4.6.tar.bz2
# cd subversion-1.4.6
# CFLAGS="-I/opt/coolstack/include" LDFLAGS="-L/opt/coolstack/lib" ./configure --prefix=/opt/coolstack --with-apxs=/opt/coolstack/apache2/bin/apxs --with-apr=/opt/coolstack/apache2 --with-apr-util=/opt/coolstack/apache2 --with-neon=/opt/coolstack


Step 4 : Change Makefile manually : Add -lgettextlib -lintl manually to SVN_APR_LIBS and SVN_APRUTIL_LIBS

SVN_APR_LIBS =  -L/opt/coolstack/apache2/lib -R/opt/coolstack/apache2/lib -lapr-1 -luuid -lsendfile -lrt -lsocket -lnsl  -lpthread -lgettextlib -lintl
SVN_APRUTIL_LIBS =  -L/opt/coolstack/apache2/lib -R/opt/coolstack/apache2/lib -laprutil-1 -lldap  -lexpat -liconv -lgettextlib -lintl


Step 5 : build and install subversion

# make
# LD_LIBRARY_PATH=/opt/coolstack/apache2/lib:/opt/coolstack/lib:/usr/sfw/lib make install

I tried to run with "svn help" and it worked. I didn't try to run any tests on
it. I hope it should work fine with coolstack apache.

Step 6 : Revert back libapr-1.la and libaprutil-1.la (Optional)

You can add appropriate CFLAGS for optimization.

Saturday Nov 03, 2007

Sun Java System Web Server 7.0 monitoring - Part 1.

     Web Server 7.0 has some interesting ways to monitor the server. In previous versions of the Web Server, we typically create a uri e.g /.perf to obtain perfdump output.  When we wanted to get the perfdump output, we send a /.perf request. The problem with this approach is if Web Server is hanging because of some faulty application or in situations where Web Server is very busy processing the request then perfdump request also hung and we need to wait for /.perf request to get processed. In complicated situations, sometimes we were never able to obtain perfdump output. Web Server 7.0 comes to rescue in such situations.
    Let us do simple simple experiments to simulate a hanging Web Server.  Let us first change the min/max daemon session threads and set those to 2 so that Web Server can only process 2 requests at a time. Here is thread-pool element of my server.xml ,
  <thread-pool>
    <min-threads>2</min-threads>
    <max-threads>2</max-threads>
  </thread-pool>

After changing the server.xml, let us restart the Web Server. Now let us create a small cgi script which takes more than 5 minutes to execute. This will help us simulate a hanging web server.

#!/usr/bin/perl
print "Content-Type: text/html\\r\\n";
print "\\r\\n";
sleep(300);
print "Hello\\r\\n";

Now let use send two test.pl requests in parallel from two browser windows or by some command line tools e.g curl
$ curl --dump-header - -o - http://localhost:8080/cgi-bin/test.pl &
$ curl --dump-header - -o - http://localhost:8080/cgi-bin/test.pl &

The above two requests will be served after 5 minutes. Now let us try to see
if we can obtain the perfdump.
$ curl --dump-header - -o - http://localhost:8080/.perf
<hung>
Ctrl-Z
[3]+  Stopped                 curl --dump-header - -o - http://localhost:8080/.perf
$ bg
[3]+ curl --dump-header - -o - http://localhost:8080/.perf &

Our perfdump (/.perf) request hung. Now let us start the admin server and try to get perfdump of instance from admin server.
$ cd admin-server
$ bin/startserv
...
$ ./bin/wadm get-perfdump --config=myhostname --node=localhost
...
Sessions:
---------------------------------------------------------------------------------------
Process  Status    Client         Age  VS            Method  URI               Function

4952     response  192.168.1.115  110  myhostname  GET     /cgi-bin/test.pl  send-cgi
4952     response  192.168.1.115  109  myhostname  GET     /cgi-bin/test.pl  send-cgi


Voila! it worked. It showed me that there are two test.pl running and they are running since 110 seconds.

So how does it work. Why did /.perf request hung and why did get-perfdump wadm command work. Here is the explanation.

Since we created 2 daemon session threads and then send 2 test.pl requests.  All daemon session threads got busy processing requests. When we send the /.perf request. Since there was no daemon session threads available so The request went into the connection queue and waiting in connection queue to be served.

This information is also available in perfdump. Here is the snippet.
ConnectionQueue:
-----------------------------------------
Current/Peak/Limit Queue Length            1/1/1226
Total Connections Queued                   17

Note that there is 1 connection in the connection queue and that should be our "/.perf" request.

So it is now clear that if all daemon session threads are busy then web server won't be able to respond to monitoring requests based on uri. That was a big bottleneck in Web Server 6.1 and before. In situations when Web Server hung, we were not able to collect the perfdump output and it was not easy to find out what requests Web Server was processing at hanging stage.

Now the next question is how does get-perfdump wadm command works. Answer to this question lies into the monitoring architecture. There is a communiation channel from admin server to Web Server instance. All monitoring data is passed to admin server by that channel.  Also Web Server worker process has a dedicated thread for serving monitoring requests. Even though all daemon session threads are busy, this thread can generate the performance data and pass it to admin server. This is exactly what happened when we send the get-perfdump wadm command.

There are several interesting things to note in the above exercise. Admin server was not running at the time when Web Server hung. We simply started the admin server and ran get-perfdump command. During admin server startup the communication channel between admin server and instance is set up. This communication channel setup is independent of startup order of admin server or Web Server instance. They could be started in any order. Even when Web Server was busy processing the requests, this channel was successfully initialized.
There are other monitoring means e.g stats-xml, wadm CLI commands and administration GUI. They all use the same channel to access monitoring data from instance.



About

Basant Kukreja

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today