Wednesday Mar 23, 2011

This blog has moved

This blog has moved to http://chrisgerhard.wordpress.com/.

Tuesday Jan 26, 2010

Workaround for runaway metacity

Sun Ray on OpenSolaris build 131 requires the same workarounds I previously mentioned.

There is one more that helps with both 130 and 131. With the new gdm set up the login screen now runs "metacity" and occasionally this can get into a loop just consuming CPU. The trigger is that metacity has been sent a signal to terminate but then tries to be a bit too clever and goes into the loop. I've filed this bug so that it can be fixed.

Happily once again you can work around this with a bit of dtrace:

#!/usr/sbin/dtrace -qws

proc:::signal-handle
/ execname == "metacity" && args[0] == 15 / {
        system("logger -t metacity.d -p daemon.notice killing metacity[%d]", pid); 
        raise(9)
}

Sunday Jan 17, 2010

More Sun Ray on OpenSolaris build 130 workarounds

One more thing for the Sun Ray on build 130. Whether this is the last remains to be seen.

Now that gdm is run via smf gnome-session is now run via ctrun(1) so that it gets it's own contract and therefore any problems it has do not result in gdm being restarted.

However the Sun Ray sessions are not started that way. Hence I was seeing all users logged out if just one session had a problem:

So a rather trivial failure such as this:

Jan 16 11:10:47 pearson genunix: [ID 603404 kern.notice] NOTICE: core_log: metacity[7448] core dumped: /var/cores/core.metacity.7448 

would result in gdm restarted:


[ Jan 16 00:06:08 Method "start" exited with status 0. ]
[ Jan 16 11:10:47 Stopping because process dumped core. ]
[ Jan 16 11:10:47 Executing stop method ("/lib/svc/method/svc-gdm stop"). ]
[ Jan 16 11:10:47 Method "stop" exited with status 0. ]

which in turn means all the users were logged out. Ooops.

The solution was simple but like the previous workarounds leaves your warranty in tatters!

# mv /usr/bin/gnome-session /usr/bin/gnome-session.orig
# cat > /usr/bin/gnome-session << EOF
> #!/bin/ksh -p
> exec ctrun -l child \\$0.orig \\$@
> EOF
# chmod 755 /usr/bin/gnome-session

This results in all your gnome sessions having their own contract as their parent is ctrun:

: pearson FSS 33 $; ptree $(pgrep -o -u gdm gnome-session)
22433 /usr/sbin/gdm-binary
  22440 /usr/lib/gdm-simple-slave --display-id /org/gnome/DisplayManager/Displa
    22965 ctrun -l child /usr/bin/gnome-session.orig --autostart=/usr/share/gdm
      22967 /usr/bin/gnome-session.orig --autostart=/usr/share/gdm/autostart/Lo
        23062 /usr/lib/gdm-simple-greeter
        23063 gnome-power-manager
: pearson FSS 34 $; 

and means that any failures are now ring-fenced to just that session.

Monday Jan 11, 2010

More Sun Ray on OpenSolaris build 130

As I have previously mentioned I have Sun Ray "working" on OpenSolaris build 130 at home. There are some minor tweaks required to get things working close to perfectly.

If you are doing this you are already running OpenSolaris build 130 and Sun Ray which is completely unsupported. These changes are also completely unsupported. There was not warranty but if there was one you will void it.

First take a back up. Since you are running OpenSolaris and therefore have ZFS take snapshot of the file system that contains /opt/SUNWut and also use beadm to create a snapshot of the boot environment.

Now to get to a point where you can login on a Sun Ray DTU you need to do this:

ln -s /usr/lib/xorg/libXfont.so.1 /opt/SUNWut/lib
ln -s /usr/lib/xorg/libfontenc.so.1 /opt/SUNWut/lib
rm /usr/lib/xorg/modules/extensions/GL
ln -s ../../../../../var/run/opengl/server \\
/usr/lib/xorg/modules/extensions/GL
mkdir /etc/opt/SUNWut/X11
echo "catalogue:/etc/X11/fontpath.d" > /etc/opt/SUNWut/X11/fontpath
usermod -d /var/lib/gdm gdm

However the utwho command won't work and if you want to use utaction as root you need to follow the instructions in my last post.

Now utwho is extremely useful and for me a requirement as it is used by my access hours script so I wanted to get that working. As with the issues with utaction the first problem is that the script that sets this up expects to run as root but now everything is running as the user "gdm". Again the solution is RBAC.


Follow the instructions from my last post to set up a GDM profile and make the user gdm use it. Then add these lines to /etc/security/exec_attr:

GDM:solaris:cmd:::/etc/opt/SUNWut/gdm/SunRayInit/Default:uid=0
GDM:solaris:cmd:::/etc/opt/SUNWut/gdm/SunRayPostSession/Default:uid=0

and then edit the two file listed above to add in the bold lines below. The example is /etc/opt/SUNWut/gdm/SunRayInit/Default:

#!/bin/ksh
# iterate over the helpers directory
#
# ident "@(#)InitDefault.sh     1.5 09/07/31 SMI"
#
# Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#

PATH=/usr/bin/X11:/usr/X11R6/bin:/opt/X11R6/bin:/usr/bin:$PATH

if [[ "$_" != "/usr/bin/pfexec" ]] && [[ -x /usr/bin/pfexec ]]

then

        exec /usr/bin/pfexec $0 $@

fi


for i in /etc/opt/SUNWut/gdm/SunRayInit/helpers/\*
do
        if [ -x $i ]; then
            . $i
        fi
done

exit 0

Finally, and quite whether this is required I'm not sure, but the reset-dpy script will not work properly either so make these changes to fix it:


\*\*\* /opt/SUNWut/.zfs/snapshot/month_2009-12-01-01:02/lib/xmgr/gdm/reset-dpy     Tue Oct 20 01:32:31 2009
--- ./reset-dpy Mon Jan 11 13:59:30 2010
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
\*\*\* 65,70 \*\*\*\*
--- 65,71 ----
        dpys=`gdmdynamic -l | /bin/sed -e 's/:\\([\^,]\*\\),[\^;]\*/\\1/g' -e 's/;/ /g' `
        for dpy in $dpys
        do
+         dpy=${dpy#:}
            if [[ $dpy -ge $MINDISP && $dpy -le $MAXDISP ]]; then
            rm "$PRESESSION_DIR/:$dpy"
              rm "$POSTSESSION_DIR/:$dpy"
root@pearson:/etc/opt/SUNWut/xmgr#       

Now all will work:

: pearson FSS 2 $; utwho -c 
 12.0 Payflex.500a094f00130100         user2      192.168.1.228   P8.00144f57a46f
: pearson FSS 3 $; utwho
 12 Payflex.500a094f00130100             user2     
 14 Payflex.500a094d00130100             user1     
 18 Payflex.500a094c00130100             user3     
 19 Payflex.500a094e00130100             user4     
: pearson FSS 4 $; 

However you have voided your warranty!

Update: 12/1/2010 These problems should be fixed in build 132 so the workarounds should not be needed then.

Sunday Jan 10, 2010

Sometimes being on the bleeding edge you get cut.

While Sun Ray and OpenSolaris build 130 are functional they are not happy. In particular the changes to gdm have resulted in many of the functions of the Sun Ray software not working. Small things like utwho(1) no longer work.

More importantly the login scripts running as the user "gdm" have stopped my scripts that adjust the user shares and stop firefox when users disconnect from working. Since this results in the system being 100% busy all the time an urgent workaround was required.

The workaround uses lots of undocumented features, so I don't expect it to keep working long term but at least it will keep me going until the next upgrade.

The problem of not running as root is trivially solved by using RBAC and then calling utaction via pfexec(1) adding these lines to each of the files:

root@pearson:/root# egrep /etc/security/\*_attr
/etc/security/exec_attr:GDM:solaris:cmd:::/opt/SUNWut/bin/utaction:uid=0
/etc/security/prof_attr:GDM:::Do not assign to users. Profile for GDM so it can run utaction as root:help=Utaction.help
root@pearson:/root# 

Then using usermod to add the GDM profile to the gdm user:

root@pearson:/root# usermod -P GDM gdm

The now the utaction you call from your PostLogin script will be run as root. However instead of passing in the user name, which when the PostLogin script runs you don't know, pass in the name of the Sun Ray session_proc file and read the UID out of there. I have:


function read_session_proc
{
        typeset IFS="="
        typeset key val
        while read key val
        do
                if [[ "$key"="uid" ]]
                then
                        typeset IFS=:
                        typeset u spam
                        getent passwd $val | read u spam
                        print $u
                        break
                fi
        done
}
if [[ "${1#/}" != $1 ]] && [[ -f $1 ]]
then
        USER=$(read_session_proc < $1)
else
        USER=$1
fi

In the adjust shares scripts and this in the PostLogin script (/etc/opt/SUNWut/gdm/SunRayPostLogin/Default):

#!/bin/sh
#
# ident "@(#)PostLoginDefault.sh        1.1 04/05/06 SMI"
#
AD=/usr/local/sbin/adjustshares.workaround
.130

d=${DISPLAY#\*:}
d=${d%.\*}
LOGNAME=/tmp/SUNWut/session_proc/$d
/usr/bin/ctrun -l child -i none /usr/bin/pfexec /opt/SUNWut/bin/utaction -i -c "$AD $LOGNAME 50" -d "$AD $LOGNAME" &


Update: I have added the ctrun otherwise if any of the actions called by utaction dump core then everyone gets logged out. Clearly the core dumps need to be resolved but there is no reason to log everyone out.

Wednesday Jan 06, 2010

ZFS resliver performance improved!

I'm not being lucky with the Western Digital 1Tb disks in my home server.

That is to say I'm extremely pleased I have ZFS as both have now failed and in doing so corrupted data which ZFS has detected (although the users detected the problem first as the performance of the drive became so poor).One of the biggest irritations about replacing drives, apart from having to shut the system down as I don't have hot swap hardware is waiting for the pool to resliver. Previously this has taken in excess of 24 hours to do.

However yesterday's resilver was after I had upgraded to build 130 which has some improvements to the resilver code:


: pearson FSS 1 $; zpool status tank
  pool: tank
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: resilver completed after 6h28m with 0 errors on Wed Jan  6 02:04:17 2010
config:

        NAME           STATE     READ WRITE CKSUM
        tank           ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            c21t0d0s7  ONLINE       0     0     0
            c20t1d0s7  ONLINE       0     0     0
          mirror-1     ONLINE       0     0     0
            c21t1d0    ONLINE       0     0     0  225G resilvered
            c20t0d0    ONLINE       0     0     0

errors: No known data errors
: pearson FSS 2 $; 
Only 6 ½ hours for 225G which while not close to the theoretical maximum is way better than 24 hours and the system was usable while this was going on.

Sunday Jan 03, 2010

Automatic virus scanning with c-icap & ZFS

Now that I have OpenSolaris running on the home server I thought I would take advantage of the virus scanning capabilities using the clamav instance I have running. After downloading, compiling and installing c-icap I was able to get the service up and running quickly using the instructions here.

However using a simple test of trying to copy an entire home directory I would see regular errors of the form:

Jan  2 16:18:49 pearson vscand: [ID 940187 daemon.error] Error receiving data from Scan Engine: Error 0

Which were accompanied by a an error to the application and the error count to vscanadm stats.

From the source it was clear that the recv1 was returning 0, indicating the stream to the virus scan engine had closed the connection. What was not clear was why?

So I ran this D to see if what was in the buffer being read would give a clue:


root@pearson:/root# cat vscan.d 
pid$target::vs_icap_readline:entry
{
        self->buf = arg1;
        self->buflen = arg2;
}
syscall::recv:return /self->buf && arg1 == 0/
{
        this->b = copyin(self->buf, self->buflen);
        trace(stringof(this->b));
}
pid$target::vs_icap_readline:return
/self->buf/
{
        self->buf=0;
        self->buflen=0;
}
root@pearson:/root# 

root@pearson:/root# dtrace -s  vscan.d -p $(pgrep vscand)
dtrace: script 'vscan.d' matched 3 probes
CPU     ID                    FUNCTION:NAME
  1   4344                      recv:return 
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

The clue was that the error comes back on the very first byte being read. The viruse scan engine is deliberately closing the connection after handling a request which since it had negotiated "keep-alive" it should not.

The solution2 was to set the MaxKeepAliveRequests entry in the c-icap.conf file to be -1 and therefore disable this feature.

1Why is recv being used to read one byte at a time? Insane, a bug will be filed.

2It is in my opinion a bug that the vscand can't cope gracefully with this. Another bug will be filed.

Sunday Dec 13, 2009

Twitter as a sysadmin tool?

After the disk failures I have suffered I have decided to start using twitter monitor the health of my home server and network as this is easier to pick up than on my phone email. Also it provides a nice log.

Using a modified version of the command line twitter script http://blogs.sun.com/sandip/entry/tweeting_from_command_line_using modified so that it will read the account information from a file. I have changed the standard script I have for watching the system to use twitter rather than email to notify me of issues. It posts to the twitter account "syslogathome" which I now follow, you can too but I'm sure you won't want to.

My script to check things is here.

#!/bin/ksh

export PATH=/usr/sbin:/usr/bin

function tweet
{
        echo tweeting $1
        /usr/local/bin/tweet.py "$1"
}
function tweet_services
{
        typeset zone="$1 "
        if [[ "$1" != "global" ]]
        then
                typeset zl="pfexec zlogin $1 "
        else
                typeset zl=""
        fi
        ${zl}svcs -x | nawk '/\^svc:/ { s=$0 } /\^Reason:/ { print s,$0 }' | while read line
        do
                tweet "$zone$line"
        done
}
function tweet_zfs
{
        zpool list -H -o name,health | while read zfs state
        do
                [[ "$state" != "ONLINE" ]] && tweet "$zfs $state"
        done
}
function tweet_disks
{
        export IFS="    "
        kstat -p -m sderr -s "Predictive Failure Analysis" | while read err value
        do
                (( $value != 0 )) && tweet "$err     $value"
        done
}
function tweet_net
{
        typeset speed=$(dladm show-linkprop -co VALUE -p speed nge0)
        if (( $speed != 1000 ))
        then
                tweet "network running in degraded state $speed" 
        fi
}
function tweet_phone
{
        if ! ping phone 1 > /dev/null 2>&1
        then
                tweet "Phone is not responding"
        fi
}

for zone in $(zoneadm list)
do
        tweet_services $zone
done
tweet_zfs
tweet_disks
tweet_net
tweet_phone

Monday Dec 07, 2009

So long NIS+, it was fun

With the push of this feature into Solaris:

6874309 Remove NIS+ from Solaris 
PSARC/2009/530 Removal of NIS+

a bit of Solaris history is made. The namespace that was to replace NIS (YP) has been survived by the system it was to replace.

NIS+ was the default name service in Solaris 2.0 and it was a long while before Sun relented and shipped a NIS (YP) server for the release. As a support engineer however NIS+ was interesting and was reasonably secure.


The flaws however limited it's adoption:

  • Servers could not be in the domain they served. This was eventually fixed however I find it amazing that we have the same situtation now with LDAP where native LDAP clients can't be served by themselves.

  • It was hard. The technical difficulties of getting NIS+ name spaces to work since they both used secure RPC and were used by secure RPC gave it a reputation for being hard to set up and unreliable. The reliability however has been resolved such that there were many large scale deployments that ran successfully.

  • The use of secure RPC made short running programs very expensive if they used NIS+1. So scripts that did NIS+ were slow.

NIS+ allowed me to learn many things:

  • I wrote a NIS+2html gate way that allowed you to navigate an entire NIS+ namespace from a browser (the browser was mosaic) using cgi.

  • An interposing library that allowed you to see all the NIS+ calls being made.

  • A TCL library giving direct access to the NIS+ library calls. This allowed very fast scripting since only one secure RPC session has to be generated.

Unfortunately none of them made it out from Sun as this was long before we became more open.


However it's future looked sealed when it's EOF was announced in Solaris 9 but a surprise reprieve allowed it to live in Solaris 10. It looks like the same will not be true for OpenSolaris. If you are still using NIS+ then you need to be finalizing your plans to move to LDAP!

It seems my baby is unlikely to make it to 21.


So long NIS+. It was fun.

1Each process would have to generate a secure RPC session key and negotiate a secure connection with the server. If the process then only made a single call to the server this session key would then be thrown away.

Wednesday Dec 02, 2009

Tracing getipnodesXXXX calls

When I wrote the D script to decode gethostbyname() and gethostbyaddr() library calls I fully intended to proactive write the script to do getipnodebyname() and getipnodebyaddr() and for that matter all the getXbyY routines. However that spare five minutes never arrived so it was only today while investigating a real issue that I had need for a script to decode getipnodebyaddr(). Fortunately taking the other script and modifying to work with getipnodebyXXXX was not that hard.

It can only decode 5 addresses per call before it runs out of space for DIF as it has to do more than the gethostbyXXXX() version since it has to cope with both IPv4 and IPv6 addresses:

dhcp-10-18-9-247-gmp03# dtrace -32 -CZs gethostbyXXXX.d -c "getent ipnodes ibm.com"
129.42.17.103	ibm.com
129.42.18.103	ibm.com
129.42.16.103	ibm.com
Look up: ibm.com:
Host: ibm.com
	h_address[0]: 0:0:0:0:0:0:0:0:0:0:ff:ff:81:2a:11:67
	h_address[1]: 0:0:0:0:0:0:0:0:0:0:ff:ff:81:2a:12:67
	h_address[2]: 0:0:0:0:0:0:0:0:0:0:ff:ff:81:2a:10:67

dhcp-10-18-9-247-gmp03# dtrace -32 -CZs getipnodebyXXXX.d -c "smbadm list"  
[\*] [CJG]
[\*] [cjg.uk.sun.com]
	[+x6250a-vbox10.cjg.uk.sun.com] [10.18.8.140]
[\*] [CJG] [S-1-5-21-1782774743-1218725973-889210084]
[.] [DHCP-10-18-9-24] [S-1-5-21-277162072-319636157-2443625992]
Look up: x6250a-vbox10:
Host: x6250a-vbox10.cjg.uk.sun.com
	h_address[0]: 10.18.8.140

The script is here. Feel free to use it.

Tuesday Nov 24, 2009

Clear up those temporary files

One of my (many) pet peeves are shell scripts that fail to delete any temporary files they use. Included in this pet peeve are shell scripts that create more temporary files than they absolutely need, in most cases the number is 0 but there are a few cases where you really do need a temporary file but if it is temproary make sure you always delete the file.

The trick here is to use the EXIT trap handler to delete the file. That way if your script is killed (unless it is kill with SIGKILL) it will still clean up. Since you will be using mktemp(1) to create your temporary file and you want to minimize any race condition where the file could be left around you need to do (korn shell):

trap '${TMPFILE:+rm ${TMPFILE}}' EXIT

TMPFILE=$(mktemp /tmp/$0.temp.XXXXXX)

if further down the script you delete or rename the file all you have to do is unset TMPFILE eg:

mv $TMPFILE /etc/foo && unset TMPFILE

Friday Nov 13, 2009

CIFS, ACls, permissions and iTunes

If you share a file system using the CIFS server (not SAMBA) and create a file in that file system using Windows XP the file ends up with these strange permissions and an ACL like this:

: pearson FSS 12 $; ls -vd Bad
d---------+  2 cjg      staff          2 Nov 13 17:11 Bad
     0:user:cjg:list_directory/read_data/add_file/write_data/add_subdirectory
         /append_data/read_xattr/write_xattr/execute/delete_child
         /read_attributes/write_attributes/delete/read_acl/write_acl
         /write_owner/synchronize:allow
     1:group:2147483648:list_directory/read_data/add_file/write_data

         /add_subdirectory/append_data/read_xattr/write_xattr/execute

         /delete_child/read_attributes/write_attributes/delete/read_acl

         /write_acl/write_owner/synchronize:allow

: pearson FSS 13 $; 


The first thing that riles UNIX some users is the lack of any file permissions, although things seem to work fine. The strange group ACL is for the local WINDOWS SYSTEM group. However the odd thing is for me it renders iTunes on the Windows system unable to see the files that it has created.

The solution is to add a default ACL to the root of the file system (well to every object in the file system if the file system is not new) that looks like this:

A+owner@:full_set:fd:allow,everyone@:read_set/execute:fd:allow

So this has the rather pleasant side effect of setting the UNIX permissions to something more recognisable:

: pearson FSS 20 $; ls -vd Good
drwxr-xr-x+  2 cjg      staff          2 Nov 13 18:16 Good
     0:owner@:list_directory/read_data/add_file/write_data/add_subdirectory
         /append_data/read_xattr/write_xattr/execute/delete_child
         /read_attributes/write_attributes/delete/read_acl/write_acl
         /write_owner/synchronize:file_inherit/dir_inherit/inherited:allow
     1:everyone@:list_directory/read_data/read_xattr/execute/read_attributes
         /read_acl:file_inherit/dir_inherit/inherited:allow
: pearson FSS 21 $; 

and the even more pleasant side effect of making iTunes works again!

Thursday Nov 12, 2009

The Kings of Computing use dtrace!

I've said many times that dtrace is not just a wonderful tool for developers and performance gurus. The Kings of Computing, which are of course System Admins, also find it really useful.

There is an ancient version of make called Parallel make that occasionally suffers from a bug (1223984) where it gets into a loop like this:

waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED)	Err#10 ECHILD
alarm(0)					= 30
alarm(30)					= 0
waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED)	Err#10 ECHILD
alarm(0)					= 30
alarm(30)					= 0
waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED)	Err#10 ECHILD

This will then consume a CPU and the users CPU shares. The application is never going to be fixed so the normal advice is not to use it. However since it can be NFS mounted from anywhere I can't reliably delete all copies of it so occasionally we will see run away processes on our build server.

It turns out this is a snip to fix with dtrace. Simply look for cases where the wait system call returns an error and errno is set to ECHILD (10) and if that happens 10 times in a row for the same process and that process does not call fork then stop the process.

The script is simple enough for me to just do it on the command line:


# dtrace -wn 'syscall::waitsys:return / arg1 <= 0 && 
execname == "make.bin" && errno == 10  && waitcount[pid]++ > 20 / {

	stop();

	printf("uid %d pid %d", uid, pid) }

syscall::forksys:return / arg1 > 0 / { waitcount[pid] = 0 }'
dtrace: description 'syscall::waitsys:return ' matched 2 probes
dtrace: allowing destructive actions
CPU     ID                    FUNCTION:NAME
  2  20588                   waitsys:return uid 36580 pid 29252
  3  20588                   waitsys:return uid 36580 pid 2522
  5  20588                   waitsys:return uid 36580 pid 28663
  7  20588                   waitsys:return uid 36580 pid 29884
 10  20588                   waitsys:return uid 36580 pid 941
 15  20588                   waitsys:return uid 36580 pid 1098

This was way easier then messing around with prstat, truss and pstop!

Sunday Nov 08, 2009

Access hour by day of the week

At the request of the users the access hours for Sun Ray users in the house have been relaxed so that on Friday and Saturday nights the Sun Ray's in bedrooms can be used later.

This required that the access hour script be updated to understand the day of the week and hence the access_hour file also is updated in an incompatible way. There is now an extra column representing the days of the week when the rule is applied as the first column after the name of the user. The day of the week field will take a wild card '\*' or ranges (1-5) for Monday to Friday, or lists (1,3,5). Sunday is day 0 as any self respecting geek would have it.

The new access_file I have looks something like this:

    user0:0-4:0001:2300:P8.00144f7dc383
    
    user2:0-4:0630:2300
    
    user3:0-4:0630:2230
    
    user4:0-4:0630:2100
    
    user4:5-6:0630:2200
    

The script is still here: http://blogs.sun.com/chrisg/resource/check_access_hours

Friday Oct 09, 2009

Preparing for OpenSolaris @ home

Since the "nevada" builds of Solaris next are due to end soon and for some time the upgrade of my home server has involved more than a little bit of TLC to get it to work I will be moving to an OpenSolaris build just as soon as I can.

However before I can do this I need to make sure I have all thesoftware to provide home service. This is really a note to myself to I don't forget anything.

  • Exim Mail Transfer Agent (MTA). Since I am using certain encryption routines, virus detection and spamassassin I was unable to use the standard MTA, sendmail, when the system was originaly built and have been using exim, from blastwave. I hope to build and use exim without getting all the cruft that comes from the Blastwave packaged. So far this looks like it will be simple as OpenSolaris now has OpenSSL.

  • An imapd. Currently I have a blastwave version but again I intend to build this from scratch again the addition of OpenSSL and libcrypto should make this easy.

  • Clamav. To protect any Windows systems and to generally not pass on viri to others clamav has been scanning all incoming email. Again I will build this from scratch as I already do.

  • Spamassassin. Again I already build this for nevada so building it for OpenSolaris will be easy.

  • Ddclient. Having dynamic DNS allows me to login remotely and read email.

  • Squeezecenter. This is a big issue and in the past has proved hard to get built thanks to all the perl dependacies. It is for that reason I will continue to run it in a zone so that I don't have to trash the main system. Clearly with all my digital music loaded into the squeezecentre software this has to work.

I'm going to see if I can jump through the legal hoops that will allow me to contribute the builds to the contrib repository via Source Juicer. However as this is my spare time I don't know whether the legal reviews will be funded.

Due to the way OpenSolaris is delivered I also need to be more careful about what I install. rather than being able to choose everything. First I need my list from my laptop. Then in addtion to that I'll need

  • Samba - pkg:/SUNWsmba

  • cups - pkg:/SUNWcups

  • OpenSSL - pkg:/SUNWopenssl

Oh and I'll need the Sun Ray server software.

Wednesday Sep 23, 2009

purple-url-handler and XMPP URLs

I've finally worked out how to drive purple-url-handler. Strictly John worked it out, so I will stand on his shoulders, but for some reason it would not work for me and I now know why and have a workaround.

First you need an XMPP URI on a web page. Some thing like:

xmpp:chrisg_fans@muc.im.sun.com?join

will when clicked in a browser that has the right helper, something OpenSolaris has had for some time, will take your IM client to that room. However with pidgin that is only the case if that room is available in the first XMPP server listed in your list of accounts. So given that this room is on Sun's IM server with the list of accounts looking like this:


It will try and connect to the first XMPP server listed, which is google and hence fail. Changing the order to be:


and then logging in and out and now the link will work. You can drag and drip the entries in pidgin.

Tuesday Sep 15, 2009

Moving to an OpenSolaris Sun Ray

Today I took the plunge and moved from working on our Nevada based Sun Ray Servers to one running OpenSolaris. So that I could get the full OpenSolaris look and feel I first purged my home directory of a number of configuration files and directories using a script like1 this:

#!/bin/ksh -p
TARGET=b4OpenSolaris
test -d $HOME/$TARGET || mkdir $HOME/$TARGET
mv $HOME/.ICEauthority $HOME/$TARGET
mv $HOME/.cache $HOME/$TARGET
mv $HOME/.chewing $HOME/$TARGET
mv $HOME/.config $HOME/$TARGET
mv $HOME/.dbus $HOME/$TARGET
mv $HOME/.dmrc $HOME/$TARGET
mv $HOME/.gconf $HOME/$TARGET
mv $HOME/.gconfd $HOME/$TARGET
mv $HOME/.gksu.lock $HOME/$TARGET
mv $HOME/.gnome2 $HOME/$TARGET
mv $HOME/.gnome2_private $HOME/$TARGET
mv $HOME/.gstreamer-0.10 $HOME/$TARGET
mv $HOME/.gtk-bookmarks $HOME/$TARGET
mv $HOME/.iiim $HOME/$TARGET
mv $HOME/.local $HOME/$TARGET
mv $HOME/.nautilus $HOME/$TARGET
mv $HOME/.printer-groups.xml $HOME/$TARGET
mv $HOME/.rnd $HOME/$TARGET
mv $HOME/.sunstudio $HOME/$TARGET
mv $HOME/.sunw $HOME/$TARGET
mv $HOME/.updatemanager $HOME/$TARGET
mv $HOME/.xesam $HOME/$TARGET
mv $HOME/.xsession-errors $HOME/$TARGET

I generated the list by installing OpenSolaris in a VirtualBox and then logging in and doing a bit of browsing and general usage and then seeing was was created. Additionally “.mozilla” was created but I chose to retain that so that I can keep all the history that is in my browser.

Once logged in I have removed the update-manager icon as I am not the administrator. I have also removed the power notification and network monitor as they provide no useful data on a Sun Ray server.

Using “System->Preferences->Startup Applications” I unchecked the codeina update notifier and added my script for updating my IM status.

So far so good but it is taking a while to get used to the menu being a the top and the window list at the bottom of the screen.

1Like as in similar to and not this exact script as mine had my home directory hard coded into it.

Thursday Sep 10, 2009

Using dtrace to find double frees

Some of the most common failures that result in customer calls are misuses of the memory allocation routines, malloc, calloc, realloc, valloc, memalign and free. There are many ways in which you can misuse these routines and the data that they return and the resulting failures often occur within the routines even though the problem is with the calling program.

I'm not going to discuss here all the ways you can abuse these routines but look at a particular type abuse. The double free. When you allocate memory using these routines it is your responsibility to free it again so that the memory does not “leak”. However you must only free the memory once. Freeing it more than once is a bug and the results of that are undefined.

This very simple code has a double free:

#include <stdlib.h>

void
doit(int n, char \*x)
{
        if (n-- == 0)
                free(x);
        else
                doit(n,x);
}
int
main(int argc, char \*\*argv)
{
        char \*x;
        char \*y;

        x = malloc(100000);
        
        doit(3, x);
        doit(10, x);
}

and if you compile and run that program all appears well;


However a more realistic program could go on to fail in interesting ways leaving you with the difficult task of finding the culprit. It is for that reason the libumem has good checking for double frees:


: exdev.eu FSS 26 $;  LD_PRELOAD=libumem.so.1 /home/cg13442/lang/c/double_free
Abort(coredump)
: exdev.eu FSS 27 $; mdb core
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::status
debugging core file of double_free (64-bit) from exdev
file: /home/cg13442/lang/c/double_free
initial argv: /home/cg13442/lang/c/double_free
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=18108 uid=14442 code=-1
> ::umem_status
Status:         ready and active
Concurrency:    16
Logs:           (inactive)
Message buffer:
free(e53650): double-free or invalid buffer
stack trace:
libumem.so.1'umem_err_recoverable+0xa6
libumem.so.1'process_free+0x17e
libumem.so.1'free+0x16
double_free'doit+0x3a
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'doit+0x4d
double_free'main+0x100
double_free'_start+0x6c

> 

Good though this is there are situations when libumem is not used and others where it can't be used1. In those cases it is useful to be able to use dtrace to do this and any way it is always nice to have more than one arrow in your quiver:


: exdev.eu FSS 54 $; me/cg13442/lang/c/double_free 2> /dev/null              <
/usr/sbin/dtrace -qs doublefree.d -c /home/cg13442/lang/c/double_free 2> /dev/null
Hit Control-C to stop tracing
double free?
	Address: 0xe53650
	Previous free at: 2009 Jun 23 12:23:22, LWP -1
	This     free at: 2009 Jun 23 12:23:22, LWP -1
	Frees 42663 nsec apart
	Allocated 64474 nsec ago by LWP -1

              libumem.so.1`free
              double_free`doit+0x3a
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d

: exdev.eu FSS 56 $; 

If run as root you can get the the real LWP values that did the allocation and the frees:

: exdev.eu FSS 63 $; pfexec /usr/sbin/dtrace -qs doublefree.d -c /home/cg1344>
Hit Control-C to stop tracing
double free?
	Address: 0xe53650
	Previous free at: 2009 Jun 23 14:21:29, LWP 1
	This     free at: 2009 Jun 23 14:21:29, LWP 1
	Frees 27543 nsec apart
	Allocated 39366 nsec ago by LWP 1

              libumem.so.1`free
              double_free`doit+0x3a
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d
              double_free`doit+0x4d

: exdev.eu FSS 64 $;

Here is the script in all it's glory.

#!/usr/sbin/dtrace -qs

BEGIN
{
	printf("Hit Control-C to stop tracing\\n");
}
ERROR 
/ arg4 == DTRACEFLT_KPRIV || arg4 == DTRACEFLT_UPRIV /
{
	lwp = -1;
}

pid$target::realloc:entry,
pid$target::free:entry
{
	self->addr = arg0;
	self->recurse++;
}

pid$target::realloc:return,
pid$target::free:return
/ self->recurse /
{
	self->recurse--;
	self->addr = 0;
}

pid$target::malloc:entry,
pid$target::memalign:entry,
pid$target::valloc:entry,
pid$target::calloc:entry,
pid$target::realloc:entry,
pid$target::realloc:entry,
pid$target::free:entry
/ lwp != -1 && self->lwp == 0 /
{
	self->lwp = curlwpsinfo->pr_lwpid;
}

pid$target::malloc:entry,
pid$target::calloc:entry,
pid$target::realloc:entry,
pid$target::memalign:entry,
pid$target::valloc:entry,
pid$target::free:entry
/ self->lwp == 0 /
{
	self->lwp = lwp;
}

pid$target::malloc:return,
pid$target::calloc:return,
pid$target::realloc:return,
pid$target::memalign:return,
pid$target::valloc:return
{
	alloc_time[arg1] = timestamp;
	allocated[arg1] = 1;
	free_walltime[arg1] = 0LL;
	free_time[arg1] = 0LL;
	free_lwpid[arg1] = 0;
	alloc_lwpid[arg1] = self->lwp;
	self->lwp = 0;
}

pid$target::realloc:entry,
pid$target::free:entry
/ self->recurse == 1 && alloc_time[arg0] && allocated[arg0] == 0 /
{
	printf("double free?\\n");
	printf("\\tAddress: 0x%p\\n", arg0);
	printf("\\tPrevious free at: %Y, LWP %d\\n", free_walltime[arg0],
		free_lwpid[arg0]);
	printf("\\tThis     free at: %Y, LWP %d\\n", walltimestamp,
		self->lwp);
	printf("\\tFrees %d nsec apart\\n", timestamp - free_time[arg0]);
	printf("\\tAllocated %d nsec ago by LWP %d\\n",
		timestamp - alloc_time[arg0], alloc_lwpid[arg0]);

	ustack(10);
}

pid$target::realloc:entry,
pid$target::free:entry
/ self->recurse == 1 && alloc_time[arg0] && allocated[arg0] == 1 /
{
	free_walltime[arg0] = walltimestamp;
	free_time[arg0] = timestamp;
	free_lwpid[arg0] = self->lwp;

	allocated[arg0] = 0;
}

pid$target::free:entry
/self->lwp && self->recurse == 0/
{
	self->lwp = 0;
}

1Most of the cases it “can't” be used is because it finds fatal problems early on in the start up of applications. Then the application writers make bizarre claims that this is a problem with libumem and will tell you it is not supported with their app. In fact the problem is with the application.

Wednesday Sep 09, 2009

Understanding iostat

1Iostat has been around for years and until Dtrace came along and allowed us to look more deeply into the kernel was the tool for analysing how the io subsystem was working in Solaris. However interpreting the output has proved in the past to cause problems.

First if you are looking at latency issues it is vital that you use the smallest time quantum to iostat you can, which as of Solaris 10 is 1 second. Here is a sample of some output produced from “iostat -x 1”:

                  extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       5.0 1026.5    1.6 1024.5  0.0 25.6   24.8   0  23 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 


The first thing to draw your attention to is the Column “%b” which the manual tells you is:

%b percent of time the disk is busy (transactions in progress)

So in this example the disk was “busy”, ie had at least one transaction (command) in progress for 23% of the time period. Ie 0.23 seconds as the time period was 1 second.

Now look at the “actv” column. Again the manual says:

actv average number of transactions actively being serviced (removed from the queue but not yet completed)
This is the number of I/O operations accepted, but not yet serviced, by the device.
In this example the average number of transactions outstanding for this time quantum was 25.6. Now here is the bit that is so often missed. Since we know that all the transactions actually took place within 0.23 seconds and were not evenly spread across the full second the average queue depth when busy was 100/23 \* 25.6 or 111.3. Thanks to dtrace and this D script you can see the actual IO pattern2:

Even having done the maths iostat smooths out peaks in the IO pattern and thus under reports the peak number of transactions as 103.0 when the true value is 200.
The same is true for the bandwidth. The iostat above comes reports 1031.5 transactions a second (r/s + w/s) again though this does not take into account that all those IO requests happened in 0.23 seconds. So the true figure for the device would be 1031.5 \* 100/23 which is 4485 transations/sec.
If we up the load on the disk a bit then you can conclude more from the iostat:
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       5.0 2155.7    1.6 2153.7 30.2 93.3   57.1  29  45 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0 3989.1    0.0 3989.1 44.6 157.2   50.6  41  83 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 

Since the %w column is non zero, and from the manual %w is:

%w percent of time there are transactions waiting for service (queue non-empty)

This is telling us that the device's active queue was full. So on the third line of the above output the devices queue was full for 0.41 seconds. Since the queue depth is quite easy to find out3 and in this case was 256, you can deduce that the queue depth for that 0.41 seconds was 256. Thus the average for the 0.59 seconds left was (157.2-(0.41\*256))/0.59 which is 88.5. The graph of the dtrace results tells a different story:


These examples demonstrate what can happen if your application dumps a large number of transactions onto a storage device while the through put will be fine and if you look at iostat data things can appear ok if the granularity of the samples is not close to your requirement for latency any problem can be hidden by the statistical nature of iostat.

1Apologies to those who saw a draft of this on my blog briefly.

2The application creating the IO attempts to keep 200 transations in the disk at all the time. It is interesting to see that it fails as it does not get notification of the completion of the IO until all or nearly all the outstanding transactions have completed.

3This command will do it for all the devices on your system:

echo '\*sd_state::walk softstate | ::print -d -at "struct sd_lun" un_throttle' | pfexec mdb -k

however be warned the throttle is dynamic so dtrace gives the real answer.

Monday Sep 07, 2009

Recovering /etc/name_to_major

What do you do if you manage to delete or corrupt /etc/name_to_major? Assuming you don't have a backup a ZFS snapshot or an alternative boot environment, in which case you probably are in the wrong job, you would appear to be in trouble.

First thing is not to panic. Do not reboot the system. If you do that it won't boot and your day has just got a whole lot worse. The data needed to rebuild /etc/name_to_major is in the running kernel so it can be rebuilt from that. If your system an x86 system it is also in the boot archive.

However if you have no boot archive or have over written it with the bad name_to_system this script will extract it from the kernel, all be it slowly:

#!/bin/ksh
i=0
while ((i < 1000 ))
do
print "0t$i::major2name" | mdb -k | read x && echo $x $i
let i=i+1 
done

1Redirect that into a file then move the remains of your /etc/name_to_major out of the way and copy the file in place.

Next time make sure you have a back up or snapshot or alternative boot environment!

1You will see lots of errors of the form “mdb: failed to convert major number to name” these are to be expected. They can be limited to just one by adding “|| break” to the mdb line but that assumes that you have no holes in the major number listings which you may have if you have removed a device, so best to not risk that.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today