Sunday Dec 13, 2009

Twitter as a sysadmin tool?

After the disk failures I have suffered I have decided to start using twitter monitor the health of my home server and network as this is easier to pick up than on my phone email. Also it provides a nice log.

Using a modified version of the command line twitter script http://blogs.sun.com/sandip/entry/tweeting_from_command_line_using modified so that it will read the account information from a file. I have changed the standard script I have for watching the system to use twitter rather than email to notify me of issues. It posts to the twitter account "syslogathome" which I now follow, you can too but I'm sure you won't want to.

My script to check things is here.

#!/bin/ksh

export PATH=/usr/sbin:/usr/bin

function tweet
{
        echo tweeting $1
        /usr/local/bin/tweet.py "$1"
}
function tweet_services
{
        typeset zone="$1 "
        if [[ "$1" != "global" ]]
        then
                typeset zl="pfexec zlogin $1 "
        else
                typeset zl=""
        fi
        ${zl}svcs -x | nawk '/\^svc:/ { s=$0 } /\^Reason:/ { print s,$0 }' | while read line
        do
                tweet "$zone$line"
        done
}
function tweet_zfs
{
        zpool list -H -o name,health | while read zfs state
        do
                [[ "$state" != "ONLINE" ]] && tweet "$zfs $state"
        done
}
function tweet_disks
{
        export IFS="    "
        kstat -p -m sderr -s "Predictive Failure Analysis" | while read err value
        do
                (( $value != 0 )) && tweet "$err     $value"
        done
}
function tweet_net
{
        typeset speed=$(dladm show-linkprop -co VALUE -p speed nge0)
        if (( $speed != 1000 ))
        then
                tweet "network running in degraded state $speed" 
        fi
}
function tweet_phone
{
        if ! ping phone 1 > /dev/null 2>&1
        then
                tweet "Phone is not responding"
        fi
}

for zone in $(zoneadm list)
do
        tweet_services $zone
done
tweet_zfs
tweet_disks
tweet_net
tweet_phone

Tuesday Nov 24, 2009

Clear up those temporary files

One of my (many) pet peeves are shell scripts that fail to delete any temporary files they use. Included in this pet peeve are shell scripts that create more temporary files than they absolutely need, in most cases the number is 0 but there are a few cases where you really do need a temporary file but if it is temproary make sure you always delete the file.

The trick here is to use the EXIT trap handler to delete the file. That way if your script is killed (unless it is kill with SIGKILL) it will still clean up. Since you will be using mktemp(1) to create your temporary file and you want to minimize any race condition where the file could be left around you need to do (korn shell):

trap '${TMPFILE:+rm ${TMPFILE}}' EXIT

TMPFILE=$(mktemp /tmp/$0.temp.XXXXXX)

if further down the script you delete or rename the file all you have to do is unset TMPFILE eg:

mv $TMPFILE /etc/foo && unset TMPFILE

Monday Sep 07, 2009

Recovering /etc/name_to_major

What do you do if you manage to delete or corrupt /etc/name_to_major? Assuming you don't have a backup a ZFS snapshot or an alternative boot environment, in which case you probably are in the wrong job, you would appear to be in trouble.

First thing is not to panic. Do not reboot the system. If you do that it won't boot and your day has just got a whole lot worse. The data needed to rebuild /etc/name_to_major is in the running kernel so it can be rebuilt from that. If your system an x86 system it is also in the boot archive.

However if you have no boot archive or have over written it with the bad name_to_system this script will extract it from the kernel, all be it slowly:

#!/bin/ksh
i=0
while ((i < 1000 ))
do
print "0t$i::major2name" | mdb -k | read x && echo $x $i
let i=i+1 
done

1Redirect that into a file then move the remains of your /etc/name_to_major out of the way and copy the file in place.

Next time make sure you have a back up or snapshot or alternative boot environment!

1You will see lots of errors of the form “mdb: failed to convert major number to name” these are to be expected. They can be limited to just one by adding “|| break” to the mdb line but that assumes that you have no holes in the major number listings which you may have if you have removed a device, so best to not risk that.

Thursday Aug 27, 2009

Starting remote X applications

Someone has posted a script to start a remote xterm on BigAdmin which exposes a number of issues I thought it would be better if google stood some chance of finding a better answer or at least an answer that does not rely on inherently insecure settings.

Remote X applications should be started using ssh -X so that the X traffic is encrypted and if you add -C compressed which can be a significant performance boost. So a script to do this could be handy although to be honest knowing the ssh options or having them set as the default in your .ssh/config is just as easy:

: exdev.eu FSS 31 $; egrep '\^(Compress|ForwardX)' ~/.ssh/config
ForwardX11 yes
Compression yes
: exdev.eu FSS 32 $; ssh -f pearson /usr/X11/bin/xterm         
: exdev.eu FSS 33 $; 

or more usefully to start graphical tools:

: exdev.eu FSS 33 $; ssh -f pearson pfexec /usr/sadm/admin/bin/dhcpmgr
: exdev.eu FSS 34 $; 

However if you really want a script to do it here is one that will and no need to mess with your .ssh/config

#!/bin/ksh
REMOTE_PATH=${REMOTE_PATH:-${PATH}}
APP=${0##\*/}
if (( $# < 1 )) 
then
        print "USAGE: ${APP} host [args]" >&2
        exit 1
fi
host=$1
shift
exec /usr/bin/ssh -o ClearAllForwardings=yes -C -Xfn $host \\
        PATH=${REMOTE_PATH} pfexec ${APP#r} $@

If you save this into a file called “rxterm” then running “rxterm remotehost” will start an xterm on the system remotehost assuming you can ssh to that system.

More entertainingly you can save it as “rdhcpmgr” and it will start the dhcpmgr program on a remote system and securely display it on your current display (assuming your PATH includes /usr/sadm/admin/bin and your profile allows you access to that application). You can use it to start any application by simple naming it after the application in question with a preceding “r”.

Tuesday Aug 04, 2009

Making a simple script faster

Many databases get backed up by simply stopping the database copying all the data files and then restarting the database. This is fine for things that don't require 24 hour access. However if you are concerned about the time it takes to take the back up then don't do this:

stop_database
cp /data/file1.db .
gzip file1.db
cp /data/file2.db .
gzip file2.db
start_database

Now there are many ways to improve this using ZFS and snapshots being one of the best but if you don't want to go there then at the very least stop doing the “cp”. It is completely pointless. The above should just be:

stop_database
gzip < /data/file1.db > file1.db
gzip < /data/file2.db > file2.db
start_database

You can continue to make it faster by backgrounding those gzips if the system has spare capacity while the back up is running but that is another point. Just stopping those extra copies will make life faster as they are completely unnecessary.

Monday Apr 07, 2008

Mounting a Mirrored root disk when booted from CDROM.

I previously mentioned about modifying an underlying mirror. So if you have booted from CDROM (yes I know they are all DVDs now but at least I've stopped saying “tape”) or the network then here is how on Solaris 91 and above.


First get a copy of the /kernel/drv/md.conf file. Since mounting a file system in this case will result in rolling the log, even for a read-only mount, this actually breaks my rule. Which is why it is wise to keep a copy of the md.conf file somewhere safe or failing that on that USB pen drive that you have dropped behind the sofa. It will be in the back up of the root file system you have.

# ufsrestore xf cg13442@1.2.3.4:/backup/root.dump kernel/drv/md.conf
Warning: ./kernel: File exists
Warning: ./kernel/drv: File exists
You have not read any volumes yet.
Unless you know which volume your file(s) are on you should start
with the last volume and work towards the first.
Specify next volume #: 1
set owner/mode for '.'? [yn] n   
Directories already exist, set modes anyway? [yn] n
# 

If you have, like I have at home, backed up your root file system into your ZFS pool you can have a quick demonstration as ZFS gets this right when you get the md.conf, you just import the pool. You have to use an alternative root as the root is read-only so it can't create /tank:

# zpool import -R /tmp tank
# ufsrestore xf /tmp/tank/backup/root kernel/drv/md.conf   
Warning: ./kernel: File exists
Warning: ./kernel/drv: File exists
You have not read any volumes yet.
Unless you know which volume your file(s) are on you should start
with the last volume and work towards the first.
Specify next volume #: 1
set owner/mode for '.'? [yn] n
Directories already exist, set modes anyway? [yn] n
#

Now run update_drv(1M) to load the new md.conf and you are away.

# update_drv md
devfsadm: mkdir failed for /dev 0x1ed: Read-only file system

That is it. You can now access your meta devices:

# metastat
d10: Mirror
    Submirror 0: d11
      State: Needs maintenance 
    Submirror 1: d12
      State: Needs maintenance 
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 70078473 blocks (33 GB)

d11: Submirror of d10
    State: Needs maintenance 
    Invoke: metasync d10
    Size: 70078473 blocks (33 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s0          0     No            Okay   Yes 


d12: Submirror of d10
    State: Needs maintenance 
    Invoke: metasync d10
    Size: 70078473 blocks (33 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t1d0s0          0     No            Okay   Yes 


d20: Mirror
    Submirror 0: d21
      State: Needs maintenance 
    Submirror 1: d22
      State: Needs maintenance 
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 1022706 blocks (499 MB)

d21: Submirror of d20
    State: Needs maintenance 
    Invoke: metasync d20
    Size: 1022706 blocks (499 MB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t1d0s1      26001     Yes           Okay   Yes 


d22: Submirror of d20
    State: Needs maintenance 
    Invoke: metasync d20
    Size: 1022706 blocks (499 MB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s1      26001     Yes           Okay   Yes 


Device Relocation Information:
Device   Reloc  Device ID
c1t1d0   Yes    id1,sd@SFUJITSU_MAP3367N_SUN36G_00N024DA____
c1t0d0   Yes    id1,sd@SFUJITSU_MAP3367N_SUN36G_00N022FA____
#

There is a document in the service database formally know as SunSolve #202794 (Previously Published As 75210) which claims you need to unload the md driver for Solaris 10 you don't. I am updating that document.

1At the time of publishing I have not verified this on Solaris 9 but I think it should work. I have clearly verified it on 10! When I have verified it I will update this post. Update: I can verify this works on Solaris 9

Sunday Apr 06, 2008

What does the home server do?

I was recently asked what the home server serves. So here is the list:

  1. NAS server. NFS and CIFS (via SAMBA). There is a single Windows system in the house which is increasingly not switched on. NFS for the two laptops that frequent the network. All supported via ZFS on two 400Gb drives with literally thousands of snapshots,44170. Space is beginning to get short thanks to the 10Mega pixel SLR camera so in the not to distant future a disk upgrade will be required.

  2. Sun Ray server. There are (currently) three Sun Rays. One acts as a photo frame and has no keyboard or mouse. The other two provide real interactive use. I can foresee a situation where we have two more Sun Rays.

  3. Email server. SMTP and IMAP via exim and imapd respectively. Clearly this implies spamassassin and and antivirus scanner, clamAV.

  4. SlimServer. I've just run up a slim server to get better access to internet radio stations. Having a radio player that I can hook up to the hi-fi that is not DAB, ie crap1, would be good. I feel a squeezebox coming soon.

Just occasionally and every time I ran up VirtualBox the system would struggle to cope prior to the CPU upgrade even when using the Fair Share Schedler. Since the upgrade it has not had any problems with having us all using it.




1It is nice to see that I am not alone in realising DAB is crap.

Friday Apr 04, 2008

Sun Ray resource management.

One of the great benefits of running Sun Rays at home is having the sessions always there. Just plug in the card and you get your session as if you were never away. However that also allows you to leave an application chewing CPU cycles when you are away. So to keep the interactive experience as good as possible I employ the same techniques described in “Using Solaris Resource Manager With Sun Ray” blueprint. For a long while I've wondered why IT don't do this. The keepers of our Sun Ray do and it works a treat. Which is a good thing when you share a Sun Ray Server with Tim.

Instead of setting the number of shares up to a specific value I use a multiplier so that those active on a Sun Ray get 10 times the number of shares that they would by default. While this works well it still leaves a significant load on the system from certain applications, specifically flash animations that are left running endlessly playing the games that were being played when the users card was removed. The fair share scheduler does it's thing to make CPU allocation fair but the memory use of those otherwise idle firefox sessions is significant.

So I've taken a leaf out of the BOFH and apply some special sanctions to those processes. Alas I may not get a job with the BOFH as my sanctions are simply to pstop(1) the copies of firefox associated with the user and DISPLAY when they detach and then prun(1) them when the user reconnects. I wondered about using memory resource caps to limit the memory but that would leave the systems rcapd(1M) battling the memory usage of the firefox processes which are not displaying anything anyway. In the unlikely event that any of the users are using their firefox sessions to simulate nuclear fission or crack SSL so would rather they kept running I'm sure they will get back to me.

So the script I have for doing this is slightly more complex than the one from the Blueprint. Since it has to err on the side of caution when stopping users firefox sessions. To do that it uses pargs(1) to make sure that the firefox sessions are really for this display. In practice I am the only person who might remote display a firefox session from here and even that is unlikely but it is the principle. The impact on the system of not trying to run all the disconnected firefox sessions is amazing.

Thursday Apr 03, 2008

Work IT catching up with home IT

At long last my home directory in the Office has caught up with my home directory at home and the one on my laptop and now lives on ZFS. Even better the admins have delegated snapshot privileges for my home directory to me. So now I have a scrip that snapshots my home directory every time I insert my smart card:

#!/bin/ksh -p

now=$(date +%F-%T)

exec mkdir $HOME/.zfs/snapshot/user_snap_$now

This is then called using utaction:

utaction -c ~/bin/sh/snap 

Which is in turn started automatically via the session magic that gnome does (Preferences->Sessions->Start Up Programs).


You will notice that I use mkdir to create the snapshot this is great as it allows me to run the script on an NFS client but does prevent me from doing a recursive snapshot which if I had other file systems I would like.

Update. I just realised that my nautilus script is now useful at work. Cool.

Thursday Mar 27, 2008

Dual Core hits home server

I bit the bullet and bought a new CPU for the home server. It now has an AMD Athlon 64 X2 5000+ Socket AM2 2.6GHz Energy Efficient L2 1MB (2x512KB):

: pearson FSS 2 $; /usr/sbin/psrinfo -v
Status of virtual processor 0 as of: 03/27/2008 08:00:38
  on-line since 03/27/2008 07:47:52.
  The i386 processor operates at 2600 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 03/27/2008 08:00:38
  on-line since 03/27/2008 07:48:00.
  The i386 processor operates at 2600 MHz,
        and has an i387 compatible floating point processor.
: pearson FSS 3 $; 

So far so good. Obviously power now no longer works so this is running at full power all the time, which is less than ideal but the performance should be and so far is considerably better than the single 2.2GHz CPU it replaces.

With the exception of PowerNow which is not supported on this Dual Core CPU, Solaris works flawlessly as expected.

Tuesday Mar 25, 2008

Automatic opening a USB disk on Sun Ray

One of my users today had a bit of a hissy fit today when she plugged in her USB thumb drive into the Sun Ray and it did nothing. That is it did nothing visible. Behind the scenes the drive had been mounted somewhere but there was no realistic way she could know this.

So I need a way to get the file browser to open when the drive is inserted. A quick google finds " "USB Drive" daemon for Sun Ray sessions" which looks like the answer. The problem I have with this is that it polls to see if there is something mounted. Given my users never log out this would mean this running on average every second. Also the 5 second delay just does not take into account the attention span of a teenager.

There has to be a better way.

My solution is to use dtrace to see when the file system has been mounted and then run nautilus with that directory.

The great thing about Solaris 10 and later is that I can give the script just the privilege that allows it to run dtrace without handing out access to the world. Then of course you can then give that privilege away.

So I came up with this script. Save it. Mine is in /usr/local which in turn is a symbolic link to /tank/fs/local. Then add an entry to /etc/security/exec_attr, subsisting the correct absolute (ie one with no symbolic links in it) path in the line.

Basic Solaris User:solaris:cmd:::/tank/fs/local/bin/utmountd:privs=dtrace_kernel

This gives the script just enough privileges to allow it to work. It then drops the extra privilege so that when it runs nautilus it has no extra privileges.

Then you just have to arrange for users to run the script when they login using:

pfexec /usr/local/bin/utmountd

I have done this by creating a file called /etc/dt/config/Xsession.d/utmountd that contains these lines:


pfexec /usr/local/bin/utmountd &
trap "kill $!" EXIT

I leave making this work for uses of CDE as an exercise for the reader.

Tuesday Mar 18, 2008

When is a good idea to modify an underlying mirror?

Following on from “When to run fsck” and “When to run quotacheck” here is another:

When to modify the individual sub mirrors that make up a mirrored volume?

Answer: Never.

With the Logical volume manger in Solaris you can build a mirror from two sub mirrors:

# metastat d0
d0: Mirror
    Submirror 0: d10
      State: Okay         
    Submirror 1: d11
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 20482875 blocks (9.8 GB)

d10: Submirror of d0
    State: Okay         
    Size: 20482875 blocks (9.8 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c1d0s0          0     No            Okay   Yes 


d11: Submirror of d0
    State: Okay         
    Size: 20482875 blocks (9.8 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c5d0s0          0     No            Okay   Yes 


Device Relocation Information:
Device   Reloc  Device ID
c1d0   Yes      id1,cmdk@AST3320620AS=____________3QF09GL1
c5d0   Yes      id1,cmdk@AST3320620AS=____________3QF0A1QD
# 

So here we have the mirror “d0” made up of devices “d10” and “d11”. Each of these devices can be addressed in the file system as /dev/md/rdsk/d0 /dev/md/rdsk/d10 and /dev/md/rdsk/d11 respectively. The block devices are also available if you so desire. While being able to address the underlying disk devices that make up a mirror is interesting and potentially useful it is only useful if you really know what you are doing.

Reading from the mirrors is o.k. Writing and that includes just mounting the file system is not. So if the device is idle you can do:


# cmp /dev/md/rdsk/d10 /dev/md/rdsk/d11


#

Which if it returns 01 gives you a feeling of confidence, although if you are this paranoid, and I am, then ZFS is a much better bet.


For example if the mirror contains a file system then mounting one side of the mirror and making modifications is a really really bad idea, even if the mirror is unmounted. Once you have made such a modification you would have to make sure the other side of the mirror had exactly the same change at the block level propagated to it. Realistically the only way to achieve that is for you to detach the other mirror and then reattach it so allow it to resync. If you really know what you are doing there are tricks you could do but I suspect those that really know what they are doing would not get into this mess in the first place.



1 If it does not then you have to look at how the mirror was constructed before you start to worry. If you did “metainit d0 –m d10 d11” or have grown the metadevice then the mirrors will never have been brought into sync. So only the blocks that have been written to since the operation will correctly comapare. Hence this is nothing to worry about. See I told you you do really have to know what you are doing.

Tuesday Mar 11, 2008

zone copy, aka zcp.

After messing around with zones for a few minutes it became clear that it would be really useful if there was a zcp command that worked just like scp(1) but used zlogin as the transport rather than using ssh. For those cases when you are root and don't want to mess with ssh authorizations since you know you can zlogin without a password anyway.

Specifically I wanted to be able to do:

# zcp  /etc/resolv.conf bookable-129-156-208-37.uk:/etc

Well it turns out that this is really easy to do. The trick is to let scp(1) do the heavy lifting for you and use zlogin(1) act as your transport. So I knocked together this script. You need to install it on your path called “zcp” and then make a hard link in the same directory called “zsh”. For example:

# /usr/sfw/bin/wget --quiet http://blogs.sun.com/chrisg/resource/zcp.sh
# cp zcp.sh /usr/local/bin/zcp 
# ln /usr/local/bin/zcp /usr/local/bin/zsh
# chmod 755  /usr/local/bin/zsh

Now the glorious simplicity of zcp, I'll even trhow in recursvice copy for free:

# zcp -r /etc/inet bookable-129-156-208-37.uk:/tmp
ipqosconf.1.sample   100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  2503       00:00    
config.sample        100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  3204       00:00    
wanboot.conf.sample  100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  3312       00:00    
hosts                100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   286       00:00    
ipnodes              100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   286       00:00    
netmasks             100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   384       00:00    
networks             100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   372       00:00    
inetd.conf           100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  1519       00:00    
sock2path            100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   566       00:00    
protocols            100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  1901       00:00    
services             100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  4201       00:00    
mipagent.conf-sample 100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  6274       00:00    
mipagent.conf.fa-sam 100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  6232       00:00    
mipagent.conf.ha-sam 100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  5378       00:00    
ntp.client           100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   291       00:02    
ntp.server           100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  2809       00:00    
slp.conf.example     100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  5750       00:00    
ntp.conf             100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   155       00:00    
ntp.keys             100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   253       00:00    
inetd.conf.orig      100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  6961       00:00    
ntp.drift            100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|     6       00:00    
ipsecalgs            100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   920       00:00    
ike.preshared        100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   308       00:00    
ipseckeys.sample     100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   510       00:00    
datemsk.ndpd         100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|    22       00:00    
ipsecinit.sample     100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  2380       00:00    
ipaddrsel.conf       100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   545       00:00    
inetd.conf.preupgrad 100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  6563       00:00    
hosts.premerge       100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   112       00:00    
ipnodes.premerge     100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|    61       00:00    
hosts.postmerge      100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|   286       00:00    
ipqosconf.2.sample   100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  3115       00:00    
ipqosconf.3.sample   100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  1097       00:00    
# 

I'll file and RFE for this to go into Solaris and update this entry when I have the number.

Update: The Bug ID is 6673792. The script now also supports zsync and zdist although niether of those have been tested yet.

Wednesday Feb 27, 2008

Latency Bubbles follow up

Following on from the latency bubbles in your IO posting. I have been asked two questions about this post privately:

  1. How can you map those long numbers in the output into readable entries, eg sd0.

  2. How can I confirm that disksort has been turned off?

The first one just requires another glob of D:

##pragma D option quiet

#define SD_TO_DEVINFO(un) ((struct dev_info \*)((un)->un_sd->sd_dev))

#define DEV_NAME(un) \\
        stringof(`devnamesp[SD_TO_DEVINFO(un)->devi_major].dn_name) /\* ` \*/

#define DEV_INST(un) (SD_TO_DEVINFO(un)->devi_instance)


fbt:ssd:ssdstrategy:entry,
fbt:sd:sdstrategy:entry
{
        bstart[(struct buf \*)arg0] = timestamp;
}

fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ arg0 != 0 /
{
        this->buf = (struct buf \*)((struct scsi_pkt \*)arg0)->pkt_private;
}

fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ this->buf /
{
        this->priv = (struct sd_xbuf \*) this->buf->b_private;
}

fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ this->priv /
{
        this->un = this->priv->xb_un;
}

fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ this->buf && bstart[this->buf] && this->un /
{
        @l[DEV_NAME(this->un), DEV_INST(this->un)] =
                lquantize((timestamp - bstart[this->buf])/1000000, 0,
                60000, 60000);
        @q[DEV_NAME(this->un), DEV_INST(this->un)] =
                quantize((timestamp - bstart[this->buf])/1000000);
                bstart[this->buf] = 0;
}


The second required a little bit of mdb. Yes you can also get the same from dtrace mdb gives the the immediate answer, firstly for all the disks that use the sd driver and then for instance 1:

# echo '\*sd_state::walk softstate | ::print -at "struct sd_lun" un_f_disksort_disabled' | mdb -k
300000ad46b unsigned un_f_disksort_disabled = 0
60000e23f2b unsigned un_f_disksort_disabled = 0
# echo '\*sd_state::softstate 1 | ::print -at "struct sd_lun" un_f_disksort_disabled' | mdb -k
300000ad46b unsigned un_f_disksort_disabled = 0

Saturday Feb 09, 2008

cron code delivered.

At last I have handed over the cron changes to support different timezones to Darren who is sponsoring the effort. I've learned a lot in the process so far of trying to do this work from “outside” of Sun. Mostly that the time required to do even a very small project like this is very great and there are times when you can't just put it down if you are busy. This makes it very difficult when doing this in your own “spare” time and can lead to some spectacularly late nights. The other problems were around keeping a build system running at home. The sometimes long times between working on this resulted in considerable effort to keep up with the various flag days. I also had some tangles with mercurial that did not help.

The ARC case was quite painless even if there were elements of Bike Shed Syndrome in it with real dangers of even greater feature creep. Having actually experienced ARCs internally I was probably better prepared for this than a real external engineer.

I got some really great feed back during the code reviews which has resulted in a better end result.

Now I'm just sitting back and waiting.

Wednesday Dec 27, 2006

When to run quotacheck?

Not quite as often as seeing someone run fsck on a live UFS file system and then regretting it but often enough someone will run quotacheck on a live file system and be surprised by the results. As usual the clue is in the manual for quotacheck:

     quotacheck expects each file system to be checked to have  a
     quota  file  named  quotas in the root directory. If none is
     present, quotacheck will not check the file system.

     quotacheck accesses the character special device  in  calcu-
     lating  the  actual disk usage for each user. Thus, the file
     systems that are checked should be  quiescent  while  quota-
     check is running.


The first paragraph implies that the file system must be mounted (and it must). The second that it is inactive.


So when can you run quotacheck?


In single user mode. Mount the file system and then run it. If you are using UFS logging you should never need to run it if you manage your users correctly. That is to say if you create a users quota before they can create any file in the file system. If you want to retrospectively add quotas then you have to drop to single user, run quota check, then boot multi user.


Once you have quotas enabled and the system is up and running the kernel will keep track of the quotas so you don't need to check them and like the fsck case if you do check them you will just introduce a corruption.


Suddenly the ZFS model of a quota for a file system and a file system per user seems like a much better way.


Tags:

Thursday Jul 13, 2006

scsi.d update

I have added printing of the name of the executable and the process id that initiates an IO so that it is easier to see who is causing all those scsi commands to be sent.

Then for those who just have to have the raw bits to be happy, I have updated scsi.d to also dump out the raw cdb as well.

00000.627329400 fp5:-> 0x2a WRITE(10) address 00:00, lba 0x0001bfe9, len 0x000001, control 0x00 timeout 60 CDBP 600d7881d1c diskomizer64mpis(23849) cdb(10) 2a000001bfe900000100
00000.788444600 fp5:-> 0x2a WRITE(10) address 00:00, lba 0x00de6380, len 0x000010, control 0x00 timeout 60 CDBP 600a4282abc diskomizer64mpis(23847) cdb(10) 2a0000de638000001000

You can find the script here: http://blogs.sun.com/roller/resources/chrisg/scsi.d


Tags:

Friday Feb 10, 2006

scsi.d script

After some feedback about the format of the output from my Dtrace script for looking at SCSI io I how have added a timestamp which helps sorting the output. The output is now cleaner and hopefully clearer though does not fit on a 80 column screen.

00000.844267200 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 60031134488 SDB 60031134518
00000.844354400 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08
00000.848251440 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 6001dd1ba50 SDB 6001dd1bae0
00000.848310720 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 3001da270f8 SDB 3001da27188
00000.850371280 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08, reason 0x0 (COMPLETED) state 0x5f Time 6084us
00000.851151040 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 60031134488 SDB 60031134518, reason 0x0 (COMPLETED) state 0x5f Time 6927us
00000.853292800 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 3001da270f8 SDB 3001da27188, reason 0x0 (COMPLETED) state 0x5f Time 5014us
00000.854442400 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 6001dd1ba50 SDB 6001dd1bae0, reason 0x0 (COMPLETED) state 0x5f Time 6226us
00002.839392160 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 3001da263c0 SDB 3001da26450
00002.839482480 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 60002cb4538 SDB 60002cb45c8
00002.849052160 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 60002cb4538 SDB 60002cb45c8, reason 0x0 (COMPLETED) state 0x5f Time 9630us
00002.850171840 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 3001da263c0 SDB 3001da26450, reason 0x0 (COMPLETED) state 0x5f Time 10824us
00003.840019440 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08
00003.840110160 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 30014c0c780 SDB 30014c0c810
00003.846265280 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 30014c0c780 SDB 30014c0c810, reason 0x0 (COMPLETED) state 0x5f Time 6205us
00003.847439680 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08, reason 0x0 (COMPLETED) state 0x5f Time 7470us

Lots of “fun” games can be played with this, like the above shows that this system has target 0 and target 6 forming a mirror making isp1 a Single Point of failure. Although my favourite is this one:


While running


# dd if=/dev/rdsk/c0t8d0s2 of=/dev/null oseek=1024 iseek=$(( 16#1fffff )) count=2
2+0 records in
2+0 records out
#

I get the following trace:

Th

00001.971470332 qus1:-> 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a94f0 SDB 300016a9520
00001.972324082 qus1:<- 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a94f0 SDB 300016a9520, reason 0x0 (COMPLETED) state 0x17 Time 937us
00001.972433832 qus1:-> 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a9d90 SDB 300016a9dc0
00001.973217082 qus1:<- 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a9d90 SDB 300016a9dc0, reason 0x0 (COMPLETED) state 0x17 Time 826us
00001.973324748 qus1:-> 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000300, len 0x000024, control 0x00 timeout 60 CDB 300016a9380 SDB 300016a93b0
00001.976352165 qus1:<- 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000300, len 0x000024, control 0x00 timeout 60 CDB 300016a9380 SDB 300016a93b0, reason 0x0 (COMPLETED) state 0x5f Time 3070us
00001.976443415 qus1:-> 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000400, len 0x000024, control 0x00 timeout 60 CDB 300016a9ab0 SDB 300016a9ae0
00001.979359665 qus1:<- 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000400, len 0x000024, control 0x00 timeout 60 CDB 300016a9ab0 SDB 300016a9ae0, reason 0x0 (COMPLETED) state 0x5f Time 2959us
00001.979453248 qus1:-> 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a9c20 SDB 300016a9c50
00001.979814748 qus1:<- 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a9c20 SDB 300016a9c50, reason 0x0 (COMPLETED) state 0x5f Time 403us
00001.979898415 qus1:-> 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a90a0 SDB 300016a90d0
00001.980151165 qus1:<- 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a90a0 SDB 300016a90d0, reason 0x0 (COMPLETED) state 0x5f Time 294us
00001.980507332 qus1:-> 0x08 (  READ(6)) address 08:00, lba 0x001fffff, len 0x000001, control 0x00 timeout 60 CDB 300016a9660 SDB 300016a9690
00001.993267665 qus1:<- 0x08 (  READ(6)) address 08:00, lba 0x001fffff, len 0x000001, control 0x00 timeout 60 CDB 300016a9660 SDB 300016a9690, reason 0x0 (COMPLETED) state 0x5f Time 12804us
00001.993382498 qus1:-> 0x28 ( READ(10)) address 08:00, lba 0x00200000, len 0x000001, control 0x00 timeout 60 CDB 300016a9940 SDB 300016a9970
00001.999256915 qus1:<- 0x28 ( READ(10)) address 08:00, lba 0x00200000, len 0x000001, control 0x00 timeout 60 CDB 300016a9940 SDB 300016a9970, reason 0x0 (COMPLETED) state 0x5f Time 5921us


I like it has you see the transition from READ(6) to READ(10) as it moves from LBA 0x1fffff to 0x200000. Did I mention needing to get out more?


You can get the script here. Still do do is correct decoding of CDBs bigger than 10 bytes, which is not a problem for my current systems and more detailed decoding of CDBs that are not reads and writes.


Tags:

Friday Apr 01, 2005

When to run fsck

Not when the file system is mounted!

I've been banging my head with this one of an on for a few weeks. I got an email from an engineer who was talking to a customer (who are always right) saying that when they run fsck on a live file system it would report errors:

    # fsck /
    \*\* /dev/vx/rdsk/rootvol
    \*\* Currently Mounted on /
    \*\* Phase 1 - Check Blocks and Sizes
    \*\* Phase 2 - Check Pathnames
    \*\* Phase 3 - Check Connectivity
    UNREF DIRECTORY I=5522736 OWNER=root MODE=40755
    SIZE=512 MTIME=Mar 31 13:07 2005
    CLEAR? y

    \*\* Phase 4 - Check Reference Counts
    \*\* Phase 5 - Check Cyl groups

    67265 files, 1771351 used, 68625795 free (14451 frags, 8576418 blocks, 0.0% fragmentation)

    \*\*\*\*\* FILE SYSTEM WAS MODIFIED \*\*\*\*\*

I kept telling them that running fsck on a live file system can and probably will generate these “errors”. The kernel's in memory copy of the file system is correct and eventually it will bring the on disk copy back in line. However by answering yes they have now corrupted the on disk copy of the file system and to make things worse the kernel does not know this so may not run fsck when the system boots. The warnings section of the fsck and fsck_ufs manual pages gives you a hint that this is a bad thing to do.

The reason they were running fsck was to check the consistency of the file system prior to adding a patch. The right way to do that would be to run pkgchk.

There are times when it is safe to run fsck on live file system, but they are rare and involve lockfs but before you do make sure you really understand what you are doing, my bet is that if you do know, you won't really want to.

I believe the message is now understood by all involved but I'm trying to make sure by adding it to the blog sphere.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today