Wednesday Sep 24, 2008

Decoding NFS v2 and v3 file handles.

This entry has been sitting in my draft queue for over a year mainly as it is no longer be relevant as NFSv4 should have rendered the script useless. The rest of this entry refers to NFSv2 and NFSv3 filehandles only.

How can you decode an NFS filehandle?

NFS file handles are opaque so only the server who hands them out can draw firm conclusions from them. However since the implementation in SunOS has not changed it is possible to write a script that will turn a file handle that has been handed out by a server running Solaris into an inode number and device. Hence way back when I wrote that script and only today someone made good use of it so here it is for everyone.

The script has not been touched in over 10 years until I added the CDDL but should still be able to understand messages files and snoop -v output and then decode the file handles.


This snoop was taken while accessing a the file “passwd” that was in /export/home on the server:


: s4u-10-gmp03.eu TS 19 $; /usr/sbin/snoop -p 3,3 -i /tmp/snoop.cg13442 -v |  decodefh | grep NFS
RPC:  Program = 100003 (NFS), version = 3, procedure = 4
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 4 (Check access permission)
NFS:  File handle = [8CB2]
NFS:   0080000000000002000A000000019DAC03419521000A000000019DA96E637436
decodefh: SunOS NFS server file handle decodes as: maj=32,min=0, inode=105900
NFS:  Access bits = 0x0000002d
NFS:    .... ...1 = Read
NFS:    .... ..0. = (no lookup)
NFS:    .... .1.. = Modify
NFS:    .... 1... = Extend
NFS:    ...0 .... = (no delete)
NFS:    ..1. .... = Execute
NFS:  

Now taking this information to the server you need to find the file system that is shared and has major number 32 and minor number 0 and then look for the file with the inode number 105900 :


# share
-               /export/home   rw   ""  
# df /export/home
/                  (/dev/dsk/c0t0d0s0 ):13091934 blocks   894926 files
# ls -lL /dev/dsk/c0t0d0s0
brw-r-----   1 root     sys       32,  0 Aug 22 15:11 /dev/dsk/c0t0d0s0
# find /export/home -inum 105900
/export/home/passwd
# 

Clearly this is a trivial example but you get the idea.

The script also understands messages files:

$ grep 'nfs:.\*702911' /var/adm/messages | head -2 | decodefh          
Sep 21 03:14:34 vi64-netrax4450a-gmp03 nfs: [ID 702911 kern.notice] (file handle: d41cd448 a3dd9683 a00 2040000 1000000 a00 2000000 2000000)
decodefh: SunOS NFS server file handle decodes as: maj=13575,min=54344, inode=33816576
Sep 21 08:34:11 vi64-netrax4450a-gmp03 nfs: [ID 702911 kern.notice] (file handle: d41cd448 a3dd9683 a00 2040000 1000000 a00 2000000 2000000)
decodefh: SunOS NFS server file handle decodes as: maj=13575,min=54344, inode=33816576
$ 

and finally can take the file handle from the command line:


$ decodefh 0080000000000002000A000000019DAC03419521000A000000019DA96E637436   
0080000000000002000A000000019DAC03419521000A000000019DA96E637436
decodefh: SunOS NFS server file handle decodes as: maj=32,min=0, inode=105900
$ 

So here is the script: http://blogs.sun.com/chrisg/resource/decodefh.sh

Remember this will only work for filehandles generated by NFS servers running Solaris and only for NFS versions 2 & 3. It is possible that the format could change in the future but at the time of writing and for the last 13 years it has been stable.

Tuesday May 22, 2007

Sun Ray firmware version

Peter gives an undocumented way to find out the firmware version of your Sun Ray. I have three problems with this.

  1. It uses an undocumented interface which therefore is likely to change

  2. It does not work for me.

  3. It involved Cut'n'Paste when we have nawk out there.

Here is how I would do the same. I look forward to someone from thinkthin1 to tell me the correct way:

/opt/SUNWut/bin/utwho -c | nawk '$3 == ENVIRON["LOGNAME"] { 
        system(sprintf("/opt/SUNWut/sbin/utquery %s\\n", $4)) 
}'

Should I worry that I can type this kind of thing into the command line?


1I know I am down as a thinkthin author. It is a vanity thing. I was invited to be an author and accepted the invitation before thinking it through. I use Sun Ray. I love Sun Ray but what I do day to day, or try to do day to day is write about things I am expert in, therefore following 50% of the advice I was given when I started blogging. The thing is I don't claim to be expert on Sun Ray. I know enough to be dangerous so have never felt the urge to post a “Sun Ray” specific expert article.

Tuesday Mar 20, 2007

mdb pipes.

Sometimes I find my self doing things that I have been doing for years and just wonder whether by now there is some tool that I have missed that makes the old way of doing something redundant. So in the off chance that I have missed something I'll document the way I often drive mdb (or more often mdb+ a slightly improved mdb that I keep wishing would get open sourced) in the hope some bright spark will point me at a better way. Failing that someone might find this useful.

When looking at crash dumps I often want to process some data and then pipe it back into mdb via some text processing tool. Now given that the startup time for mdb, particularly when running under kenv is very significant you don't want to be doing the obvious of having mdb in a typical pipeline. So while I can do this:

nawk 'BEGIN { printf("::load chain\\n") }
/buf addr/ {
        printf("%s::print -at buf_t av_back | ::if scsi_pkt pkt_comp == ssdintr and pkt_address.a_hba_tran->tran_tgt_init == ssfcp_scsi_tgt_init |::print -at scsi_pkt  pkt_ha_private | ::print -at ssfcp_pkt cmd_flags cmd_timeout cmd_next\\n",
        $3) }'  act.0 | kenv -x explorer_dir mdb+ 0

It is a pain as kenv processes the explorer to build the correct environment and for large dumps loads the dump into memory only to throw it away.

So instead I start mdb as a cooperating process in the korn shell:

kenv -x explorer_dir mdb+ |&

Then I have a shell function called “mdbc” that will submit commands into the cooperating process and read the results back. So the above becomes:

nawk 'BEGIN { printf("::load chain\\n") }
/buf addr/ {
        printf("%s::print -at buf_t av_back | ::if scsi_pkt pkt_comp == ssdintr and pkt_address.a_hba_tran->tran_tgt_init == ssfcp_scsi_tgt_init |::print -at scsi_pkt  pkt_ha_private | ::print -at ssfcp_pkt cmd_flags cmd_timeout cmd_next\\n",
        $3) }'  act.0 | mdbc

or I can do

mdbc lbolt::print

Just by way of an example to show why I bother, compare the times of these two equivalent commands:

: dredd TS 243 $; time echo  lbolt::print | kenv -x explorer mdb+ 0

real    0m37.03s
user    0m7.02s
sys     0m7.26s
: dredd TS 244 $; time mdbc lbolt::print > /dev/null                                                              

real    0m0.01s
user    0m0.00s
sys     0m0.01s
: dredd TS 245 $; 


and just to show that I get the right results from the mdbc command.

: dredd TS 245 $; time mdbc lbolt::print            
0x496619a9

real    0m0.01s
user    0m0.00s
sys     0m0.01s
: dredd TS 246 $; 

However like talk it does have a 1980s feel to it so I look forward to hearing the error of my ways.

If you think you might find it useful the shell function is here.

Tags:

Thursday Nov 23, 2006

A faster ZFS snapshot massacre

I moved the zfs snapshot script into the office and started running it on our build system. Being a cautious type when it comes to other people's data I ran the clean up script in “do nothing” mode so I could be sure it was not cleaning snapshots that it should not. After a while running like this we had over 150,000 snapshots of 114 file systems which meant that zfs list was now taking a long time to run.

So long in fact that the clean up script was not actually making forward progress against snapshots being created every 10 minutes. So I now have a new clean up script. This is functionally identical to the old one but a lot faster. Unfortunately I have now cleaned out the snapshots so the times are not what they were, zfs list was taking 14 minutes, however the difference is still easy to see.

When run with the option to do nothing the old script:

# time /root/zfs_snap_clean > /tmp/zfsd2

real    2m23.32s
user    0m21.79s
sys     1m1.58s
#

And the new:

# time ./zfs_cleanup -n > /tmp/zfsd

real    0m7.88s
user    0m2.40s
sys     0m4.75s
#

which is a result.


As you can see the new script is mostly a nawk script and more importantly only calls the zfs command once to get all the information about the snapshots:


#!/bin/ksh -p
#
# Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License, Version 1.0 only
# (the "License").  You may not use this file except in compliance
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#	Script to clean up snapshots created by the script from this blog
#	entry:
#
#	http://blogs.sun.com/chrisg/entry/cleaning_up_zfs_snapshots
#
#	or using the command given in this entry to create snapshots when
#	users mount a file system using SAMBA:
#
#	http://blogs.sun.com/chrisg/entry/samba_meets_zfs
#
#	Chris.Gerhard@sun.com 23/11/2006
#

PATH=$PATH:$(dirname $0)

while getopts n c
do
	case $c in
	n) DO_NOTHING=1 ;;
	\\?) echo "$0 [-n] [filesystems]"
		exit 1 ;;
	esac
done
shift $(($OPTIND - 1))
if (( $# == 0))
then
	set - $(zpool list -Ho name)
fi


export NUMBER_OF_SNAPSHOTS_boot=${NUMBER_OF_SNAPSHOTS:-10}
export DAYS_TO_KEEP_boot=${DAYS_TO_KEEP:-365}

export NUMBER_OF_SNAPSHOTS_smb=${NUMBER_OF_SNAPSHOTS:-100}
export DAYS_TO_KEEP_smb=${DAYS_TO_KEEP:-14}

export NUMBER_OF_SNAPSHOTS_month=${NUMBER_OF_SNAPSHOTS:-24}
export DAYS_TO_KEEP_month=365

export NUMBER_OF_SNAPSHOTS_day=${NUMBER_OF_SNAPSHOTS:-$((28 \* 2))}
export DAYS_TO_KEEP_day=${DAYS_TO_KEEP:-28}

export NUMBER_OF_SNAPSHOTS_hour=$((7 \* 24 \* 2))
export DAYS_TO_KEEP_hour=$((7))

export NUMBER_OF_SNAPSHOTS_minute=$((100))
export DAYS_TO_KEEP_minute=$((1))


zfs get -Hrpo name,value creation $@ | sort -r -n -k 2 |\\
	nawk -v now=$(convert2secs $(date)) -v do_nothing=${DO_NOTHING:-0} '
function ttg(time)
{
	return (now - (time \* 24 \* 60 \* 60));
}
BEGIN {
	time_to_go["smb"]=ttg(ENVIRON["DAYS_TO_KEEP_smb"]);
	time_to_go["boot"]=ttg(ENVIRON["DAYS_TO_KEEP_boot"]);
	time_to_go["minute"]=ttg(ENVIRON["DAYS_TO_KEEP_minute"]);
	time_to_go["hour"]=ttg(ENVIRON["DAYS_TO_KEEP_hour"]);
	time_to_go["day"]=ttg(ENVIRON["DAYS_TO_KEEP_day"]);
	time_to_go["month"]=ttg(ENVIRON["DAYS_TO_KEEP_month"]);
	number_of_snapshots["smb"]=ENVIRON["NUMBER_OF_SNAPSHOTS_smb"];
	number_of_snapshots["boot"]=ENVIRON["NUMBER_OF_SNAPSHOTS_boot"];
	number_of_snapshots["minute"]=ENVIRON["NUMBER_OF_SNAPSHOTS_minute"];
	number_of_snapshots["hour"]=ENVIRON["NUMBER_OF_SNAPSHOTS_hour"];
	number_of_snapshots["day"]=ENVIRON["NUMBER_OF_SNAPSHOTS_day"];
	number_of_snapshots["month"]=ENVIRON["NUMBER_OF_SNAPSHOTS_month"];
} 
/.\*@.\*/ { 
	split($1, a, "@");
	split(a[2], b, "_");
	if (number_of_snapshots[b[1]] != 0 &&
		++snap_count[a[1], b[1]] > number_of_snapshots[b[1]] &&
		time_to_go[b[1]] > $2) {
		str= sprintf("zfs destroy %s\\n", $1);
		printf(str);
		if (do_nothing == 0) {
			system(str);
		}
	}
}'

Tags:

Saturday Nov 11, 2006

ZFS snapshot massacre.

As the number of snapshots grows I started wondering how much space they are really taking up on the home server. This is pretty much also shows how much data gets modified after being initially created. I would guess not much as the majority of the data on the server would be:

  1. Solaris install images. Essentially read only.

  2. Photographs.

  3. Music mostly in the form of iTunes directories.

Running this command line get the result:

zfs get -rHp used $(zpool list -H -o name ) |\\
nawk '/@/ && $2 == "used" { tot++; total_space+=$3 ;\\
        if ( $3 == 0 ) { empty++ }} \\
END { printf("%d snapshots\\n%d empty snapshots\\n%2.2f G in %d snapshots\\n", tot, \\
        empty, total_space/(1024\^3), tot - empty ) }'
68239 snapshots
63414 empty snapshots
2.13 G in 4825 snapshots'
: pearson TS 15 $; zfs get used $(zpool list -H -o name )
NAME  PROPERTY  VALUE  SOURCE
tank  used      91.2G  -
: pearson TS 16 $;

So I only have 2.13G of data saved in snapshots out of 91.2 G of data. Not really a surprising result. The biggest user of space for snapshots is one file system. The one that contains planetcycling.org. As the planet gets updated every 30 minutes and the data is only indirectly controlled by me I'm not shocked by this. I would expect the amount to stabilize over time as the system and to that end I will note the current usage:


zfs get -rHp used tank/fs/web|\\
nawk '/@/ && $2 == "used" { tot++; total_space+=$3 ;\\
        if ( $3 == 0 ) { empty++ }} \\
END { printf("%d snapshots\\n%d empty snapshots\\n%2.2f G in %d snapshots\\n", tot,
        empty, total_space/(1024\^3), tot - empty ) }'
1436 snapshots
789 empty snapshots
0.98 G in 647 snapshots

All this caused me to look a bit harder at the zfs_snapshot_clean script I have as it appeared to be keeping some really old snapshots from some of the classes that I did not expect. Now while the 68000 snapshots were having no negative impact on the running of the system it it was not right. There were two issues. First it was sorting the list of snapshots using the snapshot creation time, which was correct, but it was sorting in reverse order which was not. Secondly I was keeping a lot more of the hourly snapshots than I intended.


After fixing this and running the script (you can download it from here) there was a bit of a snapshot massacre leading to a lot less snapshots:


zfs get -rHp used $(zpool list -H -o name ) |\\
nawk '/@/ && $2 == "used" { tot++; total_space+=$3 ;\\
        if ( $3 == 0 ) { empty++ }} \\
END { printf("%d snapshots\\n%d empty snapshots\\n%2.2f G in %d snapshots\\n", tot, \\
        empty, total_space/(1024\^3), tot - empty ) }'
25512 snapshots
23445 empty snapshots
2.20 G in 2067 snapshots

Only 25000 snapshots, much better, most of them remain empty.

Tags:

Friday Aug 12, 2005

coreadm and %d

Coreadm(1m) has a really nice feature that can be used to restrict coredumps that are collected to just the one from the system. Typically I'm not interested in a global core file from and application that someone is developing so I would like to limit the cores to just those created by programs that are delivered as part of solaris.

So if you use %d as part of the pattern it will expand to the name of the directory where the binary lives. If the directory does not exist then no core file.

So in this example if /usr/bin/ls were to dump core (not that it does) using this setup:

# coreadm
     global core file pattern: /var/cores/%d/%f.%p.%u
     global core file content: default
       init core file pattern: core
       init core file content: default
            global core dumps: enabled
       per-process core dumps: enabled
      global setid core dumps: disabled
 per-process setid core dumps: disabled
     global core dump logging: enabled
#

The global core only gets dumped if /var/cores/usr/bin exists.

The down side of this is that not all the executables you may be interested in are in /usr/bin or other obvious locations so they could fail to dump core as the directory does not yet exist.

So I knocked together this little script to create all the directories that I will ever need:

nawk '$2 == "f" { printf("file %s\\n", $1) }'  /var/sadm/install/contents |\\
ksh -p |\\
nawk 'function dirname(path) {
        len= length(path);
        for (n = 1 ; index(substr(path, len - (1+ n), n), "/") == 0; n++) ;

        return(substr(path, 1,  len - (2+ n)));
}
/ELF.\*executable/ {
        x[dirname($1)]=1
}
END {
        for (y in x ) {
                printf("test -d /var/cores%s || mkdir -p /var/cores%s\\n",
                        y, y)
        }
}' | ksh -x

My suspicion is that there is probably a better way. So let me know.

Tags:

Thursday May 26, 2005

Todays quick question

When running find down a single file system how can you exclude a particular directory?

For example you wish to search /var but do not want to search /var/spool/mqueue. The problem here is that if you do the obvious:

find /var \\( -name mqueue -prune \\) -mount -print

the find will stop on any file called “mqueue” and not just /var/spool/mqueue.

The solution is to use the -inum option to find:


find /var \\( -inum $(ls -di /var/spool/mqueue | nawk '{ print $1 }' ) -prune \\) -mount -print

There must be a better way, so let me know what is is.

Tag:

Thursday May 12, 2005

du -sh to du -sk

This mornings wake up problem in the email was this. Given a file that contains the output of “du -sh” for a number of files how can you sort the output based on the size of the files.


Simplest solution is to not use “du -sh” but use “du -sk” and then pipe it to sort(1). However this du could take a while to run so it might just be quicker to use this nawk script (sadly I typed this straight in):


nawk '{
        if ($1 ~ ".\*G") {
                sub("G", "", $1); $1\*=1024\*1024
        } else {
                if ($1 ~ ".\*M") {
                        sub("M", "", $1); $1\*=1024
                } else {
                        if ($1 ~ ".\*K") {
                                sub("K", "", $1)
                        }
                }
        }
        printf("%dK %s\\n", $1, $2)
}'  /var/tmp/input_file | sort -n

Which takes this:


 1.1G   /export/home/.dh/user1
 1.1G   /export/home/.dh/user2
   4K   /export/home/.dh/user3
 4.0G   /export/home/.dh/user4
 7.7G   /export/home/.dh/user5
  12K   /export/home/.dh/user6
  12K   /export/home/.dh/user7
  48M   /export/home/.dh/user8
 102M   /export/home/.dh/user9
 130M   /export/home/.dh/user10
 519M   /export/home/.dh/user11

And gives you this:


4K /export/home/.dh/user3
12K /export/home/.dh/user6
12K /export/home/.dh/user7
49152K /export/home/.dh/user8
104448K /export/home/.dh/user9
133120K /export/home/.dh/user10
531456K /export/home/.dh/user11
1153433K /export/home/.dh/user1
1153433K /export/home/.dh/user2
4194304K /export/home/.dh/user4
8074035K /export/home/.dh/user5

I leave the conversion back into human readable form as an exercise for the reader.

Tag: , awk

Tuesday Apr 12, 2005

More group aggregation

Our internal departmental blog aggregator is up and running thanks to the planetplanet.org software, a twiki running on the Sun webserver and a vers short nawk script. I know it should have been perl, but as is so often the case nwak will do. Specifically this nawk will do:

nawk '$0 == "%META:TOPICPARENT{name=\\"TWikiUsers\\"}%" {
        user=1
        next
}
$0 ~ "\^%META:TOPICPARENT" {
        user=0
}
$1 == "\*" && $2 == "Name:" {
        sub("[ \\t]\*\\\\\*[ \\t]Name:[ \\t]","");
        n=$0
}
user && /\\\* PlanetPts [Ff]eed/  {
        printf("\\n[%s]\\nname = %s\\n", $NF, n)}' 

Users just have to put an entry in there home page on the twiki, the nawk script sucks those entries out and writes the config file for the planet software that then aggregates them all. They then all get displayed within the same twiki thanks to the %INCLUDE% variable so that they appear to be part of the main web.

I'm sure there are better ways to do this and with a bit more infrastructure help it would be cool to be able to register your blog feed in LDAP and then have an aggregator use that to produce planets based on reporting structures. However for us I think this will do for now.

Friday Apr 01, 2005

grep piped into awk

This has been one of my bug bears for years and it comes up in email all the time. I blame it on the VAX I used years ago that was so slow that this really mattered at the time but now, mostly it is just because it is wrong. This is what a mean, grep piped into awk:

grep foo | awk '{ print $1 }'



Why? Because awk can do it all for you, saving a pipe and a for and exec:

nawk '/foo/ { print $1 }' 



is exactly the same, I use nawk as it is just better. It gets worse when I see:

grep foo | grep -v bar | awk '{ print $1 }'

which should be:

nawk '/foo/ && ! /bar/ { print $1 }'

Mostly these just come about when typed on the command line but when I see them in scripts it just caused me to roll my eyes. They lead me to come up with these short rules for pipes:

If your pipe line contains

Use

grep and awk

nawk

grep and sed

nawk

awk and sed

nawk

awk sed and grep

nawk



Like the pirates code these are not so much rules as guide lines.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today