Wednesday Sep 24, 2008

Decoding NFS v2 and v3 file handles.

This entry has been sitting in my draft queue for over a year mainly as it is no longer be relevant as NFSv4 should have rendered the script useless. The rest of this entry refers to NFSv2 and NFSv3 filehandles only.

How can you decode an NFS filehandle?

NFS file handles are opaque so only the server who hands them out can draw firm conclusions from them. However since the implementation in SunOS has not changed it is possible to write a script that will turn a file handle that has been handed out by a server running Solaris into an inode number and device. Hence way back when I wrote that script and only today someone made good use of it so here it is for everyone.

The script has not been touched in over 10 years until I added the CDDL but should still be able to understand messages files and snoop -v output and then decode the file handles.


This snoop was taken while accessing a the file “passwd” that was in /export/home on the server:


: s4u-10-gmp03.eu TS 19 $; /usr/sbin/snoop -p 3,3 -i /tmp/snoop.cg13442 -v |  decodefh | grep NFS
RPC:  Program = 100003 (NFS), version = 3, procedure = 4
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 4 (Check access permission)
NFS:  File handle = [8CB2]
NFS:   0080000000000002000A000000019DAC03419521000A000000019DA96E637436
decodefh: SunOS NFS server file handle decodes as: maj=32,min=0, inode=105900
NFS:  Access bits = 0x0000002d
NFS:    .... ...1 = Read
NFS:    .... ..0. = (no lookup)
NFS:    .... .1.. = Modify
NFS:    .... 1... = Extend
NFS:    ...0 .... = (no delete)
NFS:    ..1. .... = Execute
NFS:  

Now taking this information to the server you need to find the file system that is shared and has major number 32 and minor number 0 and then look for the file with the inode number 105900 :


# share
-               /export/home   rw   ""  
# df /export/home
/                  (/dev/dsk/c0t0d0s0 ):13091934 blocks   894926 files
# ls -lL /dev/dsk/c0t0d0s0
brw-r-----   1 root     sys       32,  0 Aug 22 15:11 /dev/dsk/c0t0d0s0
# find /export/home -inum 105900
/export/home/passwd
# 

Clearly this is a trivial example but you get the idea.

The script also understands messages files:

$ grep 'nfs:.\*702911' /var/adm/messages | head -2 | decodefh          
Sep 21 03:14:34 vi64-netrax4450a-gmp03 nfs: [ID 702911 kern.notice] (file handle: d41cd448 a3dd9683 a00 2040000 1000000 a00 2000000 2000000)
decodefh: SunOS NFS server file handle decodes as: maj=13575,min=54344, inode=33816576
Sep 21 08:34:11 vi64-netrax4450a-gmp03 nfs: [ID 702911 kern.notice] (file handle: d41cd448 a3dd9683 a00 2040000 1000000 a00 2000000 2000000)
decodefh: SunOS NFS server file handle decodes as: maj=13575,min=54344, inode=33816576
$ 

and finally can take the file handle from the command line:


$ decodefh 0080000000000002000A000000019DAC03419521000A000000019DA96E637436   
0080000000000002000A000000019DAC03419521000A000000019DA96E637436
decodefh: SunOS NFS server file handle decodes as: maj=32,min=0, inode=105900
$ 

So here is the script: http://blogs.sun.com/chrisg/resource/decodefh.sh

Remember this will only work for filehandles generated by NFS servers running Solaris and only for NFS versions 2 & 3. It is possible that the format could change in the future but at the time of writing and for the last 13 years it has been stable.

Wednesday May 21, 2008

Using mirror mounts to get a better /net

One problem with the automounter is that when you use the /net mount points to mount a server if the admin on that server adds a share then you client won't see that share until the automounter timesout the mount. This obviously requires that the mounts are unused which for a large nfs server could never happen.

So given an NFS server host called sa64-zfs-gmp03.eu which is sharing a directory /newpool/cjg on a client you can do:

#  ls /net/sa64-zfs-gmp03.eu/newpool
cjg
#  ls /net/sa64-zfs-gmp03.eu/newpool/cjg
SPImage         ipmiLog         ppcenv          sel.bin         tmp
SPValueAdd      mcCpu0Core0Log  processLog      summaryLog
evLog           mcCpu1Core0Log  prsLog          swLog
hwLog           mcCpu2Core0Log  pstore          tdulog.tar
# cd  /net/sa64-zfs-gmp03.eu/newpool/cjg
# ls
SPImage         ipmiLog         ppcenv          sel.bin         tmp
SPValueAdd      mcCpu0Core0Log  processLog      summaryLog
evLog           mcCpu1Core0Log  prsLog          swLog
hwLog           mcCpu2Core0Log  pstore          tdulog.tar

However if at this point on the server you create and share a new file system:

# zfs create -o sharenfs=rw newpool/cjg2
# share
-@newpool/cjg   /newpool/cjg   rw   ""  
-@newpool/cjg2  /newpool/cjg2   rw   ""  
# echo foo > /newpool/cjg2/file
# 

You can't now directly access it on the client:

# ls /net/sa64-zfs-gmp03.eu/newpool/cjg2
/net/sa64-zfs-gmp03.eu/newpool/cjg2: No such file or directory
#

Now we all know you can work around this by using aliases for the server or even different capitalization:

# ls /net/SA64-zfs-gmp03.eu/newpool/cjg2
file
# 

however lots of users just won't buy that and I don't blame them.

With the advent or mirror mounts to NFSv4 you can do a lot better and there is an RFE (4107375) for the automounter to do this for you, which looks like it would be simple on a client that can do mirror mounts but until that is done here is a work-around. Create a file “/etc/auto_mirror “that contains this line:

\* &:/

Then add this line to auto_master:

/mirror auto_mirror  -nosuid,nobrowse,vers=4

or add a new key to an existing automount table:

: s4u-nv-gmp03.eu TS 50 $; nismatch mirror auto_share
mirror / -fstype=autofs,nosuid,nobrowse auto_mirror.org_dir.cte.sun.com.
: s4u-nv-gmp03.eu TS 51 $; 

Now if we do the same test this time replacing the “/net” path with the “/mirror” path you get:

# ls /mirror/sa64-zfs-gmp03.eu/newpool/
cjg
# ls /mirror/sa64-zfs-gmp03.eu/newpool/cjg
SPImage         ipmiLog         ppcenv          sel.bin         tmp
SPValueAdd      mcCpu0Core0Log  processLog      summaryLog
evLog           mcCpu1Core0Log  prsLog          swLog
hwLog           mcCpu2Core0Log  pstore          tdulog.tar
# (cd /mirror/sa64-zfs-gmp03.eu/newpool/cjg ; sleep 1000000) &
[1]     10455
# ls /mirror/sa64-zfs-gmp03.eu/newpool/cjg2
/mirror/sa64-zfs-gmp03.eu/newpool/cjg2: No such file or directory

Here I created the new file system on the server and put the file in.

# ls /mirror/sa64-zfs-gmp03.eu/newpool/cjg2
file
# 

If you are an entirely NFSv4 shop then you could change the “/net” mount point to use this.

Wednesday Jul 04, 2007

First use of sharemgr

The NFS server that serves our build environment also serves out some legacy UFS/SVM filesystems via the more traditional method. Or at least it used to. Today I wanted to read the contents of the directory via NFS and I gor permission denied. A bit of digging showed that it no longer shared globally with read only access, a bit more and I discovered that it was now being shared using sharemgr(1M) and not in the default share:

So instead of editing the /etc/dfs/dfstab file I can now change the shares for the entire group with one command. The problem I had was figuring exactly what that command was (the bit that took some figuring out was that I needed the -S sys option as we are using the “no security at all” AUTH_SYS on this share which given what it contains is not unreasonable).

# sharemgr list -v

cdfs    enabled nfs
default enabled nfs
zfs     enabled nfs
# sharemgr show -v cdfs 
cdfs
          /cdbuild/images/temp
          /cdbuild/builds
          /cdbuild/images/stored
          /cdbuild/cdfs
# sharemgr show -p cdfs 
cdfs nfs=()
        /cdbuild/images/temp     nfs:sys=(rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM" root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM")
        /cdbuild/builds  nfs:sys=(rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM" root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM")
        /cdbuild/images/stored   nfs:sys=(root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM" rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM")
        /cdbuild/cdfs    nfs:sys=(root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM" rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM")
# 
# sharemgr set -P nfs -S sys -p ro= cdfs
# sharemgr show -p cdfs      

cdfs nfs=() nfs:sys=(ro="")
        /cdbuild/images/temp     nfs=() nfs:sys=(ro="" rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM" root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM")
        /cdbuild/builds  nfs=() nfs:sys=(ro="" rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM" root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM")
        /cdbuild/images/stored   nfs=() nfs:sys=(ro="" root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM" rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM:pts-cdrw:pts-cdrw.UK.Sun.COM")
        /cdbuild/cdfs    nfs=() nfs:sys=(ro="" root="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM" rw="stomper:stomper.UK.Sun.COM:dvdrhost:dvdrhost.UK.Sun.COM:dvdrhost2:dvdrhost2.UK.Sun.COM")
#

How cool is that to be able to change the share options on four file systems with just one command. No more faffing around with an editor trying to do global edits on a file or generating the file from a database.

Thursday Apr 19, 2007

NFS futures at LOSUG

We were privileged to have the legendary Calum Mackay talk at the London Open Solaris User Group last night on the topic of NFS futures. Including everything that is in the up coming NFS v4.1 specifications:

  • Parallel NFS aka pNFS

  • Directory delegations

  • Sessions.

He also covered the rest of what is going on with the NFS v4.0 work. In particular the namespace work that he has been doing which will provide Mirror mount support and Referrals.

Mirror mounts will change the way a client behaves when it encounters a directory that is a mount point for another file system on the server. For example given 2 file systems:

/export/home/cjg and /export/home/cjg/Documents

Currently if you mount /export/home/cjg from a server using NFS v2 or v3 on an NFS client and look in the Documents directory you should see an empty directory which if you then write into can cause loads of confusion and potentially more serious consequences.

However with NFSv4 & mirror mounts now you see something different. The client would automatically mount the sub file systems without recourse to the automounter. This is kind of cool as the layout I describe above is exactly what I want for my home directory. That way when gnome or firefox or thunderbird go pop and corrupt their start up file I can roll back to the last snapshot without it messing up my data that is in Documents.

Referrals have the potential to be as useful and I suspect also the potential to be as dangerous as symbolic links. They allow you to move a file system onto another server and yet the client application can continue to access the original path. The client kernel gets sent a referral and mounts the file system from the new location.

All in all an excellent evening.

Tags:

Saturday Dec 23, 2006

Build 55 UK keyboard oddity

Build 55 hit my laptop yesterday and the first thing I noticed was that the keyboard type was all wrong. It seemed to think this was a US keyboard which makes life hard thanks to not being able to find the “|” or for that matter the quotes. More to the point it was going to make the thank you letter to my boss saying “Thank you for the £100,000 bonus” really hard to write (I can dream).

A bit of digging and it was cured by setting the “keyboard-layout” setting in the eeprom (which is not really an eeprom at all on x86):

#  eeprom keyboard-layout=UK-English

It was a shame it was not a completely seamless upgrade as I was feeling quite pleased with myself as I had upgraded the home server and remembered to update the /etc/apache2/extra/httpd-vhosts.conf file before activating the new BE, only to discover that I also had to edit the httpd-ssl.conf file in the same directory to disable all thoughts of SSL. Then on the laptop I remembered to put the /etc/X11/gdm/custom.conf file in place before switching so that there is a Shutdown button on the launch menu so had it not been for the keyboard layout this would have been a perfect upgrade.


However I did forget to rescue the my nfsmapid and it's config files. Now that is done NFS is happy again. I have actually found things that break without it (as oppose to just look broken). Using ACLs would leave me unable to read directories that I should be able to (it contains things that due to Christmas should not be read by one person).


Tags:


Monday Dec 11, 2006

Slightly better nfs v4 support for nomadic systems

My laptop wings it's way both physically and virtually between work and home on a regular basis and in both places the network file system of choice is NFS. However the admins of the two places have not agreed what my login name is one being impersonal and numerical and the other not. The admin at home refuses to change and I'm past asking for such a thing from the admins at work.

Mostly this is not a problem beyond my fingers typing the wrong thing depending on the host I happen to be using. However there is one area where is is a right royal pain and that is NFS.

With NFS v3 and v2 you just had to make sure the numerical user ID was the same and NFS would work. The admin at home grumbled about this but the practical impact one the day. My UID at home matches my UID at work.

With NFS v4 this is no longer enough. NFS v4 passes the owner of an object not as a number but as a string of the form:

user@nfsmapid_domain

Where the nfsmapid_domain is known to nfsmapid(1M). The nfsmapid converts your UID into your login name and generates the string using that and the domain.

The problem with this is that my laptop, which since work owns it uses my work login name, when the server passes my home login name to the laptop the laptop does not understand that and converts the ownership of the file to nobody:

: principia IA 85 $; ls -la
total 10
drwxr-x--x   2 nobody   staff          2 Dec 11 12:30 .
drwxr-xr-x  51 nobody   sys           92 Dec 11 12:30 ..
: principia IA 86 $; 

Which is irritating (although oddly files created on the client in the directory while appearing to be owned by nobody on the client have the correct ownership on the server. Snoop shows that the UID is actually still used and is passed in the RPC authentication header to the server).


I asked on the nfs list on OpenSolaris.org if there was any way around this and alas there is none. Not taking no for an answer I pulled the sources down and built a new nfsmapid daemon that has a directory per nfsmapid_domain which can contain two files: “user” and “group” which contain one to one key pair mappings for mapping local users and groups to remote users and groups.



: principia IA 23 $; cat /etc/nfs/nfsmapdir/thegerhards.com/user
#
# Simple key value pairs.
#
# local_user remote_user
cg13442 cjg
: principia IA 24 $; 


In a full implementation it would need to also map UIDs but for my limited case it is fine. Here you can see on the laptop the directory displays as my work login and then on the server my home login:

: principia IA 25 $; ls -la
total 11
drwxr-x--x   2 cg13442  staff          3 Dec 11 15:36 .
drwxr-xr-x  51 cg13442  sys           92 Dec 11 12:30 ..
-rw-r-----   1 cg13442  staff          0 Dec 11 12:33 x
: principia IA 26 $; 
: principia IA 26 $; ssh  -x cjg@pearson ls -la $(pwd)
total 11
drwxr-x--x   2 cjg      staff          3 Dec 11 15:36 .
drwxr-xr-x  51 cjg      sys           92 Dec 11 12:30 ..
-rw-r-----   1 cjg      staff          0 Dec 11 12:33 x
: principia IA 27 $; 

There is clearly more work to do to get good nfs support for nomadic systems but at least this change gets me back to where I was with NFS v3.

The diffs are here taken directly from the mecurial repository.


Tags:

Wednesday Sep 29, 2004

A tunnel to my automounter

As I have said previously, I really like the automounter, and feel It my geeky duty to push my luck with what can be done with it.

When in the office we have a standard automounter mount point /share/install which allows access to all the install images of all the software that we have. Now when at home I wanted the same thing but to get the data from the office over ADSL. But then ssh will do compression which I have found can significantly improve access times. Could I tunnel NFS over ssh and still get the automounter to do it's stuff?

First you have to tunnel the NFS tcp port over ssh:

ssh -C -L 6049:nfs-server:2049  myzone.atwork

where nfs-server is the name of the nfs server of the install images and myzone.atwork is the name of a host at work that can access the nfs server.

Now thanks to nfs URLs I can mount the file system using:

mount nfs://localhost:6049/export/install

Automounting requires a small amount of hackery to workaround a feature of the automounter where it assumes any mount from a “local” address can be achieved using a loopback mount. So the map entry for install looks like this:

install / -fstype=xnfs nfs://127.0.0.1:6049/export/install

Then in /usr/lib/fs/xfns I have a mount script:

#!/bin/ksh -p
exec /usr/sbin/mount $@

And viola I have automounting nfs over a compressed ssh tunnel, mainly because I can! I can then live upgrade my home system over nfs via an ssh tunnel with compression to each new build as it comes out.

This also allows the pleasant side effect of being able to pause any install from the directory by using pstop(1) to stop the ssh process and prun(1) to continue it, which can be useful if I want to have better interactive perfomance over the network for a period while the upgrade continues.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today