Saturday Aug 21, 2010

Tulsa Labs

Tulsa Labs


Originally posted on Kool Aid Served Daily
Copyright (C) 2010, Kool Aid Served Daily

Friday Oct 02, 2009

args[ ] may not be referenced because probe description matches an unstable set of probes

I need to be able to see if a mount request generated an error in the mountd daemon. I have a custom kernel that has changed the mount() function call to return the error code. So, now I'm using a very simple DTrace script to catch the error codes:

#!/usr/sbin/dtrace -s

pid$1::mount:return
{
        printf("rc = %d", args[1]);
}

pid$1::audit_mountd_mount:return
{
        printf("rc = %d", args[1]);
}

The issue is what happens when I try to run it:

[root@pnfs-4-11 ~]> ./mountd_res.d `pgrep -x mountd`
dtrace: failed to compile script ./mountd_res.d: line 18: args[ ] may not be referenced because probe description pid100824::mount:return matches an unstable set of probes

What I think this means is that I've got multiple declarations of mount() and they all do not return something. Okay, I can narrow down the probe to just the one I want:

[root@pnfs-4-11 ~]> dtrace -l -f mount
   ID   PROVIDER            MODULE                          FUNCTION NAME
 2378 lx-syscall                                               mount entry
 2379 lx-syscall                                               mount return
13708        fbt           genunix                             mount entry
13709        fbt           genunix                             mount return
31190    syscall                                               mount entry
31191    syscall                                               mount return
65944  pid100824            mountd                             mount entry
65945  pid100824         libc.so.1                             mount entry
65946  pid100824            mountd                             mount return
65947  pid100824         libc.so.1                             mount return

And if I adjust my script:

pid$1:mountd:mount:return
{
        printf("rc = %d", args[1]);
}

We see I've fixed this issue!

[root@pnfs-4-11 ~]> ./mountd_res.d `pgrep -x mountd`
dtrace: failed to compile script ./mountd_res.d: line 18: index 1 is out of range for pid100824:mountd:mount:return args[ ]

Okay, I got my syntax wrong for the return code:

pid$1:mountd:mount:return
{
        printf("rc = %d", arg1);
}

And now I see the correct output:

[root@pnfs-4-11 ~]> ./mountd_res.d `pgrep -x mountd`
dtrace: script './mountd_res.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  65946                     mount:return rc = 0
  0  65240        audit_mountd_mount:return rc = 1
  0  65946                     mount:return rc = 0
  0  65240        audit_mountd_mount:return rc = 1


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Sunday Aug 30, 2009

Investigating mountd

We've had some interesting question on nfs-discuss lately about mountd and I thought I'd use an userland script I wrote, snoop, and DTrace to show some interesting properties about mountd.

My perl script will send UDP mount requests to a server and spoof the client IP. I want to control what I send and I'll sometimes spoof a request from a non-existent machine.

BTW - you will notice I don't talk about what the share is that much, unless I state otherwise, it is:

[root@silver ~]> share | grep tdh
-@tank/home     /export/zfs/tdh   rw   ""  

Need IP to name mappings

We can try a host without a name mapping:

[tdh@slayer ~/src]> host 10.10.20.41
Host 41.20.10.10.in-addr.arpa. not found: 3(NXDOMAIN)
[tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 10.10.20.42

Note that since we don't have a client to receive the reply, we'll snoop it:

 30   0.01297  10.10.20.42 -> silver       MOUNT3 C Mount /export/zfs/tdh
 31   0.01712       silver -> thens.internal.excfb.com DNS C 42.20.10.10.in-addr.arpa. Internet PTR ?
 32   0.05912 thens.internal.excfb.com -> silver       DNS R  Error: 3(Name Error)
 33   0.00522       silver -> thens.internal.excfb.com DNS C 42.20.10.10.in-addr.arpa. Internet PTR ?
 34   0.00056 thens.internal.excfb.com -> silver       DNS R  Error: 3(Name Error)
 35   0.00777       silver -> 10.10.20.42  MOUNT3 R Mount Permission denied

BTW - right off the bat we can see that mountd tries to resolve the client IP. What happens if there is a reverse entry?

[tdh@slayer ~/src]> host 192.168.4.14
14.4.168.192.in-addr.arpa domain name pointer blast-4-14.internal.excfb.com.
[tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 192.168.4.14

Well?

 37   0.03198 blast-4-14.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh
 38   0.00089       silver -> thens.internal.excfb.com DNS C 14.4.168.192.in-addr.arpa. Internet PTR ?
 39   0.00051 thens.internal.excfb.com -> silver       DNS R 14.4.168.192.in-addr.arpa. Internet PTR blast-4-14.internal.excfb.com.
 40   0.00290       silver -> blast-4-14.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix

So a client has to have a reverse mapping from IP to host name before we allow a mount to succeed. And we can see that in the source code for usr/src/cmd/fs.d/nfs/mountd/mountd.c:

    875 	getclientsnames(transp, &nb, &clnames);
    876 	if (clnames == NULL || nb == NULL) {
    877 		/\*
    878 		 \* We failed to get a name for the client, even 'anon',
    879 		 \* probably because we ran out of memory. In this situation
    880 		 \* it doesn't make sense to allow the mount to succeed.
    881 		 \*/
    882 		error = EACCES;
    883 		goto reply;
    884 	}

Need to match exactly on hostname

What if I change the share to be:

[root@silver ~]> zfs set sharenfs=rw=blast4-14:blast4-15 tank/home/tdh

Will it work or fail?

[tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 192.168.4.14
[tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 192.168.4.15

Note that I wanted to show something warm in the cache and something cold:

 24   0.03500 blast-4-14.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh
 25   0.00129       silver -> blast-4-14.internal.excfb.com MOUNT3 R Mount Permission denied
 41   0.03706 blast-4-15.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh (retransmit)
 42   0.01419       silver -> thens.internal.excfb.com DNS C 15.4.168.192.in-addr.arpa. Internet PTR ?
 43   0.00048 thens.internal.excfb.com -> silver       DNS R 15.4.168.192.in-addr.arpa. Internet PTR blast-4-15.internal.excfb.com.
 44   0.00089       silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied

Two points here, the DNS cache is not flushed when a share reloads, but the nfs auth cache must be. If it were not flushed, we would have gotten permission granted.

Okay, I can show you what is going on by this example:

[root@silver ~]> zfs set sharenfs=rw=blast4-14.internal.excfb.com:blast4-15 tank/home/tdh

And;

 24   0.03769 blast-4-14.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh
 25   0.00122       silver -> blast-4-14.internal.excfb.com MOUNT3 R Mount Permission denied
 41   0.03115 blast-4-15.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh (retransmit)
 42   0.00092       silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied

Okay, we should have gotten permission granted for the first!

[root@silver ~]> zfs set sharenfs=rw=blast4-17.internal.excfb.com:blast4-15 tank/home/tdh

And that fails as well:

 20   0.03105 blast-4-17.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh
 21   0.00086       silver -> thens.internal.excfb.com DNS C 17.4.168.192.in-addr.arpa. Internet PTR ?
 22   0.00056 thens.internal.excfb.com -> silver       DNS R 17.4.168.192.in-addr.arpa. Internet PTR blast-4-17.internal.excfb.com.
 23   0.00163       silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount Permission denied

Ohhh! Smug mode off

Okay, I'm interested in what the function in_access_list will tell me. I've got a small DTrace script, which I iterated over the development off, which will let me know what is going on here:


!/usr/sbin/dtrace -Fs

/\*
 \* Thanks to Peter Harvey;
 \* http://blogs.sun.com/peteh/entry/dereferencing_user_space_pointers_in
 \*
 \*      # ./mountd.d `pgrep -x mountd`
 \*
 \*/

dtrace:::BEGIN
{
        printf("Sampling... Hit Ctrl-C to end.\\n");
}

pid$1::check_client_new:return
{
        printf("Access permission is %d", arg1);
}

pid$1::in_access_list:entry
{
        self->trace_me = 1;
        printf("Access list is %s", copyinstr(arg2));
}

pid$1::in_access_list:return
{
        self->trace_me = 0;
        printf("Access permission is %d", arg1);
}

pid$1::strcasecmp:entry
/self->trace_me == 1/
{
        printf("host vs list entry: |%s|, |%s|\\n", copyinstr(arg0),
                copyinstr(arg1));
}

pid$1::strcasecmp:entry
/self->trace_me == 1/
{
        printf("Comparison is %d", arg1);
}

Note that I need to use strcasecmp because I can't iterate over the array. I get as a result:

[root@silver ~]> ./mount_trace.sh
+ pgrep -x mountd 
+ /root/mountd.d 634 
dtrace: script '/root/mountd.d' matched 6 probes
CPU FUNCTION                                 
  0 | :BEGIN                                  Sampling... Hit Ctrl-C to end.

  0  -> in_access_list                        Access list is blast4-17.internal.excfb.com:blast4-15
  0    -> strcasecmp                          host vs list entry: |blast4-17.internal.excfb.com|, |blast-4-17.internal.excfb.com|

  0     | strcasecmp:entry                    Comparison is 135156944
  0     | strcasecmp:entry                    host vs list entry: |blast4-15|, |blast-4-17.internal.excfb.com|

  0     | strcasecmp:entry                    Comparison is 135156944
  0    <- in_access_list                      Access permission is 0
\^C

D'oh! My share is wrong, wrong I say!

If instead I try:

[root@silver ~]> zfs set sharenfs=rw=blast-4-17.internal.excfb.com:blast-4-15 tank/home/tdh

We expect blast-4-15 to fail and blast-4-17 to succeed see:

 28   0.03211 blast-4-15.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh
 29   0.00150       silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied
 49   0.03460 blast-4-17.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh (retransmit)
 50   0.00087       silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix

Which shows again, that you need an exact match and we don't append the domain name to end of the hostname. What would happen if added this to end of the server's /etc/hosts?

192.168.4.14    blast-4-15

I expect it to work. Does it?

[root@silver ~]> grep MOUNT xxx 
 24   0.03176 blast-4-15.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh
 27   0.00124       silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied
 49   0.00160 blast-4-17.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh (retransmit)
 52   0.00098       silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix

Hmm, I bet the name entry is cached, which I can show by trying one which is not cached:

[root@silver ~]> zfs set sharenfs=rw=blast-4-17.internal.excfb.com:blast-4-31 tank/home/tdh

And

[root@silver ~]> grep MOUNT xxx
 24   0.03633 blast-4-17.internal.excfb.com -> silver       MOUNT3 C Mount /export/zfs/tdh
 27   0.00102       silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix
 45   0.02997   blast-4-31 -> silver       MOUNT3 C Mount /export/zfs/tdh (retransmit)
 46   0.00189       silver -> blast-4-31   MOUNT3 R Mount OK FH=01CC Auth=unix

And we can see the lack of caching because of two clues, the name output in snoop, i.e., "blast-4-31", and the lack of DNS activity between the two packets.

Summary

Some of the behaviour shocked me and I made some stupid mistakes that were hard to figure out what I had done. As an exercise in triage, it was great. I now have the beginnings of a DTrace debugging tool that I can point people at if they need some help. I'm very, very happy about that part!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Tuesday Jul 28, 2009

DTrace probe not firing

I'm trying to track down whether the client address is ever being set in a NFS request. I've checked with build 117, 112, 109, 85, and now I'm trying 79a. I've got a VMWare image running on a laptop. But for some reason my probe isn't loading:

# ./req.d
dtrace: failed to compile script ./req.d: line 3: probe description ::rfs_dispatch:entry does not match any probes
# dtrace -f rfs_dispatch
dtrace: invalid probe specifier rfs_dispatch: probe description ::rfs_dispatch: does not match any probes

A clue can be found here:

# share
#

The clue is that with no shares loaded, then the 'nfssrv' module is not loaded. If we create a share, we see:

# share -F nfs -o rw,anon=0 /export/home
# dtrace -f rfs_dispatch
dtrace: description 'rfs_dispatch' matched 2 probes
\^C

# ./req.d
dtrace: script './req.d' matched 1 probe

We have success!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Thursday Jul 16, 2009

Some nasty NFSv3 interactions at mount time between Linux and OpenSolaris

We had a recent integration that exposed some nasty interactions between a OpenSolaris client and a Linux server. There are bugs on both sides, but what I want to do here is document the behavior you'll see and what you can do to fix it.

The first problem was that the fix for 6790413 AUTH_NONE implementation in kernel RPC caused a nasty interaction with a Linux server in that it tried the first security flavor in the array returned by the MOUNTD request to the server. The issue can be seen here:

[thud@adept nfs]> more /etc/exports
/ \*(sync)
/home 192.168.1.0/255.255.255.0(rw,async,no_subtree_check,insecure,no_root_squash)

And a mount request from an OpenSolaris client:

[thud@witch ~]> sudo mount -o vers=3 wont:/home /mnt
[thud@witch ~]> cd /mnt
[thud@witch /mnt]> ls -la
total 35
drwxr-xr-x   3 root     root        4096 Feb 25  2008 .
drwxr-xr-x  27 root     root          30 Jul 17 00:34 ..
drwx------  25 thud      staff       4096 Mar 19 00:22 thud
[thud@witch /mnt]> cd thud
thud: Permission denied.

Why, well look at what the server sends back:

MOUNT:----- NFS MOUNT -----
MOUNT:
MOUNT:Proc = 1 (Add mount entry)
MOUNT:Status = 0 (OK)
MOUNT:File handle = [DADF]
MOUNT: 01000700010005000000000053CF6DE4FF1C4572BB2950392EB6993C
MOUNT:Authentication flavor = none,unix,390003,390004,390005
MOUNT:

The OpenSolaris server selected AUTH_NONE, as it was first. If we try this again:

[thud@witch ~]> sudo umount /mnt
[thud@witch ~]> sudo mount -o vers=3,sec=sys wont:/home /mnt
[thud@witch ~]> cd /mnt/thud

We are happy.

Note that this case works for Linux because if there is no command line option, the client will default to AUTH_SYS. It ignores the list from the server.

Well, we discussed whether we wanted to use the default security flavor as defined in nfssec.conf(4) or if we wanted to re-order the array on strongest flavor or if we wanted to do both (i.e., re-order only if the default was not present).

It turns out that you should honor the array's order as much as possible (See Section 2.7 of RFC2623). We've decided to use any option provided on the command line, then the default, and then the first entry in the array. I.e., if no command line option and no default, we consult the server's list. Also, if there is a command line option, it has to be present in the list or the mount fails. If on the other hand the default is not present, then we take the first entry in the list.

You can track this fix in 6860784 mount_nfs needs to choose default auth first for NFSv3 mounts. If you need relief, for now specify 'sec=sys' on your mount command or add it to your automount maps.

In the meantime, I started a discussion with the Linux NFS developers about the issue (Security negotiation), and it turns out that they decided that returning AUTH_NONE as the first flavor was a bug. This was fixed in nfs-utils (commit 3c1bb23c0379864722e79d19f74c180edcf2c36e in version 1.1.3).

And sure enough, my stock Fedora Core 8 server has a version of 1.1.0. So I updated my server to Fedora Core 11 to see what would happen. I was actually surprised, with version 1.1.5 that the mount failed:

[root@witch ~]> mount -o vers=3 adept:/home /mnt
nfs mount: security mode does not match the server exporting adept:/home

It turns out that the Linux server is not returning any security flavors with the exact same exports as before!

MOUNT:----- NFS MOUNT -----
MOUNT:
MOUNT:Proc = 1 (Add mount entry)
MOUNT:Status = 0 (OK)
MOUNT:File handle = [DADF]
MOUNT: 01000700010005000000000053CF6DE4FF1C4572BB2950392EB6993C
MOUNT:Authentication flavor =
MOUNT:

Again, this works with a Linux client, and that is because they basically ignore the array of security flavors and try AUTH_SYS by default.

The bug (which I later verified has been seen by others (Red Hat Bugzilla – Bug 467613 rpc.mountd does not announce any flavors) is that if no 'sec=' is mentioned in the export definition, then no security flavor is set. If we change the export to instead be:

/home 192.168.1.0/255.255.255.0(sec=sys,rw,async,no_subtree_check,insecure,no_root_squash)

Then we restore interoperability.

There is a lesson buried in here, don't just test against your own client/server. Both sides failed that lesson at different points. Also, we do cutting edge pNFS and NFSv4.1 interoperability testing all the time, but we don't with NFSv3. While as developers we may think that development work is over, we do make bug fixes to support customers and we need to be careful to reduce customer pain.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Thursday Feb 12, 2009

Playing with 'sec=none' and using AUTH_NONE

AUTH_NONE is one of the least understood security flavors you can use with NFS (see nfssec(5) for more details). When you share a resource, you can specify the security flavors with 'sec'. You can also specify an anonymous uid with 'anon'. I mention that because the two interact.

The way they interact is that any unauthenticated user id is mapped to the anonymous uid. The primary way to be unauthenticated is to be the root uid on the client and not be in the 'root' access list. As the default for 'anon' is -1, this means that the client root typically has no permissions on the server. A server admin can grant clients root permissions by saying 'anon=0' in the share. As we will see, that can be very dangerous.

The secondary way to be unauthenticated requires that the share to have 'sec=none' set. share_nfs(1M) states that if the client uses either AUTH_NONE or a security mode is one that is not in the share, then the NFS request is treated as unauthenticated.

Let's try some examples:

On the server

[root@pnfs-9-24 ~]> zfs create pnfs2/sysnone
[root@pnfs-9-24 ~]> zfs create pnfs2/sysnone0
[root@pnfs-9-24 ~]> zfs create pnfs2/sysnone55
[root@pnfs-9-24 ~]> zfs set sharenfs=sec=sys:none,anon=0 pnfs2/sysnone0
[root@pnfs-9-24 ~]> zfs set sharenfs=sec=sys:none pnfs2/sysnone
[root@pnfs-9-24 ~]> zfs set sharenfs=sec=sys:none,anon=55 pnfs2/sysnone55
[root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone

Note that we have held off on doing the following:

[root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone0
[root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone55

And on the client:

[root@pnfs-9-23 ~]> mount -o vers=3,sec=none pnfs-9-24:/pnfs2/sysnone /mnt
[root@pnfs-9-23 ~]> ls -la /mnt
total 8
drwxrwxrwx   2 root     root           2 Feb 12 16:20 .
drwxr-xr-x  35 root     root          37 Feb 12 16:31 ..
[root@pnfs-9-23 ~]> touch /mnt/foo
[root@pnfs-9-23 ~]> ls -la !$
ls -la /mnt/foo
-rw-r--r--   1 nobody   nobody         0 Feb 12 21:58 /mnt/foo
[root@pnfs-9-23 ~]> umount /mnt

Since there was no anon set, we get -1.

[root@pnfs-9-23 ~]> mount -o vers=3,sec=none pnfs-9-24:/pnfs2/sysnone0 /mnt
[root@pnfs-9-23 ~]> touch /mnt/foo
[root@pnfs-9-23 ~]> ls -la /mnt/foo
-rw-r--r--   1 root     root           0 Feb 12 22:00 /mnt/foo
[root@pnfs-9-23 ~]> umount /mnt

Since 'anon=0', we are going to use uid 0. I'll point out the danger later.

[root@pnfs-9-23 ~]> mount -o vers=3,sec=none pnfs-9-24:/pnfs2/sysnone55 /mnt
[root@pnfs-9-23 ~]> touch /mnt/foo
touch: cannot create /mnt/foo: Permission denied

What happened here? Well, remember that we didn't set directory permissions, so it is most likely that root owns this 'directory':

[root@pnfs-9-24 ~]> ls -la /pnfs2/sysnone55
total 6
drwxr-xr-x   2 root     root           2 Feb 12 22:15 .
drwxr-xr-x  12 root     root          12 Feb 12 22:15 ..
[root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone0
[root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone55

The prior example worked because 'anon=0' matched up perfectly. So now:

[root@pnfs-9-23 ~]> touch /mnt/foo
[root@pnfs-9-23 ~]> ls -la /mnt/foo
-rw-r--r--   1 55       55             0 Feb 12 22:01 /mnt/foo
[root@pnfs-9-23 ~]> nfsstat -m /mnt
/mnt from pnfs-9-24:/pnfs2/sysnone55
 Flags:         vers=3,proto=tcp,sec=none,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600
 Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

Let's see some non-root behavior here:

[thud@pnfs-9-23 ~]> touch /mnt/bar
[thud@pnfs-9-23 ~]> ls -la /mnt
total 10
drwxrwxrwx   2 root     root           4 Feb 12 22:18 .
drwxr-xr-x  35 root     root          37 Feb 12 16:31 ..
-rw-r--r--   1 55       55             0 Feb 12 22:18 bar
-rw-r--r--   1 55       55             0 Feb 12 22:01 foo

And if we go back to the prior case (pnfs2/sysnone0):

[thud@pnfs-9-23 ~]> touch /mnt/bar
[thud@pnfs-9-23 ~]> ls -la /mnt
total 10
drwxrwxrwx   2 root     root           4 Feb 12 22:20 .
drwxr-xr-x  35 root     root          37 Feb 12 16:31 ..
-rw-r--r--   1 root     staff          0 Feb 12 22:20 bar
-rw-r--r--   1 root     root           0 Feb 12 22:00 foo

So if we mix 'sec=none' and 'anon=0', it is easy enough to give every remote user root access on the server.

But we haven't examined the real power of 'sec=none' here:

[root@pnfs-9-23 ~]> mount -o vers=3,sec=krb5i pnfs-9-24:/pnfs2/sysnone0 /mnt
nfs mount: security mode does not match the server exporting pnfs-9-24:/pnfs2/sysnone0
[root@pnfs-9-23 ~]> mount -o vers=4,sec=krb5i pnfs-9-24:/pnfs2/sysnone0 /mnt
[root@pnfs-9-23 ~]> nfsstat -m /mnt
/mnt from pnfs-9-24:/pnfs2/sysnone0
 Flags:         vers=4,proto=tcp,sec=krb5i,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
 Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

Okay, our NFS tester extraordinary, Helen Chao, found a bug here. According to the man pages, you can argue that either the v3 case should have seen the mount succeed or fail. On the one hand, you specified that you wanted the mount to only be krb5i. On the other hand, you told the share to map all unlisted modes to AUTH_NONE. The slight bug here is that we need to clearly document what our expectations are here.

The next bug is not as minor one - v3 and v4 should have the same behavior. We don't care which side of the fence we fall on about allowed or denied, we just want consistency.

The next bug is very subtle here - 'nfsstat -m' reports that the mount is via krb5i. But is it? (Is this even a bug?)

[root@pnfs-9-23 ~]> touch /mnt/gar
[root@pnfs-9-23 ~]> ls -la /mnt
total 11
drwxrwxrwx   2 root     root           5 Feb 12 22:29 .
drwxr-xr-x  35 root     root          37 Feb 12 16:31 ..
-rw-r--r--   1 root     staff          0 Feb 12 22:20 bar
-rw-r--r--   1 root     root           0 Feb 12 22:00 foo
-rw-r--r--   1 root     root           0 Feb 12 22:29 gar

Can't tell anything there. Can I as a normal user?

[thud@pnfs-9-23 ~]> touch /mnt/googoo
touch: cannot stat /mnt/googoo: Permission denied
[thud@pnfs-9-23 ~]> kinit
Password for thud@NFSV4.SUN.COM: 
[thud@pnfs-9-23 ~]> touch /mnt/googoo
[thud@pnfs-9-23 ~]> ls -al /mnt
total 12
drwxrwxrwx   2 root     root           6 Feb 12 22:32 .
drwxr-xr-x  35 root     root          37 Feb 12 16:31 ..
-rw-r--r--   1 root     staff          0 Feb 12 22:20 bar
-rw-r--r--   1 root     root           0 Feb 12 22:00 foo
-rw-r--r--   1 root     root           0 Feb 12 22:29 gar
-rw-r--r--   1 root     root           0 Feb 12 22:32 googoo

Okay, so it isn't a bug at all - the client is correct here. I.e., the client is using kerberos to talk to the server. The share does not absolve the server from having to understand kerberos. We can clearly see that in that the user without a ticket is denied permission to create a file. And we also see that the uid is clearly mapped to the anon uid on the server.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Wednesday Oct 08, 2008

Not able to mount from Fedora Core 9

Helen Chao, a colleague who had never really used Linux, asked me to help configure a kernel. I asked why and she said she needed to test RDMA over NFSv4. It turns out that the stock 2.6.25 kernel with Fedora Core 9 already had the support in it. We followed the directions at the nfs-rdma.txt and were not able to get it running.

Helen (a great test engineer) proceeded to investigate from there and couldn't get a simple loopback or NFS mount to succeed.

So I exported the root to all hosts and went to work debugging this issue. A 'rpcinfo -p' on the server showed the expected registered services. The same call from a client failed, but a ping worked:

[th199096@jhereg ~]> rpcinfo -p pnfs-9-30
\^C
[th199096@jhereg ~]> rpcinfo -p pnfs-9-30
\^C
[th199096@jhereg ~]> sudo mount -o vers=3 pnfs-9-30:/ /mnt
\^C
[th199096@jhereg ~]> sudo mount -o vers=3 pnfs-9-30:/ /mnt
nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out
nfs mount: retrying: /mnt
nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out
\^C
[th199096@jhereg ~]> ping pnfs-9-30
pnfs-9-30 is alive

I thought that perhaps it was a firewall issue and disabled IPTABLES.

No luck and I knew the mount should succeed - I tried it with my home Core 8 box and an OpenSolaris server. It worked, but then again, that Linux box has been configured for ages. Long story short, I asked Chuck Lever for help.

His only suggestion was to turn off selinux or as he puts it:

Also disable selinux, just so your systems behave like normal Unix.

So I followed the directions I found here: How to Disable SELinux and now the mount works:

# mount -o vers=3 pnfs-9-30:/ /mnt
nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out
nfs mount: retrying: /mnt
nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out
nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out
nfs mount: /mnt: mounted OK
# 

Most of the help I found with google on the RPC messages wasn't informative. Either the suggestion was to turn off IPTABLES or there was no reply.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Sunday Oct 05, 2008

The myth of security with AUTH_SYS

AUTH_SYS is an insecure security mode, yet it is commonly used within companies. It can be used as the proverbial open lock on a door - the fact that the lock is there means do not enter. But I've seen people terminated for ignoring that lock.

With that in mind, I want to go over the simple security schemes employed within a company and show why they don't work. The punchline will be of course Kerberos. Speaking of myths, one is that you need NFSv4 in order to deploy Kerberos. You don't - common servers and clients easily speak Kerberos with NFSv3. And ignore NFSv2, please, please.

Export Security

With an export (or share), the most lax security is typically the default:

[root@pnfs-9-26 ~]> zfs create rootpool/export/home/secure
[root@pnfs-9-26 ~]> share
[root@pnfs-9-26 ~]> zfs set sharenfs=on rootpool/export/home/secure
[root@pnfs-9-26 ~]> share
-@rootpool/exp  /export/home/secure   rw   ""  

I.e., every machine in the world can mount pnfs-9-26:/export/home/secure. The reasons for this default are simple:

  1. When the model was first developed
    1. the number of exports and clients was limited
    2. the degree of interconnectedness outside of the organization was limited
  2. It is hard to know beforehand where to limit access.

By default, root has access almost like any other user, but it is mapped to the user nobody. We can see this here if we grant wide open permissions on the export:

[root@pnfs-9-26 ~]> chmod 777 /export/home/secure
[root@pnfs-9-26 ~]> ls -la /export/home/secure
total 6
drwxrwxrwx   2 root     root           2 Oct  5 11:39 .
drwxr-xr-x   5 th199096 staff          6 Oct  5 11:39 ..

We should be able to create a file as anyone from another machine:

[root@jhereg ~]> mount -o vers=3 pnfs-9-26:/export/home/secure /mnt
[root@jhereg ~]> touch /mnt/i_am_root

That worked:

[root@pnfs-9-26 secure]> ls -la
total 7
drwxrwxrwx   2 root     root           3 Oct  5 11:51 .
drwxr-xr-x   5 th199096 staff          6 Oct  5 11:39 ..
-rw-r--r--   1 nobody   nobody         0 Oct  5 11:51 i_am_root

Notice that root has been mapped to nobody. What happens if we do it as a normal user:

[th199096@jhereg ~]> touch /mnt/i_am_jhereg
[th199096@jhereg ~]> touch /mnt/i_am_th199096

And we get the correct user:

[root@pnfs-9-26 secure]> ls -la
total 9
drwxrwxrwx   2 root     root           5 Oct  5 11:54 .
drwxr-xr-x   5 th199096 staff          6 Oct  5 11:39 ..
-rw-r--r--   1 th199096 staff          0 Oct  5 11:54 i_am_jhereg
-rw-r--r--   1 nobody   nobody         0 Oct  5 11:51 i_am_root
-rw-r--r--   1 th199096 staff          0 Oct  5 11:54 i_am_th199096

Now what happens if we try to remove i_am_th199096 as root?

[root@jhereg ~]> rm /mnt/i_am_th199096
rm: /mnt/i_am_th199096: override protection 644 (yes/no)? y

anon

We are allowed to do that, but is it a property of being root or the permissions? We can check this with a simple change of the share:

[root@pnfs-9-26 secure]> zfs set sharenfs=anon=-1 rootpool/export/home/secure
[root@pnfs-9-26 secure]> share
-@rootpool/exp  /export/home/secure   anon=-1   ""  

See share_nfs(1M) for a description of anon. Notice I didn't specify whether rw is set or not. We can retry the delete:

[root@jhereg ~]> rm /mnt/i_am_jhereg
NFS3 getattr failed for pnfs-9-26: RPC: Authentication error; s1 = 13, s2 = 0
rm: /mnt/i_am_jhereg: Permission denied

If you want to make sure to deny root level access to a share, then you need to set anon=-1.

Conversely, if you want to enable root level access to a share, you can set anon=0:

[root@pnfs-9-26 secure]> zfs set sharenfs=anon=0 rootpool/export/home/secure
[root@pnfs-9-26 secure]> share
-@rootpool/exp  /export/home/secure   anon=0   ""  

I've recreated the two files in the background (which shows by the way that rw is the default). And when we test the deletion:

[root@jhereg ~]> rm /mnt/i_am_jhereg
[root@jhereg ~]> 

No pesky question that implies I am not a god!

root=

If I want to allow root access from one host but deny it from all others, I can use the root= access list:

[root@pnfs-9-26 secure]> zfs set sharenfs=root=pnfs-9-25.central.sun.com rootpool/export/home/secure
[root@pnfs-9-26 secure]> share
-@rootpool/exp  /export/home/secure   sec=sys,root=pnfs-9-25   ""  

PS: The sec=sys is stating this is an AUTH_SYS share. Also, since I am using DNS for hosts in /etc/resolv.conf, I need a FQDN.

Try to remove:

[root@jhereg ~]> rm /mnt/i_am_th199096 
rm: /mnt/i_am_th199096: override protection 644 (yes/no)? yes

Since it worked and we got a prompt, it has to be the permission set which is enabling this. If we tighten things down a bit more:

[root@pnfs-9-26 secure]> zfs set sharenfs=root=pnfs-9-25.central.sun.com,anon=-1 rootpool/export/home/secure
[root@pnfs-9-26 secure]> share
-@rootpool/exp  /export/home/secure   anon=-1,sec=sys,root=pnfs-9-25   ""  

We can see we are locked out:

[root@jhereg ~]> rm /mnt/i_am_root
rm: /mnt/i_am_root: Permission denied

versus

[root@pnfs-9-25 ~]> rm /mnt/i_am_root
[root@pnfs-9-25 ~]> 

And yet the other machine reigns supreme:

We'll revisit the use effectiveness of root= without anon=, when we look at permissions.

rw=

So we can keep machines from getting access altogether by restricting the rw= access list:

[root@pnfs-9-26 ~]> zfs set sharenfs=rw=pnfs-9-25.central.sun.com rootpool/export/home/secure
[root@pnfs-9-26 ~]> share
-@rootpool/exp  /export/home/secure   sec=sys,rw=pnfs-9-25.central.sun.com   ""  

which yields on the two clients:

[root@jhereg ~]> ls -la /mnt
/mnt: Permission denied

and

[root@pnfs-9-25 ~]> ls -la /mnt
drwxrwxrwx   2 root     root           6 Oct  5 19:33 .
drwxr-xr-x  36 root     root          39 Oct  5 19:11 ..
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_here
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs-9-25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs_9_25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_th199096

Note that the client jhereg must be caching a file handle for the root of the export /export/home/secure on the server pnfs-9-26. If it were not, we would have to reissue the mount request, which would have to fail. Also note, it is not just the mountd requests which have to check access list permissions. If it were, then the above operations would always work. SunOS used to work this way and the Solaris NFS team made a change back in the 1995/96 time frame, see for example Brent Callaghan's presentation at the 1996 Connectathon: NFS Client Authentication. And quickly, the security reason for doing so is the implication that if a rogue client someone sniffed out a valid file handle, then it had complete access to all of the information on that share.

ro=

We can likewise grant read only access via the ro= access list.

Access list interactions

All of rw, rw=, ro, and ro= interact as described by sharenfs(1M).

File Permissions

So access lists work on machines. If a machine is able to mount a share from a server, then all users on that client can access everything on that server. Right?

Wrong. The directory and file permissions determine user access. Contrast this with a model derived from a client only having one user logged in at a time. In that situation, it may not be the machine which is important but rather the user..

If I wanted to only grant access to a single user, then I would set the owner of the share to be that user and I would also set the permissions to be 700:

[root@pnfs-9-26 ~]> chown th199096:staff /export/home/secure/
[root@pnfs-9-26 ~]> chmod 700 /export/home/secure/
[root@pnfs-9-26 ~]> ls -la /export/home/secure/
total 10
drwx------   2 th199096 staff          6 Oct  5 19:33 .
drwxr-xr-x   5 th199096 staff          6 Oct  5 11:39 ..
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_here
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs-9-25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs_9_25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_th199096

And lets change the share to be wide open:

[root@pnfs-9-26 ~]> zfs set sharenfs=on rootpool/export/home/secure
[root@pnfs-9-26 ~]> share
-@rootpool/exp  /export/home/secure   rw   ""  

We see root access is denied (because it maps to nobody):

[root@pnfs-9-25 ~]> ls -la /mnt
/mnt: Permission denied
total 3

But on that same machine, th199096 is granted access:

[root@pnfs-9-25 ~]> su - th199096
[th199096@pnfs-9-25 ~]> ls -la /mnt
total 12
drwx------   2 th199096 staff          6 Oct  5 19:33 .
drwxr-xr-x  36 root     root          39 Oct  5 19:11 ..
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_here
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs-9-25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs_9_25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_th199096

By the way, if we grant either root= or anon=0 access, then this all goes out the window:

[root@pnfs-9-26 ~]> zfs set sharenfs=rw,anon=0 rootpool/export/home/secure

yields:

[root@pnfs-9-25 ~]> ls -la /mnt
total 12
drwx------   2 th199096 staff          6 Oct  5 19:33 .
drwxr-xr-x  36 root     root          39 Oct  5 19:11 ..
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_here
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs-9-25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:27 i_am_pnfs_9_25
-rw-r--r--   1 th199096 staff          0 Oct  5 13:30 i_am_th199096

A client's root only gets to boss things around if the server grants permission.

The final myth of AUTH_SYS

Take a server for which the root account is locked down. Assume admins who don't want an inadvertent 'rm -rf /net' to nuke their server, so by default they create shares of the form:

[root@pnfs-9-26 ~]> zfs set sharenfs=rw,anon=-1 rootpool/export/home/secure

And further, at some point someone decides to lock down a share's permissions, i.e., 700 on the user th199096.

How long would it take someone to get access over AUTH_SYS?

Not long - even though we know root access is out and we can assume they do not know my password. Since we use NIS, they can do a 'ypcat passwd | grep th199096' and grab my uid. Then they only have to create a dummy account a test machine.

What if we create a special account, not in NIS? Well, they may not have root access on the server, but if they have any access, then they could cd to the parent directory, issue an 'ls -la', see the user name, and then grep for it out of /etc/passwd.

You could lock down the machine, lock down the NIS database, etc. But the fact remains that if I can mount it, then I can create a simple script to try every UID until I get access. How many servers out there check for getattr storms?

The answer is to further restrict the access lists. But eventually, if I'm able to gain access to one of the restricted machines or if I can bring up my box with the same IP as one of the restricted machines, I can get access.

Kerberos

But all I need to do to combat this without all of these "extreme" measures is to enable Kerberos on the server:

[root@pnfs-9-26 ~]> zfs set sharenfs=sec=krb5,rw,anon=-1 rootpool/export/home/secure
[root@pnfs-9-26 ~]> share
-@rootpool/exp  /export/home/secure   anon=-1,sec=krb5,rw   ""  

I am the right user (actually my uid on pnfs-9-25 matches that of the uid of the user th199096 on pnfs-9-26), but it fails:

[th199096@pnfs-9-25 ~]> ls -al /mnt
NFS3 access failed for pnfs-9-26: RPC: Authentication error; s1 = 13, s2 = 0
/mnt: Permission denied
total 3

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Beware expectations

I call a "share" an "export" because I learned the terminology at another company, one based on the SunOS style and not the Solaris style. It turns out I have other expectations on how shares work. I thought the following was legal:

[root@pnfs-9-24 ~]> zfs set sharenfs=rw=pnfs-9-25:jhereg rootpool/export/home/secure

And all I got was:

[root@pnfs-9-25 ~]> mount pnfs-9-24:/export/home/secure /mnt
nfs mount: mount: /mnt: Permission denied

I reinstrumented mountd to spit out some debug messages and I saw:

[root@pnfs-9-24 ~]> Oct  5 16:04:27 pnfs-9-24 mountd[1598]: Considering |pnfs-9-25| vs |pnfs-9-25.Central.Sun.COM|
Oct  5 16:04:27 pnfs-9-24 mountd[1598]: Considering |jhereg| vs |pnfs-9-25.Central.Sun.COM|
Oct  5 16:04:27 pnfs-9-24 mountd[1598]: Considering |pnfs-9-25| vs |pnfs-9-25.Central.Sun.COM|
Oct  5 16:04:27 pnfs-9-24 mountd[1598]: Considering |jhereg| vs |pnfs-9-25.Central.Sun.COM|
Oct  5 16:04:27 pnfs-9-24 mountd[1598]: pnfs-9-25.Central.Sun.COM denied access to /export/home/secure

So it never considers the FQDN. Interesting, so what happens if we add it?

[root@pnfs-9-24 ~]> zfs set sharenfs=root=pnfs-9-25.Central.sun.com,anon=-1 rootpool/export/home/secure

We see:

[root@pnfs-9-25 ~]> mount pnfs-9-24:/export/home/secure /mnt
[root@pnfs-9-25 ~]> 

And on the console:

[root@pnfs-9-24 ~]> Oct  5 16:06:27 pnfs-9-24 mountd[1598]: Considering |pnfs-9-25.Central.sun.com| vs |pnfs-9-25.Central.Sun.COM|

By the way, the compare is case insensitive. This took me way longer to track down than I liked. And it had me going down dead-ends with other "bugs".

The share_nfs(1M) has this to say:

access_list The access_list argument is a colon-separated list whose components may be any number of the following: hostname The name of a host. With a server con- figured for DNS or LDAP naming in the nsswitch "hosts" entry, any hostname must be represented as a fully quali- fied DNS or LDAP name.

And sure enough:

[root@pnfs-9-24 ~]> grep hosts /etc/nsswitch.conf
# "hosts:" and "services:" in this file are used only if the
#hosts:      nis [NOTFOUND=return] files
hosts: files dns
# before searching the hosts databases.

Besides RTFMing myself, which I had done earlier, but not well enough, I was struck by the thought that I wish we had made this choice at a previous company. It solves a lot of problems, reduces a lot of name server queries (which was many of the problems), but is not as flexible. Consider a multi-homed client thorton which can either be thorton.central.sun.com or thorton.be.central.sun.com. With just rw=thorton, we can leverage the search domains to allow access to both interfaces as once.

But, depending on the ordering in the search domains, we may end up sending more name lookups than we want. Also, I've heard some sysadmins expose the belief that those interfaces represent different machines. And if you want both to have access, you explicitly grant them both access.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
About

tdh

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today