Wednesday Jun 29, 2005

mutex_enter() in an interrupt handler?

I've always assumed that interrupt handlers were absoluteley there to do the minimum work and get out as soon as possible. A question came to me today that challenged that view. The question was "Can I mutex_enter the pidlock while in an interrupt context?" My instant reaction was no way, you should never block in an interrupt context. However it seems I was wrong. Take a look at fas_intr(). You can see virtually the first thing it does is grab a mutex. This got me thinking. There is a fundemental difference between grabbing a lock in an interrupt handler that is private to that code and using a lock which is in general use throughout the kernel. Take this example. Something is holding the pidlock. But for whatever reason (it's timeslice has run out) it has be scheduled off CPU. So we have our foo card installed in our Solaris machine and an interrupt comes in for it. foointr() has been written to grab the mutex for the pidlock to make sure it has a stable view of the process table. So it tries to grab the pidlock and it has to block till the owner frees it. Could be ages, and the interupt wouldn't be serviced in a timely manner. I'd like to say I dredged through the code and found all this out by reading the code, but infact I just read the manual page. This says that you can acquire a mutex, kernel or high level interupt context. However there is a restriction that the mutex must have been initialised as type DRIVER_MUTEX, and have the ddi_iblock_cookie (returned from ddi_get_iblock_cookie(9f)) as the arg to mutex_init(9f). So there you have it. you can use a mutex in a high level interupt handler, just not ones like pidlock Technorati Tag:
Technorati Tag:

Thursday Jun 23, 2005

Debugging automounter problems

I've been working on a fix to an automounter problem for some time. So I'm just testing the final fix and sanity testing the binaries on my laptop. Suddenly my home directory doesn't get mounted. This is of course very strange. It's been working on that laptop for ages so I think it must be my fix.
So naturally the first thing I do is take my binaries off and try again. Few that didn't work either so at least my fix is good. Trouble is I still have an autmounter that doesn't work.
Some things do work however. eg.
$ cd /net//
But somethings don't
$ cd /work/cwb
ksh: /work/cwb: permision denied
$ cd /home/cwb
ksh: /home/cwb: not found
This was rather strange.
So, you may or may not already know this, but there is a way of seeing the diagnostic output from automountd by running the command
$ ls -l /net/=9
/net/=9: No such file or directory
Well actually you can substitute the 9 for another number corresponding to a debug level. Traditionally this sent output to the console window. So I dilligentley start up a console terminal window and it stays resolutely blank. After a little investigation it appears that in converting autofs to SMF, these messages are now sent to the smf log
svcs -l \\\*autofs\\\*
fmri         svc:/system/filesystem/autofs:default
name         Automounter
enabled      true
state        online
next_state   none
state_time   Fri 17 Jun 2005 09:45:35 BST
logfile      /var/svc/log/system-filesystem-autofs:default.log
restarter    svc:/system/svc/restarter:default
contract_id  47 
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   require_all/restart svc:/milestone/name-services (online)
dependency   optional_all/none svc:/network/nfs/client (online)
So I take a look in /var/svc/log/system-filesystem-autofs:default.log and low and behold we have the answer.
t1      Automountd: trace level = 9
t1        do_lookup1: action=2 wildcard=FALSE error=2
t1      LOOKUP REPLY    : status=2
t10     LOOKUP REQUEST: Wed Jun 22 17:50:02 2005
t10       name=cwb[] map=auto_home opts=nobrowse path=/home direct=0
t10       PUSH /etc/auto_home
t10             Executable map: map=/etc/auto_home key=cwb
t10             read_execout: forking .....
t10             read_execout: map=/etc/auto_home key=cwb line=
t10       POP /etc/auto_home
t10       do_lookup1: action=2 wildcard=FALSE error=2
t10     LOOKUP REPLY    : status=2
t10     LOOKUP REQUEST: Wed Jun 22 17:50:02 2005
t10       name=cwb[] map=auto_home opts=nobrowse path=/home direct=0
t10       PUSH /etc/auto_home
t10             Executable map: map=/etc/auto_home key=cwb
t10             read_execout: forking .....
t10             read_execout: map=/etc/auto_home key=cwb line=
t10       POP /etc/auto_home
t10       do_lookup1: action=2 wildcard=FALSE error=2
t10     LOOKUP REPLY    : status=2
Note the PUSH of auto_home. In that section it appears that the map is executeable. But in this case auto_home is just a normal file
$ cat /etc/auto_home
# Home directory map for automounter
cwb     deadarm:/export/home/&
So clearly something has happened to make the automounter believe this is an executeable map (see man automount(1m) for more details of automount maps and executeable ones). Check the permissions on /etc/auto_\*
$ ls -l /etc/auto_\*
-rwxr-xr-x   1 root     root         339 Apr  9 18:11 /etc/auto_direct
-rwxr-xr-x   1 root     bin           66 Jun 23 14:44 /etc/auto_home
-rwxr-xr-x   1 root     bin          128 Jun 22 17:55 /etc/auto_master
So somehow I've managed to get execute permission on these files, so the automounter tries to run them. Hence I get no mounts. Clearing the exec bit
$ chmod 644 /etc/auto_\*
And it all returns to normal. Now make sure I turn off the debugging so as not to fill up my log files :)
$ ls -l /net/=0
/net/=0: No such file or directory
Putting the debug messages in the the smf log file is great. It makes it much easier to debug remotely and you don't have to rty and capture the console output as you're testing. My laptop is now working fine, and my fix is still good. Of course I still don't know why my map files had the mode changed. I suspect I mistyped. I'll pkgchk the system later to see if there are any other strange permissions I've set... Technorati Tag:
Technorati Tag:
Technorati Tag:
Technorati Tag:
Technorati Tag:

Wednesday Jun 22, 2005

UK OpenSolaris User group

On Monday we had the first meeting of the UK OpenSolaris user group. It was interesting to meet some of the other people (ie. non Sun people) who want to get OpenSolaris working. There was a presentation from Simon Phipps about why OpenSource is necessary and good, which was interesting. We also had a free form discussion about anything. The vast majority of this was covering Licensing which is something I have no experise, or to be honest interest in. However there were two really interesting technical points.
1. How will I know exactly which source made up my [Open]Solaris system: I put Open in brackets because this also pertains to comercial solaris and is a problem we've had inside sun for many years. We know what shipped as base Solaris 10 for example, but then we ask customers to apply patches (or they choose to because they have a problem). These patches are generated from what is called the "Patch Gate", which is branched off the main gate at FCS (at which point the main gate is frozen). All patches are generated from this and then the target binaries are put in to the patch and applied to the system. Copies of the source are also take and put in the "Source Patch" which is only available internally. Obviously we can keep a record of the SCCS versions of the code we put in to the patches so that we can go back later and find out. But these are not readily available from the binaries. This would be a good thing to be able to find out from a running system and is probably something we should put in to OpenSolaris to allow users to identify exactley what is on their system.
2. Can we mount ext[23]fs on Solaris: The answer to this is no. At the moment we have no implementation of the ext[23] file systems. It would be useful for migrating people to Solaris, however as a company our development is targetting ZFS. That's not to say it shouldn't be done, so put it up for discussion and see who bites. Try posting to the disucssion forum Or perhaps the file systems one I know that is primarily for ufs, but theres some good filesystems people in that comunity.
I'm now looking forward to the next meeting. Hopefully we will get even more people there and can have some targetted technical discussions
Technorati Tag:
Technorati Tag:
Technorati Tag:
Technorati Tag:
Technorati Tag:

Tuesday Jun 14, 2005

How does my signal handler get to run

With OpenSolaris a reality, I can talk about some of the areas I've had to trouble shoot in the past. I started trying to write blogs and soon found I needed to reference the source code to illustrate the point. This is one area I've had to look at many times when tracking down problems. I've pieced together my understanding from reading the source code, however while writing this blog entry I've used a simple program and some "D" to illustrate what's going on.

Here's the simple program to copy some memory around, receive an alarm signal and run a signal handler.

$  cat alarm.c
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

void alarm_handler();

        char \*src, \*dest;
        sigset_t set;
        struct sigaction sigact;

        sigact.sa_handler = alarm_handler;
        sigaddset(&sigact.sa_mask, SIGALRM);

        sigact.sa_flags = SA_SIGINFO;

        src = malloc (4096);
        dest = malloc (4096);
        memset(src, '@', 4096);
        while(1) {
                memcpy (src, dest, 4096);

void alarm_handler()
        char txt[]="In alarm handler";

        write (1, txt, sizeof(txt));

memset and memcpy are implemented by using the block load and store functions which use the floating point registers. This was used to emulate a customer's problem.
We know that the signal is actually delivered by the kernel when it is safe to do so, which is usually when you transition from kernel to userland or back (ie entering or leaving a system call). The act of sending a signal simply marks the t_astflag in the thread structure for trap to process later, so we can write a little D to trace anything from trap_cleanup (where the astflag is checked). We also need to check whether we should be checking for a signal at this time (this was new to me as well)
$ cat trap.d
#!/usr/sbin/dtrace -Fs
        curthread->_tu._ts._t_astflag == 1 &&
        curthread->_tu._ts._t_sig_check == 1 &&
        execname == "alarm"
        trace (timestamp);
        self->trace = 1;

        self->trace == 1
        trace (timestamp);
        self->trace = 0;

        self->trace == 1
        trace (timestamp);

        self->trace == 1
        trace (timestamp);

So we run these two together so we can see what happens. I'll annotate this in-line

$ trap.d
dtrace: script 'trap.d' matched 46375 probes
  1  -> trap_cleanup                             94182830223096
  1   | trap_cleanup:entry                       94182830229763
  1    -> sigcheck                               94182830231096
  1    <- sigcheck                               94182830232763

sigcheck is actually part of the ISSIG_PENDING macro which is used to establish if there is a signal there and if we are allowed to do anything with it.
  1    -> issig                                  94182830234346
  1    <- issig                                  94182830235763
  1    -> issig_forreal                          94182830237763
  1      -> schedctl_finish_sigblock             94182830239763
  1      <- schedctl_finish_sigblock             94182830241263
  1      -> fsig                                 94182830244096
  1        -> sigdiffset                         94182830246013
  1        <- sigdiffset                         94182830247346
  1      <- fsig                                 94182830249263
  1      -> fsig                                 94182830250263
  1        -> sigdiffset                         94182830251096
  1        <- sigdiffset                         94182830251930
  1        -> lowbit                             94182830253430
  1        <- lowbit                             94182830254513
  1      <- fsig                                 94182830255263
  1      -> sigdeq                               94182830257513
  1      <- sigdeq                               94182830259430
  1      -> isjobstop                            94182830261013
  1      <- isjobstop                            94182830262430
  1      -> undo_watch_step                      94182830264346
  1      <- undo_watch_step                      94182830266096
  1    <- issig_forreal                          94182830267180
All of this is issig_forreal() a function which extracts the next pending signal and puts it in to p_cursig. We then call psig ()
  1    -> psig                                   94182830268763
  1      -> schedctl_finish_sigblock             94182830269680
  1      <- schedctl_finish_sigblock             94182830270596
Blocks all signals if requested by the userlevel code setting tdp->sc_shared->sc_sigblock

  1      -> sigorset                             94182830273596
  1      <- sigorset                             94182830275013

sendsig() is platform dependant. This is for the sparc version so we go in to sendsig32()
  1      -> sendsig32                            94182830277430
  1        -> flush_user_windows_to_stack        94182830279263
  1        <- flush_user_windows_to_stack        94182830280346
Make sure the stack has all been flushed to the stack so we can safely change the stack pointer. We're going to have to contruct a stack frame for our signal handler so the current registers have to have somewhere to go on the stack. So we need to know how big a stack frame is, we do this by incrementing minstacksz. First of all the floating point registers if the fpu is enabled then the general registers, then the extra register set.
  1        -> xregs_getsize                      94182830282596
  1        <- xregs_getsize                      94182830283930
Watch point handling
  1        -> watch_disable_addr                 94182830285846
  1          -> avl_numnodes                     94182830287096
  1          <- avl_numnodes                     94182830288263
  1        <- watch_disable_addr                 94182830289346
From here we're copying the context (all the register information) in to the area on the stack we've just set up.
  1        -> kmem_alloc                         94182830290513
  1          -> kmem_cache_alloc                 94182830291846
  1          <- kmem_cache_alloc                 94182830293430
  1        <- kmem_alloc                         94182830294680
  1        -> savecontext32                      94182830296430
  1          -> flush_user_windows_to_stack      94182830298680
  1          <- flush_user_windows_to_stack      94182830299513
  1          -> getgregs32                       94182830301763
  1            -> mkpsr                          94182830304680
  1            <- mkpsr                          94182830306096
  1          <- getgregs32                       94182830307596
  1          -> getfpregs                        94182830309430
  1            -> lwp_getdatamodel               94182830310513
  1            <- lwp_getdatamodel               94182830311430
  1          <- getfpregs                        94182830315096
  1          -> fpuregset_nto32                  94182830316513
  1          <- fpuregset_nto32                  94182830318680
  1        <- savecontext32                      94182830320013
  1        -> xregs_setptr32                     94182830321846
  1        <- xregs_setptr32                     94182830323013
  1        -> kmem_alloc                         94182830324013
  1          -> kmem_cache_alloc                 94182830325096
  1          <- kmem_cache_alloc                 94182830326263
  1        <- kmem_alloc                         94182830327180
  1        -> xregs_get                          94182830328846
  1          -> xregs_getgregs                   94182830331263
  1          <- xregs_getgregs                   94182830333013
  1          -> xregs_getgfiller                 94182830334763
  1          <- xregs_getgfiller                 94182830336180
  1        <- xregs_get                          94182830337346
  1        -> xregs_getfpregs                    94182830339346
  1          -> xregs_getfpfiller                94182830341513
  1          <- xregs_getfpfiller                94182830343346
  1        <- xregs_getfpregs                    94182830344846
  1        -> kmem_free                          94182830347013
  1        <- kmem_free                          94182830347930
  1        -> kmem_cache_free                    94182830348930
  1        <- kmem_cache_free                    94182830349846
  1        -> kmem_free                          94182830352263
  1        <- kmem_free                          94182830353013
  1        -> kmem_cache_free                    94182830353930
  1        <- kmem_cache_free                    94182830354846
  1        -> watch_disable_addr                 94182830356513
  1          -> avl_numnodes                     94182830357430
  1          <- avl_numnodes                     94182830358430
  1        <- watch_disable_addr                 94182830359263
  1        -> ucopy                              94182830361013
  1        <- ucopy                              94182830362680
  1      <- sendsig32                            94182830364680
Interestingly we see nothing in the dtrace output for the crucial bit of the code that sets the registers up to run the signal handler. This is because it is simply done by manipulating the register within sendsig32. Here is a snippit of the code that does this
1451        /\*
1452         \* Set up user registers for execution of signal handler.
1453         \*/
1454        rp->r_sp = (uintptr_t)fp;
1455        rp->r_pc = (uintptr_t)hdlr;
1456        rp->r_npc = (uintptr_t)hdlr + 4;
1457        /\* make sure %asi is ASI_PNF \*/
1458        rp->r_tstate &= ~((uint64_t)TSTATE_ASI_MASK << TSTATE_ASI_SHIFT);
1459        rp->r_tstate |= ((uint64_t)ASI_PNF << TSTATE_ASI_SHIFT);
1460        rp->r_o0 = sig;
1461        rp->r_o1 = (uintptr_t)sip_addr;
1462        rp->r_o2 = (uintptr_t)&fp->uc;

  1    <- psig                                   94182830366263
  1  <- trap_cleanup                             94182830368013
So once we return to userland the registers we're using are the ones set at the end of sendsig32. As a result we start running in the signal handler. Simple really. With dtrace it is very simple to see the flow of code through a function. Combined with the opensolaris source browser you can quickly learn about new areas. Essential when tracking down bugs and customer problems.
Technorati Tag:
Technorati Tag:
Technorati Tag:

Chris W Beal


« March 2015