Wednesday Sep 26, 2007

dtrace in destructive mode: working around a non-working syscall

Last week, I was at a customer's site. They are testing our new SPARC Enterprise M8000 Servers and we were in the course of deploying standard applications to the (2) systems under test.

We discovered that one of the applications just would not run on Solaris 10 Update 4. It is an application, let's call it tapewriter, which had been Solaris certified for years, but it does a malformed ioctl request when doing a SCSI inquiry, which it does at the very start trying to access a tape drive.

Analysis: What is happening there?

With truss, we could see tapewriter issues a uscsi ioctl call and the system returns an "invalid argument" error.

ioctl(5, 0x04C9, 0xFFBFF690)                    Err#22 EINVAL

The 0x04c9 function parameter denotes a uscsi request. The third argument points to a struct called uscsi_cmd32. We can learn that from /usr/include/sys/scsi/impl/uscsi.h, and the struct is defined there as well: 

struct uscsi_cmd32 {
        int             uscsi_flags;    /\* read, write, etc. see below \*/
        short           uscsi_status;   /\* resulting status  \*/
        short           uscsi_timeout;  /\* Command Timeout \*/
        caddr32_t       uscsi_cdb;      /\* cdb to send to target \*/
        caddr32_t       uscsi_bufaddr;  /\* i/o source/destination \*/
        size32_t        uscsi_buflen;   /\* size of i/o to take place \*/
        size32_t        uscsi_resid;    /\* resid from i/o operation \*/
        uchar_t         uscsi_cdblen;   /\* # of valid cdb bytes \*/
        uchar_t         uscsi_rqlen;    /\* size of uscsi_rqbuf \*/
        uchar_t         uscsi_rqstatus; /\* status of request sense cmd \*/
        uchar_t         uscsi_rqresid;  /\* resid of request sense cmd \*/
        caddr32_t       uscsi_rqbuf;    /\* request sense buffer \*/
        caddr32_t       uscsi_reserved_5;       /\* Reserved for Future Use \*/
};

Maybe you know as little about SCSI as I do, so let me explain that this is a call to issue generic scsi commands to a device. The scsi command itself is coded in the cdb=command descriptor block. For a SCSI inquiry, which asks "Who's that on the other end of the line?", this looks like e.g. 0x12 0x00 0x00 0x00 x0ff 0x00. 0x12 means inquiry, and 0xff means "my buffer to accomodate data returned is 255 bytes in size". The data returned by the scsi command will be placed in the buffer provided by uscsi_bufaddr. As you want to read data, you will put 0x08 (READ) in uscsi_flags, which tells uscsi what you are about to do.

Sometimes, there is an error message pending on the device that has not been reported yet. This is called the "check condition" state. Before issuing a command, you should check whether the device is ok, and if it is in "check condition" state (the inquiry will report that), you may feel inclined not to issue the command, but another one called "Request Sense", which will get you detailed error status information.

Oh well, that's two requests instead of one. What a load of work. This is why uscsi offers to issue the Request Sense command for you when there is a check condition.

Where's the point? If you request uscsi to do that for you (by setting  USCSI_RQENABLE =0x10000 in uscsi_flags), you have to provide an additional buffer where the Request Sense data can be put.

So why doesn't tapewriter work?

dtrace giving insight

I wrote a small dtrace script that shows the parameters passed to ioctl. It looks like this:

#!/usr/sbin/dtrace -s
#pragma D option quiet

struct uscsi_cmd32      \*ustruct;

syscall::ioctl:entry
/(arg1==0x4c9)&&(execname=="tapewriter")/
{
ustruct=copyin(arg2,sizeof(\*ustruct));
printf ("E pid proc: %i %s\\n",pid,execname);
printf ("flags: %x\\n",ustruct->uscsi_flags);
printf ("uscsi_timeout: %x\\n",ustruct->uscsi_timeout);
printf ("uscsi_cdblen: %x\\n",ustruct->uscsi_cdblen);
printf ("uscsi_buflen: %x\\n",ustruct->uscsi_buflen);
printf ("uscsi_bufaddr: %lx\\n",(unsigned long)ustruct->uscsi_bufaddr);
printf ("uscsi_rqlen: %x\\n",ustruct->uscsi_rqlen);
printf ("uscsi_rqbuf: %lx\\n",(unsigned long)ustruct->uscsi_rqbuf);
printf ("uscsi_cdb: %lx\\n",(unsigned long)ustruct->uscsi_cdb);

first=(char \*)copyin(ustruct->uscsi_cdb,6);
printf ("cdb first byte: %i hex %x\\n",first[0],first[0]);
printf ("cdb 2nd byte: %i hex %x\\n",first[1],first[1]);
printf ("cdb 3rd byte: %i hex %x\\n",first[2],first[2]);
printf ("cdb 4th byte: %i hex %x\\n",first[3],first[3]);
printf ("cdb 5th byte: %i hex %x\\n",first[4],first[4]);
printf ("cdb 6th byte: %i hex %x\\n",first[5],first[5]);
self->ok=1;self->bufadr=arg2;
}

syscall::ioctl:return
/self->ok/
{
 self->ok=0;
 printf ("R pid proc errno: %i %s %i\\n",pid,execname,errno);
 printf ("retval: %i\\n",arg0);
 ustruct=copyin(self->bufadr,sizeof(\*ustruct));
 self->bufadr=0;
 printf ("flags: %x\\n",ustruct->uscsi_flags);
 printf ("uscsi_status: %i\\n",ustruct->uscsi_status);
 printf ("uscsi_timeout: %x\\n",ustruct->uscsi_timeout);
 printf ("uscsi_cdblen: %x\\n",ustruct->uscsi_cdblen);
 printf ("uscsi_buflen: %x\\n",ustruct->uscsi_buflen);
 printf ("uscsi_bufaddr: %lx\\n",(unsigned long)ustruct->uscsi_bufaddr);
 printf ("uscsi_rqlen: %x\\n",ustruct->uscsi_rqlen);
 printf ("uscsi_rqstatus: %x\\n",ustruct->uscsi_rqstatus);
 printf ("uscsi_rqbuf: %lx\\n",(unsigned long)ustruct->uscsi_rqbuf);
 printf ("uscsi_cdb: %lx\\n\\n",(unsigned long)ustruct->uscsi_cdb);
}

This script will show the contents of the uscsi command block as the ioctl is entered and left by tapewriter.

For those who are not familiar with dtrace:

dtrace is a framework and a command. The command dtrace, which is used to execute this script, configures the framework to watch the processor to trigger probes as it executes processes. The script is a "d script", written in "d", the language that is used to describe what dtrace should look for (i.e. the probes that should be watched) and what to do (the actions executed when a probe "fires"). This script watches for processes entering ("entry" probe) or leaving ("return" probe) the ioctl syscall. There is a filter on the entry call that leaves all ioctls aside that are not invoking the uscsi function (0x4c9) or are not done by an executable called "tapewriter".  The self->ok filter on the return probe makes sure we passed the entry call before: self->xxx variables are thread-local variables. self->bufadr is used to remember the third entry parameter, which is the address of the uscsi command struct. Ok, well, I could have used a single variable for this purpose.

For syscall entry probes, arg0...arg5 are variables representing the arguments passed to the syscall. For syscall return probes, arg0 contains the return code to be passed up the callstack by the syscall.

Most of the actions look like c and are self-explanatory.

People writing kernel modules will know about copyin and copyout: These functions copy data from kernel to user address space and vice-versa.

This is what we saw:

root@m8000a:/root/dtrace>./ioctl.d
E pid proc: 17738 tapewriter
flags: 10028
uscsi_status: 0
uscsi_timeout: 7fff
uscsi_cdblen: 6
uscsi_buflen: ff
uscsi_bufaddr: 565598
uscsi_rqlen: 84
uscsi_rqstatus: 0
uscsi_rqbuf: 0
uscsi_cdb: 24d9f0
cdb first byte: 18 hex 12
cdb 2nd byte: 0 hex 0
cdb 3rd byte: 0 hex 0
cdb 4th byte: 0 hex 0
cdb 5th byte: 255 hex ff
cdb 6th byte: 0 hex 0

R pid proc errno: 17738 tapewriter 22
retval: -1
flags: 2
uscsi_status: 0
uscsi_timeout: 454a
uscsi_cdblen: 0
uscsi_buflen: 79736572
uscsi_bufaddr: 416e616c
uscsi_rqlen: 0
uscsi_rqstatus: 0
uscsi_rqbuf: 0
uscsi_cdb: 54617065


The first block of output is written upon entry, the first upon return from the call. The error code of 22 means EINVAL, "Invalid Argument". Did anything catch your eye?

Yep. There is RQENABLE set, but rqbuf is zero. Up to Solaris 10 update 3, Solaris did tolerate this (by simply ignoring the buffer if its null), but with Update 4, the semantics have changed and Solaris now explicitely checks for the buffer if RQENABLE is set. This is where it decides to return EINVAL.

Exercise to the reader: Find these pieces of code in the Solaris source. At about what time has the code been changed? You can browse the source at OpenSolaris.org

Stuck!?

For some reason, we really needed to run tapewriter, otherwise we could not proceed with the test. tapewrite is a third party application, and the manufacturer was already developing a patch (thanks to dtrace we could exactly tell him where to look (I was sure it was the flag as I quickly wrote a c application to check that)). However, this will take some time and it seemed we could not work around that problem.  Switching to U3 was not an option, as that would require re-importing the database, which took nine days.

We were expecting to run into even more problems, as other interfaces may have changed as well. So we had to go on as quickly as possible, bypassing the faulty call.

dtrace saving us ... lots of time

The answer was to have dtrace change tapewriter's call on the fly, clearing the RQENABLE flag.

The revised script looks like this:

#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option destructive

struct uscsi_cmd32      \*ustruct;

syscall::ioctl:entry
/(arg1==0x4c9)&&(execname=="tapewriter")/
{
ustruct=copyin(arg2,sizeof(\*ustruct));
printf ("E pid proc: %i %s\\n",pid,execname);
printf ("flags: %x\\n",ustruct->uscsi_flags);

/\* clear RQENABLE flag \*/
ustruct->uscsi_flags &=~(0x10000);
copyout(ustruct,arg2,sizeof(ustruct->uscsi_flags));

/\* read back struct to verify change \*/
ustruct=copyin(arg2,sizeof(\*ustruct));
printf ("flags: %x\\n",ustruct->uscsi_flags);
printf ("uscsi_timeout: %x\\n",ustruct->uscsi_timeout);
printf ("uscsi_cdblen: %x\\n",ustruct->uscsi_cdblen);
printf ("uscsi_buflen: %x\\n",ustruct->uscsi_buflen);
printf ("uscsi_bufaddr: %lx\\n",(unsigned long)ustruct->uscsi_bufaddr);
printf ("uscsi_rqlen: %x\\n",ustruct->uscsi_rqlen);
printf ("uscsi_rqbuf: %lx\\n",(unsigned long)ustruct->uscsi_rqbuf);
printf ("uscsi_cdb: %lx\\n",(unsigned long)ustruct->uscsi_cdb);

first=(char \*)copyin(ustruct->uscsi_cdb,6);
printf ("cdb first byte: %i hex %x\\n",first[0],first[0]);
printf ("cdb 2nd byte: %i hex %x\\n",first[1],first[1]);
printf ("cdb 3rd byte: %i hex %x\\n",first[2],first[2]);
printf ("cdb 4th byte: %i hex %x\\n",first[3],first[3]);
printf ("cdb 5th byte: %i hex %x\\n",first[4],first[4]);
printf ("cdb 6th byte: %i hex %x\\n",first[5],first[5]);
self->ok=1;self->bufadr=arg2;
}

syscall::ioctl:return
/self->ok/
{
 self->ok=0;
 printf ("R pid proc errno: %i %s %i\\n",pid,execname,errno);
 printf ("retval: %i\\n",arg0);
 ustruct=copyin(self->bufadr,sizeof(\*ustruct));
 self->bufadr=0;
 printf ("flags: %x\\n",ustruct->uscsi_flags);
 printf ("uscsi_status: %i\\n",ustruct->uscsi_status);
 printf ("uscsi_timeout: %x\\n",ustruct->uscsi_timeout);
 printf ("uscsi_cdblen: %x\\n",ustruct->uscsi_cdblen);
 printf ("uscsi_buflen: %x\\n",ustruct->uscsi_buflen);
 printf ("uscsi_bufaddr: %lx\\n",(unsigned long)ustruct->uscsi_bufaddr);
 printf ("uscsi_rqlen: %x\\n",ustruct->uscsi_rqlen);
 printf ("uscsi_rqstatus: %x\\n",ustruct->uscsi_rqstatus);
 printf ("uscsi_rqbuf: %lx\\n",(unsigned long)ustruct->uscsi_rqbuf);
 printf ("uscsi_cdb: %lx\\n\\n",(unsigned long)ustruct->uscsi_cdb);
}


To make dtrace accept the copyout command, I had to include the destructive directive at the start of the script. 

And it worked! It worked so well they decided to set this script up as a pre-exec script to tapewriter, meaning that it will be run automatically every time tapewriter is run. There is a post-exec script that kills dtrace when tapewriter is done.

Disclaimer

This is a rather dirty hack and we could not really forsee what the implications were for the execution of tapewriter. Remember that we did this in a testing environment, and rendering the system unusable would not have had any business impact.

Use with caution! In this case, it just saved about a week of waiting for the patch.

 

About

Dirk Wriedt

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
Software for Solaris
Favourite blogs
Dirks Bookmarks

No bookmarks in folder