Wednesday Sep 09, 2009

Understanding iostat

1Iostat has been around for years and until Dtrace came along and allowed us to look more deeply into the kernel was the tool for analysing how the io subsystem was working in Solaris. However interpreting the output has proved in the past to cause problems.

First if you are looking at latency issues it is vital that you use the smallest time quantum to iostat you can, which as of Solaris 10 is 1 second. Here is a sample of some output produced from “iostat -x 1”:

                  extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       5.0 1026.5    1.6 1024.5  0.0 25.6   24.8   0  23 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 


The first thing to draw your attention to is the Column “%b” which the manual tells you is:

%b percent of time the disk is busy (transactions in progress)

So in this example the disk was “busy”, ie had at least one transaction (command) in progress for 23% of the time period. Ie 0.23 seconds as the time period was 1 second.

Now look at the “actv” column. Again the manual says:

actv average number of transactions actively being serviced (removed from the queue but not yet completed)
This is the number of I/O operations accepted, but not yet serviced, by the device.
In this example the average number of transactions outstanding for this time quantum was 25.6. Now here is the bit that is so often missed. Since we know that all the transactions actually took place within 0.23 seconds and were not evenly spread across the full second the average queue depth when busy was 100/23 \* 25.6 or 111.3. Thanks to dtrace and this D script you can see the actual IO pattern2:

Even having done the maths iostat smooths out peaks in the IO pattern and thus under reports the peak number of transactions as 103.0 when the true value is 200.
The same is true for the bandwidth. The iostat above comes reports 1031.5 transactions a second (r/s + w/s) again though this does not take into account that all those IO requests happened in 0.23 seconds. So the true figure for the device would be 1031.5 \* 100/23 which is 4485 transations/sec.
If we up the load on the disk a bit then you can conclude more from the iostat:
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       5.0 2155.7    1.6 2153.7 30.2 93.3   57.1  29  45 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0 3989.1    0.0 3989.1 44.6 157.2   50.6  41  83 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 

Since the %w column is non zero, and from the manual %w is:

%w percent of time there are transactions waiting for service (queue non-empty)

This is telling us that the device's active queue was full. So on the third line of the above output the devices queue was full for 0.41 seconds. Since the queue depth is quite easy to find out3 and in this case was 256, you can deduce that the queue depth for that 0.41 seconds was 256. Thus the average for the 0.59 seconds left was (157.2-(0.41\*256))/0.59 which is 88.5. The graph of the dtrace results tells a different story:


These examples demonstrate what can happen if your application dumps a large number of transactions onto a storage device while the through put will be fine and if you look at iostat data things can appear ok if the granularity of the samples is not close to your requirement for latency any problem can be hidden by the statistical nature of iostat.

1Apologies to those who saw a draft of this on my blog briefly.

2The application creating the IO attempts to keep 200 transations in the disk at all the time. It is interesting to see that it fails as it does not get notification of the completion of the IO until all or nearly all the outstanding transactions have completed.

3This command will do it for all the devices on your system:

echo '\*sd_state::walk softstate | ::print -d -at "struct sd_lun" un_throttle' | pfexec mdb -k

however be warned the throttle is dynamic so dtrace gives the real answer.

Thursday Sep 03, 2009

Improved sd.conf format

Editing sd.conf has always been somewhat difficult thanks to it not being a documented interface and that the interface was never inteded to be exposed and it was even architecture specific. Fortunately Micheal documented it, which meant that it was known even if syntax remained obscure.

However after ARC case 2008/465 was approved and the changes pushed as part of bug 6518995 you can now use more a human readable syntax1:

sd-config-list=
        "ATA     VBOX HARDDISK", "disksort:false";

As it turns out the “disksort”2 option along with the thottle-max and throttle-min are the ones I most often want to tune.

Here is the current list of tunables lifted straight from the ARC case.


Tunable_Name

Commitment

Data_Type

cache-nonvolatile

Private

BOOLEAN

controller-type

Private

UINT32

delay-busy

Committed

UINT32

disksort

Private

BOOLEAN

timeout-releasereservation

Private

UINT32

reset-lun

Private

BOOLEAN

retries-busy

Private

UINT32

retries-timeout

Committed

UINT32

retries-notready

Private

UINT32

retries-reset

Private

UINT32

throttle-max

Private

UINT32

throttle-min

Private

UINT32


1This reminds me of the change to /etc/printcap that allowed you to specify the terminal flags as strings rather than as a bitmap. All the mystery seemed to be removed!

2While I used disksort as an example for this case I can't think of any reason why you would have it enabled for a virtual disk in VirtualBox.

Friday Jul 31, 2009

Adding a Dtrace provider to the kernel

Since writing scsi.d I have been pondering if there should really be a scsi dtrace provider that allows you to do all that scsi.d does and more. Since the push of 6797025 that both removed the main reason for not doing this and also gave impetus to do it as scsi.d needed incompatible changes to use the new return function as the return “probe”.

This work is very much work in progress and may or may not see the light of day due to some other issues around scsi addressing, however I thought I would document how I added a kernel dtrace provider so if you want to you don't have to do so much searching1.

Adding the probes themselves is simplicity itself using the DTRACE_PROBEN() macros. Following the convention I added this macro:

#define	DTRACE_SCSI_2(name, type1, arg1, type2, arg2)			\\
	DTRACE_PROBE2(__scsi_##name, type1, arg1, type2, arg2);

to usr/src/uts/common/sys/sdt.h. Then after including <sys/sdt.h> in each file I put this macro in each of the places I wanted my probes:

 	DTRACE_SCSI_2(transport, struct scsi_pkt \*, pkt,
 	    struct scsi_address \*, P_TO_ADDR(pkt))

The bit that took a while to find was how to turn these into a provider. To do that edit the file “usr/src/uts/common/dtrace/sdt_subr.c” and create the attribute structure2:

 static dtrace_pattr_t scsi_attr = {
 { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
 { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
 { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
 { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_ISA },
 { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
 };

and add it to the sdt_providers array:


	{ "scsi", "__scsi_", &scsi_attr, 0 },

than add the probes to the sdt_args array:

	{ "scsi", "transport", 0, 0, "struct scsi_pkt \*", "scsi_pktinfo_t \*"},
	{ "scsi", "transport", 1, 1, "struct scsi_address \*", "scsi_addrinfo_t \*"},
	{ "scsi", "complete", 0, 0, "struct scsi_pkt \*", "scsi_pktinfo_t \*"},
	{ "scsi", "complete", 1, 1, "struct scsi_address \*", "scsi_addrinfo_t \*"},

Finally you need to create a file containing the definitions of the output structures, scsi_pktinfo_t and scsi_addrinfo_t and define translators for them. That goes into /usr/lib/dtrace and I called mine scsa.d (there is already one called scsi.d).

/\*
 \* CDDL HEADER START
 \*
 \* The contents of this file are subject to the terms of the
 \* Common Development and Distribution License (the "License").
 \* You may not use this file except in compliance with the License.
 \*
 \* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 \* or http://www.opensolaris.org/os/licensing.
 \* See the License for the specific language governing permissions
 \* and limitations under the License.
 \*
 \* When distributing Covered Code, include this CDDL HEADER in each
 \* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 \* If applicable, add the following below this CDDL HEADER, with the
 \* fields enclosed by brackets "[]" replaced with your own identifying
 \* information: Portions Copyright [yyyy] [name of copyright owner]
 \*
 \* CDDL HEADER END
 \*/
/\*
 \* Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
 \* Use is subject to license terms.
 \*/

#pragma D depends_on module scsi
#pragma D depends_on provider scsi

inline char TEST_UNIT_READY = 0x0;
#pragma D binding "1.0" TEST_UNIT_READY
inline char REZERO_UNIT_or_REWIND = 0x0001;
#pragma D binding "1.0" REZERO_UNIT_or_REWIND

inline char SCSI_HBA_ADDR_COMPLEX = 0x0040;
#pragma D binding "1.0" SCSI_HBA_ADDR_COMPLEX

typedef struct scsi_pktinfo {
	caddr_t pkt_ha_private;
	uint_t	pkt_flags;
	int	pkt_time;
	uchar_t \*pkt_scbp;
	uchar_t \*pkt_cdbp;
	ssize_t pkt_resid;
	uint_t	pkt_state;
	uint_t 	pkt_statistics;
	uchar_t pkt_reason;
	uint_t	pkt_cdblen;
	uint_t	pkt_tgtlen;
	uint_t	pkt_scblen;
} scsi_pktinfo_t;

#pragma D binding "1.0" translator
translator scsi_pktinfo_t  < struct scsi_pkt \*P > {
	pkt_ha_private = P->pkt_ha_private;
	pkt_flags = P->pkt_flags;
	pkt_time = P->pkt_time;
	pkt_scbp = P->pkt_scbp;
	pkt_cdbp = P->pkt_cdbp;
	pkt_resid = P->pkt_resid;
	pkt_state = P->pkt_state;
	pkt_statistics = P->pkt_statistics;
	pkt_reason = P->pkt_reason;
	pkt_cdblen = P->pkt_cdblen;
	pkt_tgtlen = P->pkt_tgtlen;
	pkt_scblen = P->pkt_scblen;
};

typedef struct scsi_addrinfo {
	struct scsi_hba_tran	\*a_hba_tran;
	ushort_t a_target;	/\* ua target \*/
	uchar_t	 a_lun;		/\* ua lun on target \*/
	struct scsi_device \*a_sd;
} scsi_addrinfo_t;

#pragma D binding "1.0" translator
translator scsi_addrinfo_t  < struct scsi_address \*A > {
	a_hba_tran = A->a_hba_tran;
	a_target = !(A->a_hba_tran->tran_hba_flags & SCSI_HBA_ADDR_COMPLEX) ?
		0 : A->a.spi.a_target;
	a_lun = !(A->a_hba_tran->tran_hba_flags & SCSI_HBA_ADDR_COMPLEX) ?
		0 : A->a.spi.a_lun;
	a_sd = (A->a_hba_tran->tran_hba_flags & SCSI_HBA_ADDR_COMPLEX) ?
		A->a.a_sd : 0;
};

again this is just enough to get going so I can see and use the probes:

jack@v4u-2500b-gmp03:~$ pfexec dtrace -P scsi -l
   ID   PROVIDER            MODULE                          FUNCTION NAME
 1303       scsi              scsi                    scsi_transport transport
 1313       scsi              scsi                 scsi_hba_pkt_comp complete
jack@v4u-2500b-gmp03:~$ 

While this all works well for parallel scsi getting the address of devices on fibre is not clear to me. If you have any suggestions I'm all ears.

1If there is such a document already in existence then please add a comment. I will just wish I could have found it.

2These may not be the right attributes but gets me to the point it compiles and can be used in a PoC.

Friday Mar 20, 2009

Limiting scsi.d output to IOs that take a long time.

I have added support for only reporting on scsi packets that take more than a certain amount of time to complete to scsi.d. This is partularly useful when combined with the PERF_REPORT option to see just the IO requests that took over a time threshold and still collect statistics on all the other IO requests.

To use this you have to specify “-D REPORT_OVERTIME=X” where X is the time you wish to report on in nano seconds. Here is an example that will only print the details of IO requests that took more than 250ms:

# scsi.d -D REPORT_OVERTIME=$((250\*1000\*1000)) -D PERF_REPORT 
Only reporting IOs longer than 250000000ns
Hit Control C to interrupt
00005.388800363 glm0:-> 0x2a WRITE(10) address 00:00, lba 0x00270e67, len 0x000008, control 0x00 timeout 60 CDBP 300002df438 1 sched(0) cdb(10) 2a0000270e6700000800
00005.649494475 glm0:<- 0x2a WRITE(10) address 00:00, lba 0x00270e67, len 0x000008, control 0x00 timeout 60 CDBP 300002df438, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 260833us
00005.384612799 glm0:-> 0x0a  WRITE(6) address 00:00, lba 0x048564, len 0x000001, control 0x00 timeout 60 CDBP 30002541ac0 1 sched(0) cdb(6) 0a0485640100
00005.716416658 glm0:<- 0x0a  WRITE(6) address 00:00, lba 0x048564, len 0x000001, control 0x00 timeout 60 CDBP 30002541ac0, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 331939us
00005.385907691 glm0:-> 0x0a  WRITE(6) address 00:00, lba 0x0605b4, len 0x000001, control 0x00 timeout 60 CDBP 300035637a0 1 sched(0) cdb(6) 0a0605b40100
00005.773925990 glm0:<- 0x0a  WRITE(6) address 00:00, lba 0x0605b4, len 0x000001, control 0x00 timeout 60 CDBP 300035637a0, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 388153us
00005.389078533 glm0:-> 0x2a WRITE(10) address 00:00, lba 0x004b19d3, len 0x000003, control 0x00 timeout 60 CDBP 300002df0f8 1 sched(0) cdb(10) 2a00004b19d300000300
00005.824242527 glm0:<- 0x2a WRITE(10) address 00:00, lba 0x004b19d3, len 0x000003, control 0x00 timeout 60 CDBP 300002df0f8, reason 0x0 (COMPLETED) pkt_state 0x1f state 0x0 Success Time 435303us
\^C

  glm                                                       0
           value  ------------- Distribution ------------- count    
         4194304 |                                         0        
         8388608 |                                         3        
        16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        678      
        33554432 |@@@@@                                    98       
        67108864 |@                                        27       
       134217728 |                                         8        
       268435456 |                                         3        
       536870912 |                                         0        

# 

The implementation of this uses dtrace speculations and as such may require some tuning of the various settings. Specifically the number of speculation buffers I have set to 1000 which should be enough for all but the busiest systems but if you do reach that limit you can increase them using the following options:


-D NSPEC=N

Set the number of speculations to this value.

-D SPECSIZE=N

Set the size of the speculaton buffer. This should be 200 times the size of NSPEC.

-D CLEANRATE=N

Specify the clean rate.

As usual the script is available here. It is version 1.18.

Tuesday Mar 10, 2009

Update to scsi.d

I've just released a new version of scsi.d that fixes a strange bug that only effected 10 but would result in scsi.d not starting:

# dtrace -qCs scsi.d-1.16  
dtrace: failed to compile script scsi.d-1.16: "/usr/include/sys/scsi/scsi_pkt.h", line 33: incomplete struct/union/enum struct scsi_address: pkt_address
# 

The solution was simple even if I still don't really understand why the header included fine in OpenSolaris but not in 10.

Tuesday Jan 27, 2009

New version of scsi.d required for build 106

This version supports some more filters. Specifically you can now specify these new options:

  • MIN_BLOCK only report on IO to less than or equal to this value.

  • MAX_BLOCK only report on IO to blocks greater or equal to this value.

This is most useful for limiting your trace to particular block ranges, be they file system or as was the case that caused me to add this to see who is trampling on the disk label.

In this contrived example it was format:

pfexec /usr/sbin/dtrace -o /tmp/dt.$$ -Cs  scsi.d -D MAX_BLOCK=3 
<SNIP>
00058.529467684 glm0:-> 0x0a  WRITE(6) address 01:00, lba 0x000000, len 0x000001
, control 0x00 timeout 60 CDBP 60012635560 1 format(3985) cdb(6) 0a0000000100
00058.542945891 glm0:<- 0x0a  WRITE(6) address 01:00, lba 0x000000, len 0x000001
, control 0x00 timeout 60 CDBP 60012635560, reason 0x0 (COMPLETED) pkt_state 0x1
f state 0x0 Success Time 13604us

While this answered my question there are neater ways of answering the question just by using the IO provider:

: s4u-nv-gmp03.eu TS 68 $; pfexec /usr/sbin/dtrace -n 'io:::start / args[0]->b_blkno < 3 && args[0]->b_flags & B_WRITE / { printf("%s %s %d %d", execname, args[1]->dev_statname, args[0]->b_blkno, args[0]->b_bcount) }'
dtrace: description 'io:::start ' matched 6 probes
CPU     ID                    FUNCTION:NAME
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512
  0    629             default_physio:start format sd0 0 512

Also build 106 of nevada has changed the structure definition for scsi_address and in doing so this breaks scsi.d which has intimate knowledge of scsi_address structures. I have a solution that you can download but in writing it I also filed this bug:

679803 dtrace suffers from macro recursion when including scsi_address.h

which scsi.d has to work around. When that bug is resolved the work around may have to be revisited.

All versions of scsi.d are available here and this specific verison, version 1.16 here.

Thank you to Artem Kachitchkine for bringing the changes to scsi_address.h and their effects on scsi.d to my attention.

Tuesday Dec 23, 2008

Reading random disk blocks with format

Occasionally it is useful to be able to read blocks from disks when there is no label on the disk. Since most applications won't be able to open the device as you need to use O_NDELAY flag to the open system call.

Luckily it is possible to use format to read arbitrary disk blocks so you don't have to resort to writing a special application. The trick is to use the read analysis option to format and then restrict the blocks that you want to read down to the the blocks that you are interested in. Then once read use the print buffer command to output the data:

# format

Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133>
          /pci@1f,4000/scsi@3/sd@0,0
Specify disk (enter its number): 0
selecting c0t0d0
[disk formatted]
Warning: Current Disk has mounted partitions.


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format> ana


ANALYZE MENU:
        read     - read only test   (doesn't harm SunOS)
        refresh  - read then write  (doesn't harm data)
        test     - pattern testing  (doesn't harm data)
        write    - write then read      (corrupts data)
        compare  - write, read, compare (corrupts data)
        purge    - write, read, write   (corrupts data)
        verify   - write entire disk, then verify (corrupts data)
        print    - display data buffer
        setup    - set analysis parameters
        config   - show analysis parameters
        !<cmd>   - execute <cmd> , then return
        quit
analyze> set
Analyze entire disk[yes]? no
Enter starting block number[0, 0/0/0]: 0
Enter ending block number[0, 0/0/0]: 0
Loop continuously[no]? 
Enter number of passes[2]: 1
Repair defective blocks[yes]? 
Stop after first error[no]? yes
Use random bit patterns[no]? 
Enter number of blocks per transfer[1, 0/0/1]: 
Verify media after formatting[yes]? 
Enable extended messages[no]? 
Restore defect list[yes]? 
Restore disk label[yes]? 

analyze> read
Ready to analyze (won't harm SunOS). This takes a long time, 
but is interruptable with CTRL-C. Continue? y

        pass 0
   0/0/0  

Total of 0 defective blocks repaired.
analyze> print
0x53554e39  0x2e304720  0x63796c20  0x34393234  0x20616c74  0x20322068  
0x64203237  0x20736563  0x20313333  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000001  0x00000000  0x00000000  0x00080002  
0x00000003  0x00010005  0x00000000  0x00010000  0x00010000  0x00010000  
0x00010000  0x00010000  0x00000000  0x00000000  0x00000000  0x600ddeee  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x1518133e  0x00000000  0x00000001  
0x133c0002  0x001b0085  0x00000000  0x00000000  0x00fdc0a1  0x00001217  
0x00100e03  0x00000000  0x010dcea4  0x00000000  0x00000000  0x00000000  
0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  0x00000000  
0x00000000  0xdabe5344  
analyze> 


Now I know those at the back will recongise this as a Solaris SPARC disk label. With vtoc, that give away VTOC_SANITY values of 0x600ddee being in the middle after the ASCII label

Wednesday Nov 05, 2008

Throttling disks

The disk drivers in Solaris support SCSI tagged queuing and have done for a long time. This enables them to send more than one command to a LUN (logical Unit) at a time. The number of commands that can be sent in parallel is limited, throttled, by the disk drivers so that they never send more commands than the LUN can cope with. While it is possible for the LUN to respond with a “queue full” SCSI status to tell the driver that it can not cope with any more commands there are significant problems with relying on this approach:

  • Devices connected via fibre channel have to negotiate onto the loop to return the queue full status. This can mean that by the time the device manages to return queue full the host can have sent many more commands. This risks that the LUN can end up with a situation it can not cope with and typically results in the LUN resetting.

  • If the LUN is being accessed from more than one host it is possible for it to return Queue full on the very first command. This makes it hard for the host to know when it will be safe to send a command since there are none outstanding from that host.

  • If the LUN is part of many LUNs on a single target it may share the total pool of commands that can be accepted by all the LUNS and so again could respond with “queue full” on the first command to a LUN.

  • In the two cases above the total number of commands a single host can send a single LUN will vary depending on conditions that the host simply can not know making adaptive algorithms unreliable.

All the above issues result in people taking the safest option and setting the throttle for a device as low as required so that the LUN never needs to send queue full. In some cases as low as 1. This is bad when limited to an individual LUN, it is terrible when done globally on the entire system.

As soon as you get to the point where you hit the throttle two things happen:

  1. You are no longer transferring data over the interconnect (fibre channel, parallel scsi or iscsi) for writes. This data has to wait until another command can complete before it can be transferred. This then reduces the throughput of the device. You writes can end up being throttled by reads and hence tend towards the speed of the spinning disk if the read has to go to the disk even though you may have a write cache.

  2. The command is queued on the waitq which will increase the latency still further if the queue becomes deep. See here for information about disksort's effect on latency.

Given that the system will regularly dump large numbers of commands on devices for short periods of time you want those commands to be handled as quickly as possible to minimized applications hanging while their IO is completed. If you want to observe the maximum number of commands sent to a device then there is a D script here to do that.

So the advice for configuring storage would be:

Wednesday Oct 29, 2008

Converting sd minor numbers to instance numbers.

I had an email this week about a program that I wrote that would not die. The program is a disk test program that has been around in Sun for a while and with luck will be open sourced in the not to distant future, but I digress. The program was hanging no IO was going on and even sending it a kill -KILL would not kill it.

Generally if processes don't disappear when sent the signal “KILL” that is not the fault of the program. Since there is nothing the program can do to protect itself from KILL you need to look elsewhere. That elsewhere being in the kernel somewhere. So a crash dump was generated and I was pointed at it.

From the stack it was clear that the program could not die as there were outstanding async IO requests pending and looking at the aio_t confirmed this.

Walking the structures down to the first element on the aio_poolq to find a stuck IO and the buf's dev_t to see where we are hung up I do this:

> ::pgrep disko | ::print proc_t p_aio  | ::print aio_t aio_pollq | ::print  aio
_req_t   aio_req_buf.b_edev | ::devt
     MAJOR       MINOR
        27        9474
> 0t27::major2name
sd

Seeing that minor number rang alarm bells as the usual way to convert from a minor number for the sd driver into an instance is to divide by 8 (the number of partitions) but that would still leave over 1000 devices. Possible but not likely. Only at that point did it dawn on me that this was an x86 box which thanks to a long history supports a different number of slices. A short grok in the source and the conversion for x86 is to divide by 64.

> 0t9474%0t64=D
                148             

> \*sd_state::softstate 0t148 | ::print "struct sd_lun"
{
    un_sd = 0xffffff016b72daa8
    un_rqs_bp = 0xffffff07aef1db80
    un_rqs_pktp = 0xffffff0548871080
    un_sense_isbusy = 0
    un_buf_chain_type = 0x1
    un_uscsi_chain_type = 0x8

What shocked me was how far I could get through a crash dump before taking on board the architecture of the system.

Friday May 30, 2008

A more generic scsi.d

I had reason to want to be able to decode scsi_pkt structures in dtrace again this week and it struck me that that is really what scsi.d does. So I went about modifying it to be able to translate the structures in the more general case. The result is that it is now moderately simple to get it to decode a scsi_pkt structure in any routine that can access them.

If that is the boat you are into, and clearly I am, then download the latest scsi.d script (version 1.13 or later) and then look for the comments “FRAMEWORK” to show you where you need to add your code.

I've also added a few more SCSI commands to the commands that scsi.d will understand. These are mostly from the set of commands used by tape devices.

As an example I have left the probe I was interested in in the script, which was why we are seeing ssd reporting illegal requests. Which command does the target not like?

01054.352580740 scsi_vhci0:ILL 0x1d SEND_DIAGNOSTIC address 00:00, lba 0x100100, len 0x000010, control 0x00 timeout 300 CDBP 6002a2681c0 sched(0) cdb(6) 1d1001001000


Then it is a simple matter to explain why this is not a problem. Get the monitoring software that is generating these commands to not talk to the targets that don't understand them.

Friday Jan 11, 2008

Latency bubbles in your disk IO

The following was written in response to an email from a customer about monitoring IO in response to my scsi.d postings. Tim covers where disk IO requests can be queued in his posting titled “Where can I/O queue up in sd/ssdwhich I would recommend as a starting point.

The disk IO sub-systems are built to provide maximum through put which is most often the right thing. However the weakness of tuning for throughput is that occasionally you can get some bizarre behaviour when it comes to latency. The way that optimum IO bandwidth is achieved is by sorting each io by logical block address (LBA) and then issuing those in order to minimize head seek. This is documented in the disksort(9F) manual page.

So if you have a sequence of writes to blocks N, N+1, N+2, N-200, N+3, N+4, N+5,N+6, N+7 in that order and your LUN as a queue depth and therefore throttle of 2.1 The IO's will actually be delivered to the LUN in this order N, N+1, N+2, N+3, N+4, N+5,N+6, N+7, N-200. Hence there will be a significant latency applied to the IO going to LBA N-200 and in practice it is possible to have IO requests delayed on the waitq for many seconds (I have a pathological test case that can hold them there for the time it takes to perform an IO on nearly every block on the LUN, literally hours). You better hope that that IO was not your important one!

This issue only comes into play in the disk driver has reached the throttle for the device as up until that point each IO can be passed straight to the LUN for processing.2 Once the driver has reached the throttle for the LUN it begins queuing IO requests internally and by default will sort them to get maximum throughput. Clearly the lower the throttle the the sooner you get into this potential scenario.

Now for the good news. For most disk arrays sorting by LBA does not make much sense since the LUN will be made up of a number of drives and there will be a read cache and a write cache. So for these devices it makes sense to disable disksort and deliver the IO requests to the LUN in the order in which they are delivered to the disk driver. If you look in the source for sd.c you will see that we do this by default for most common arrays. To achieve this there is a flag, “disable disksort”, that can be set in sd.conf or ssd.conf depending on which driver is in use. See Micheal's blog entry about editing sd.conf. While you are reading that entry note you can use it to set the throttle for individual LUNS so you do not have to set [s]sd_max_throttle, which will penalize all devices rather than just the one you were aiming for. If you have just one that only has a small queue depth and you will see below why a small queue depth can be a really bad thing.

So how could you spot these latency bubbles?

It will come as no surprise that the answer is dtrace. Using my pathological test case, but with it set to run for only 10 minute to a single spindle, the following D produces a clear indication that all is not well:

fbt:ssd:ssdstrategy:entry,
fbt:sd:sdstrategy:entry
{
        start[(struct buf \*)arg0] = timestamp;
}

fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ start[(this->buf = (struct buf \*)((struct scsi_pkt \*)arg0)->pkt_private)] != 0 /
{
        this->un = ((struct sd_xbuf \*) this->buf->b_private)->xb_un;
        @[this->un] = lquantize((timestamp - start[this->buf])/1000000, 
                 60000, 600000, 60000);
        @q[this->un] = quantize((timestamp - start[this->buf])/1000000);
        
        start[this->buf] = 0;
}

This produces the following output3, the times are milliseconds:


dtrace: script 'ssdrwtime.d' matched 4 probes
\^C

 

    6597960853440
           value  ------------- Distribution ------------- count    
         < 60000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 377204   
           60000 |                                         0        
          120000 |                                         0        
          180000 |                                         0        
          240000 |                                         0        
          300000 |                                         0        
          360000 |                                         0        
          420000 |                                         0        
          480000 |                                         2        
          540000 |                                         300      
       >= 600000 |                                         0        

 
    6597960853440
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |                                         40       
               1 |                                         9        
               2 |                                         6        
               4 |                                         17       
               8 |                                         23       
              16 |                                         6        
              32 |                                         36       
              64 |@@                                       15407    
             128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   361660   
             256 |                                         0        
             512 |                                         0        
            1024 |                                         0        
            2048 |                                         0        
            4096 |                                         0        
            8192 |                                         0        
           16384 |                                         0        
           32768 |                                         0        
           65536 |                                         0        
          131072 |                                         0        
          262144 |                                         0        
          524288 |                                         302      
         1048576 |                                         0        

Now recall that my test case is particularly unpleasant but it demonstrates the point. 300 IO requests took over 9 minutes and they only actually got to complete as the test case was shutting down. While the vast majority of the IO requests complete in less than 256ms.


Now lets run the same pathological test with disksort disabled:

dtrace: script 'ssdrwtime.d' matched 4 probes
\^C

    6597960853440
           value  ------------- Distribution ------------- count    
         < 60000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 543956   
           60000 |                                         0        


    6597960853440
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |                                         30       
               1 |                                         21       
               2 |                                         30       
               4 |                                         0        
               8 |                                         0        
              16 |                                         50       
              32 |                                         3        
              64 |                                         384      
             128 |                                         505      
             256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  531169   
             512 |@                                        11764    
            1024 |                                         0        

Notice that now the majority of the IO requests took longer now, falling in the 256ms bucket rather than the 128ms bucket but none of the IO requests took many minutes.


Now my test case is pathological but if you have drives with small queue depths and disk sort is still enabled you are open to some quite spectacular latency bubbles. To mitigate this my advice is:

  1. Don't ever set the global [s]sd_max_throttle in /etc/system. Use the [s]sd.conf file to set the appropriate throttle for each device.

  2. Consider what is more important to you. Throughput or latency. If is is latency or if your LUN is on a storage array then turn of disksort using the [s]sd.conf file.

  3. If you have pathological applications then understand that the IO subsystem can give you throughput or bounded latency, not both. So separate out the IO devices that need throughput from those for which latency is more important.

  4. Be aware that even “dumb” disk drives often implement disk sort internally so in some cases they can give a similar issues when they have a queue depth of greater than 24. In those cases you may find it better to throttle them down to a queue depth of 2 and disable disksort in [s]sd to get the most predictable latency all be it at the expense of throughput. If this is your issue then you can spot this either by using scsi.d directly or by modifying it to produce aggregations like those above. I'll leave that as an exercise for the reader.


1The queue depth of a LUN is the number of commands that it can handle at the same time. The throttle is usually set to the same number and it used by the disk driver to prevent it sending more commands than the device can cope with.

2Now the LUN itself may then re order the IO if it has more then two IO's in it's internal queue.

3Edited to remove output for other drives.

4With a queue depth of 2 the drive can not sort the IO requests. It has to have one as active and the next one waiting. When the active one completes the waiting one will be actioned before a new command can come from the initiator.

Friday Mar 16, 2007

scsi.d updates are like busses.

None for months then three come along one after another. This latest update is a direct result of me failing to test the script on Solaris 10. Some of the things that work on Nevada are not quite there on Solaris 10.

Specifically in nevada you can do this:

this->name = “->”

But not in Solaris 10. So the name now has to be a flag that is then checked later.

As always the new script is in the scsi.d directory this one is version 1.12.

On the up-side the bugs were reported within hours as there are actually people using this script to do useful work which makes me realise this was worth putting out there.

Tags:

Wednesday Mar 14, 2007

Bug fix for scsi.d

Today, well last night, I learned that dtrace clause local variables (this->) unlike thread local variables (self->) are not initialised by default. This is all clearly stated in the documentation had I read it with more care. My thanks to Adam on the OpenSoalris zfs-discuss alias for pointing it out to me. The long and the short of this means that when I changed scsi.d to use clause local variables I needed to also ensure that they were initialised before testing them.

While I was editing it I also added support for Group 6 and 7 SCSI commands. Since these commands are vendor specific it just prints out the first 32 bytes of the CDB, or the whole CDB if it is less than 32 bytes. In the event that you need more than 32 bytes to be printed adding this is trivial. The section of code, with line numbers, that prints out bytes 25 to 32 is listed here:

   554  PRINT_CDB(24)
   555  PRINT_CDB(25)
   556  PRINT_CDB(26)
   557  PRINT_CDB(27)
   558  PRINT_CDB(28)
   559  PRINT_CDB(29)
   560  PRINT_CDB(30)
   561  PRINT_CDB(31)

There are no prizes for working out how to make it print byte 33.

So there is verson 1.11 of scsi.d.

Tags:

Tuesday Mar 13, 2007

scsi.d improvements.

With thanks to Brendan Gregg for the encouragement I have improved the argument handling in scsi.d. I asy improved carefully as it is still not right. Mainly as I am too subborn to wrap the thing in a shell script when dtrace can so nearly do what I want. Then new argument syntax is as follows:

scsi.d [ @T timeinseconds ] [ hba [ instamce ]]

The “@T” is because dtrace won't let you use “-T” as it does not stop argument processing after reading the arguments on the first line of the script. If/When it does I will change it to be “-T”.

So now you can trace a particular HBA without setting a timeout, something you previously could not do and all the strange quotes are no longer required.

Additionally all the thread local variables have become clause local variables,“self “ becomes “this” which should reduce the impact of the script on the system.

So I bring you version 1.10 of scsi.d.

Tags:

Monday Feb 19, 2007

16 byte CDB with scsi.d

I've been working to put USCSI support for LUNS greater than 2Tb into diskomizer (a test program for testing disks). So while I have a system that has a LUN greater than 2TB I thought I would just test out scsi.d as it has the code for all the SCSI commands in it. Alas there was a trivial bug that meant the lookup of the names of commands did not work. So I have spun a new version (1.9). Download it from here.

Now for the sad out there here is the transition from 10 byte CDB to 16:

00000.015415100 fp5:-> 0x2a WRITE(10) address 40:01, lba 0xfffffffe, len 0x000001, control 0x00 timeout 60 CDBP 60002278ebc dd(11510) cdb(10) 2a00fffffffe00000100
00000.016729900 fp5:<- 0x2a WRITE(10) address 40:01, lba 0xfffffffe, len 0x000001, control 0x00 timeout 60 CDBP 60002278ebc, reason 0x0 (COMPLETED) state 0x1f Time 1360us
00000.016828400 fp5:-> 0x2a WRITE(10) address 40:01, lba 0xffffffff, len 0x000001, control 0x00 timeout 60 CDBP 6000388d3b4 dd(11510) cdb(10) 2a00ffffffff00000100
00000.018059000 fp5:<- 0x2a WRITE(10) address 40:01, lba 0xffffffff, len 0x000001, control 0x00 timeout 60 CDBP 6000388d3b4, reason 0x0 (COMPLETED) state 0x1f Time 1272us
00000.018205900 fp5:-> 0x8a WRITE(16) address 40:01, lba 0x0000000100000000, len 0x000001, control 0x00 timeout 60 CDBP 600060c8704 dd(11510) cdb(16) 8a000000000100000000000000010000
00000.019467100 fp5:<- 0x8a WRITE(16) address 40:01, lba 0x0000000100000000, len 0x000001, control 0x00 timeout 60 CDBP 600060c8704, reason 0x0 (COMPLETED) state 0x1f Time 1320us
00000.019611800 fp5:-> 0x8a WRITE(16) address 40:01, lba 0x0000000100000001, len 0x000001, control 0x00 timeout 60 CDBP 6000227ab6c dd(11510) cdb(16) 8a000000000100000001000000010000
00000.020823300 fp5:<- 0x8a WRITE(16) address 40:01, lba 0x0000000100000001, len 0x000001, control 0x00 timeout 60 CDBP 6000227ab6c, reason 0x0 (COMPLETED) state 0x1f Time 1257us


I know I need to get out more.

Monday Jan 29, 2007

scsi.d update

A very small update to scsi.d. It allows it to work even when there is some CTF confusion due to having an incorrectly built kernel module loaded, typically via an IDR.

Version 1.8 is available here.

Friday Dec 29, 2006

scsi.d update

A small fix to scsi.d provided by a bloggless colleague results in version 1.7.

My thanks to him.

Tags:

Thursday Jul 13, 2006

scsi.d update

I have added printing of the name of the executable and the process id that initiates an IO so that it is easier to see who is causing all those scsi commands to be sent.

Then for those who just have to have the raw bits to be happy, I have updated scsi.d to also dump out the raw cdb as well.

00000.627329400 fp5:-> 0x2a WRITE(10) address 00:00, lba 0x0001bfe9, len 0x000001, control 0x00 timeout 60 CDBP 600d7881d1c diskomizer64mpis(23849) cdb(10) 2a000001bfe900000100
00000.788444600 fp5:-> 0x2a WRITE(10) address 00:00, lba 0x00de6380, len 0x000010, control 0x00 timeout 60 CDBP 600a4282abc diskomizer64mpis(23847) cdb(10) 2a0000de638000001000

You can find the script here: http://blogs.sun.com/roller/resources/chrisg/scsi.d


Tags:

Friday Feb 10, 2006

scsi.d script

After some feedback about the format of the output from my Dtrace script for looking at SCSI io I how have added a timestamp which helps sorting the output. The output is now cleaner and hopefully clearer though does not fit on a 80 column screen.

00000.844267200 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 60031134488 SDB 60031134518
00000.844354400 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08
00000.848251440 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 6001dd1ba50 SDB 6001dd1bae0
00000.848310720 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 3001da270f8 SDB 3001da27188
00000.850371280 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08, reason 0x0 (COMPLETED) state 0x5f Time 6084us
00000.851151040 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 60031134488 SDB 60031134518, reason 0x0 (COMPLETED) state 0x5f Time 6927us
00000.853292800 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 3001da270f8 SDB 3001da27188, reason 0x0 (COMPLETED) state 0x5f Time 5014us
00000.854442400 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 6001dd1ba50 SDB 6001dd1bae0, reason 0x0 (COMPLETED) state 0x5f Time 6226us
00002.839392160 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 3001da263c0 SDB 3001da26450
00002.839482480 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 60002cb4538 SDB 60002cb45c8
00002.849052160 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 60002cb4538 SDB 60002cb45c8, reason 0x0 (COMPLETED) state 0x5f Time 9630us
00002.850171840 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 3001da263c0 SDB 3001da26450, reason 0x0 (COMPLETED) state 0x5f Time 10824us
00003.840019440 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08
00003.840110160 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 30014c0c780 SDB 30014c0c810
00003.846265280 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 30014c0c780 SDB 30014c0c810, reason 0x0 (COMPLETED) state 0x5f Time 6205us
00003.847439680 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08, reason 0x0 (COMPLETED) state 0x5f Time 7470us

Lots of “fun” games can be played with this, like the above shows that this system has target 0 and target 6 forming a mirror making isp1 a Single Point of failure. Although my favourite is this one:


While running


# dd if=/dev/rdsk/c0t8d0s2 of=/dev/null oseek=1024 iseek=$(( 16#1fffff )) count=2
2+0 records in
2+0 records out
#

I get the following trace:

Th

00001.971470332 qus1:-> 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a94f0 SDB 300016a9520
00001.972324082 qus1:<- 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a94f0 SDB 300016a9520, reason 0x0 (COMPLETED) state 0x17 Time 937us
00001.972433832 qus1:-> 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a9d90 SDB 300016a9dc0
00001.973217082 qus1:<- 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a9d90 SDB 300016a9dc0, reason 0x0 (COMPLETED) state 0x17 Time 826us
00001.973324748 qus1:-> 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000300, len 0x000024, control 0x00 timeout 60 CDB 300016a9380 SDB 300016a93b0
00001.976352165 qus1:<- 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000300, len 0x000024, control 0x00 timeout 60 CDB 300016a9380 SDB 300016a93b0, reason 0x0 (COMPLETED) state 0x5f Time 3070us
00001.976443415 qus1:-> 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000400, len 0x000024, control 0x00 timeout 60 CDB 300016a9ab0 SDB 300016a9ae0
00001.979359665 qus1:<- 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000400, len 0x000024, control 0x00 timeout 60 CDB 300016a9ab0 SDB 300016a9ae0, reason 0x0 (COMPLETED) state 0x5f Time 2959us
00001.979453248 qus1:-> 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a9c20 SDB 300016a9c50
00001.979814748 qus1:<- 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a9c20 SDB 300016a9c50, reason 0x0 (COMPLETED) state 0x5f Time 403us
00001.979898415 qus1:-> 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a90a0 SDB 300016a90d0
00001.980151165 qus1:<- 0x08 (  READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a90a0 SDB 300016a90d0, reason 0x0 (COMPLETED) state 0x5f Time 294us
00001.980507332 qus1:-> 0x08 (  READ(6)) address 08:00, lba 0x001fffff, len 0x000001, control 0x00 timeout 60 CDB 300016a9660 SDB 300016a9690
00001.993267665 qus1:<- 0x08 (  READ(6)) address 08:00, lba 0x001fffff, len 0x000001, control 0x00 timeout 60 CDB 300016a9660 SDB 300016a9690, reason 0x0 (COMPLETED) state 0x5f Time 12804us
00001.993382498 qus1:-> 0x28 ( READ(10)) address 08:00, lba 0x00200000, len 0x000001, control 0x00 timeout 60 CDB 300016a9940 SDB 300016a9970
00001.999256915 qus1:<- 0x28 ( READ(10)) address 08:00, lba 0x00200000, len 0x000001, control 0x00 timeout 60 CDB 300016a9940 SDB 300016a9970, reason 0x0 (COMPLETED) state 0x5f Time 5921us


I like it has you see the transition from READ(6) to READ(10) as it moves from LBA 0x1fffff to 0x200000. Did I mention needing to get out more?


You can get the script here. Still do do is correct decoding of CDBs bigger than 10 bytes, which is not a problem for my current systems and more detailed decoding of CDBs that are not reads and writes.


Tags:

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today