scsi_latency.d, part of OLED, is a tool that shows latency of Disk IO taken by SCSI Mid Layer driver stack towards completion of Success or Failed IO.
The normal time to complete a SCSI packet is about 15ms for legacy spindle disks and in microseconds for Solid State Drives over SATA or NVMe m2 drives. There are times when response time from the target can take seconds to complete causing application failure, like for instance an Oracle Real Application Cluster node can be evicted if the heartbeat IO doesn’t come back within the configured time in seconds, this happens despite having multiple paths to the storage. To diagnose why a SCSI command level granularity is taking a long time to complete on a production system is quite challenging. DTrace can be used to find the latency, we can find granular details like which layer in the Storage driver stack is contributing to longer service time, which commands are taking a long time along with the status of those commands.
Installation
For installation instructions and more information about OLED, please see Oracle Linux Enhanced Diagnostics.
DTrace on Linux can be effectively used to measure latency. Refer to Getting Started with DTrace to install the required rpm for Oracle Linux. It is easier to measure latency with DTrace, by using the probe point when the a SCSI packet is dispatched and the probe point when the SCSI packet has completed. For SCSI you can use the probe point scsi_dispatch_cmd_start() to start the timer, and the probe point scsi_done:entry or scsi_mq_done:entry (if multiqueue is enabled) can be used to stop the timer.
DTrace provides aggregate functions to summarize the large collection of data in power-of-two frequency distribution. We can measure the latency using the ‘quantize’ aggregate function in milliseconds. Below is some sample output, the left column of the histogram ‘value’ output is the latency time in milliseconds (ms) and the right ‘count’ are the number of IO commands completed.
Usage examples:
Script: scsi_latency.d
The script scsi_latency.d takes 1 argument of sampling interval specified in seconds.
# scsi_latency.d 2
Sample Time : 2024 May 3 16:11:01 SCSI Mid layer latency
==============================================================
0:2:1:0 TUR
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
1 | 0
(ms)
0:2:1:0 Inquiry
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
1 | 0
(ms)
0:2:0:0 Write(10)
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@ 5
1 |@@@@@@@@@@ 5
2 |@@@@@@@@@@@@@@@@ 8
4 |@@@@ 2
8 | 0
(ms)
0:2:1:0 Write(10)
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@ 355
1 |@@ 46
2 |@@@@ 96
4 |@@@@ 99
8 | 5
16 | 1
32 | 4
64 | 4
128 |@ 17
256 |@@@@@@@ 169
512 |@@@@@ 110
1024 | 0
(ms)
0:2:1:0 Read(10)
value ------------- Distribution ------------- count
0 | 0
1 | 1
2 | 0
4 | 0
8 | 6
16 |@ 14
32 |@ 20
64 |@@@ 60
128 |@@@@@@@ 140
256 |@@@@@@@@@ 202
512 |@@@@@@@@@@@@@@@@@@@ 398
1024 |@ 16
2048 | 0
(ms)
Sample Time : 2024 May 3 16:11:01
=============================================================
Device Command OpCode Status Count
=============================================================
0:2:1:0 TUR 0x0 Success 1
0:2:1:0 Inquiry 0x12 Success 4
0:2:0:0 Write(10) 0x2a Success 20
0:2:1:0 Read(10) 0x28 Success 857
0:2:1:0 Write(10) 0x2a Success 906
=============================================================
The above sample was taken when the device “0:2:1:0” was doing a reset while sequential Read and Writes were being issued to the device, we can see READs and WRITES are taking little longer than normal. We also see TUR (Test Unit Read), Inqury times which are sent following the reset. It also shows a summary of a number of commands that were Successful or failed.
Script: scsi_queue.d
This script is used to measure how long the SCSI packet spends time in the SCSI Mid Layer driver queue before submitting to the lower layer adapter driver stack. This happens when the number of IO commands submitted exceeds the queue depth specfied for the LUN/Target. The script takes one argument of sampling interval in seconds.
# scsi_queue.d 2
2022 May 20 16:32:42Wait queue time by disk (ms):
sdfi 7:0:3:13
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 51
100 |@@@@@@@@ 13
200 | 0
sdem 7:0:2:13
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@ 32
100 |@@ 3
200 |@@ 3
300 |@@ 3
400 |@ 2
500 |@@ 3
600 |@ 2
700 |@@ 3
800 |@ 2
900 |@ 2
>= 1000 |@@@@@@ 10
sddz 8:0:3:411
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@ 57
100 |@@@@@@ 14
200 | 1
300 | 0
400 | 0
500 | 0
600 | 0
700 | 0
800 | 0
900 | 0
>= 1000 |@@@@@@@@@ 21
From the above output we can see the some commands are spending over 1 second in the SCSI Mid Layer queue for lun ‘sdem’ and ‘sddz’.
As always, to report any issue, please open a Service Request to Oracle Linux Support.