Interested in diving into block layer performance? Frequently developers are looking for details about the ongoing I/O load such as:
- Processes initiating R/W requests
- Size of the requests
- I/O latency
- Disks inundated with requests
- Sector read/writes taking time
- I/O time spent in the queue
This article focuses on how bcc-tools can be leveraged for block layer analysis (bcc here stands for BPF Compiler Collection). These tools provide a useful insight into the system without any coding, taking off-the-shelf products to identify performance issues. Why reinvent the wheel! Though if one does need to invent, BCC provides a simple interface to create custom tooling for the purpose of analysing the kernel! These tools provide a great reference and starting point for your learning/debugging experience.
Tools:
To access and start playing with the tools just install the packages bcc-tools
and kernel-uek-devel
. Once installed the scripts can be found found at /usr/share/bcc/tools
. The following sections describe some of these tools and how they can help in block layer diagnosability, and accelerate the understanding of system performance. We will use fio
tool to generate the I/O and see how the following tools will help in achieving our goals.
biotop
biosnoop
biolatency
bitesize
biolatpcts
tplist
trace
biotop
biotop
allows us to look at an overview of what processes are generating I/O on this system. We will use biotop
to examine the system at a higher level.
It is similar to the top
command except for disks with default display interval of one second. It displays processes performing I/O sorted by the size of I/O (Kbytes column) and acts like a snapshot of processes doing the largest read/write operations and their average latency.
usage: biotop.py [-h] [-C] [-r MAXROWS] [interval] [count]
The first line contains the time and load average from the file /proc/loadavg
.
Fields displayed:
- PID: Process Id
- COMM: Process name
- D: Operation – Read/Write
- MAJ: Major number of disk
- MIN: Minor number of disk
- DISK: I/O to this disk
- I/O: The number of I/Os done in the current interval
- Kbytes: The total Kbytes completed 9. AVGms: Average IO time in milliseconds
biotop
generates the following output and tells us the current fio
processes which are reading and writing from/to a couple of dm disks with an average latency >1000ms. This output is enough to signal whether our system is slow in processing various I/Os.
# /usr/share/bcc/tools/biotop -C Tracing... Output every 1 secs. Hit Ctrl-C to end 12:55:01 loadavg: 0.11 0.28 0.36 6/511 11790 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 31917 fio W 252 3 dm-3 13 52 1843.11 31917 fio R 252 3 dm-3 12 48 869.73 31915 fio W 252 3 dm-3 11 44 1124.25 1610 kworker/19:1H R 131 240 sdgj 11 44 3288.80 31917 fio W 252 42 dm-42 9 36 946.63 31914 fio W 252 3 dm-3 9 36 1562.52 31915 fio R 252 3 dm-3 8 32 1718.64 31914 fio R 252 44 dm-44 7 28 2633.36
biosnoop
Next, let us look at I/O level latency. ’‘’biosnoop’’ traces and prints one line for each I/O request. It also shows the latency between the request issue and its completion. The -Q option can be used to print the time spent in the block queue.
usage: biosnoop.py [-h] [-Q]
Fields displayed:
- T: Operation: Read/Write
- LAT: Latency in milliseconds from request issue to device to its completion
- QUE: Time spent in milliseconds in kernel queue
# /usr/share/bcc/tools/biosnoop TIME(s) COMM PID DISK T SECTOR BYTES QUE(ms) LAT(ms) 103.809904 fio 31916 dm-44 W 35688 4096 0.01 21964.53 122.904032 fio 31917 sdhr R 2385000 4096 0.00 20113.50 122.904040 fio 31917 dm-3 R 2385000 4096 0.00 20113.51 122.904144 fio 31916 sdas R 241776 4096 0.00 20173.90 122.904159 fio 31916 dm-42 R 241776 4096 0.00 20173.92 122.904280 fio 31915 sdas R 2034336 4096 0.00 20173.72 122.904288 fio 31915 dm-42 R 2034336 4096 0.00 20173.74 122.947834 fio 31917 sdjp R 25591480 4096 31170.24 8898.60 122.947840 fio 31917 dm-3 R 25591480 4096 0.00 40068.85
Outliers can be also be filtered, such as I/Os which are taking more time. These are generally the cause of performance concerns.
awk '$NF > 20000' biosnoop.out
The above output shows the fio
R/W operations which are taking more than 20 ms on average while the amount of time spent in the block queue is on average negligible. This amount of time is considered to be huge latency for any I/O and points to performance issues below the block layer.
biolatency
Another highly valuable tool when process information is not necessary and one merely wants to check the time taken by I/Os irrespective of their details. Since this supports the option of per disk latencies, it offers insight into which disk might have performance issues. This prints a histogram of I/O latencies. The number of I/Os are also displayed with each latency bucket.
usage: biolatency.py [-h] [-T] [-Q] [-m] [-D] [-F] [-j] [interval] [count]
Fields displayed:
- usecs: I/O latency bucket in microseconds
- count: Number of I/Os within the latency bucket
- distribution: Bar chart to visualize the count
# /usr/share/bcc/tools/biolatency Tracing block device I/O... Hit Ctrl-C to end. ^C usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 1 | | 64 -> 127 : 9391913 | **************************** | 128 -> 255 : 13087567 | **************************************** | 256 -> 511 : 2535900 | ******* | 512 -> 1023 : 155152 | | 1024 -> 2047 : 5770 | | 2048 -> 4095 : 6686 | | 4096 -> 8191 : 2836 | | 8192 -> 16383 : 2143 | | 16384 -> 32767 : 60606 | | 32768 -> 65535 : 1275 | |
Outliers can be spotted, such as in this case only 1275 I/Os have highest latency whereas the majority of the I/O latency is between 64usec to 500usec. Instead of biosnoop
which displays latency per I/O, biolatency
can be used for a quick glance across the I/O performance of the system.
bitesize
A great tool to look at the block sizes of each process. It displays the number of I/Os in a process for each size bucket. One can also monitor the latency outcomes with different block sizes. It is helpful to know which processes are doing big or small amounts of I/Os. Extreme I/O sizes could be a cause of performance issues and this tool helps identify those processes generating such I/Os.
usage: ./bitesize
Fields displayed:
- Kbytes: I/O size range in Kb
- count: Number of I/Os within the Kbytes size bucket
- distribution: Bar chart to visualize the count
# /usr/share/bcc/tools/bitesize Tracing block I/O... Hit Ctrl-C to end. ^C Process Name = fio Kbytes : count distribution 0 -> 1 : 1 | | 2 -> 3 : 0 | | 4 -> 7 : 30144068 |****************************************| 8 -> 15 : 3 | |
fio
command is doing 30144068 I/Os with IO size between 4-7 Kbytes.
biolatpcts
This tool further breaks down the latency into percentiles and the type of I/Os. It prints the I/O latency percentile for each kind of I/O. The default latency is the amount of time I/O taken on the device. This can be modified by providing the from-rq-alloc
and after-rq-alloc
options, which determine the latency from an attempt to allocate request and the request allocation itself.
usage: biolatpcts.py [-h] [-i INTERVAL] [-w {from-rq-alloc,after-rq-alloc,on-device}] [-p PCT,…] [-j] [–verbose] DEV
Fields displayed:
- DEV: Device name being traced
- read: Read latency for the respective percentile
- write: Write latency for the respective percentile
- discard: Discard latency for the respective percentile
- flush: Flush latency for the respective percentile
- Percentile columns: 1-100 percentile columns (can be customized)
# /usr/share/bcc/tools/biolatpcts sda sda 1 5 10 16 25 50 75 84 90 95 99 100 read 5us 5us 465us 505us 535us 615us 78ms 86ms 94ms 98ms 100ms 150ms write 5us 495us 515us 535us 555us 655us 2.5ms 3.5ms 5.5ms 8.5ms 12ms 150ms discard - - - - - - - - - - - - flush - - - - - - - - - - - -
From the above table, 100 percent of the read requests on disk sda take less than 150ms to complete while only 1 percent of the reads on sda take 5us or less.
The next two tools are not specific to the block layer but provide a deep look into the kernel (block layer included).
tplist
This tool displays the tracepoints and probes within the kernel which could be used for tracing the block layer.
usage: tplist.py [-h] [-p PID] [-l LIB] [-v] [filter]
The following prints the tracepoints within the block layer and these tracepoints could be used with other tools to gain further insights.
# /usr/share/bcc/tools/tplist block* block:block_touch_buffer block:block_dirty_buffer block:block_rq_requeue block:block_rq_complete block:block_rq_insert block:block_rq_issue block:block_bio_bounce block:block_bio_complete block:block_bio_backmerge block:block_bio_frontmerge block:block_bio_queue block:block_getrq block:block_sleeprq block:block_plug block:block_unplug block:block_split block:block_bio_remap block:block_rq_remap
trace
This tool traces entry/return of functions and can print the stack trace(-K) leading up-to entry/return. It also provides an optional conditional check for the trace event and prints the parameter values as well. Multiple functions/probes can be traced at a time.
usage: trace [-h] [-b BUFFER_PAGES] [-p PID] [-L TID] [-v] [-Z STRING_SIZE] [-S] [-M MAX_EVENTS] [-t] [-T] [-K] [-U] [-a] [-I header] probe [probe …]
Fields displayed:
- PID: Process ID
- TID: Thread ID
- COMM: Process name
- FUNC: function name/tracepoint being traced
- Last column displays the parameter values requested.
For instance, the following output displays the process and the sector number of the I/O.
# /usr/share/bcc/tools/trace 't:block:block_rq_insert "sector=%d", args->sector' PID TID COMM FUNC - 1492319 1492319 fio block_rq_insert sector=80075364 1492319 1492319 fio block_rq_insert sector=80075365 1492319 1492319 fio block_rq_insert sector=80075366 1492319 1492319 fio block_rq_insert sector=80075367 1492319 1492319 fio block_rq_insert sector=80075368 1492319 1492319 fio block_rq_insert sector=80075369 1492319 1492319 fio block_rq_insert sector=80075370
Conclusion
Be sure to check the –help output for each tool as it displays various options which provide further insight into the block layer. Apart from block layer understanding, the above tools act as a jumping-off point to create your own tools under bcc-tools
. Bear in mind there are also other tools for other kernel areas located in bcc-tools
which are also worth exploring.