Introduction

kstack is a command that is used to gather the kernel stack trace from a process or a group of processes. It is a useful troubleshooting tool that can provide insight into the kernel functions that a process is executing. The tool gathers information from the /proc filesystem then organizes and displays that data. kstack is available for Oracle Linux 7, 8, and 9. For this blog we will be using OL8.

Installation

For installation instructions and more information about OLED, please see Oracle Linux Enhanced Diagnostics.

A Little About Kernel Stacks

A kernel stack is a list of kernel functions and represents the current state of program execution. These function calls can give us an idea of what the process is doing and can help in diagnosing bugs. You can look at the kernel stack by simply doing a cat of /proc/<PID>/stack. To look at the stack from your current shell process, you can replace <PID> with $$ as the root user. For example:

# cat /proc/$$/stack
[<0>] do_wait+0x15c/0x2e0
[<0>] kernel_wait4+0xbc/0x168
[<0>] __do_sys_wait4+0xb7/0xca
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

Kernel stack traces are read from the bottom up.

Using kstack

Because it’s part of OLED, kstack is executed using the command:

oled kstack

And the man page can be read using the command:

# man oled-kstack

The simplest use for kstack is to dump the stack from one or more processes. This is done with the -p option. Here’s what the kstack command looks like when run against our current shell process.

# oled kstack -p $$
zzz <Sat Sep 30 16:57:18 GMT 2023 - ol8-1

PID:  8521   State:  S   Start Time: 2023/09/30 16:36:49   Cmd: -bash
Stack:
[<0>] do_wait+0x15c/0x2e0
[<0>] kernel_wait4+0xbc/0x168
[<0>] __do_sys_wait4+0xb7/0xca
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

That should look familiar. The -p option will take one or more PIDs, separated by commas, and display the stacks for them. In an effort to be concise and efficient, kstack will display processes with duplicate stacks together. Let’s look at an example where we are looking at the stacks from multiple bash processes.

# pgrep bash
4118
8521
8564
8611
8658
8705

# oled kstack -p 4118,8521,8564,8611,8658,8705
zzz <Sat Sep 30 17:02:16 GMT 2023 - ol8-1

PID:  4118   State:  S   Start Time: 2023/09/30 15:09:59   Cmd: -bash
PID:  8521   State:  S   Start Time: 2023/09/30 16:36:49   Cmd: -bash
Stack:
[<0>] do_wait+0x15c/0x2e0
[<0>] kernel_wait4+0xbc/0x168
[<0>] __do_sys_wait4+0xb7/0xca
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

PID:  8564   State:  S   Start Time: 2023/09/30 16:38:20   Cmd: -bash
PID:  8611   State:  S   Start Time: 2023/09/30 16:38:22   Cmd: -bash
PID:  8658   State:  S   Start Time: 2023/09/30 16:38:25   Cmd: -bash
PID:  8705   State:  S   Start Time: 2023/09/30 16:38:27   Cmd: -bash
Stack:
[<0>] do_select+0x58d/0x865
[<0>] core_sys_select+0x1aa/0x3bb
[<0>] do_pselect.constprop.0+0xee/0x1a3
[<0>] __x64_sys_pselect6+0x4d/0xc4
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

Now it starts to get a little more interesting. The processes are listed above their respective stack traces. Notice that out of the six processes we are checking, two of them have one stack, and the other four have a different one. This makes it easier to read than just running cat /proc/<PID>/stack against each process. Also, for each process, the following data is displayed: – PID – The process ID. – State – This is the process state from /proc/<PID>/status. You might recognize these as the process status as displayed in the top command. – Start Time – When the process was started. – Cmd – The command that was executed.

The -p option works great as long as you know the process ID(s), but processes can be ephemeral and it might be difficult to catch them. You can use the -a option to collect the kernel stack traces for all processes.

# oled kstack -a 

kstack also offers the following options to collect stack trace information based on the state of the process. – -D – Processes in D state or “uninterruptible sleep”. This can happen when the process is waiting on I/O or a lock. – -R – Running or runable processes. – -S – Sleeping (interruptable) processes. – -Z – Zombie processes.

You can combine these options. So, to see all the running processes and those waiting on I/O you could run:

# oled kstack -DR

zzz <Sat Sep 30 18:13:25 GMT 2023 - ol8-1

PID: 10453   State:  D   Start Time: 2023/09/30 17:52:02   Cmd: kworker/u8:2+flush-252:0
Stack:
[<0>] rq_qos_wait+0xbb/0x137
[<0>] wbt_wait+0xaf/0x111
[<0>] __rq_qos_throttle+0x23/0x39
[<0>] blk_mq_submit_bio+0x146/0x561
[<0>] __submit_bio_noacct+0x97/0x211
[<0>] iomap_submit_ioend+0x4e/0x86
[<0>] iomap_writepage_map+0x505/0x80c
[<0>] write_cache_pages+0x1c5/0x509
[<0>] iomap_writepages+0x1c/0x3e
[<0>] xfs_vm_writepages+0x82/0xc0 [xfs]
[<0>] do_writepages+0x7a/0x1ee
[<0>] __writeback_single_inode+0x3d/0x28e
[<0>] writeback_sb_inodes+0x215/0x4a0
[<0>] __writeback_inodes_wb+0x4c/0xeb
[<0>] wb_writeback+0x1db/0x29c
[<0>] wb_do_writeback+0x236/0x2b8
[<0>] wb_workfn+0x5a/0x194
[<0>] process_one_work+0x1f1/0x3c6
[<0>] worker_thread+0x53/0x3e4
[<0>] kthread+0x127/0x144
[<0>] ret_from_fork+0x22/0x2d

PID: 11277   State:  D   Start Time: 2023/09/30 18:13:01   Cmd: dd if=/dev/zero of=./testing bs=1024M count=10
Stack:
[<0>] balance_dirty_pages+0x334/0xd98
[<0>] balance_dirty_pages_ratelimited+0x307/0x3b3
[<0>] iomap_file_buffered_write+0x1a9/0x25a
[<0>] xfs_file_buffered_write+0xc5/0x2e0 [xfs]
[<0>] new_sync_write+0x146/0x1e6
[<0>] vfs_write+0x202/0x291
[<0>] ksys_write+0x67/0xe8
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

PID: 11280   State:  R   Start Time: 2023/09/30 18:13:24   Cmd: python3 /usr/libexec/oled-tools/kstack -DR
Stack:
[<0>] proc_pid_stack+0xb3/0x129
[<0>] proc_single_show+0x51/0xcd
[<0>] seq_read_iter+0x12e/0x4e3
[<0>] seq_read+0x131/0x18a
[<0>] vfs_read+0x95/0x196
[<0>] ksys_read+0x67/0xe8
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

This example shows us two processes in D state and one running (our kstack command). Let me change the dd command that we see above, and run the command again.

# oled kstack -DR
zzz <Sat Sep 30 18:18:52 GMT 2023 - ol8-1

PID: 11311   State:  R   Start Time: 2023/09/30 18:18:45   Cmd: dd if=/dev/zero of=./testing bs=1 count=10000000000
Stack:


PID: 11314   State:  R   Start Time: 2023/09/30 18:18:52   Cmd: python3 /usr/libexec/oled-tools/kstack -DR
Stack:
[<0>] proc_pid_stack+0xb3/0x129
[<0>] proc_single_show+0x51/0xcd
[<0>] seq_read_iter+0x12e/0x4e3
[<0>] seq_read+0x131/0x18a
[<0>] vfs_read+0x95/0x196
[<0>] ksys_read+0x67/0xe8
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

Notice in the example above that we can see the dd command is no longer waiting on I/O, but is in running state, and there is no stack displayed for it. This is because it is running entirely in userspace, there are no kernel function calls being made.

You can also use the -p option with these other options. So to see the stack traces for the running processes and also systemd, you could run the following:

# oled kstack -Rp 1
zzz <Sat Sep 30 19:36:43 GMT 2023 - jyoder-ol8-1

PID:     1   State:  S   Start Time: 2023/09/30 15:09:41   Cmd: /usr/lib/systemd/systemd --switched-root --system --deserialize 17
Stack:
[<0>] ep_poll+0x306/0x3f3
[<0>] do_epoll_wait+0xcf/0xf1
[<0>] __x64_sys_epoll_wait+0x72/0x114
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

PID: 13732   State:  R   Start Time: 2023/09/30 19:36:42   Cmd: python3 /usr/libexec/oled-tools/kstack -DRp 1
Stack:
[<0>] proc_pid_stack+0xb3/0x129
[<0>] proc_single_show+0x51/0xcd
[<0>] seq_read_iter+0x12e/0x4e3
[<0>] seq_read+0x131/0x18a
[<0>] vfs_read+0x95/0x196
[<0>] ksys_read+0x67/0xe8
[<0>] do_syscall_64+0x3b/0x8d
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0

While this is all good information, sometimes our developers want to see how the process’ stack trace changes over time. kstack can help gather that information with the following options: – -b – Puts kstack in background mode. kstack will run in the background gathering stack traces at a set interval for a set period of time. You might have noticed the line displayed at the top of all our examples so far starts with “zzz” and contains the date and the system name. This is the separator that delineates each sample that kstack runs. This is very useful when we have more than one sample to look through. – -t – The number of minutes for kstack to run in the background. If you don’t specify this option it defaults to 30 minutes. – -i – The number of seconds between samples. The default is 60. The minimum is 10 seconds.

So, if you want to see the stack traces for all running and D state processes, every 10 seconds, for 2 minutes you would execute:

# oled kstack -bDR -i 10 -t 2
2023-09-30 19:14:36.101 INFO - Spawning process to run in background...
2023-09-30 19:14:36.102 INFO - Background PID: 12963
2023-09-30 19:14:36.102 INFO - This script will run up to: 2023-09-30 19:16:36.102537
2023-09-30 19:14:36.103 INFO - The requested data will be written to the directory: /var/oled/kstack

Since we are gathering this data in the background we have to write out the stack traces to a file. This file is located in /var/oled/kstack and is called kstack_HOSTNAME.out. In our case, kstack_ol8-1.out. kstack will compress any old log files and add a time stamp (seconds since epoch) to them each time a new command is run.

# ls -1 /var/oled/kstack
kstack_ol8-1.out
kstack_ol8-1.out-1696101276.gz
kstack_ol8-1.out-1696101727.gz
kstack_ol8-1.out-1696102125.gz

If you want to stop kstack before it is finished, you will need to kill the PID that is displayed when the command is run. For example:

2023-09-30 19:14:36.102 INFO - Background PID: 12963

You can control the destination, size and number of files that kstack creates using the following options: – -d – The directory where the log files will be written to. The default is /var/oled/kstack. – -m – Maximum size of the log file, in megabytes. The default is 1MB. Once this size is reached, the file is rotated. – -n – Number of files to keep. The default is 5. With the default settings, kstack will always use less than 5MB in /var/oled/kstack because the logs are compressed as they are rotated.

kstack also has a fail safe to ensure that it can’t fill up a filesystem. By default kstack will refuse to write to a file system that is at or above 85% utilization. You can override this by using the -x option to specify another percentage.

# df -h .
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/ocivolume-oled   10G  9.4G  633M  94% /var/oled

# oled kstack -bDR -i 10 -t 2
2023-09-30 20:15:03.714 ERROR - Maximum filesystem space reached. Used: 94%, Threshold: 85%

# oled kstack -bDR -i 10 -t 2 -x 95
2023-09-30 20:15:18.69 INFO - Spawning process to run in background...
2023-09-30 20:15:18.69 INFO - Background PID: 14641
2023-09-30 20:15:18.69 INFO - This script will run up to: 2023-09-30 20:17:18.069760
2023-09-30 20:15:18.70 INFO - The requested data will be written to the directory: /var/oled/kstack

Conclusion

kstack is a flexible diagnostic tool for gathering kernel stack trace data. If you run into bugs or performance issues on your system, kstack is one of the options we have for troubleshooting. If you have any issues with kstack, please open a Service Request to Oracle Linux Support.