Introduction
kstack is a command that is used to gather the kernel stack trace from a process or a group of processes. It is a useful troubleshooting tool that can provide insight into the kernel functions that a process is executing. The tool gathers information from the /proc filesystem then organizes and displays that data. kstack is available for Oracle Linux 7, 8, and 9. For this blog we will be using OL8.
Installation
For installation instructions and more information about OLED, please see Oracle Linux Enhanced Diagnostics.
A Little About Kernel Stacks
A kernel stack is a list of kernel functions and represents the current state of program execution. These function calls can give us an idea of what the process is doing and can help in diagnosing bugs. You can look at the kernel stack by simply doing a cat of /proc/<PID>/stack. To look at the stack from your current shell process, you can replace <PID> with $$ as the root user. For example:
# cat /proc/$$/stack [<0>] do_wait+0x15c/0x2e0 [<0>] kernel_wait4+0xbc/0x168 [<0>] __do_sys_wait4+0xb7/0xca [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0
Kernel stack traces are read from the bottom up.
Using kstack
Because it’s part of OLED, kstack is executed using the command:
oled kstack
And the man page can be read using the command:
# man oled-kstack
The simplest use for kstack is to dump the stack from one or more processes. This is done with the -p option. Here’s what the kstack command looks like when run against our current shell process.
# oled kstack -p $$ zzz <Sat Sep 30 16:57:18 GMT 2023 - ol8-1 PID: 8521 State: S Start Time: 2023/09/30 16:36:49 Cmd: -bash Stack: [<0>] do_wait+0x15c/0x2e0 [<0>] kernel_wait4+0xbc/0x168 [<0>] __do_sys_wait4+0xb7/0xca [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0
That should look familiar. The -p option will take one or more PIDs, separated by commas, and display the stacks for them. In an effort to be concise and efficient, kstack will display processes with duplicate stacks together. Let’s look at an example where we are looking at the stacks from multiple bash processes.
# pgrep bash 4118 8521 8564 8611 8658 8705 # oled kstack -p 4118,8521,8564,8611,8658,8705 zzz <Sat Sep 30 17:02:16 GMT 2023 - ol8-1 PID: 4118 State: S Start Time: 2023/09/30 15:09:59 Cmd: -bash PID: 8521 State: S Start Time: 2023/09/30 16:36:49 Cmd: -bash Stack: [<0>] do_wait+0x15c/0x2e0 [<0>] kernel_wait4+0xbc/0x168 [<0>] __do_sys_wait4+0xb7/0xca [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0 PID: 8564 State: S Start Time: 2023/09/30 16:38:20 Cmd: -bash PID: 8611 State: S Start Time: 2023/09/30 16:38:22 Cmd: -bash PID: 8658 State: S Start Time: 2023/09/30 16:38:25 Cmd: -bash PID: 8705 State: S Start Time: 2023/09/30 16:38:27 Cmd: -bash Stack: [<0>] do_select+0x58d/0x865 [<0>] core_sys_select+0x1aa/0x3bb [<0>] do_pselect.constprop.0+0xee/0x1a3 [<0>] __x64_sys_pselect6+0x4d/0xc4 [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0
Now it starts to get a little more interesting. The processes are listed above their respective stack traces. Notice that out of the six processes we are checking, two of them have one stack, and the other four have a different one. This makes it easier to read than just running cat /proc/<PID>/stack against each process. Also, for each process, the following data is displayed: – PID – The process ID. – State – This is the process state from /proc/<PID>/status. You might recognize these as the process status as displayed in the top command. – Start Time – When the process was started. – Cmd – The command that was executed.
The -p option works great as long as you know the process ID(s), but processes can be ephemeral and it might be difficult to catch them. You can use the -a option to collect the kernel stack traces for all processes.
# oled kstack -a
kstack also offers the following options to collect stack trace information based on the state of the process. – -D – Processes in D state or “uninterruptible sleep”. This can happen when the process is waiting on I/O or a lock. – -R – Running or runable processes. – -S – Sleeping (interruptable) processes. – -Z – Zombie processes.
You can combine these options. So, to see all the running processes and those waiting on I/O you could run:
# oled kstack -DR zzz <Sat Sep 30 18:13:25 GMT 2023 - ol8-1 PID: 10453 State: D Start Time: 2023/09/30 17:52:02 Cmd: kworker/u8:2+flush-252:0 Stack: [<0>] rq_qos_wait+0xbb/0x137 [<0>] wbt_wait+0xaf/0x111 [<0>] __rq_qos_throttle+0x23/0x39 [<0>] blk_mq_submit_bio+0x146/0x561 [<0>] __submit_bio_noacct+0x97/0x211 [<0>] iomap_submit_ioend+0x4e/0x86 [<0>] iomap_writepage_map+0x505/0x80c [<0>] write_cache_pages+0x1c5/0x509 [<0>] iomap_writepages+0x1c/0x3e [<0>] xfs_vm_writepages+0x82/0xc0 [xfs] [<0>] do_writepages+0x7a/0x1ee [<0>] __writeback_single_inode+0x3d/0x28e [<0>] writeback_sb_inodes+0x215/0x4a0 [<0>] __writeback_inodes_wb+0x4c/0xeb [<0>] wb_writeback+0x1db/0x29c [<0>] wb_do_writeback+0x236/0x2b8 [<0>] wb_workfn+0x5a/0x194 [<0>] process_one_work+0x1f1/0x3c6 [<0>] worker_thread+0x53/0x3e4 [<0>] kthread+0x127/0x144 [<0>] ret_from_fork+0x22/0x2d PID: 11277 State: D Start Time: 2023/09/30 18:13:01 Cmd: dd if=/dev/zero of=./testing bs=1024M count=10 Stack: [<0>] balance_dirty_pages+0x334/0xd98 [<0>] balance_dirty_pages_ratelimited+0x307/0x3b3 [<0>] iomap_file_buffered_write+0x1a9/0x25a [<0>] xfs_file_buffered_write+0xc5/0x2e0 [xfs] [<0>] new_sync_write+0x146/0x1e6 [<0>] vfs_write+0x202/0x291 [<0>] ksys_write+0x67/0xe8 [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0 PID: 11280 State: R Start Time: 2023/09/30 18:13:24 Cmd: python3 /usr/libexec/oled-tools/kstack -DR Stack: [<0>] proc_pid_stack+0xb3/0x129 [<0>] proc_single_show+0x51/0xcd [<0>] seq_read_iter+0x12e/0x4e3 [<0>] seq_read+0x131/0x18a [<0>] vfs_read+0x95/0x196 [<0>] ksys_read+0x67/0xe8 [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0
This example shows us two processes in D state and one running (our kstack command). Let me change the dd command that we see above, and run the command again.
# oled kstack -DR zzz <Sat Sep 30 18:18:52 GMT 2023 - ol8-1 PID: 11311 State: R Start Time: 2023/09/30 18:18:45 Cmd: dd if=/dev/zero of=./testing bs=1 count=10000000000 Stack: PID: 11314 State: R Start Time: 2023/09/30 18:18:52 Cmd: python3 /usr/libexec/oled-tools/kstack -DR Stack: [<0>] proc_pid_stack+0xb3/0x129 [<0>] proc_single_show+0x51/0xcd [<0>] seq_read_iter+0x12e/0x4e3 [<0>] seq_read+0x131/0x18a [<0>] vfs_read+0x95/0x196 [<0>] ksys_read+0x67/0xe8 [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0
Notice in the example above that we can see the dd command is no longer waiting on I/O, but is in running state, and there is no stack displayed for it. This is because it is running entirely in userspace, there are no kernel function calls being made.
You can also use the -p option with these other options. So to see the stack traces for the running processes and also systemd, you could run the following:
# oled kstack -Rp 1 zzz <Sat Sep 30 19:36:43 GMT 2023 - jyoder-ol8-1 PID: 1 State: S Start Time: 2023/09/30 15:09:41 Cmd: /usr/lib/systemd/systemd --switched-root --system --deserialize 17 Stack: [<0>] ep_poll+0x306/0x3f3 [<0>] do_epoll_wait+0xcf/0xf1 [<0>] __x64_sys_epoll_wait+0x72/0x114 [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0 PID: 13732 State: R Start Time: 2023/09/30 19:36:42 Cmd: python3 /usr/libexec/oled-tools/kstack -DRp 1 Stack: [<0>] proc_pid_stack+0xb3/0x129 [<0>] proc_single_show+0x51/0xcd [<0>] seq_read_iter+0x12e/0x4e3 [<0>] seq_read+0x131/0x18a [<0>] vfs_read+0x95/0x196 [<0>] ksys_read+0x67/0xe8 [<0>] do_syscall_64+0x3b/0x8d [<0>] entry_SYSCALL_64_after_hwframe+0x63/0x0
While this is all good information, sometimes our developers want to see how the process’ stack trace changes over time. kstack can help gather that information with the following options: – -b – Puts kstack in background mode. kstack will run in the background gathering stack traces at a set interval for a set period of time. You might have noticed the line displayed at the top of all our examples so far starts with “zzz” and contains the date and the system name. This is the separator that delineates each sample that kstack runs. This is very useful when we have more than one sample to look through. – -t – The number of minutes for kstack to run in the background. If you don’t specify this option it defaults to 30 minutes. – -i – The number of seconds between samples. The default is 60. The minimum is 10 seconds.
So, if you want to see the stack traces for all running and D state processes, every 10 seconds, for 2 minutes you would execute:
# oled kstack -bDR -i 10 -t 2 2023-09-30 19:14:36.101 INFO - Spawning process to run in background... 2023-09-30 19:14:36.102 INFO - Background PID: 12963 2023-09-30 19:14:36.102 INFO - This script will run up to: 2023-09-30 19:16:36.102537 2023-09-30 19:14:36.103 INFO - The requested data will be written to the directory: /var/oled/kstack
Since we are gathering this data in the background we have to write out the stack traces to a file. This file is located in /var/oled/kstack and is called kstack_HOSTNAME.out. In our case, kstack_ol8-1.out. kstack will compress any old log files and add a time stamp (seconds since epoch) to them each time a new command is run.
# ls -1 /var/oled/kstack kstack_ol8-1.out kstack_ol8-1.out-1696101276.gz kstack_ol8-1.out-1696101727.gz kstack_ol8-1.out-1696102125.gz
If you want to stop kstack before it is finished, you will need to kill the PID that is displayed when the command is run. For example:
2023-09-30 19:14:36.102 INFO - Background PID: 12963
You can control the destination, size and number of files that kstack creates using the following options: – -d – The directory where the log files will be written to. The default is /var/oled/kstack. – -m – Maximum size of the log file, in megabytes. The default is 1MB. Once this size is reached, the file is rotated. – -n – Number of files to keep. The default is 5. With the default settings, kstack will always use less than 5MB in /var/oled/kstack because the logs are compressed as they are rotated.
kstack also has a fail safe to ensure that it can’t fill up a filesystem. By default kstack will refuse to write to a file system that is at or above 85% utilization. You can override this by using the -x option to specify another percentage.
# df -h . Filesystem Size Used Avail Use% Mounted on /dev/mapper/ocivolume-oled 10G 9.4G 633M 94% /var/oled # oled kstack -bDR -i 10 -t 2 2023-09-30 20:15:03.714 ERROR - Maximum filesystem space reached. Used: 94%, Threshold: 85% # oled kstack -bDR -i 10 -t 2 -x 95 2023-09-30 20:15:18.69 INFO - Spawning process to run in background... 2023-09-30 20:15:18.69 INFO - Background PID: 14641 2023-09-30 20:15:18.69 INFO - This script will run up to: 2023-09-30 20:17:18.069760 2023-09-30 20:15:18.70 INFO - The requested data will be written to the directory: /var/oled/kstack
Conclusion
kstack is a flexible diagnostic tool for gathering kernel stack trace data. If you run into bugs or performance issues on your system, kstack is one of the options we have for troubleshooting. If you have any issues with kstack, please open a Service Request to Oracle Linux Support.