sos report - The Swiss Army Knife of Diagnostic Tools

September 5, 2023 | 20 minute read
Text Size 100%:

Introduction

The sosreport command has been around since 2009. Written in Python the tool is designed to gather comprehensive diagnostic data from a Linux system. The tool has been modified and improved by the open source community over the years and is now run using the command sos report. sos report does not make modifications to your system configuration.

Why does Oracle always ask for this?

Quite simply, the output from sos report is a one-stop shop for diagnostic data. It contains most of the data we need to begin troubleshooting a problem on your Linux system. Sometimes it’s all the data we need. Yes, there are circumstances where we will come back and ask for more data, but sos report is the next best thing to troubleshooting live on your system.

In some cases we may ask for more than one sos report. We may want to compare a good system to a bad system, or we may want to see how the system changes over time. The sos report provides a snapshot of the system’s state as well as some historical information.

“Can’t you just ask for the specific files you need?”, I hear you ask. Yes, we could, but we’re only human and chances are we may miss something or decide we need to look at a different file. sos report will collect configuration information, log files, as well as the output from dozens of commands, saving time, and potential mistakes, by not asking you to collect those things individually by hand. By using sos report we hope to reduce the back and forth that can be annoying and delay the resolution of your issue.

Installation

OL6 and OL7

sudo yum install sos

OL8 and OL9

sudo dnf install sos

Execution

Below are the commands that I prefer when asking someone to run sos report. These commands will ensure we get all the log files from /var/log (not just truncated versions) as well as all the sar data. Historical information is important! Even though the commands differ between some of the OL versions, the collected data is the same.

OL6 and OL7

sudo sosreport --batch --all-logs -k sar.all_sar=on

OL8

sudo sos report --batch --all-logs -k sar.all_sar=on

The --batch option allows you to run sos report without any further keyboard input. Normally sos report will ask you for your name and ticket number. By using --batch, you can avoid this and the output file will contain the system name and a time stamp.

The --all-logs option ensures that we get all the files from /var/log. Without this, some of the files, like /var/log/messages, are truncated.

The -k sar.all_sar=on option tells sos report to collect all the sar files in /var/log/sa, not just the last 14 days.

OL9

sudo sos report --batch --all-logs -e sar -k sar.all_sar=on

The -e sar option turns the sar plugin back on. It is disabled by default in OL9.

Here is the output from an sos report run on OL9 (truncated).

[opc@ol9-1 ~]$ sudo sos report --batch --all-logs -e sar -k sar.all_sar=on

sosreport (version 4.5.3)

This command will collect diagnostic and configuration information from
this Oracle Linux system and installed applications.

An archive containing the collected information will be generated in
/var/tmp/sos.7p70_t00 and may be provided to a Oracle America support
representative.

Any information provided to Oracle America will be treated in accordance
with the published support policies at:

        Distribution Website : https://support.oracle.com/
        Commercial Support   : https://support.oracle.com/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.


 Setting up archive ...
 Setting up plugins ...
[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing: xsk_diag. Use '--allow-system-changes' to enable collection.
[plugin:sar] sar: could not list /var/log/sa
[plugin:sssd] skipped command 'sssctl config-check': required services missing: sssd.
[plugin:sssd] skipped command 'sssctl domain-list': required services missing: sssd.
[plugin:systemd] skipped command 'systemd-resolve --status': required services missing: systemd-resolved.
[plugin:systemd] skipped command 'systemd-resolve --statistics': required services missing: systemd-resolved.
 Running plugins. Please wait ...

  Starting 1/95  alternatives    [Running: alternatives]
  Starting 2/95  anacron         [Running: alternatives anacron]
  Starting 3/95  ata             [Running: alternatives anacron ata]
  Starting 4/95  auditd          [Running: alternatives anacron ata auditd]
  Starting 5/95  bcache          [Running: alternatives ata auditd bcache]
  Starting 6/95  block           [Running: alternatives ata auditd block]
  Starting 7/95  boot            [Running: alternatives ata block boot]
  Starting 8/95  btrfs           [Running: alternatives block boot btrfs]
  Starting 9/95  cgroups         [Running: block boot btrfs cgroups]
  Starting 10/95 chrony          [Running: block boot cgroups chrony]
  Starting 11/95 cloud_init      [Running: block boot cgroups cloud_init]
  Starting 12/95 cockpit         [Running: boot cgroups cloud_init cockpit]
  Starting 13/95 console         [Running: boot cgroups cloud_init console]
  Starting 14/95 cron            [Running: boot cgroups cloud_init cron]
  Starting 15/95 crypto          [Running: boot cgroups cloud_init crypto]
  Starting 16/95 date            [Running: boot cgroups cloud_init date]
  Starting 17/95 dbus            [Running: boot cgroups cloud_init dbus]
  Starting 18/95 devicemapper    [Running: boot cgroups dbus devicemapper]
  Starting 19/95 devices         [Running: boot cgroups devicemapper devices]
  Starting 20/95 dnf             [Running: boot devicemapper devices dnf]
  Starting 21/95 dracut          [Running: boot devicemapper dnf dracut]
  Starting 22/95 ebpf            [Running: boot dnf dracut ebpf]
  Starting 23/95 filesys         [Running: boot dnf ebpf filesys]
  Starting 24/95 firewall_tables [Running: boot dnf ebpf firewall_tables]
  Starting 25/95 firewalld       [Running: boot dnf ebpf firewalld]
  Starting 26/95 fwupd           [Running: dnf ebpf firewalld fwupd]
  Starting 27/95 grub2           [Running: dnf ebpf firewalld grub2]
  Starting 28/95 gssproxy        [Running: dnf ebpf grub2 gssproxy]
  Starting 29/95 hardware        [Running: dnf ebpf grub2 hardware]
  Starting 30/95 host            [Running: dnf grub2 hardware host]
  Starting 31/95 hts             [Running: dnf grub2 hardware hts]
  Starting 32/95 i18n            [Running: dnf grub2 hardware i18n]
  Starting 33/95 iscsi           [Running: dnf grub2 hardware iscsi]
  Starting 34/95 jars            [Running: dnf grub2 hardware jars]
  Starting 35/95 kdump           [Running: dnf grub2 hardware kdump]
<SNIP>
  Starting 80/95 sunrpc          [Running: process processor selinux sunrpc]
  Starting 81/95 system          [Running: process processor selinux system]
  Starting 82/95 systemd         [Running: processor selinux system systemd]
  Starting 83/95 systemtap       [Running: processor selinux systemd systemtap]
  Starting 84/95 sysvipc         [Running: processor selinux systemtap sysvipc]
  Starting 85/95 teamd           [Running: processor selinux systemtap teamd]
  Starting 86/95 tuned           [Running: processor selinux systemtap tuned]
  Starting 87/95 udev            [Running: selinux systemtap tuned udev]
  Starting 88/95 udisks          [Running: systemtap tuned udev udisks]
  Starting 89/95 unpackaged      [Running: systemtap tuned udisks unpackaged]
  Starting 90/95 usb             [Running: systemtap tuned unpackaged usb]
  Starting 91/95 vdo             [Running: systemtap tuned unpackaged vdo]
  Starting 92/95 vhostmd         [Running: systemtap tuned unpackaged vhostmd]
  Starting 93/95 x11             [Running: systemtap tuned unpackaged x11]
  Starting 94/95 xen             [Running: systemtap tuned unpackaged xen]
  Finishing plugins              [Running: systemtap tuned unpackaged]
  Starting 95/95 xfs             [Running: systemtap tuned unpackaged xfs]
  Finishing plugins              [Running: systemtap unpackaged xfs]
  Finishing plugins              [Running: systemtap unpackaged]
  Finishing plugins              [Running: systemtap]

  Finished running plugins

Creating compressed archive...

Your sosreport has been generated and saved in:
    /var/tmp/sosreport-ol9-1-2023-08-05-aaevrlq.tar.xz

 Size   10.52MiB
 Owner  root
 sha256 1cb78f8f8114bd1f11b477eb30ed90b7095a03c648e2613cca8ea20b1102210e

Please send this file to your support representative.

[opc@ol9-1 ~]$

Notice that four plugins run at the same time. This helps reduce the amount of time it takes for sos report to complete. Want to see all the plugins that are active and inactive as well as the available plugin options? Try this:

sudo sos report -l

You can enable (-e) and disable (-n) plugins as you wish from the command line.

In defense of sar

You may be asking yourself why I’m so hung up on sar. Well, I do realize there are much better tools out there for performance monitoring. Performance Co-Pilot for example. However sar provides an entire month’s worth of data in just a few MB. It can provide a nice bird’s-eye view, and can show you trends that would be missed by more granular data which is typically kept for a shorter duration.

Here’s an example of what you can do with a month’s worth of sar data. It’s not as dramatic as some issues I have seen, but by looking at the averages per day for CPU utilization you can see that there is more %user taken up on the 8th and 9th.

202307$ printf "=CPU Util= %s\n" "$(sar -u -f `ls -1rt sa??|head -1` | head -3 | tail -1)";ls -1rt sa?? | while read F; do sar -u -f $F | awk 'BEGIN{S=0} $1 == "Linux"{if (S==0){D=$(NF-3);S=1}} $1 == "Average:"{print D,$0}'; done
=CPU Util= 04:00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
07/10/2023 Average:        all      0.18      0.00      0.05      0.20      0.00     99.57
07/11/2023 Average:        all      0.18      0.00      0.06      0.30      0.00     99.46
07/11/2023 Average:        all      0.21      0.00      0.04      0.21      0.00     99.53
07/12/2023 Average:        all      0.30      0.00      0.08      0.44      0.00     99.18
07/13/2023 Average:        all      0.37      8.72      0.22      0.41      0.00     90.29
07/14/2023 Average:        all      0.27      2.91      0.10      0.34      0.00     96.38
07/15/2023 Average:        all      0.21      0.00      0.06      0.26      0.00     99.47
07/16/2023 Average:        all      0.29      0.00      0.07      0.33      0.00     99.31
07/17/2023 Average:        all      0.24      0.00      0.07      0.30      0.00     99.39
07/18/2023 Average:        all      1.36      3.72      0.60      0.80      0.00     93.51
07/19/2023 Average:        all      0.47      0.01      0.25      0.94      0.00     98.33
07/20/2023 Average:        all      0.53      3.64      0.48      0.83      0.00     94.52
07/21/2023 Average:        all      0.78      0.00      0.16      0.48      0.00     98.58
07/22/2023 Average:        all      0.48      0.00      0.15      0.46      0.00     98.90
07/23/2023 Average:        all      4.17      0.00      0.47      0.70      0.00     94.65
07/24/2023 Average:        all      2.32      0.00      0.34      0.58      0.00     96.76
07/25/2023 Average:        all      0.60      0.00      0.21      0.53      0.00     98.66
07/26/2023 Average:        all      1.25      0.00      0.39      0.81      0.00     97.55
07/27/2023 Average:        all      0.98      0.00      0.24      0.58      0.00     98.21
08/01/2023 Average:        all      1.34      0.01      0.36      0.81      0.00     97.49
08/02/2023 Average:        all      1.55      0.00      0.43      1.23      0.00     96.79
08/03/2023 Average:        all      2.16      0.00      0.48      1.13      0.00     96.23
08/04/2023 Average:        all      2.26      3.48      0.74      1.13      0.00     92.39
08/05/2023 Average:        all      3.02      0.00      0.59      1.15      0.00     95.25
08/06/2023 Average:        all      6.45      0.00      0.91      1.42      0.00     91.21
08/07/2023 Average:        all      7.59      0.00      1.53      3.45      0.00     87.43
08/08/2023 Average:        all      6.36      0.00      1.10      2.07      0.00     90.47
08/09/2023 Average:        all      8.23      0.00      1.70      3.23      0.00     86.83

It is recommended to install sysstat if not already installed.

sudo dnf install sysstat
sudo systemctl enable --now sysstat

Output

sos report will write its output files to /var/tmp. You can control this by providing the --tmp-dir option. On OL6 the files are written to /tmp with no option available. sos report will create 2 files, a tarred and a compressed file, this is the sos report, and a checksum file so you can verify the integrity of the sos report.

When to run?

Please run sos report as soon as possible after a problem has happened. If your system crashed, run it as soon as the system is back up. You can also run sos report anytime you want a baseline of the systems configuration. Before making a big change on your system you can save a copy of sos report offline so you can refer to it if things go sideways. We have even recommended to some customers that they take a fresh sos report on boot up via systemd.

Let’s take a peek inside

Once you untar the sos report and cd into the directory it creates you will see several sub-directories and symbolic links. The symbolic links are commonly used information and point to files under the directories. They are placed at the top level for the sake of convenience. Here are some of the symbolic links and directories you will find:

  • cmdline - The command line used to boot the kernel.
  • date - The date/time on the system when the sos report command was run.
  • dmidecode - Hardware information.
  • free - Memory information.
  • hostname - System name. Could be obfuscated if run with –clean (covered later).
  • installed-rpms - List of the installed RPMs and when they were installed.
    • TIP: Want to sort the RPMs by the time they were installed? Try this:

      cat installed-rpms | while read R W M D T Y; do  SECONDS=`date +%s -d "${M} ${D} ${T} ${Y}"`;  printf "%s\t%3s %3s %2d %8s %4d\t%d\n" ${R} ${W} ${M} ${D} ${T} ${Y} ${SECONDS}; done | sort -rn -k 7 |  less
  • last - Who logged in and when. Plus reboots!
  • lsmod - A list of loaded modules.
  • lspci - Info on installed PCI devices.
  • mount - What filesystems are mounted.
  • uname - Kernel version.
  • uptime - How long the system has been up and load average.

Directories

Notice that some of the directories start with sos_. These contain data that was collected by commands run by sos report. The other directories contain files that were copied by sos report.

  • boot - Grub info.
  • etc - System configuration info.
  • proc - Kernel data structures.
  • run - Run time information for processes and services.
  • sos_commands - Contains the majority of data from all the commands that sos report runs. Spend some time checking out this directory and its sub-directories. Here are a few examples
    • auditd - auditctl and ausearch commands.
    • block - Block commands such as lsblk, blkid, and parted.
    • cgroups - Cgroup listing.
    • devicemapper - Various dmsetup commands.
    • filesys - Data on the mounted file systems.
    • grub2 - Listing of /boot as well as the output from grub2-mkconfig.
    • hardware - Details of the system’s hardware.
    • iscsi - The output from several iscsiadm commands.
    • ksplice - Info about Ksplice patches installed and the effective kernel version.
    • logs - journalctl commands as well as a listing from /var/log.
    • multipath - multipath device information.
    • Well, you get the idea. On OL9 when I ran sos report using the command above there were 79 of these sub directories.
  • sos_logs - Logs from running sos report.
  • sos_reports - html, txt and json files with information about the running of sos report. The html file is interesting as it breaks down all the data that is collected by plugin.
  • sys - Kernel data structure and device information.
  • usr - Information from the /usr file system, including /usr/lib, /usr/libexec, and /lib/share.
  • var - One of the most useful of system directories because it contains /var/log.

Here’s a use case

The customer has been alerted by their monitoring software to a “Link Down” message in their /var/log/messages file. The first thing we should do is gather some basic information about the customer’s system. Thankfully they have uploaded an sos report.

# cd sosreport-my-test-2023-07-12-dfbucbc/

# cat uname
Linux my-test 5.4.17-2136.315.5.el8uek.x86_64 #2 SMP Wed Dec 21 19:38:18 PST 2022 x86_64 x86_64 x86_64 GNU/Linux

# grep Product dmidecode
Product Name: ORACLE SERVER X6-2L

# grep device-mapper-multipath installed-rpms
device-mapper-multipath-0.8.4-22.el8.x86_64 Tue Nov 1 14:56:32 2022

From this data we know that this is an OL8 system running UEK6. We can see the hardware type, and the version of multipath that is running. Now we do some research to discover if there are any known issues with any of these versions. Next, having not found any applicable bugs, we check the error message and, with the PCI bus address, we can check the card type.

# grep -i "link down" var/log/messages
Jul 12 10:54:33 my-test kernel: lpfc 0000:23:00.1: 1:1305 Link Down Event x2 received Data: x2 x20 x800110 x0 x0

# grep "23:00.1" lspci
23:00.1 Fibre Channel [0c04]: Emulex Corporation LPe15000/LPe16000 Series 8Gb/16Gb Fibre Channel Adapter [10df:e200] (rev 30)

We can now check if there are any known issues with this type of adapter. If there aren’t any we can recommend to the customer that they have the hardware checked, but we also want to make sure that multipath did its job.

# grep multipathd /var/log/messages | grep "Jul 12"  | grep mpathr
Jul 12 10:54:38 my-test multipathd[5598]: checker failed path 135:224 in map mpathr
Jul 12 10:54:38 my-test multipathd[5598]: mpathr: remaining active paths: 15
Jul 12 10:54:39 my-test multipathd[5598]: mpathr: remaining active paths: 14
Jul 12 10:54:39 my-test multipathd[5598]: mpathr: remaining active paths: 13
Jul 12 10:54:40 my-test multipathd[5598]: checker failed path 128:32 in map mpathr
Jul 12 10:54:40 my-test multipathd[5598]: mpathr: remaining active paths: 12

Here we can see that multipath did work correctly and the customer has a remaining 12 paths serving data.

Obfuscation

If you really, really, need to, you can obfuscate the data in sos report. We discourage this as it may make the troubleshooting harder. Obfuscation will take certain data from the sos report and replace it with a generic value, thus hiding the true value. The obfuscation tools will hide hostnames, domains, usernames, IP addresses, and, in the case of OL8 and 9, user-provided keywords.

Our analysis of your sos report will be based on the obfuscated values. You will need to translate the generic values into real-world values by consulting the translation tables produced at the same time the data is obfuscated.

OL7

On OL7 you can obfuscate an sos report only after it has been collected. You do this with the soscleaner tool. There is no soscleaner tool for OL6.

Installation

sudo yum install soscleaner

Obfuscation after the fact

sudo soscleaner sosreport-jyoder-ol7-1-2023-08-05-qklnggo.tar.xz

Here are the files it produces in /tmp:

  • soscleaner-3194083558969556-dn.csv - Translation table for domain names. Specify domain names to obfuscate with -d or --domain=.
  • soscleaner-3194083558969556-hostname.csv - Translation table for hostnames.
  • soscleaner-3194083558969556-ip.csv - Translation table for IP addresses.
  • soscleaner-3194083558969556.log - Log file from running soscleaner command.
  • soscleaner-3194083558969556.tar.gz - The obfuscated sos report. Upload this file only.

Here’s what the hostname table looks like:

Obfuscated Hostname,Original Hostname
host1.example.com,jyoder-ol7-1.allregionaliads.osdevelopmeniad.oraclevcn.com
host0,jyoder-ol7-1

OL8 and OL9

Obfuscation after the fact

sudo sos clean --batch --keywords SecretApp sosreport-ol9-1-2023-08-05-qtskqpx.tar.xz

In this example I’m passing the --keywords option and provide it the name of an application that I want to keep secret (appropriately named “SecretApp”). You can pass --keywords a comma separated list of additional words that you want obfuscated in the sos report. Every occurrence of these keywords (“SecretApp” in this case) will be obfuscated everywhere they appear, in addition to the default obfuscation of hostnames, domains, IPs and usernames. We’ll see an example of what this --keywords option does below. After we run this command the following files are created in /var/tmp:

  • sosreport-host0-2023-08-05-qtskqpx-obfuscated.tar.xz - The obfuscated sos report. Upload this file only.
  • sosreport-host0-2023-08-05-qtskqpx-private_map - The obfuscation mapping in Json format.
  • sosreport-host0-2023-08-05-qtskqpx-obfuscated.tar.xz.sha256 - Checksum from the sos report
  • sosreport-ol9-1-2023-08-05-qtskqpx-obfuscation.log - The log from the obfuscation process.

Here is an example of the effect of the above command on /var/log/messages:

BEFORE

Aug  5 18:10:33 ol9-1 SecretApp[24833]: Started
Aug  5 18:11:38 ol9-1 SecretApp[24836]: ERROR: Can not continue. Aborting.
Aug  5 18:13:33 ol9-1 dracut[25537]: dracut-057-21.git20230214.0.2.el9
Aug  5 18:13:33 ol9-1 dracut[25539]: Executing: /usr/bin/dracut --list-modules
Aug  5 18:13:44 ol9-1 systemd[1]: Starting Hostname Service...
Aug  5 18:13:44 ol9-1 systemd[1]: Started Hostname Service.

AFTER

Aug  5 18:10:33 host0 obfuscatedword0[24833]: Started
Aug  5 18:11:38 host0 obfuscatedword0[24836]: ERROR: Can not continue. Aborting.
Aug  5 18:13:33 host0 dracut[25537]: dracut-057-21.git20230214.0.2.el9
Aug  5 18:13:33 host0 dracut[25539]: Executing: /usr/bin/dracut --list-modules
Aug  5 18:13:44 host0 systemd[1]: Starting Hostname Service...
Aug  5 18:13:44 host0 systemd[1]: Started Hostname Service.

You can see why it might make troubleshooting a little more cumbersome.

Obfuscation during collection.

You can also obfuscate the sos report as you collect it. This option is available for OL8 and OL9 only.

sudo sos report --clean --batch --keywords SecretApp --all-logs -e sar -k sar.all_sar=on

Here’s what files were created:

  • sosreport-host0-2023-08-05-ncgpuos-private_map
  • sosreport-host0-2023-08-05-ncgpuos-obfuscated.tar.xz - Upload this file only.
  • sosreport-host0-2023-08-05-ncgpuos-obfuscated.tar.xz.sha256

Note: The hostname is automatically obfuscated. You can disable this if you want. Check the man page for sos-clean.

sos.conf

Don’t want to have to remember all those command line options every time? Well, you don’t have to. You can configure all the command line options in /etc/sos/sos.conf, and then just run sos report. Here’s an example from an OL9 system using the options I list above.

$ egrep -v "^#|^$" /etc/sos/sos.conf
[global]
batch = yes
[report]
enable-plugins = sar
all-logs = yes
[collect]
[clean]
keywords = SecretApp
[plugin_options]
sar.all_sar=on

Conclusion

I would encourage you to take some time to run a few iterations of sos report on your system and check out all of the data you can find there. Make sure you are always running the latest version. We love sos report reports because they make the job easier for everyone. If you have any issues with sos report please open a Service Request with Oracle Linux Support, and we will get it sorted right away.

Jeffery Yoder


Previous Post

Reverse Engineering UndefinedBehaviorSanitizer (UBSan)

Gregory Herrero | 14 min read

Next Post


Discover the Latest Advancements in Automation with Oracle Linux Automation Manager 2.1

Monica S | 3 min read