X

News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

Faster Startup Times with Oracle Linux 7

Steve Sistare is a kernel development architect at Oracle. In this blog post, he gives tips and tricks for making your systems boot faster. 

In October Pasha Tatashin shared his work on booting the kernel substantially faster, especially for large systems:
However, booting the kernel is only part of the job, as services must also be be started in userland.  That time can be significant and depends on the configuration. Oracle Linux is configured to satisfy a wide range of requirements by default, but if you are willing to tweak the configuration, you can substantially reduce your boot time.

Silence the console

The largest single improvement you can make is setting the kernel command line "quiet" option, which stops kernel messages from being printed to the console.  These messages can consume many seconds of time during boot when the console is connected via a serial interface with a low baud rate setting.  On an x86 system running Oracle Linux 7 where the default ILOM serial console rate is 9600 baud, the kernel+initrd+userspace time as reported by systemd-analyze was reduced from 62.259 seconds to 21.425 seconds after enabling the quiet option.  You can instead increase the serial rate to 115200, but doing so is a pain: you must change it in the ILOM, in bios, in grub, and in the kernel command line arguments.  Even then, quiet is still a few seconds faster than 115200 baud.  Also, the kernel messages are not lost; they can still be viewed with the dmesg and journal commands, provided the kernel boots successfully!  Don't use quiet during kernel development when you need to see the kernel output up to the point of failure.  Lastly, you won't see much improvement using quiet when booting VMs that use para-virtualized (as opposed to emulated) console devices, because they are not limited by serial baud rate.

To enable the quiet option:

  • Edit /etc/default/grub
  • Add "quiet" to the GRUB_CMDLINE_LINUX string
  • Run "grub2-mkconfig -o /boot/grub2/grub.cfg"

Use up-to-date packages

Use the latest Oracle Linux updates whenever possible to get the latest packages and optimizations.  For example, an older version of the NetworkManager package had a bug where a gratuitous "carrier wait" timeout was enabled, delaying the boot by 5 seconds.  This was fixed in in OL7.4.  Similarly, I analyzed a case where the kbd package was waiting 5 seconds for a PUT_FONT ioctl to succeed, on a KVM guest with no graphical console, so waiting was futile.  Worse yet, this occurred once during the initd boot phase, and again during the userspace phase, for a total delay of 10 seconds.  This is also fixed in OL7.4.

Tune grub

Reduce or eliminate delays in grub.  By default grub pauses for 5 seconds so you can choose a different kernel or modify kernel parameters, but this is configurable.  Set it to 1 if you want this crutch for emergencies.  Set it to 0 if you are running under KVM and have other ways to modify the parameters when a kernel fails to boot, such as by manually mounting your guest image and editing the grub files directly. 

To change the timeout:

  • Edit /etc/default/grub
  • Change GRUB_TIMEOUT=5 to GRUB_TIMEOUT=1 (or 0)
  • Run "grub2-mkconfig -o /boot/grub2/grub.cfg"

Check autofs

Verify that autofs is configured correctly.  Recent changes in this package cause a 10-second delay at boot time if automount entries in /etc/nsswitch.conf are misconfigured, and you will see a message like "automount[969]: problem reading master map, maximum wait exceeded" in the journal.  See https://lkml.org/lkml/2017/5/26/64 for more details.

 

After following the above tips, let's see how we a doing.  Use the systemd-analyze command to see the overall boot time and the time taken by the top services. Here I boot an OL7.4 KVM guest on an Intel Xeon Platinum CPU:

<... boot and login to the console ...>

# systemd-analyze
Startup finished in 697ms (kernel) + 879ms (initrd) + 3.877s (userspace) = 5.454s
# systemd-analyze blame
           1.799s kdump.service
           1.166s NetworkManager-wait-online.service
           396ms postfix.service
           321ms network.service
           247ms systemd-udev-settle.service
           244ms tuned.service
           151ms dev-mapper-ol\x2droot.device
           148ms lvm2-monitor.service
           119ms lvm2-pvscan@251:2.service
           113ms plymouth-quit.service
           112ms plymouth-quit-wait.service
           104ms systemd-vconsole-setup.service
           ...

Tune the bridge

Next, if you are using KVM with a bridge interface between the host and guest, and you know that the network topology has no loops, then disable the spanning tree protocol (STP) on the bridge.  This saves 2 or more seconds of guest boot time.  When STP is enabled, the bridge drops all packets received from a newly discovered address until the forwarding delay has expired; this avoids packet flooding when loops are present.  Hence the guest's DHCP request packet is dropped, leading to dhclient timeout and retry.

To see your bridges:

    # brctl show
    bridge name     bridge id               STP enabled     interfaces
    virbr0          8000.525400fe7ca3       yes             virbr0-nic

To see the forwarding delay:

    # brctl showstp virbr0 | grep forward
    forward delay       2.00       bridge forward delay       2.00

To disable stp:

    # brctl stp virbr0 off

If you are using libvirt, disable stp in the default configuration to make the change persist across host reboot:

    # virsh net-edit default
    Find the line "<bridge name='virbr0' stp='on' delay='0'/>
    Change on to off.

(At first glance, you might think that delay='0' would solve the problem.  However, the kernel enforces a minimum value of 2 seconds.)

After disabling stp, we get the following boot time.  Note that the NetworkManager-wait-online time is much smaller, as we have eliminated the DHCP delay:

$ systemd-analyze
Startup finished in 678ms (kernel) + 903ms (initrd) + 2.746s (userspace) = 4.328s
$ systemd-analyze blame 
          1.700s kdump.service
           397ms postfix.service
           294ms network.service
           252ms systemd-udev-settle.service
           206ms tuned.service
           203ms dev-mapper-ol\x2droot.device
           168ms NetworkManager-wait-online.service
           157ms plymouth-quit-wait.service
           157ms plymouth-quit.service
           135ms lvm2-monitor.service
            95ms systemd-vconsole-setup.service
           ...

Disable optional services

Disable services you don't need, using "systemctl disable <name.service>".  Use the "systemd-analyze blame" command to see the candidates, as shown above.  For example, if you don't need a kernel crash dump after panic's, or are willing to wait until a problem occurs to enable it, then disabling kdump.service saves almost 2 seconds of boot time.   If you are not running a mail server, then disable postfix.service.  Unconvinced that tuned.service does any noticeable tuning?  Disable it.

Putting it all together:

$ systemctl disable kdump.service
$ systemctl disable postfix.service
$ systemctl disable tuned.service

<... reboot and login to the console ...> 
$ systemd-analyze
Startup finished in 681ms (kernel) + 863ms (initrd) + 1.136s (userspace) = 2.681s
$ systemd-analyze blame
           299ms network.service
           273ms systemd-udev-settle.service
           178ms dev-mapper-ol\x2droot.device
           160ms NetworkManager-wait-online.service
           149ms lvm2-monitor.service
           148ms plymouth-quit-wait.service
           136ms plymouth-quit.service
            94ms systemd-vconsole-setup.service
           ...

Cool.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.