The Linux kernel contains more than 1,500 tunables – and setting these parameters correctly can significantly improve system performance and utilization! For years, we’ve tried to provide the right suggestions for these tunables, via software release notes and improved default values, but many system loads will benefit from dynamic tuning of these values.
Introducing bpftune, an automatic configurator that monitors your workloads and sets the correct kernel parameter values! bpftune is an open source project available via dnf install
in the Oracle Linux ol_developer
repos, and at https://github.com/oracle-samples/bpftune.
bpftune aims to provide lightweight, always-on auto-tuning of system behaviour. The key benefits it provides are:
It is currently focused on some of the most common issues with tunables we have run into at Oracle, but with a pluggable infrastructure that is open to contributions. We hope you find it useful too!
Even as the number of sysctls in the kernel grows, individual systems get a lot less care and adminstrator attention than they used to; phrases like “cattle not pets” exemplify this. Given the modern cloud architectures used for most deployments, most systems never have any human adminstrator interaction after initial provisioning; in fact given the scale requirements, this is often an explicit design goal - “no ssh’ing in!”.
These two observations are not unrelated; in an earlier era of fewer, larger systems, tuning by administrators was more feasible.
These trends - system complexity combined with minimal admin interaction suggest a rethink in terms of tunable management.
A lot of lore accumulates around these tunables, and to help clarify why we developed bpftune, we will use a straw-man version of the approach taken with tunables:
“find the set of magic numbers that will work for the system forever”
This is obviously a caricature of how administrators approach the problem, but it does highlight a critical implicit assumption - that systems are static.
And that gets to the “BPF” in bpftune; BPF provides the means to carry out low-overhead observations of a system. So not only can we observe the system and tune appropriately, we can also observe the effect of that tuning and re-tune if necessary. This is a key feature of bpftune which we will return to.
bpftune is a daemon which manages a set of .so plugin tuners; each of these is a shared object that is loaded on start-up.
Tuners can be enabled or disabled; a tuner is automatically disabled if the admin changes associated tunables manually. Tuners share a global BPF ring buffer which allows posting of events from BPF programs to userspace. For example, if the sysctl tuner sees a sysctl being set, it posts an event. Each tuner has an associated id (set when it is loaded), and events posted contain the tuner id.
Each tuner has a BPF component (built using a BPF skeleton) and a userspace component. The latter has init(), fini() and event_handler() entrypoints. When an event is received, the tuner id is used to identify the appropriate event handler and its event_handler() callback function is run. init, fini and event_handler functions are loaded from the tuner .so object.
bpftune is also available in the ol9_developer and ol8_developer repositories for Oracle Linux and can be installed via:
$ sudo yum install --enablerepo=ol9_developer bpftune
For OL8:
$ sudo yum install --enablerepo=ol8_developer,ol8_UEKR7 bpftune
To enable bpftune as a service
$ sudo service bpftune start
…and to enable it by default
$ sudo systemctl enable bpftune
bpftune logs to syslog so /var/log/messages will contain details of any tuning carried out.
bpftune can also be run in the foreground as a program; to redirect output to stdout/stderr, run
$ sudo bpftune -s
On exit, bpftune will summarize any tuning done.
Simply starting bpftune and observing changes made via /var/log/messages can be instructive. For example, on a standard VM with sysctl defaults, I ran
$ service bpftune start
…and went about normal development activities such as cloning git trees from upstream, building kernels, etc. From the log we see some of the adjustments bpftune made to accommodate these activities
$ sudo grep bpftune /var/log/messages ... Apr 19 16:14:59 bpftest bpftune[2778]: bpftune works fully Apr 19 16:14:59 bpftest bpftune[2778]: bpftune supports per-netns policy (via netns cookie) Apr 19 16:18:40 bpftest bpftune[2778]: Scenario 'specify bbr congestion control' occurred for tunable 'TCP congestion control' in global ns. Because loss rate has exceeded 1 percent for a connection, use bbr congestion control algorithm instead of default Apr 19 16:18:40 bpftest bpftune[2778]: due to loss events for 145.40.68.75, specify 'bbr' congestion control algorithm Apr 19 16:26:53 bpftest bpftune[2778]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput Apr 19 16:26:53 bpftest bpftune[2778]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 6291456) -> (4096 131072 7864320) Apr 19 16:26:53 bpftest bpftune[2778]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput Apr 19 16:26:53 bpftest bpftune[2778]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 7864320) -> (4096 131072 9830400) Apr 19 16:29:04 bpftest bpftune[2778]: Scenario 'specify bbr congestion control' occurred for tunable 'TCP congestion control' in global ns. Because loss rate has exceeded 1 percent for a connection, use bbr congestion control algorithm instead of default Apr 19 16:29:04 bpftest bpftune[2778]: due to loss events for 140.91.12.81, specify 'bbr' congestion control algorithm
Developers can find build dependencies, instructions and source code layout, as well as instructions for contributing to this project, at the source repo: https://github.com/oracle-samples/bpftune
Previous Post