Tracking Linux Stable kernels with UEK

January 18, 2022 | 8 minute read
Text Size 100%:

Oracle Linux's kernel has been built from a Linux Stable or Long Term Stable (LTS) release since its inception, but in the last few years we've moved even closer to the LTS model for the continuous uptake and delivery of bug fixes. Closely tracking LTS brings various advantages to Oracle Linux customers including faster delivery of security patches and close integration with upstream Linux. The benefits of keeping up to date with upstream stable kernels extends beyond security fixes, but for the purpose of this blog post we'll focus on the security benefit of LTS.

One of the advantages of tracking upstream stable updates comes in the form of early access to patches for security vulnerabilities in the kernel. With regular update of Linux Stable patches, customers may find their vulnerability scanners reporting their kernels are already patched for newly identified CVEs. This is because many Linux kernel patches are assigned a CVE identifier after that patch is already in the upstream Linux kernel and the upstream stable kernels (though perhaps not in all enterprise distributions). A surprising number of CVEs reported against the Linux kernel have a negative number of days-to-resolution, meaning that they were already fixed at the time the vulnerability was identified.

Understanding CVEs in the Linux Kernel

CVE stands for Common Vulnerabilities and Exposures and is a centralized database of security vulnerabilities against a broad array of products. We use CVE as a shorthand for "security bug fixes", and patching known CVEs is often a requirement for certifications and security audits.

In many cases, a code fix for a security vulnerability is indistinguishable from a bug fix or logic change, and the broad basis for attaching a security impact to a bug (any impact to Confidentiality, Integrity or Availability) means a broad range of patches can be identified as having a security impact.

The number of CVEs reported against Linux has been growing rapidly in the last few years, as shown in this graph. (The graph also shows how Ksplice updates keep up with externally reported CVEs so you can stay secure without rebooting).

Looking at the raw numbers, it would seem that the CVE program is working well -- that common vulnerabilities are reported and patched in relevant products, and that announcements (and vulnerability scanners) are updated to reflect this information. But this doesn't tell the whole story, because the majority of these CVE-enumerated patches were already fixed in the Linux kernel (and the LTS trees) at the time they received the CVE identifier.

The CVE Time Machine

Greg Kroah-Hartman, the upstream Linux Stable maintainer, gave an excellent talk and analysis of the problems with CVEs in the Linux kernel at the Kernel Recipes conference in 2019. The full talk is on youtube and is both entertaining and still relevant GregKH on "CVEs are dead, long live the CVE" and highlights the fact that the average "request to fix" date for Linux CVEs is more than 100 days...before the "request to fix" arrives. Only a fraction of CVEs are reported against Linux before the patches are already available.

Consider CVE-2021-45485. This isn't a particularly "newsworthy" CVE, though it did receive a CVSSv3 classification of "High". The vulnerability was identified as a CVE on 24-Dec-2021, had been already patched in upstream Linux for seven months, and we shipped the patch in UEK5 and UEK6 in August and September of 2021. The git changelog doesn't reflect the CVE because it predates the identifier by more than a half a year. This vulnerability is just the most recent example of a trend which has been ongoing for years.

A CVE by any other name

Most CVEs are bug fixes. Security vulnerabilities can be any program issue which affects the confidentiality, integrity or availability of a service. This definition encompasses the vast majority of kernel patches -- and we have experience in this area. For the first years of our work with Oracle Linux and UEK, we tried to triage each patch to determine whether to include those in our enterprise distribution. This was a thankless job for the engineers who combed through the contributions in the kernel's Stable trees since by definition every patch which is included in the Linux Stable trees must "fix a real bug".

Here's a subset of the Stable tree's inclusion requirements:

  • It must be obviously correct and tested.
  • It cannot be bigger than 100 lines, with context.
  • It must fix only one thing.
  • It must fix a real bug that bothers people (not a, "This could be a problem" type thing).
  • It or an equivalent fix must already exist in Linus' tree (upstream).

Identifying Fixes in the Kernel

While the CVE process allows some patches to be marked as a "fix" under specific conditions, there is a broader mechanism within the kernel to identify that a code change is a "fix" -- the Fixes tag. Developers are encouraged (though not required) to add a Fixes: <commit hash> tag to their submission to note that the patch being submitted is a correction to a previously submitted patch. This creates a dependency graph for patches which can be exploited to map which commits are actually providing a fix.

It's not uncommon for a fix to be a submitted for a problem, only for the fix itself to require an additional patch. We've built tooling around this problem to scrape for "fixes for fixes" in the upstream and LTS kernels whenever we bring changes into UEK. Using this tool, changes to UEK have to pass not only regular code review and testing, but we also scan all other commits so we can review and potentially include newer patches that Fix the patch in question.

How Oracle Linux stays up to date with LTS kernels

Oracle Linux pulls the fixes from the Linux Long Term Stable branches on a monthly cadence for all our active UEK branches. This ensures that UEK stays reasonably up to date with the latest fixes in the Linux kernel. We've experimented with a number of different ways to pull in the changes from Linux Stable, including the curated model (discussed and rejected above), individual cherry-picks of the LTS code, and a few other models. The mechanism we've found that to be the most effective for tracking upstream LTS kernels has been to lag behind the tip of the LTS tree by less than four weeks.

By tracking LTS on a month-delay, we give upstream development a chance to resolve any issues which may crop up in the LTS branches. We also use this time to review and resolve any issues that are detected, and more importantly, we scrub the delta between the UEK-LTS snapshot and the LTS HEAD for any relevant Fixes tags, either from LTS or from the mainline kernel itself. Adding in our testing and validation time, patches usually appear in UEK within 8 weeks of their appearance in an LTS tree.

This mode of working with Stable has also had an unexpected side-benefit for our upstream kernel developers and maintainers. Previously, Oracle's kernel team would commit a patch to upstream and would also be expected to provide a backport of that patch to the internal source repository for UEK. Having our kernel aligned with mainline stable enables developers to commit their code directly to mainline once. Upstream stable and UEK processes ensure that the same code is included in our enterprise kernels without requiring any extra work. Critical patches will jump the queue without going through the full LTS cycle but the majority of non-critical fixes arrive through the LTS process.

Staying Up-to-Date

Keeping up-to-date with upstream Linux development is a challenge which has been discussed at length in the Linux community. Each distribution vendor faces unique challenges in adopting an "upstream first" model (there were several talks at the Linux Plumbers Conference about the challenges -- and benefits! -- of moving Android to an upstream-first model), but we believe there's no other way of ensuring that Oracle products and customers can take advantage of the latest development in Linux. It also helps attract talented Linux developers to the Oracle Linux team!

Despite our best efforts, we still do have to carry distro-specific patches in our kernel. We've had a multi-year effort to streamline those patches, either by submitting them upstream where appropriate, or by adopting a framework that forces those patches to stay current against the latest upstream development. Internally, we call this LUCI: Linux Upstream Continuous Integration. Any patches which are carried in UEK which are not upstream are tracked by LUCI and compiled into the upstream kernel as part of our nightly builds. If a patch stops compiling against LUCI, the developer is immediately notified. He or she can then check why their patch no longer works against upstream and to either fix the patch or to engage in the upstream discussion about why that area of the code is changing upstream (if they aren't already participating in that discussion).

Examples of the code we're tracking in LUCI include code changes that never made it into upstream Linux (IBRS mitigations for Spectre are one example), legacy interfaces/technical debt in the RDMA subsystem (which we are diligently working to unwind), and other necessary and critical patches. By and large, we try to minimize the amount of code in LUCI as it's extra work for developers to track on top of their regular upstream work!

We use LUCI to ensure our code tracks upstream Linux, and we also use it as the development platform to springboard the next version of UEK. LUCI gives our cloud, database, and applications development teams the opportunity to test their product code alongside the latest development in Linux. Before LUCI it would take multiple months to uptake a new version of the Linux kernel; we now have a nightly build of the latest Linux kernel with relevant kernel changes.

Conclusion

Oracle's Linux kernel tracks upstream Linux Long Term Stable branches on a regular cadence, and strives to get those patches out to customer systems as quickly and as reasonably as possible. We use a model in which we track fixes-for-fixes and add a short time delay. This, combined with Enterprise-quality testing, ensures that customers can take advantage of the latest developments in Linux. Closely tracking the Long Term Stable branches also means that customers receive fixes for security vulnerability sometimes months before those patches are classified as security vulnerabilities. We use Linux Upstream Continuous Integration to ensure that developers have an incentive to commit their code to upstream first, and to minimize the cost of carrying patches outside the Linux kernel. Check out our latest UEK code at https://github.com/oracle/linux-uek or at https://yum.oracle.com.

Greg Marsden


Previous Post

Most popular Linux blog articles of 2021

Peter Laudenslager | 2 min read

Next Post


The Freezing of tasks in the Linux kernel and how it's used by Ksplice

Gregory Herrero | 24 min read