Introduction
You’ve probably heard about Ksplice, our rebootless kernel patching technology that delivers security updates to your critical systems with zero downtime. Join us for a closer look at how Ksplice helps make even some of the trickiest security patches take effect without a reboot.
In January 2020, Intel announced an advisory for CVE-2019-14615 (see also: MITRE, NVD), a bug in the Linux i915 kernel driver that allowed malicious users to obtain sensitive information from other users and other programs.
The i915 driver is a driver for Intel HD integrated graphics GPUs — one of the most widely used GPUs for laptops and other portable devices. On those systems, the i915 graphics hardware is integrated into the system CPU. This vulnerability only affects computers which have the i915 hardware, which is not distributed with most Server-class CPUs. Therefore the vulnerability primarily would affect laptop or desktop users.
Oracle offers the Ksplice service for free for desktop/laptop users of Ubuntu and Fedora, and as part of our Oracle Linux Premier Support subscription for enterprise datacenter and Cloud use. We wanted to make this security patch available for laptop and desktop users of ksplice.
Using our Ksplice technology, some patches can be turned into rebootless updates completely automatically however every patch must be inspected individually to help ensure the update is correct. On its surface, this particular patch was a relatively simple code change, however the code being changed executes at module-load time, so applying the rebootlessly patch as-written would not actually fix the security issue (or any issue!) — the code being changed has already run and won’t run again!
In order to make the patch effective for running kernels we need to fully understand what’s going on and try to find a workaround.
The bug
Let’s take a closer look at the bug and the patch. Here is the commit in question: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bc8a76a152c5f9ef3b48104154a65a68a8b76946
commit bc8a76a152c5f9ef3b48104154a65a68a8b76946
drm/i915/gen9: Clear residual context state on context switch
Intel ID: PSIRT-TA-201910-001
CVEID: CVE-2019-14615
Intel GPU Hardware prior to Gen11 does not clear EU state
during a context switch. This can result in information
leakage between contexts.
For Gen8 and Gen9, hardware provides a mechanism for
fast cleardown of the EU state, by issuing a PIPE_CONTROL
with bit 27 set. We can use this in a context batch buffer
to explicitly cleardown the state on every context switch.
As this workaround is already in place for gen8, we can borrow
the code verbatim for Gen9.
The most important part of the changelog is this:
“Intel GPU Hardware prior to Gen11 does not clear EU state during a context switch. This can result in information leakage between contexts.”
We can also look at the description from NIST NVD:
“Insufficient control flow in certain data structures for some Intel(R) Processors with Intel(R) Processor Graphics may allow an unauthenticated user to potentially enable information disclosure via local access.”
And Ubuntu:
“It was discovered that the Linux kernel did not properly clear data structures on context switches for certain Intel graphics processors. A local attacker could use this to expose sensitive information.”
It’s important to note that “context switch” here refers to a context switch on the GPU, not the CPU. Typically, each program that needs to render something on the screen has its own GPU context.
In this case the bug is that a GPU context (possibly belonging to a different user) can “steal” data from another context because some data is not cleared between context switches. Probably the most realistic real-world scenario here is e.g. that a WebGL context can “screenshot” or steal sensitive information from other programs that are running on the same computer.
Let’s have a look at the patch itself:
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 75dd0e0367b7a..f0485784afbe0 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2664,6 +2664,14 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
batch = gen8_emit_flush_coherentl3_wa(engine, batch);
+ /* WaClearSlmSpaceAtContextSwitch:skl,bxt,kbl,glk,cfl */
+ batch = gen8_emit_pipe_control(batch,
+ PIPE_CONTROL_FLUSH_L3 |
+ PIPE_CONTROL_STORE_DATA_INDEX |
+ PIPE_CONTROL_CS_STALL |
+ PIPE_CONTROL_QW_WRITE,
+ LRC_PPHWSP_SCRATCH_ADDR);
+
batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
/* WaMediaPoolStateCmdInWABB:bxt,glk */
This is a simple patch! It only changes the code, and it only changes a single function in a single place. Usually it’s data structure changes that give Ksplice developers the most headache — code can be patched on the fly simply by calling the modified function instead of the original function, but once you start changing data structure layouts of things that are already in memory you really need to take care. But that is not the case here.
The problem here is that the code that is being patched runs when the driver is initialized. In other words, we can patch the code just fine, but that won’t actually change anything if the function will never be run again!
So what can we do?
Solution
We need to break this down and really understand what is going on before we can work on a fix. For that, we’ll dive into some details of the i915 driver and hardware…
One of the first things we might want to do is to look at the call stack leading up to the function that is being changed. In this case it is very simple — there is only one possible callchain, and it comes directly from the driver’s PCI probe function:
&i915_pci_driver
- i915_pci_probe()
- i915_driver_probe()
- i915_driver_modeset_probe()
- i915_gem_init()
- intel_engines_init()
- intel_execlists_submission_init()
- intel_init_workaround_bb()
- gen9_init_indirectctx_bb()
This also highlights the problem we mentioned where patching the function is not really going to fix the vulnerability; the code is clearly called when the driver is initialized, and so for any running system where this code is loaded it will already have run.
It may also be interesting to look at the function that is being called by the new version of the function:
static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
{
memset(batch, 0, 6 * sizeof(u32));
batch[0] = GFX_OP_PIPE_CONTROL(6);
batch[1] = flags;
batch[2] = offset;
return batch + 6;
}
This is also really simple — the function adds a command to a buffer and returns a pointer to the new end of the buffer.
If we recall the changelog and the patch, it mentions “issuing a PIPE_CONTROL with bit 27 set. We can use this in a context batch buffer to explicitly cleardown the state on every context switch”.
Clearly, the new call is issuing this PIPE_CONTROL command with bit 27 set — this bit is PIPE_CONTROL_FLUSH_L3 in the kernel sources. PIPE_CONTROL itself is a command for the GPU. Luckily for us, the i915 hardware is pretty well documented. We’ll find the PIPE_CONTROL command in the Kaby Lake documentation, volume 2a, page 1104 or the Skylake documentation, volume 2a, page 1039.
In gen8_emit_pipe_control() the name of the parameter is called batch and the function being patched (the caller) is called intel_init_workaround_bb(). Here, bb is short for “batch buffer”.
What is a batch buffer? Ben Widawsky and Daniel Vetter (both DRM and i915 developers) have both written blog posts on the topic of Intel GPU architecture, here is what they have to say:
“The batchbuffer is nothing more than a set of instructions to be read by the GPU for setting up state, and to a much smaller extent, instructions telling the GPU how to act upon that state.” (i915 Hardware Contexts (and some bits about batchbuffers), Ben Widawsky, 2013)
and:
“As I’ve alluded already, gpu command submission on intel hardware happens by sending a special buffer object with rendering commands to the kernel for execution on the gpu, the so called batch buffer.” (i915/GEM Crashcourse, Part 2, Daniel Vetter, 2012)
These are really very good resources and worth reading for anybody who would like to understand how graphics programming works under the hood.
In any case, through these resources, we arrive at the understanding that the pipe control opcode is emitted into a batch buffer which is executed by the GPU on every context switch. This means that for this Ksplice update, we’re not just patching kernel code, we need to patch actual GPU code!
The batch buffer is pointed to by the “LRC register state” (note the name of the file being patched: intel_lrc.c), where LRC means “logical ring context”. Since the LRC registers are set up during driver/device initialization, we arrive at the following solution sketch:
- First we need to create a new batch buffer containing the correct pipe control instruction.
- Then we need to update the LRC register to point to the new batch buffer.
In DRM and the Linux graphics universe, all memory buffers that are potentially accessed by a GPU need to live in a “GEM object”. GEM stands for “Graphics Execution Manager”, and manages device memory in a uniform way (from the driver’s point of view). This is how the DRM man-page describes it:
“GEM stands for Graphics Execution Manager and is a generic DRM memory-management framework in the kernel, that is used by many different drivers. Gem is designed to manage graphics memory, control access to the graphics device execution context and handle essentially NUMA environment unique to modern graphics hardware. Gem allows multiple applications to share graphics device resources without the need to constantly reload the entire graphics card. Data may be shared between multiple applications with gem ensuring that the correct memory synchronization occurs.”
More concretely, we need something like the following code to create a new GEM object, allocate memory for the new batch buffer, and map the memory on the CPU so we can write to it:
void ksplice_patch_i915_device(struct pci_dev *pdev)
{
struct drm_i915_private *dev_priv = to_i915(pdev);
struct intel_engine_cs *engine = dev_priv->engine[RCS];
struct drm_i915_gem_object *obj = i915_gem_object_create(engine->i915, CTX_WA_BB_OBJ_SIZE);
if (IS_ERR(obj))
...
struct i915_vma *vma = i915_vma_instance(obj, &engine->i915->ggtt.base, NULL);
if (IS_ERR(vma))
...
err = i915_vma_pin(vma, 0, PAGE_SIZE, PIN_GLOBAL | PIN_HIGH);
if (err)
...
struct page *page = i915_gem_object_get_dirty_page(vma->obj, 0);
void *batch_ptr = kmap_atomic(page);
We can then call the new (patched) version of gen9_init_indirectctx_bb() to populate the batch buffer with the correct GPU code:
struct i915_wa_ctx_bb indirect_ctx_bb;
indirect_ctx_bb.offset = 0;
void *batch_end = gen9_init_indirectctx_bb(engine, batch_ptr);
indirect_ctx_bb.size = batch_end - batch_ptr;
Now, in order to update the LRC registers we need to iterate over the i915 devices and update each of them. Realistically speaking there should only ever be a single i915 device on a given system, but the driver is written to handle multiple devices so our code needs to be written that way too:
u32 ggtt_offset = i915_ggtt_offset(vma);
struct i915_gem_context *ctx;
list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
struct intel_context *ce = &ctx->engine[RCS];
void *vaddr;
u32 *regs;
...
vaddr = i915_gem_object_pin_map(ce->state->obj, map_type);
if (IS_ERR(vaddr))
...
regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
regs[CTX_RCS_INDIRECT_CTX + 1] = (ggtt_offset + indirect_ctx_bb.offset) | (indirect_ctx_bb.size / CACHELINE_BYTES);
...
i915_gem_object_unpin_map(ce->state->obj);
}
}
We’ve omitted some details (like error handling) from the code shown here.
Of course, there are also a few other things we need for a complete solution:
- only apply the fix for vulnerable devices (generation 9)
- iterating over all devices using for_each_pci_dev()
- waiting for the GPU to be idle using i915_gem_switch_to_kernel_context() and i915_gem_wait_for_idle()
- locking and other safety checks
Other Ksplice considerations
The Ksplice team has requirements that apply to all Ksplice updates for safety reasons:
-
Applying the update should be atomic. In other words, the update must be either applied or not applied; the kernel must never be allowed to run when the update is only half applied.
-
The changes performed by the update must always be reversible.
These requirements are pretty strict, but have significant advantages for both us and our users — in the rare case that a security fix causes an issue (e.g. a performance degradation), the update can be backed out and the system will continue running as before.
For requirement #1, the important part is to run all fallible operations before we start patching. In other words, we must for example allocate all the memory we need up front (since memory allocations requests can fail) and only switch the system over to the new state in a single atomic update that happens while the system is stopped.
For requirement #2, we simply need to keep the old (unpatched) batch buffers around so that we can switch back to them when reversing the update.
These requirements mean that we need to do a little bit of extra book-keeping, as we need to grab references to things that shouldn’t go away, and we also need somewhere to stash the references to all the resources we’re going to use during the atomic update.
Conclusion
CVE-2019-14615 is a security vulnerability with a CVSS v3 score of 5.5 and is rated by Oracle Linux as having “Moderate” impact; this is a vulnerability that could, under certain circumstances, compromise the confidentiality, integrity or availability of resources. Since Ksplice is used on both enterprise systems and desktop platforms (the latter of which is more likely to be using this graphics driver), we wanted to patch it.
This bug also shows how an innocent-looking patch can require deep knowledge of the kernel to patch correctly — and to help ensure it is safe to apply to a running kernel.
During the course of analyzing a patch and a vulnerability a Ksplice engineer may have to dive into completely unfamiliar code to understand what’s going on; nobody on our team knew much about the i915 driver (or even the graphics stack) when we first saw the patch. The kernel has almost 30 million lines of code, so this is actually the case for most of the patches that the Ksplice team touches!
To our knowledge, as of the date this blog was published, no other live patching service shipped updates for this CVE. This is an example of Oracle’s security-first approach, and a testament to the importance placed on protecting customers’ valuable data.
Additionally, in the course of applying the zero downtime updates for Ubuntu Bionic, we discovered an error and responsibly reported it to the Ubuntu team who were able to release a patch under a separate security vulnerability ID as CVE-2020-8832.
As you can see, there’s a lot more to keeping systems from having to reboot than just applying patches as-is. If this kind of work sounds interesting to you, consider applying for a job with the Ksplice team! Feel free to drop us a line at ksplice-support_ww@oracle.com.