Why virtual machines are hard

Let's review the issues of system state. The hypervisor ("host") runs in the processor's supervisor state (sometimes refered to as "ring 0" or other jargons) in which it has unrestricted access to the architected instruction set of that platform (be it x86, System/370, etcetera). Virtual machines ("guests") run in user or problem state, in which a subset of instructions is permitted. In particular, the instructions that alter address translation (virtual memory mapping), perform I/O, enable or disable interrupts, are forbidden, and generate an exception or trap when executed.

So, let's review what happens when a guest is running: it goes about its business until it executes one of these supervisor instructions. This generates a trap (in some contexts called an intercept or privop exception) that causes a context switch to the hypervisor. The hypervisor saves the current state, and then inspects the "virtual" state of the guest system: if the guest was running its operating system and was in virtual supervisor state, the privop is emulated (more on that later), and eventually control is returned to the virtual machine (if it's still the highest priority non-blocked virtual machine).

If the guest was running an application (say, a Linux or Solaris virtual machine under VMware running a user-land process), then this is a program exception that must be handled by the guest operating system, just as would happen on a real (non-virtual) machine if an application issued such an instruction. The exception is reflected to the guest virtual machine: its registers and next-instruction location are set up exactly as would be the case if a misbehaving application did this when no virtual machines are involved.

Every time a virtual machine executes a privileged operation, then, we have at least two context switches to deal with, switching from guest to host, and then back again. Several hundred or several thousand instructions are executed, and the clock cycle cost is higher: we must discard translation lookaside buffer entries (TLBs) as the two environments operate with different physical to real memory mappings, and probably touch enough code and data operands to displace the L1 cache contents (and perhaps L2 cache as well).

This describes the simple cost of state switching, typically thousands of clocks per intercept, exclusive of the CPU time the hypervisor spends emulating the guest's instructions. This process happens every time the guest does I/O, context switches between its processes, sets or responds to a time or I/O interrupt, or one of its applications executes a system call. Thousands of clock cycles spent each time, at a frequency that can add up to thousands of times per second.

Next time: the three levels of memory needed when a virtual memory guest runs under a hypervisor, and the magic of the shadow page table. G'nite.

Comments:

Post a Comment:
Comments are closed for this entry.
About

jsavit

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today