OpenSolaris on Xen

For the past year, I've been busy on the team that is porting OpenSolaris to run as a fully para-virtualized domain under the Xen hypervisor. The areas I've been concentrating on are changes to virtual and physical memory management and the mechanisms by which OpenSolaris gets loaded and started, aka boot.

Memory Management under Xen

The changes to physical memory management  translate what OpenSolaris calls a Page Frame Number (PFN or  pfn_t) into Machine Frame Numbers (MFNs) under Xen before using them in page tables, descriptor tables or programming DMA. Under Xen addresses derived from PFNs are referred to as pseudo-Physical addresses and are used in the kernel with the existing type paddr_t. Note that not all MFN values the kernel sees can be translated into PFNs, so a way to distinguish them was needed. Several routines were added to the kernel to deal with these translation issues.

The changes to virtual memory management are primarily around:

  • The HAT must translate PFNs into MFNs when creating page table entries and do the reverse translation, MFN to PFN, when examining pagetables.
  • Xen requires that page tables that are in active use be mapped read-only. The code to access page tables in the HAT is now aware of when it should be using read-only mappings.
  • Changing the algorithm used for TLB shootdowns. Xen provides a single interface to simultaneously change a page table entry and invalidate TLB entries. To reduce the differences between Xen and non-Xen code, the HAT code was restructured.

Some kmdb dcmds have been modified and new ones introduced to help manage the difference between PFN and MFNs during kernel development or crash analysis.

Booting the Kernel

The changes to the way OpenSolaris boots were extensive and complicated. The goal was to make the boot time code used on plain hardware and the code used under Xen as similar as possible. As part of that approach we decide to eliminate the separate boot loader found in /boot/multiboot altogether.

Review of Pre-Xen Boot

As a refresher, the pre-Xen version of OpenSolaris gets into memory in the following way on x64 hardware:

  • Grub is used to load the /boot/multiboot program and the boot_archive into memory.
  • multiboot then determines which version of unix in the boot_archive to boot based on what sort of hardware (32 or 64 bit) is present and any command line information passed to it in the menu.lst file.
  • multiboot builds an intial set of 32 bit page tables to enable it to load the unix executable at the appropriate place in virtual memory as described the the unix ELF file. When booting the 64 bit kernel, an optional 2nd layer is used to automatically double map the 32 bit virtual memory into the top of 64 bit virtual memory.
  • The "unix" executable is rather incomplete (ie. it won't run by itself) but has embedded in it a PT_INTERP section that points to the krtld (kernel runtime loader) module. multiboot combines krtld from the boot_archive with unix as it loads both into memory.
  • Execution actually starts in krtld. Additional modules needed by the kernel are loaded by krtld from the boot_archive. Once the kernel is complete enough to run, execution in the kernel finally begins.
  • multiboot continues to be used, via the BOP_X\*() interfaces, to manage virtual memory and console I/O until the kernel has initialized itself enough to take over.

New Approach to Boot

This seemed like a lot of code to port to Xen, especially since multiboot effectively is just a memory allocator and ELF file decoder. An additional problem was that multiboot was very much a 32 bit program, but on amd64 platforms the Xen domain is always entered in 64 bit mode. A lot of tedious clean up work would be required to make mutltiboot even compile, let along work, as a 64 bit program. We decided to make the following changes to the way in which we build Unix:

  • Link krtld (as well as enough other code) into the unix ELF file at build time. Hence, there is no more PT_INTERP section in unix.
  • We rely on grub to load the unix file directly. For amd64 kernels this relies on grub's a.out hack code to load the 64 bit ELF based on an embedded multiboot header.
  • The unix ELF file's text and data segments now have explicitly specified physical load addresses which are at 4 Meg and 8 Meg.
  • A third loadable segment was added to the unix ELF file. The code in this segment is compiled to load and run at address 12 Meg. The code is always 32 bit executable on hardware, but is native when under Xen. It contains the ELF (or multiboot header) specified entry point. We call this code "dboot", short for Direct Boot.
If you want to understand these changes more completely, you can read the OpenSolaris makefiles (both Xen and pre-Xen). Another way to compare them is to run elfdump(1) on the unix files that result.

Using this new version of the unix file, the following happens at boot:

  • Grub loads the UNIX file, either as 32 bit ELF or 64 bit using the a.out hack and transfers control to the dboot code.
  • The dboot code builds page tables that exactly match what the booted kernel (64 bit, 32 bit PAE or 32 bit non-PAE) will use. The page table entries include mappings for the kernel text and data at the correct high virtual memory addresses.
  • For non-Xen, dboot activates paging mode
  • The dboot code finally jumps into unix kernel text.
  • The entry point in unix, _start, is provided by i86pc/os/fake_bop.c. As the name implies, this is kernel code which emulates the old BOP_\*() interfaces that the rest of kernel startup relies on.

This new boot approach is much smaller and simpler. It also removes many artificial restrictions that startup.c had to deal with, like a 32 bit allocator in the 64 bit kernel. You can read more about these in Nils blog.

As an additional clean up, the code to manage console I/O and to deal with boot time page table and memory management was made "common" source between the dboot code and what the kernel needed in early startup.

The big benefit for the Xen port was that the dboot code was easy to port to Xen. Since much of the code is now common between dboot and the rest of the kernel, it was designed to work from the beginning in a 64 bit environment.

menu.lst changes

The new way of booting requires you to specify the kernel you want to boot explicitly in your grub menu.lst file. You can see more of what is going on by adding prom_debug=true,kbm_debug=true to your menu.lst file. This is done by adding the -B

title 32 bit OpenSolaris with boot time debug output
kernel /platform/i86pc/kernel/unix -B prom_debug=true,kbm_debug=true
module /platform/i86pc/boot_archive

title 64 bit OpenSolaris no debug output, but console I/O to serial port
kernel /platform/i86pc/kernel/amd64/unix -B console=ttya
module /platform/i86pc/boot_archive
Under Xen you include these settings in your domain builder configuration file in the "extra" property.


Post a Comment:
Comments are closed for this entry.



« April 2014

No bookmarks in folder


No bookmarks in folder