Solaris 10 on x64 Processors: Part 2 - Getting Started

Booting and Startup
Some of the trickier issues with porting Solaris to a new platform architecture originate from some of the decisions we made 15 years ago. This is a complex story, and one that we're soon to drastically improve on x86/x64 systems with newboot , (as demoed recently by Jan Setje-Eilers at the first OpenSolaris User Group meeting) but I'll try to relate the minimum needed so that you understand why we took the path that we did for Solaris on x64 processors.

The Solaris 2 booting system was originally designed back in 1991 in the context of the problems and issues we had in the world of SunOS 4.x, our desires to have the hardware and software organizations in the company execute relatively independently, as well as to support the then-blossoming clone market for SPARC workstations. We wanted to enable both ourselves and 3rd parties to deliver support for new hardware without having to change the Solaris product CD bits - except by adding new driver modules to it. Also remember that speeds and feeds were two orders of magnitude slower then, and that kernel text was a larger proportion of the minimum memory size than it is today so we had to be far more selective about which modules to load as we booted.

The design we came up with starts with the primary bootstrap loading the secondary booter, and the secondary booter putting the core kernel into memory. The kernel then turns around and invokes various services from the secondary booter using the bootops interface to allow the kernel to discover, load and assemble the drivers and filesystem modules it needs as it initializes itself and explores the system it finds itself on. Once it determines it has all the modules it needs to mount the root filesystem, it takes over IO and memory allocation, mounts the root, and (if successful) continues to boot by loading additional kernel modules from that point on.

Note that this early part of the boot process starts out with the secondary boot program being the one true resource allocator i.e. in charge of physical memory, virtual memory and all I/O, and ends with the kernel being that resource allocator at the end. While moving from the former state to the latter sounds simple in principle, it's quite complex in practice because of the incremental nature of the handoff. For example, the DDI (device driver interfaces) aren't usable until the kernel has initialized its infrastructure for managing physical and virtual memory. So we have to load the modules we might need based on the name of the root device given to us by the booter which in turn comes from OpenBoot. Kernel startup somehow has to get most of the VM system initialized and working, yet still allow the boot program and its underlying firmware to do I/O to find the drivers and filesystem modules it needs from the boot filesystem to mount the filesystem. Practically speaking, this entails repeated resynchronization between the boot program and the kernel over the layout and ownership of physical and virtual memory. In other words, the kernel tries to take over physical and virtual memory management while effectively avoiding conflicting with the secondary booter and the firmware using the same resources. It's really quite a dance.

For the x86 port, a similar approach was used, using real-mode drivers that were placed onto a boot floppy as an analogue of the OpenBoot drivers in the SPARC world to construct a primitive device tree data structure analogous to the OpenBoot device tree. (In 1995, for the PowerPC port, we implemented "virtual open firmware" which was an even closer simulation of OpenBoot to make it easier to reuse SPARC boot and configuration code).

Note that the x86 secondary boot program itself runs in protected mode like the kernel; it is responsible for switching to real-mode and back to run the real-mode drivers.

Six years go by ...

During that time, specifically in Solaris 2.5, we made things even more complicated for hardware bringup by splitting the basic kernel into separate modules: unix, genunix and the kernel linker krtld; these make bringup more difficult because genunix and krtld are not relocated until run-time, thus diagnosing hexadecimal addresses where the kernel has fallen over in genunix or in krtld becomes a significant pain, absent a debugger like kmdb or its predecessor kadb.

Now, in 1997 when we created 64-bit Solaris for SPARC, it was relatively simple to make a 64-bit boot program use the OpenBoot interfaces; there is no "mode" switch between operating a SPARC V9 processor in "32-bit" mode or "64-bit" mode - apart from the address mask, the rest of the difference was entirely about software conventions, not hardware per se. So we didn't really have to do anything to the basic boot architecture, this part of the 64-bit Solaris project on SPARC was really quite straightforward, mostly a matter of getting the boot program LP64-clean.

Meanwhile, in late 2003 ...

The Opteron architecture presented us a far more difficult challenge because the processor needed to switch to long mode via an arcane sequence of instructions including switching between different format page tables and descriptor tables. Worse still, the 64-bit Solaris kernel on Opteron would need to turn around and invoke what it would think of as a 64-bit boot program running in long mode in order to fetch modules from the disk (and as discussed above) invoking real-mode code and the BIOS to do so!

Our initial approach was to use the existing, unmodified, protected mode 32-bit booter and have it boot an interposer called vmx that used the 32-bit booter to load the 64-bit kernel into double mapped memory. In this case, "double mapped" means that there's an (up to) 4Gbyte area of physical memory that are (a) mapped by 32-bits of VA in the protected mode page tables and (b) mapped by the bottom 32-bits of VA and the top 4G of VA in the long mode page tables. The interposer then pretended to be a 64-bit booter to the 64-bit kernel. When the 64-bit kernel asked vmx for a boot service via one the bootops vector (fortunately a relatively small and well-behaved interface), vmx quickly switched back to protected mode, then after massaging the arguments appropriately, invoked the bootops of the 32-bit, protected mode booter to provide the service. That service would in turn often result in the protected mode booter switching the processor back to real mode to deliver it. Though our colleagues at AMD winced at the thought of the poor processor switching back and forth hundreds or thousands of times through so many years of x86 history, Opteron didn't mind this at all.

Finally, before we delivered the code to the Solaris 10 gate, William Kucharski, who took the half-thought-out prototype and made all this insanity actually work correctly, integrated the vmx code inside the 32-bit booter so this component is invisible in the final product.

What else could we have done?

We could've made the boot program be completely 64-bit, and thus have that program deal with the mode switching from long, to protected to real to invoke the BIOS and back again. While possible, it would've involved porting a bunch of code in the x86 boot program to LP64, and reworking both protected-mode and real-mode assembler in the boot program to fit into an ELF64 environment. The latter seemed like a lot more work than we wanted, even if we'd somehow managed to convince the assembler and linker to support it.

Another suggestion was to somehow jump into the 64-bit kernel and do magic to allow us to call back to 32-bit boot services there. But that seemed to us to be just a matter of finding a different place to put the code we had in vmx; we thought putting it into boot where it will be eventually reused by the system was better than making that ugliness live in the kernel proper.

The other option on the table was to become dependent on the newboot project which we were planning at the time to bring us into the modern world of open source kernel booting; but we were unwilling to wait, or to force that dependency to be resolved earlier because of schedule risk.

Descriptor Tables and Segmentation

It quickly becomes apparent to students of the x86 architecture that with x64, AMD tried hard to preserve the better parts of the x86 segmentation architecture, while trying to preserve compatibility sufficient to allow switching to and from long mode relatively painlessly. But to the kernel programmer, it only seems to get more complicated. In previous versions of Solaris, we used to build the descriptor tables (almost) statically, with a single pass over the tables to munge IDT and GDT structures, from a form easy to initialize in software, into the form expected by the hardware. Early on, we realized it was worth bringing this up-to-date too, so we discarded what we had (for both 32-bit and 64-bit Solaris) and used the FreeBSD version which use a series of function calls to build table entries piece by piece.

Some of our more amusing early mistakes here included copying various IDT entries for machine exceptions as "trap" type exceptions instead of interrupt-type exceptions which caused real havoc when an interrupt would sneak in before the initial swapgs instruction of the handler. All terribly obvious now, but less than obvious at the time.

One optimization we made was to exploit the fact that the data segment descriptor registers %ds and %es were effectively ignored in long mode. Further, %cs and %ss were always set correctly by the hardware, %fs was unused by the kernel, and %gs has a special instruction to change the base address underlying the segment. Taken together this lead us to a scheme of lazy update of segment registers; we only update segment registers on the way out of the kernel if we know that something needs to be changed.

Next time I'll try and touch on the rest of the work involved with the kernel port proper.

Technorati Tag: OpenSolaris
Technorati Tag: Solaris

Comments:

Sir/Madam, I want to know the flags required to compile 32bit code on x64 Solaris machine, so that i can load my STREAMS module successfully. Error i m getting is: module can't be loaded: No such file or directory.

Posted by Ankur Agrawal on September 21, 2005 at 02:15 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

tpm

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today