A new boot architecture
Here I'll attempt to give a very brief overview of the components
that make up the new boot architecture we recently integrated into
New-boot as it was called during the development phase utilizes
GRUB as the initial boot loader. At the moment we're using GRUB 0.95
with a couple of patches and a number of bugfixes. The source to this
GRUB is available both via OpenSolaris under usr/src/grub/grub-0.95
as well as via the SUNWgrubS source package on every all post new-boot
One of the goals of the project was to establish an interface
between Solaris and the boot-loader allowing for (more) independent
development of either. The multiboot spec, as implemented from the
side of the boot-loader by GRUB seemed ideally suited to this.
The other side of the spec is implemented by the multiboot kernel /
boot loader simply called multiboot. Its code can be found under: usr/src/psm/stand/boot/i386/common.
From GRUB's perspective it is a truly multiboot compliant kernel. From
the perspective of the Solaris kernel, it is merely a ramdisk loader
and boot strap. It makes boot-time options passed through GRUB
available as properties and can read and load (gen)unix and krtld from
the ramdisk (more on the ramdisk in a moment).
Another goal was to reduce Solaris's reliance on hardware specific
features, specifically the ability to perform read operations from IO
devices early on in boot. Unlike some other operating systems, Solaris
loads all drivers dynamically and assembles the "path" of drivers
required to access the root device dynamically. This means it needs to
be able to access the files that are the driver binaries before it can
access the device they live on directly.
The history of this pre-root-mount IO is that on SPARC such IO is
accomplished by device specific fcode delivered via OBP (IEEE 1275),
or extensions to it delivered on IO adapters. On x86, the lack of such
an OBP was compensated for by bootconf and it's collection of
real-mode drivers. To the Solaris kernel, these real-mode drivers
presented a very OBP like interface allowing for little divergence in
the kernel configuration code. This meant that in order for a
particular device to be usable as a boot device on x86, it needed two
drivers: one real-mode driver to boot, and then another Solaris kernel
driver to access the device once the system is booted.
Much like OBP on SPARC (or on a PowerMac), x86 systems tend to come
with code that can access bootable devices. This is the BIOS, and in
many cases BIOS option ROMs on adapters. While the very early stage of
boot has always utilized this code, calling back into it during kernel
boot is problematic as it not only involves switching back into real
mode, but restoring enough state for the BIOS to be able to run again
as well as potentially saving an restoring state specific to whatever
IO device is being used.
So the problem is that we need to be able to read and load
arbitrary modules during boot and would like to utilize device
specific code that ships with the hardware that we can't call back
into once we've started to boot. The solution is to simply load
everything we could possibly need at once and then boot with it in
memory (which we don't need system specific drivers to access. The
implementation of this solution involves a ramdisk that is populated
pre-boot (either at install time or if it needs to be, updated
As I alluded to earlier, the multiboot kernel knows how to read
this ramdisk well enough to load krtld (the kernel linker/ loader),
and (gen)unix from it. Then krtld, which can also read the ramdisk
thanks to the code in usr/src/uts/common/krtld/bootrd.c,
can bring in modules (and other files) via kobj_open()
until root can be mounted.
Before we can make any real progress towards mounting root, we need
to have an idea of the physical layout of the machine. On SPARC this
is accomplished by looking at the hardware tree that OBP has built
(prtconf -p to view). On x86, this used to be accomplished by code in
bootconf1 that when it was done, exported something very
similar to a 1275 device tree.
Luckily all PCI devices can be enumerated quite reliably by parsing
PCI config space. This happens in pci_setup_tree(). If
you look closely at things like pci_reprogram(), you'll notice that we
do a little more than just enumerate them there.
While PCI accounts for most devices in a modern system, those of us
living in a UNIX world with serial consoles still like to use the
on-board serial ports (which are still ISA) that are still found on
many systems. Similarly, PS/2 ports (8042) while finally starting to
disappear on desktop systems still account for nearly all integrated
laptop keyboards and pointing devices. So we need to deal with at
least them. The good news is that the need to power manage devices in
the order they are connected (if you power down an HBA, you can't talk
to the disk to power it down) already lead to the need for some sort
of system wide description of how devices are interconnected and the
ACPI tables can provide this information.
Before I explain how ISA devices are enumerated, let's take a look
at ACPI. If you're thinking of ACPI as a power management related
spec, you are correct, but it has grown to include things that supersede
the MP tables (how do I find the not yet running processors),
Plug-and-Play interrupt programming and other things that are a
constant source of system specific bugs. Up to and including Solaris
10, we used acpi_int, which was a home grown ACPI interpreter that
suffered greatly from the vagueness of the original ACPI specs. While
we could have brought it up to speed with respect to the ACPI 2.0
spec, there is still the issue of machine specific bugs. As luck would
have it, Intel had recently made acpica (which is a fairly complete
OS-side ACPI implementation) available under a sufficiently free
license. After much reality checking with various engineers and not
least legal review we decided to dump acpi_intp and incorporate
acpica. It will likely form the basis of future power management
effort and can now be found under usr/src/uts/i86pc/io/acpica.
Information provided by ACPI is also used in pcplusmp
(on mp systems) and uppc
to configure interrupt routing. On multiprocessor systems it also
supplements information found in the MP tables to help us find APICs
and their associated CPUs.
Before I get back to ISA enumeration, one more note on ACPI. I
previously mentioned that the initial ACPI spec was quite vague. This
lead to odd implementations not only on the OS side, but also on many
systems that are still in use today. Currently we have don't trust any
systems (strictly speaking BIOSs) made before 1999. The startup code
makes that check. Beyond that, David implemented a
mechanism for us to deliver fixed tables as regular files to override
one's delivered by the hardware in case they are hopelessly
broken. This code can be found in AcpiOsTableOverride().
Now, on to ISA enumeration. This happens when the isa nexus is
attached and it sets out to enumerate its children in isa_alloc_nodes(). If
ACPI can be used on this system acpi_isa_device_enum()
is called and the devinfo nodes are built. Otherwise we revert to the
- Two serial ports.
- One parallel port.
- One i8042 (PS/2) node for mouse and keyboard.
- No floppy (it is known to hard hang if not present).
That pretty much covers device enumeration. Please keep in mind
that this is a 9000ft view and I'm skipping over numerous things that
took many engineers many months to figure out with less than a
Once all the needed drivers and support modules have been loaded,
it becomes time to actually mount root for the first time, read only,
in the kernel via mount_root()
(for nfs) or ufs_mountroot(). -
Strictly speaking most of the drivers were probably loaded as a result
of mountroot opening the root device and devfs
assembling the required drivers.
The actual root device to be mounted is specified via the bootpath
property. This could be any typical root device, even a metadevice.
If it is not set, we default to mounting root directly on the
In the case when root is mounted on a real device (not the
ramdisk), the ramdisk needs to contain little more than the kernel and
all required drivers. This type of ramdisk image is stored in
/platform/i86pc/boot_archive and needs to be kept in sync
with the kernel binaries on the root device in order to avoid loading
miss-matched modules from the root filesystem after root is
mounted2. The list of files and directories is keep in
/boot/solaris/filelist.ramdisk on the running system. The
task of syncing the bootarchive is handled by bootadm(1M).
The other case, in which we mount root on the ramdisk requires the
ramdisk to contain something closer to a minimal system. An example of
this is the install miniroot, but the options are really only limited
by ones time and available main memory.
1 In a way bootconf was pretty neat, the idea behind it was
that device probing during boot should be interactive so that probe
conflicts could be resolved by the end-user/system-admin on a system
by system basis. This had a lot of value in the bad old days before self
describing buses where you had to poke device registers and guess,
based on the devices reaction what kind of device it was and pray that
some other device wouldn't hard-hang the system if treated the same
2Solaris dynamically loads and unloads modules and drivers
on an as-needed basis.