Solaris 10 on x64 Processors: Part 4 - Userland


The amount of work involved in the kernel part of the amd64 project was fairly large, fortunately the userland part was more straightforward because of our prior work on 64-bit Solaris on SPARC back in 1997. So, for this project, once the kernel work, which abstracts the hardware differences between processors, was done, many smaller tasks appeared that were mostly solved by tweaking Makefiles and finding occasional #ifdefs that needed something added or modified. Fortunately, it was also work that was done in parallel by many people from across the organizations that contribute to the Solaris product.

Of course there were other substantial pieces of work like the Sun C and C++ compilers, and the Java Virtual Machine; though the JVM was already working on 32-bit and 64-bit Solaris on SPARC as well as 32-bit on x86, and the Linux port of the JVM had already caused that team to explore many of the amd64 code generation issues.

One of the things we tried to do was to be compatible with the amd64 ABI on Linux. As we talked to industry partners, we discovered that there was a variety of interpretations of the term "ABI." Many of the people we talked to outside of Sun thought that "ABI" only referred to register usage, C calling conventions, data structure sizes and alignments. A specification for compiler and linker writers, but with little or nothing beyond that about the system interfaces an application can actually invoke. But, the System V ABI is a larger concept than that, and was at least intended to provide a sufficient set of binary specifications to allow complete application binaries to be constructed that could be built once, and run on any ABI-conformant implementation. Thus Sun engineers tend to think of "the ABI" as being the complete set of interfaces used by user applications, rather than just compiler conventions; and over the years we expanded this idea of maintaining a binary compatible interface to applications all the way to the Solaris application guarantee program.

Though we tried to be compatible at this level with Linux on amd64, we discovered a number of issues in the system call and library interfaces that made that difficult, and while we did eliminate gratuitous differences where we could, we eventually decided on a more pragmatic approach. We decided to be completely compatible with the basic "compiler" style view of the ABI, and simply try and make it simple to port applications from 32-bit Solaris to 64-bit Solaris, and from Solaris on sparcv9 to Solaris on x64, and leave the thornier problems of full 64-bit Linux application compatibility to the Linux Application Environment (LAE ) project.

Threads and Selectors

In previous releases of Solaris, the 32-bit threads library used the %gs selector to allow each LWP in a process to refer to a private LDT entry to provide the per-thread state manipulated by the internals of the thread library. Each LWP gets a different %gs value that selects a different LDT entry; each LDT entry is initialized to point at per-thread state. On LWP context switch, the kernel loads the per-process LDT register to virtualize all this data to the process. Workable, yes, but the obvious inefficiency here was requiring every process to have at least one extra locked-down page to contain a minimal LDT. More serious, was the implied upper bound of 8192 LWPs per process (derived from the hardware limit on LDT entries).

For the amd64 port, following the draft ABI document, we needed to use the %fs selector for the analogous purpose in 64-bit processes too. On the 64-bit kernel, we wanted to use the FSBASE and GSBASE MSRs to virtualize the addresses that a specific magic %fs and magic %gs select, and we obviously wanted to use a similar technique on 32-bit applications, and on the 32-bit kernel too. We did this by defining specific %fs and %gs values that point into the GDT, and arranged that context switches update the corresponding underlying base address from predefined lwp-private values - either explicitly by rewriting the relevant GDT entries on the 32-bit kernel, or implicitly via the FSBASE and GSBASE MSRs on the 64-bit kernel. The result of all this work makes the code simpler, it scales cleanly, and the resulting upper bound on the number of LWPs is derived only from available memory (modulo resource controls, obviously).

Floating point

Most of the prework we had done to establish the SSE capabilities in the 32-bit kernel was readily reused for amd64; modulo some restructuring to allow the same code to be compiled appropriately for the two kernel builds. However, late in the development cycle, the guys in our floating point group pointed out that we didn't capture the results of floating point exceptions properly; the result of a subtle difference in the way that AMD and Intel processors presented information to the kernel after the floating point exception had been acknowledged. Fortunately they noticed this, and we rewrote the handler to be more robust and to behave the same way on both flavors of hardware.

Continuous Integration vs. One Giant Putback

To try to keep our merging and synchronization efforts under control, we did our best to integrate many of the changes we were making directly into the Solaris 10 gate so that the rest of the Solaris development organization could see it. This wasn't a willy-nilly integration of modified files, instead each putback was a regression-tested subset of the amd64 project that could stand alone if necessary. Perhaps I should explain this a little further. The Solaris organization has, for many years, tried to adhere to the principle of integrating complete projects, that is, changes that can stand alone, even if the follow-on projects are cancelled, fail, or become too delayed to make the release under development. Some of the code reorganization we needed was done this way, as well as most of the items I described as "prework" in part 1. There were also a bunch of code removal projects we did that helped us avoid the work of porting obsolete subsystems and support for drivers. As an aside, it's interesting to muse on exactly who is responsible to get rid of drivers for obsolete hardware; it's a very unglamourous task, but one that it's highly necessary if you aren't to flounder under and ever more opaque and untestable collection of crufty old source code.

In the end though, we got to the point where the pain of creating and testing subsets of our change by hand to create partial projects in Solaris 10 became just too painful for the team to countenance. Instead, we focussed on creating a single delivery of all our change in one coherent whole. Our Michigan-based "army of one," Roger Faulkner did all of this, as well as most of the rest of the heavy lifting in userland i.e. creating the 64-bit libc and basic C run-time etc. as well as the threading primitives. Roger really did an amazing job on the project.

Projects of this giant size and scope are always difficult; and everyone gets even more worried when the changes are integrated towards the end of a release. However, we did bring unprecedented levels of testing to the amd64 project, from some incredible, hard working test people. Practically speaking I think we did a reasonable job of getting things right by the end of the release, despite a few last minute scares around our mishandling of process-private LDTs. Fortunately these were only really needed for various forms of Windows emulation, so we disabled them on the 64-bit kernel for the FCS product; this works now in the Solaris development gate, and a backported fix is working its way through the system.

Not to say that there aren't bugs of course ...

Distributed Development

I think it's worth sharing some of the experiences of how the core team worked on this project. First, when we started, Todd Clayton (the engineering lead, who also did the segmentation work, among other things) and I asked to build a mostly-local team. We asked for that because we believed that time-to-market was critical, and we thought that we could go the fastest with all the key contributors in close proximity. However, for a number of reasons, that was not possible, and we ended up instead with a collection of talented people spread over many sites as geographically distributed as New Zealand, Germany, Boston, Michigan, and Colorado as well a small majority of the team back in California. To help unify the team and make rapid progress, we came up with the idea of periodically getting the team together physically in one place (either offsite in California or Colorado) and spending a focussed week together. We spent the first week occupying a contiguous block of adjacent offices in another building; problem was that we didn't really change the dynamics of the way people worked with each other. Our accidental discovery came during our first Colorado meeting where we ended up in one (large!) training room for our kick-off meeting. Rather than trudge back across campus where we had reserved office space, we decided to stay put and just start work where we were, and suddenly everything clicked. We stayed in the room for the rest of the week, working closely with each other, immersing ourselves in the project, the team, and what needed to be done. This was very effective, because as well as reinforcing the sense of team during the week away, everyone was able to go back to their home sites and work independently and effectively for many weeks before meeting up again - with only an occasional phone call or email between team-members to synchronize.

Looking Back

I've tried to do a reasonable tour of the amd64 project, driven mostly by what stuck in my memory, and biassed by the work I was involved in to some degree, but obviously much detail has been omitted or completely forgotten. To the people at Sun whose work or contribution I've either not mentioned, foolishly glossed over or forgotten completely, sorry, and thanks for your efforts. To the people at AMD that helped support us, another thank you. To our families and loved ones that put up with "one more make," yet more thanks. This was a lot of work, done faster than any of us thought possible, and 2004 was in truth, well, a bit of a blur.

Technorati Tag: OpenSolaris
Technorati Tag: Solaris


Post a Comment:
Comments are closed for this entry.



« July 2016