By kucharsk on Dec 13, 2005
Now I can finally chat a bit about what I've been doing for the last several months since finishing up the work to bring Solaris 10 to amd64 hardware.
BrandZ and Signals
One of the more difficult problems of running a Linux branded process in BrandZ is that of signal delivery. While Solaris and Linux largely support the same signals, they often have different signal numbers, and in one case have different default semantics. Signal delivery is further complicated by differences in signal stack structure and contents and the action taken when a user signal handler exits. Finally, many signal-related structures, such as sigset_ts differ between Solaris and Linux. All of these differences must be accounted for if the signal delivery mechanism is to have any hope of working correctly, and frankly dealing with them made me yearn for the "simpler" days of translating amd64 page table entries back and forth between 32 and 64-bit mode. :-)
Signal number conversion
The simplest transformation that must be done when sending signals is to translate between Linux and Solaris signal numbers.
So for example, when a Linux process sends a signal using the kill(2) system call, the BrandZ infrastructure must translate the signal into its Solaris equivalent before handing control off to the Solaris kill(2). Conversely, when a signal is delivered to a Linux process, BrandZ must convert the signal number from its Solaris value to back to its Linux value. This translation of signal numbers at both the time of generation and the time of delivery ensures that the Solaris kernel sees only Solaris signals and that any signals generated by the kernel are seen by Linux processes as the proper signal.
Calling user signal handlers
In order to support user-level signal handlers, BrandZ uses a double layer of indirection to process and deliver signals to branded threads.
In a native Solaris process, signal delivery is interposed upon by libc for any thread registering a signal handler. Libc needs to do various bits of magic to provide thread-safe critical regions, so it registers its own handler for the signal, named sigacthandler(), using the sigaction(2) system call.
Adding a Linux branded thread to the mix complicates this behavior further, as when a thread receives a signal, it may be running with a Linux value in the x86 %gs segment register as opposed to the value Solaris threads expect; if control were passed directly to a bit of Solaris code, that code would suffer a segmentation fault the first time it tried to dereference a memory location using %gs.
This need to impose upon the normal Solaris signal handling mechanism means that while the path from signal generation to delivery for a native Solaris thread looks something like:
kernel -> sigacthandler() -> call_user_handler() -> user signal handlerfor BrandZ Linux threads, this instead would look like:
kernel -> lx_sigacthandler() -> sigacthandler() -> call_user_handler() -> lx_call_user_handler() -> Linux user signal handlerThe new routines mentioned above are:
This routine is responsible for setting the %gs segment register to the value Solaris code expects, and jumping to Solaris' libc signal interposition handler, sigacthandler().
This routine is responsible for translating Solaris signal numbers to their Linux equivalents, building a Linux signal stack based on the information Solaris has provided, and passing the stack to the registered Linux signal handler. It is, in effect, the Linux thread equivalent to libc's call_user_handler()
void setsigacthandler(void (\*new_handler)(int, siginfo_t \*, void \*), void (\*\*old_handler)(int, siginfo_t \*, void \*))
Once setsigacthandler() has been executed, all future branded threads the thread may create will automatically have the proper interposition handler installed as the result of any sigaction() call.
Note that none of this interposition is necessary unless a Linux thread registers a user signal handler, as the default action for all signals is the same between Solaris and Linux save for one signal, SIGPWR. To handle this case, BrandZ always installs its own internal signal handler for SIGPWR that translates performs the Linux default action, namely to terminate the process upon receipt. (Solaris' default action is to ignore SIGPWR.)
Returning from a user signal handler
The process of returning to an interrupted thread of execution from a user signal handler is entirely different between Solaris and Linux. While Solaris generally expects to set the context to the interrupted one on a normal return from a signal handler, Linux instead pushes actual code that calls a specific Linux system call, sigreturn(2), onto the signal handler's stack. Then when a Linux signal handler completes execution, instead of returning through what would in Solaris' libc be a call to setcontext(2), a call to sigreturn() is responsible for accomplishing much the same thing.
This trampoline code:
pop %eax mov LX_SYS_sigreturn, %eax int $0x80is referenced such that when the Linux user signal handler is eventually called, the stack looks like this:
|Pointer to trampoline code|
|Linux signal number|
|Pointer to Linux siginfo_t|
|Pointer to Linux ucontext_t|
When the trampoline code is executed, BrandZ interposes upon the Linux sigreturn(2) call in order to turn it into the return through the libc call stack that Solaris expects. This is done by the lx_sigreturn() routine, which removes the Linux signal frame from the stack and pass the resulting stack pointer to another routine, lx_sigreturn_tolibc(), which makes libc believe the user signal handler it had called returned.
When control returns to call_user_handler(), a setcontext(2) will be done that (in most cases) returns the thread executing the code back to the location originally interrupted by receipt of the signal.
One final complication in this process is the restoration of the %gs segment register. The proper value is saved when the thread context is originally saved, but prior to BrandZ, code existed in libc to force the value to that expected by libc before calling setcontext(2).
For BrandZ, the code that did so has been removed. While perhaps making faults due to bad user context values for %gs harder to debug (as such bad values will now make applications appear to segmentation fault deep within Solaris' libc), the versatility to properly restore custom %gs values seems worth the trade-off.
WrapupSo while this may all seem unbelievably complex, it actually works rather well. Best of all does it without impacting the performance of native Solaris threads running in other zones on the same machine, one of our primary goals with this project.
BrandZ is, of course still a work in progress with the actual product still undergoing heavy development, but at least the OpenSolaris release gives you something you can touch, feel and play with.
I suspect you'll find it was well worth the wait.
Technorati Tag: OpenSolaris
Technorati Tag: BrandZ
Technorati Tag: Solaris