Stacks with split personalities
By Doug Simon on Aug 26, 2010
After an update to one of our Linux test machines, stack overflow detection stopped working in Maxine on Linux. We employ the same mechanism as HotSpot of placing a yellow guard page at the end of the stack. That is, the lowest addressed page of the application accessible part of the stack is mprotected with
PROT_NONE. This is in addition (and adjacent) to the red guard page placed by the thread library itself. On entry to every compiled Java method, a stacking banging instruction sequence loads at a fixed negative offset from the stack pointer. For example, on x64 the sequence is:
mov r11, [rsp - 0x3000]If this address falls within the yellow guard page, an OS trap occurs. Control passes to the signal handler registered by the VM which will translate this as a stack overflow and throw a
StackOverflowError. However, as a result of the aforementioned update, we were no longer seeing an OS trap when the yellow guard page was accessed on Linux. Something was changing the protection bits for the guard page. With a little help from gdb and strace, I discovered that when the VM loaded the
libjava.solibrary from the JDK,
dlopentriggered a call to a rather self-documenting function named
__make_stacks_executable. What was this?! This function (source here) iterates over the list of current threads and calls
mprotecton the entire stack of each thread with the flags
PROT_READ | PROT_WRITE | PROT_EXEC. Of course, this removed the yellow guard page and hence the reason for the failure of stack overflow detection.
So, why did the linker decide to take this action? It turns out that it's all to do with the GNU_STACK ELF header, nicely described here and here. The Maxine launcher (
maxvm) and shared library (
libjvm.so) are compiled with the default stack-protection value for the GNU_STACK header which is RW (read/write). This means the thread library (pthreads) ensures all thread stacks are initially created with
PROT_READ | PROT_WRITE protection flags. Now when the linker loads a shared library that does not have a GNU_STACK header (e.g. a legacy library compiled before gcc added this header by default) or has a GNU_STACK header whose value is RWX, then it calls (via a call-back registered by pthreads I think) the aforementioned
__make_stacks_executable function. After this point, all currently active threads no longer have a yellow guard page on their stack. As I discovered, most of the shared libraries in the Linux JDK (build 1.6.0_20-b02) do not have a GNU_STACK header and hence indicate to the linker that they require executable stacks.
So, what's the solution for Maxine to handle this? A bit of experimentation revealed that if I added the default GNU_STACK header (with the execstack command) to all the shared libraries in the JDK, the Maxine VM stack overflow detection worked again. However, I'm not sure that this is safe in general as I don't know if HotSpot places executable code on the stack (much like gcc does to implement trampolines for local functions). The solution we've adopted for Maxine is to use the
-z execstack option when linking the VM launcher. This basically makes the launcher equivalent to the JDK shared libraries with respect to controlling pthread's use of executable stacks (except that it's done with a non-default value of the GNU_STACK header, not its absence). I'd be curious to know what version of gcc is used to build the JDK for Linux as I'm not aware of a flag to omit this header in the current version of gcc. One (unfortunate) implication of this linker behaviour is that HotSpot (and Maxine) binaries & libraries must always specify executable stacks if they are to continue supporting loading of 3rd party libraries (part of JNI). Either that or invent another way to implement stack overflow detection that does not involve protecting pages on the stack.