Hotspot internals Q&A
By nike on Jul 08, 2007
Now this blog is mostly collection of random system programming technologies, in my opinion interesting enough to share. But as my full time job is to hack Hotspot JVM I could also answer VM internals related questions here (as long as they are non-trivial, and could be answered in 100-200 words :)).
Please leave your questions as comments to this posting, and I'll try to answer.
Q: Where can I read more about optimizations performed by Hotspot at runtime?
A: This question is somewhat broader in scope that ones I'm ready to answer here, but would like to suggest those links: Hotspot group page provides some useful info, including rather simple way to obtain VM sources and examine it yourself.
Q: What is an OOP?
A: It is regular object pointer (a-la C pointer) pointing to the object in Java heap. In systems with compaction of objects (and hence moving) oops, unlike pointers need to be updated when collector move what it points on, so VM have to know location of all oops, using so-called oop-maps. To complicate situation a bit, as VM is in charge of dereferencing of an oop, it's possible to use somewhat mangled version of pointer, for example on 64-bit systems use 32-bit values and derefernce using heap base and knowing object alignment, thus addressing up to 32G of heap with 8-byte alignment, and 64G with 16-byte alignment.
Q: What are YOU personally working on in HotSpot?
A: I work in the runtime team, we deal with OS support, synchronization, JNI and everything else what's not covered by JIT complier team and garbage collector team. My job includes bugfixing, porting on new platforms, and big project I'm doing now is so-called compressed oops, using 32-bit values to address objects in Java heap, thus decreasing memory traffic and footprint in 64-bit systems.
Q: Why chunked heaps are not implemented in hotspot ?
A: That's very long story, and my personal opinion (not Sun's) is that because this feature is hard enough to implement (for example barriers code expects contiguous heaps), and it seems to be less and less relevant with migration to 64-bit architectures. Actually G1 collector, which is in productization phase now, uses heap logically split in several pieces (regions).
Q: Does Hotspot generates SIMD instructions (bug 6536652)?
A: Currently there's some basic infrastructure for SIMD support in place, and it's planned to implement more vectorization.
Q: Is it possible to see native code generated by JIT compiler?
A: Yes, there are two ways to do that: one using serviceability agent (available now), and using disassembler DLL (will be available eventually).
- read documentation in
- build Serviceability Agent (
cd agent/make && make all)
- use command line script
clhsdbproc.shor UI version
- in "Class Browser"'s you can check compiled code (if any) and view the disassembly
- read documentation in
Q: Hotspot doesn't compile on platform XXX with compiler YYY and libc ZZZ?
A: I intended this Q&A session only for technical questions on VM internals, for bug reporting and discussions please use bug tracking system or mailing lists.
Q: How much additional optimization whould be possible for JIT compilers, it they have most/all of high-level code structure. Namely, if we compile the source code not into bytecode, but into some kind of internal representation?
A: Idea of high level intermediate representation as target for compilation (or other generation) is pretty old (search your favorite search engine starting with G "portable intermediate representation"). I think system like that was built starting from 80s. For Java this idea was also considered, see for example this paper, but I don't see much benefits from that, maybe other than more compact representation. With heavily optimizing compiler it's more important to be able easily extract what particular Java program does, and in this sense Java bytecode is acceptable (tree representation could be constructed easily, if needed).
Q: The JLS in section "17.4.4 Synchronization Order" states that:
"The write of the default value (zero, false or null) to each variable synchronizes-with the first action in every thread."
This implies that a thread is guaranteed to never read the old state of an object that has been collected. I can imagine that a gc, after collection, zeroes out the memory and stops every thread to force a synchronization. But how are the details? In particular, how can the gc force a synch action on a thread to ensure that it really sees the zeroes when it accesses that part of the memory? How can the gc thread force another thread to perform an acquire after its own release?
A: Generally, Doug Lea JSR-133 Cookbook provides very good documentation on Java memory model. Regarding your question, GC usually (unless concurrent collector is used) forces threads to safepoint, and only then updates memory. Concurrent collector uses atomic operations on memory locations it modifies if running concurrently, or also uses safepointing. GC moves objects only during safepoint, and Hotspot uses cooperative suspension model. GC thread forces Java thread to go on safepoint by read protecting "polling page" (so safepoint check is just single memory load instruction). Java threads check polling page in "safe places", i.e. when all object references are in locations described by oop maps, so that GC knows their location and can update with new object pointer. After Java thread resumes it sees all references values updated, as in between memory barrier instruction issued, so no stalled values could be in caches. This is rather complex topic, so I'd suggest you look at
safepoint.cppin Hotspot source code for better understanding synchronization protocol used by the VM.