Hotspot internals Q&A

Now this blog is mostly collection of random system programming technologies, in my opinion interesting enough to share. But as my full time job is to hack Hotspot JVM I could also answer VM internals related questions here (as long as they are non-trivial, and could be answered in 100-200 words :)).

Please leave your questions as comments to this posting, and I'll try to answer.


Q&A section

  • Q: Where can I read more about optimizations performed by Hotspot at runtime?
    A: This question is somewhat broader in scope that ones I'm ready to answer here, but would like to suggest those links:
    1. Hotspot technology docs
    2. Whitepaper on performance optimizations in 5.0
    If you have more concrete questions - feel free to ask. Also Hotspot group page provides some useful info, including rather simple way to obtain VM sources and examine it yourself.
  • Q: What is an OOP?
    A: It is regular object pointer (a-la C pointer) pointing to the object in Java heap. In systems with compaction of objects (and hence moving) oops, unlike pointers need to be updated when collector move what it points on, so VM have to know location of all oops, using so-called oop-maps. To complicate situation a bit, as VM is in charge of dereferencing of an oop, it's possible to use somewhat mangled version of pointer, for example on 64-bit systems use 32-bit values and derefernce using heap base and knowing object alignment, thus addressing up to 32G of heap with 8-byte alignment, and 64G with 16-byte alignment.
  • Q: What are YOU personally working on in HotSpot?
    A: I work in the runtime team, we deal with OS support, synchronization, JNI and everything else what's not covered by JIT complier team and garbage collector team. My job includes bugfixing, porting on new platforms, and big project I'm doing now is so-called compressed oops, using 32-bit values to address objects in Java heap, thus decreasing memory traffic and footprint in 64-bit systems.
  • Q: Why chunked heaps are not implemented in hotspot ?
    A: That's very long story, and my personal opinion (not Sun's) is that because this feature is hard enough to implement (for example barriers code expects contiguous heaps), and it seems to be less and less relevant with migration to 64-bit architectures. Actually G1 collector, which is in productization phase now, uses heap logically split in several pieces (regions).
  • Q: Does Hotspot generates SIMD instructions (bug 6536652)?
    A: Currently there's some basic infrastructure for SIMD support in place, and it's planned to implement more vectorization.
  • Q: Is it possible to see native code generated by JIT compiler?
    A: Yes, there are two ways to do that: one using serviceability agent (available now), and using disassembler DLL (will be available eventually).
    1. read documentation in agent/doc
    2. build Serviceability Agent (cd agent/make && make all)
    3. use command line script clhsdbproc.sh or UI version hsdbproc.sh
    4. in "Class Browser"'s you can check compiled code (if any) and view the disassembly
  • Q: Hotspot doesn't compile on platform XXX with compiler YYY and libc ZZZ?
    A: I intended this Q&A session only for technical questions on VM internals, for bug reporting and discussions please use bug tracking system or mailing lists.
  • Q: How much additional optimization whould be possible for JIT compilers, it they have most/all of high-level code structure. Namely, if we compile the source code not into bytecode, but into some kind of internal representation?
    A: Idea of high level intermediate representation as target for compilation (or other generation) is pretty old (search your favorite search engine starting with G "portable intermediate representation"). I think system like that was built starting from 80s. For Java this idea was also considered, see for example this paper, but I don't see much benefits from that, maybe other than more compact representation. With heavily optimizing compiler it's more important to be able easily extract what particular Java program does, and in this sense Java bytecode is acceptable (tree representation could be constructed easily, if needed).
  • Q: The JLS in section "17.4.4 Synchronization Order" states that: "The write of the default value (zero, false or null) to each variable synchronizes-with the first action in every thread." This implies that a thread is guaranteed to never read the old state of an object that has been collected. I can imagine that a gc, after collection, zeroes out the memory and stops every thread to force a synchronization. But how are the details? In particular, how can the gc force a synch action on a thread to ensure that it really sees the zeroes when it accesses that part of the memory? How can the gc thread force another thread to perform an acquire after its own release?
    A: Generally, Doug Lea JSR-133 Cookbook provides very good documentation on Java memory model. Regarding your question, GC usually (unless concurrent collector is used) forces threads to safepoint, and only then updates memory. Concurrent collector uses atomic operations on memory locations it modifies if running concurrently, or also uses safepointing. GC moves objects only during safepoint, and Hotspot uses cooperative suspension model. GC thread forces Java thread to go on safepoint by read protecting "polling page" (so safepoint check is just single memory load instruction). Java threads check polling page in "safe places", i.e. when all object references are in locations described by oop maps, so that GC knows their location and can update with new object pointer. After Java thread resumes it sees all references values updated, as in between memory barrier instruction issued, so no stalled values could be in caches. This is rather complex topic, so I'd suggest you look at safepoint.cpp in Hotspot source code for better understanding synchronization protocol used by the VM.
Comments:

Where can I read more about optimizations performed by Hotspot at runtime?

Posted by Daniil on July 08, 2007 at 09:32 AM MSD #

Is there a whitepaper on JDK 6 performance in the works? Can you give any details on what Bug 6536652 (SIMD)actually does? I'm curious what SIMD instructions are generated are the kind of Java code that will generate them. Im really interested in how HotSpot can optimise complex algorithmic code (such as in multimedia and signal processing) in the hope that one day it make the use of native performance libraries unneccessary (such as MediaLib with Java Advanced Imaging). I also wonder if there is any hope of scalarization of objects. For example, a calculateLocation() method that returns new Point(x, y), I'd love to see the Point object optimised away entirely when the method is inlined so x and y stay in registers. Lastly, What are YOU personally working on in HotSpot?

Posted by Ben Loud on July 09, 2007 at 02:22 AM MSD #

What is an OOP? Please, DO NOT use any word beginning with O or P in your reply. The current definition "Ordinary Object Pointer" is far too general to throw any light on the subject. As a follow-on question - In the HotSpot [tm] VM, everything could be called an OOP, so how could that be useful?

Posted by guest on July 09, 2007 at 02:52 AM MSD #

Is it possible to see native code generated by runtime compiler?

Posted by Peter on July 09, 2007 at 03:40 AM MSD #

hospot is basically about compiling some parts of bytecode into native instructions. However, here outside of sun we do not have possibility to check instructions generated by hotspot. They are printed by 'disassembler.dll' and this is yet not downloadable due licence incompatibilites in past. Since hotspot sources are released under GPL, there should be no more hurdles. However, there is no observable progress in this area. Yet, we can only talk, talk and talk what advantages of hotspot. What is status of 'disassembler.dll' and where we can watch it??

Posted by martin on July 09, 2007 at 05:48 AM MSD #

192.18.43.225:

Yes, it should be possible see generated native code: You need just:
1. DEBUG version of runtime (http://download.java.net/jdk6/binaries/)
2. turn on printing (-XX:PrintAssembly)

Now, You should see the output, but You will not, due missing 'disassembler.dll' (see my previous post)

Posted by martin on July 09, 2007 at 05:56 AM MSD #

Martin, thank you for your reply. Didn't know about DEBUG version of runtime. I hope disassembler.dll will be available once.

Posted by Peter on July 09, 2007 at 10:40 AM MSD #

I love the concept of dynamic FAQ :)
here is my question:
Why chunked heaps are not implemented in hotspot ? Rémi

Posted by Rémi Forax on July 09, 2007 at 10:59 AM MSD #

Здравствуйте! Когда я пытаюсь собрать HotSpot статически - в параметрах gcc и g++ указано static получаю такую ошибку: ----------------- Linking vm... /usr/bin/ld:libjvm.so.lds:12: parse error collect2: ld returned 1 exit status Linking launcher... /usr/bin/ld: cannot find -ljvm collect2: ld returned 1 exit status gmake[4]: \*\*\* [gamma] Ошибка 1 ----------------- Спасибо

Posted by Timofey on July 09, 2007 at 12:16 PM MSD #

Hi, thanks for opening your blog up to questions, I have one--is it possible to track, observe or log which methods are getting inlined? There is a discussion over on the JVM Languages Google group where this came up and we didn't know of a way to track this. Thanks, Patrick

Posted by Patrick Wright on July 09, 2007 at 03:12 PM MSD #

Hi Patrick.
Yes, it is possible. -XX:PrintInlining does the magic (AFAIK). However, You will need DEBUG version of jdk - see my posts above on how to get it. Here is most detailed list of jdk6 options i have seen. Note that most interesting options are available only in DEBUG build.

Posted by martin on July 10, 2007 at 04:39 AM MSD #

Hi! It's possible to build Hotspot for system with GLIBC2.1.3? I try build it statically on system with GLIBC2.3.2 and have some trouble... Thnx!!!

Posted by Timofey on July 10, 2007 at 07:47 AM MSD #

The JLS in section "17.4.4 Synchronization Order" states that:
"The write of the default value (zero, false or null) to each variable synchronizes-with the first action in every thread."
This implies that a thread is guaranteed to never read the old state of an object that has been collected. I can imagine that a gc, after collection, zeroes out the memory and stops every thread to force a synchronization. But how are the details? In particular, how can the gc force a synch action on a thread to ensure that it really sees the zeroes when it accesses that part of the memory?

Posted by Michel Onoff on July 18, 2007 at 04:55 AM MSD #

Hello Nikolay. I'm interested in your personal professional opinion on how much additional optimization whould be possible for JIT compilers, it they have most/all of high-level code structure. Namely, if we compile the source code not into bytecode, but into some kind of internal representation (a tree of nodes), which is proved to be valid (i.e. names are resolved, types and operations are verified), and high-level constructs are preserved (statements, and even design patterns are annotated in the code somehow). Actually, you can read http://www.symade.org/SOP_and_SymADE.doc (it's in russian) about the general idea. So, the question is - how much additional optimization is possible using this compilation method described above?

Posted by Maxim Kizub on July 18, 2007 at 02:41 PM MSD #

With regard to your kind answer to question about JLS in section "17.4.4 Synchronization Order", the point that is unclear to me is this:
How can the gc thread force another thread to perform an acquire after its own release?
It is only after it performs an acquire on the same sync object on which the gc thread performs a release that an ordinary Java thread is guaranteed to read the writes of the gc thread (or later release-writes by other threads).

Posted by Michel Onoff on July 20, 2007 at 05:42 AM MSD #

I have received several similar emails like this one.

Posted by links of london jewellery on November 23, 2009 at 02:06 AM MSK #

I compile(The C1 compiler) the hotspot by visual studio 2003. I test the hotspot by running a thread which is implemented with java. Before the test, I thought the truly startup function of the thread was the interpretion that interpret the bytecode of the "run" function and then execute. However, the truth is that when call the "FindClass" function it first call the "GenerateOopMap::interp1" and then verify the bytecode. when the thread start I found nothing about interpreting the bytecode. Is that means when load the class the function of the class is compiled into native code first?

Posted by CC on April 27, 2011 at 10:45 AM MSD #

Post a Comment:
  • HTML Syntax: NOT allowed
About

nike

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today