Download a PDF of this article
In previous issues of Java Magazine, we introduced the basic theoretical concepts of just-in-time (JIT) compilation as well as the Java Microbenching Harness and the JITWatch open source tool for visualizing and understanding the basic mechanisms provided in the Java HotSpot VM. In this article, we dive into escape analysis (EA), which is one of the more interesting forms of optimization that takes place in the JVM. EA is an automatic analysis of the scope of variables performed by the JVM to enable certain kinds of special optimizations, which we’ll also examine. To follow along, you need only basic familiarity with how the HotSpot JVM works.
To understand the basic idea behind EA, let’s look at the following buggy C code—which is impossible to write in Java, of course:
int * get_the_int() {
int i = 42;
return &i;
}
This C code creates an int
on the stack and then returns a pointer to it as the return value of the function. This is incorrect, because the stack frame where the int
was stored is destroyed as get_the_int()
returns, so you have no way of knowing what is in the memory location if it is accessed at some later time.
Completely eliminating the possibility of these types of bugs was a major safety goal in the design of the Java platform. By design, the JVM does not have a low-level “read memory at location indexed by value” capability. All heap access is done by field name (or array index) relative to a base object. The relevant JVM bytecodes corresponding to these operations include getfield
and putfield
.
Now consider the following bit of Java code:
public class Rect {
private int w;
private int h;
public Rect(int w, int h) {
this.w = w;
this.h = h;
}
public int area() {
return w * h;
}
public boolean sameArea(Rect other) {
return this.area() == other.area();
}
public static void main(final String[] args) {
java.util.Random rand = new java.util.Random();
int sameArea = 0;
for (int i = 0; i < 100_000_000; i++) {
Rect r1 = new Rect(rand.nextInt(5), rand.nextInt(5));
Rect r2 = new Rect(rand.nextInt(5), rand.nextInt(5));
if (r1.sameArea(r2)) { sameArea++; }
}
System.out.println("Same area: " + sameArea);
}
}
This code creates 100 million pairs of rectangles of random size and counts how many pairs are of equal size. During each iteration of the for
loop, a new pair of Rect
objects is allocated. You would therefore expect 200 million Rect
objects to be allocated in the main
method: 100 million each of r1
and r2
.
However, if an object is created in one method and used exclusively inside that method—that is, if it is not passed to another method or used as the return value—the runtime can potentially do something smarter. You can say that the object does not escape and the analysis that the runtime (really, the JIT compiler) does is called escape analysis.
If the object does not escape, then the JVM could, for example, do something similar to an “automatic stack allocation” of the object. In this case, the object would not be allocated on the heap and it would never need to be managed by the garbage collector. As soon as the method containing the stack-allocated object returned, the memory that the object used would immediately be freed.
In practice, the HotSpot VM’s C2 JIT compiler does something more sophisticated than stack allocation. Let’s have a look.
Within the HotSpot VM source code, you can see how the EA analysis system classifies the usage of each object:
typedef enum {
NoEscape = 1, // An object does not escape method or thread and it is
// not passed to call. It could be replaced with scalar.
ArgEscape = 2, // An object does not escape method or thread but it is
// passed as argument to call or referenced by argument
// and it does not escape during call.
GlobalEscape = 3 // An object escapes the method or thread.
}
The first option suggests that the object can be replaced by a scalar substitute. This elimination is called scalar replacement. This means that the object is broken up into its component fields, which are turned into the equivalent of extra local variables in the method that allocates the object. Once this has been done, another HotSpot VM JIT technique can kick in, which enables these object fields (and the actual local variables) to be stored in CPU registers (or on the stack if necessary).
One of the major challenges of the Java platform is the sophistication of the execution model. In this case, just by looking at the Java source code, you might naively conclude that the object r1
does not escape the main
method but that r2
is passed as an argument to the sameArea
method on r1
and so it escapes the scope of the main
method.
Using the previous classifications, it would appear at first sight that r1
should be treated as a NoEscape
and r2
should be treated as an ArgEscape
; however, this would be a dangerous conclusion for several reasons.
First of all, recall that method calls in Java are replaced by the Java compiler with invoke
bytecodes. These operate by setting up the stack with the destination of the call (known as the receiver object) and with any arguments before the call of the appropriate method is looked up and dispatched (that is, executed).
This means that the receiver object is also passed to the method being called (it becomes the this
object in the method that is called). So receiver objects also escape the current scope; in this case, that would mean that both r1
and r2
would be classified as ArgEscape
if EA were to be applied to the code as it appears in the Java source code.
If this were the whole story, it would seem that the feature of allocation elimination is extremely limited. Fortunately, the Java HotSpot VM can do better than this. Let’s look at the detail of the bytecode and see what can be observed.
The method sameArea()
is both small (17 bytes of bytecode) and frequently called in the example, thereby making it an ideal candidate to be inlined:
public boolean sameArea(Rect);
Code:
0: aload_0
1: invokevirtual #4 // Method area:()I
4: aload_1
5: invokevirtual #4 // Method area:()I
8: if_icmpne 15
11: iconst_1
12: goto 16
15: iconst_0
16: ireturn
The method makes two further calls to another (easily inlineable) method area()
:
public int area();
Code:
0: aload_0 #2 // Field w:I
1: getfield
4: aload_0
5: getfield #3 // Field h:I
8: imul
9: ireturn
Using JITWatch or PrintCompilation, you can see that the calls to area()
are indeed inlined into their caller sameArea()
and that method is inlined into its callsite in the loop body of the method. JITWatch provides a useful graphical representation of which methods will be inlined (illustrated in Figure 1).
Figure 1.
Remember that the order in which the Java HotSpot VM applies its JIT compiler optimizations is important. Method inlining is one of the first optimizations and is known as a gateway optimization, because it opens the door to other techniques by first bringing related code closer together.
Now that the call to sameArea()
and the calls to area
have been inlined, the method scopes no longer exist, and the variables are present only in the scope of main()
. This means that EA will no longer treat either r1
or r2
as an ArgEscape
: both are now classified as a NoEscape
after the methods have been fully inlined.
This might seem like a counterintuitive result, but you need to bear in mind that the original source code is not what the JIT compiler will use as a starting point. Without this knowledge, it’s easy to draw the wrong conclusion about what is eligible for EA.
In the previous example, both of these object allocations can avoid using the heap and instead their fields will be treated as individual values. The register allocator will normally place the broken-up object fields directly into registers, but if not enough free registers are available, the remaining fields will be placed on the stack. This situation is known as a stack spill.
To illustrate the power of eliminating heap allocations inside tight loops of code, run this program with and without EA enabled and inspect the activity of the garbage collector.
Because EA is enabled by default in modern JVMs, to do this, you need to disable EA by using the JVM switch -XX:-DoEscapeAnalysis
.
Here is the garbage collection log with EA enabled (with some extraneous detail removed):
java -XX:+PrintGCDetails Rect
Same area: 18073993
Heap
PSYoungGen total 95744K, used 13462K
eden space 82432K, 16% used
from space 13312K, 0% used
to space 13312K, 0% used
ParOldGen total 218624K, used 0K
object space 218624K, 0% used
Metaspace used 2664K, capacity 4490K, committed 4864K, reserved 1056768K
class space used 286K, capacity 386K, committed 512K, reserved 1048576K
The log shows that there were no GC events at all—instead, the log just contains the heap summary as the process exits. If you look at the GC Log from a run without escape analysis enabled, then things look quite different:
java -XX:+PrintGCDetails -XX:-DoEscapeAnalysis Rect
[GC (Allocation Failure) [PSYoungGen: 82432K->480K(95744K)] 82432K->488K(314368K),
0.0008348 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 82912K->464K(95744K)] 82920K->480K(314368K),
0.0007404 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[Many minor GC collections]
[GC (Allocation Failure) [PSYoungGen: 56352K->0K(55808K)] 56720K->368K(274432K),
0.0004405 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 55296K->0K(54784K)] 55664K->368K(273408K),
0.0004537 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Same area: 18080278
Heap
PSYoungGen total 54784K, used 46674K
eden space 54272K, 86% used
from space 512K, 0% used
to space 512K, 0% used
ParOldGen total 218624K, used 368K
object space 218624K, 0% used
Metaspace used 2665K, capacity 4490K, committed 4864K, reserved 1056768K
Class space used 286K, capacity 386K, committed 512K, reserved 1048576K
In this case, you can clearly see the GC events that are caused by allocation failure as the Eden area of memory fills up and needs to be collected.
The addition of EA to the Java HotSpot VM is a useful improvement. When EA was in development, an additional 3% to 6% performance increase in real-world tests was seen that was directly attributable to it.
However, for the developer who is also interested in the how and why of platform features, EA provides an interesting insight: it is a feature that depends upon another optimization (automatic inlining) and is essentially useless without it.
The low-level details and the source code of the JVM’s implementation can be found in opto/escape.hpp
in the Java HotSpot VM source code. It is a modified form of the algorithm presented in the “Escape Analysis for Java” proceedings of the ACM SIGPLAN Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) conference in November 1999 by Jong-Deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreedhar, and Sam Midkiff.
Chris Newland (@chriswhocodes) is a Java Champion. He invented and still leads developers on the JITWatch project, an open source log analyzer for visualizing and inspecting just-in-time compilation decisions made by the HotSpot JVM.
Ben Evans (@kittylyst) is a Java Champion and Senior Principal Software Engineer at Red Hat. He has written five books on programming, including Optimizing Java (O'Reilly) and The Well-Grounded Java Developer (Manning). Previously he was Lead Architect for Instrumentation at New Relic, a founder of jClarity (acquired by Microsoft) and a member of the Java SE/EE Executive Committee.