Escape Analysis in the HotSpot JIT Compiler

June 14, 2021 | 9 minute read
Text Size 100%:

Complex analysis of variables’ scope enables a variety of subtle optimizations.

Download a PDF of this article

In previous issues of Java Magazine, we introduced the basic theoretical concepts of just-in-time (JIT) compilation as well as the Java Microbenching Harness and the JITWatch open source tool for visualizing and understanding the basic mechanisms provided in the Java HotSpot VM. In this article, we dive into escape analysis (EA), which is one of the more interesting forms of optimization that takes place in the JVM. EA is an automatic analysis of the scope of variables performed by the JVM to enable certain kinds of special optimizations, which we’ll also examine. To follow along, you need only basic familiarity with how the HotSpot JVM works.

To understand the basic idea behind EA, let’s look at the following buggy C code—which is impossible to write in Java, of course:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
int * get_the_int() {
   int i = 42;
   return &i; 
}

This C code creates an int on the stack and then returns a pointer to it as the return value of the function. This is incorrect, because the stack frame where the int was stored is destroyed as get_the_int() returns, so you have no way of knowing what is in the memory location if it is accessed at some later time.

Completely eliminating the possibility of these types of bugs was a major safety goal in the design of the Java platform. By design, the JVM does not have a low-level “read memory at location indexed by value” capability. All heap access is done by field name (or array index) relative to a base object. The relevant JVM bytecodes corresponding to these operations include getfield and putfield.

Now consider the following bit of Java code:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
public class Rect { 
   private int w; 
   private int h;
   public Rect(int w, int h) {
      this.w = w; 
      this.h = h; 
   }
   public int area() { 
      return w * h; 
   }
   public boolean sameArea(Rect other) {
      return this.area() == other.area(); 
   }
   public static void main(final String[] args) { 
      java.util.Random rand = new java.util.Random();
      int sameArea = 0;
      for (int i = 0; i < 100_000_000; i++) {
         Rect r1 = new Rect(rand.nextInt(5), rand.nextInt(5)); 
         Rect r2 = new Rect(rand.nextInt(5), rand.nextInt(5));
         if (r1.sameArea(r2)) { sameArea++; }
      }
      System.out.println("Same area: " + sameArea);
   }
}

This code creates 100 million pairs of rectangles of random size and counts how many pairs are of equal size. During each iteration of the for loop, a new pair of Rect objects is allocated. You would therefore expect 200 million Rect objects to be allocated in the main method: 100 million each of r1 and r2.

However, if an object is created in one method and used exclusively inside that method—that is, if it is not passed to another method or used as the return value—the runtime can potentially do something smarter. You can say that the object does not escape and the analysis that the runtime (really, the JIT compiler) does is called escape analysis.

If the object does not escape, then the JVM could, for example, do something similar to an “automatic stack allocation” of the object. In this case, the object would not be allocated on the heap and it would never need to be managed by the garbage collector. As soon as the method containing the stack-allocated object returned, the memory that the object used would immediately be freed.

In practice, the HotSpot VM’s C2 JIT compiler does something more sophisticated than stack allocation. Let’s have a look.

Within the HotSpot VM source code, you can see how the EA analysis system classifies the usage of each object:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
typedef enum {
   NoEscape = 1,    // An object does not escape method or thread and it is 
                    // not passed to call. It could be replaced with scalar.
   ArgEscape = 2,   // An object does not escape method or thread but it is 
                    // passed as argument to call or referenced by argument 
                    // and it does not escape during call.
   GlobalEscape = 3 // An object escapes the method or thread.
}

The first option suggests that the object can be replaced by a scalar substitute. This elimination is called scalar replacement. This means that the object is broken up into its component fields, which are turned into the equivalent of extra local variables in the method that allocates the object. Once this has been done, another HotSpot VM JIT technique can kick in, which enables these object fields (and the actual local variables) to be stored in CPU registers (or on the stack if necessary).

One of the major challenges of the Java platform is the sophistication of the execution model. In this case, just by looking at the Java source code, you might naively conclude that the object r1 does not escape the main method but that r2 is passed as an argument to the sameArea method on r1 and so it escapes the scope of the main method.

Using the previous classifications, it would appear at first sight that r1 should be treated as a NoEscape and r2 should be treated as an ArgEscape; however, this would be a dangerous conclusion for several reasons.

First of all, recall that method calls in Java are replaced by the Java compiler with invoke bytecodes. These operate by setting up the stack with the destination of the call (known as the receiver object) and with any arguments before the call of the appropriate method is looked up and dispatched (that is, executed).

This means that the receiver object is also passed to the method being called (it becomes the this object in the method that is called). So receiver objects also escape the current scope; in this case, that would mean that both r1 and r2 would be classified as ArgEscape if EA were to be applied to the code as it appears in the Java source code.

If this were the whole story, it would seem that the feature of allocation elimination is extremely limited. Fortunately, the Java HotSpot VM can do better than this. Let’s look at the detail of the bytecode and see what can be observed.

The method sameArea() is both small (17 bytes of bytecode) and frequently called in the example, thereby making it an ideal candidate to be inlined:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
public boolean sameArea(Rect); 
   Code:
      0: aload_0 
      1: invokevirtual #4    // Method area:()I
      4: aload_1 
      5: invokevirtual #4    // Method area:()I
      8: if_icmpne     15 
     11: iconst_1 
     12: goto          16 
     15: iconst_0 
     16: ireturn

The method makes two further calls to another (easily inlineable) method area():

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
public int area(); 
   Code:
      0: aload_0       #2    // Field w:I

      1: getfield 
      4: aload_0 
      5: getfield      #3    // Field h:I

      8: imul 
      9: ireturn

Using JITWatch or PrintCompilation, you can see that the calls to area() are indeed inlined into their caller sameArea() and that method is inlined into its callsite in the loop body of the method. JITWatch provides a useful graphical representation of which methods will be inlined (illustrated in Figure 1).

Figure 1.

Figure 1.

Remember that the order in which the Java HotSpot VM applies its JIT compiler optimizations is important. Method inlining is one of the first optimizations and is known as a gateway optimization, because it opens the door to other techniques by first bringing related code closer together.

Now that the call to sameArea() and the calls to area have been inlined, the method scopes no longer exist, and the variables are present only in the scope of main(). This means that EA will no longer treat either r1 or r2 as an ArgEscape: both are now classified as a NoEscape after the methods have been fully inlined.

This might seem like a counterintuitive result, but you need to bear in mind that the original source code is not what the JIT compiler will use as a starting point. Without this knowledge, it’s easy to draw the wrong conclusion about what is eligible for EA.

In the previous example, both of these object allocations can avoid using the heap and instead their fields will be treated as individual values. The register allocator will normally place the broken-up object fields directly into registers, but if not enough free registers are available, the remaining fields will be placed on the stack. This situation is known as a stack spill.

To illustrate the power of eliminating heap allocations inside tight loops of code, run this program with and without EA enabled and inspect the activity of the garbage collector.

Because EA is enabled by default in modern JVMs, to do this, you need to disable EA by using the JVM switch -XX:-DoEscapeAnalysis.

Here is the garbage collection log with EA enabled (with some extraneous detail removed):

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
java -XX:+PrintGCDetails Rect 
Same area: 18073993 
Heap
 PSYoungGen total 95744K, used 13462K 
  eden space 82432K, 16% used
  from space 13312K, 0% used 
  to   space 13312K, 0% used 
 ParOldGen total 218624K, used 0K
  object space 218624K, 0% used
 Metaspace    used 2664K, capacity 4490K, committed 4864K, reserved 1056768K
  class space used 286K, capacity 386K, committed 512K, reserved 1048576K

The log shows that there were no GC events at all—instead, the log just contains the heap summary as the process exits. If you look at the GC Log from a run without escape analysis enabled, then things look quite different:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
java -XX:+PrintGCDetails -XX:-DoEscapeAnalysis Rect 
[GC (Allocation Failure) [PSYoungGen: 82432K->480K(95744K)] 82432K->488K(314368K),
0.0008348 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 82912K->464K(95744K)] 82920K->480K(314368K),
0.0007404 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]

[Many minor GC collections]

[GC (Allocation Failure) [PSYoungGen: 56352K->0K(55808K)] 56720K->368K(274432K),
0.0004405 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 55296K->0K(54784K)] 55664K->368K(273408K),
0.0004537 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Same area: 18080278 
Heap
 PSYoungGen total 54784K, used 46674K 
  eden space 54272K, 86% used 
  from space 512K, 0% used 
  to   space 512K, 0% used 
 ParOldGen total 218624K, used 368K 
  object space 218624K, 0% used
 Metaspace    used 2665K, capacity 4490K, committed 4864K, reserved 1056768K
  Class space used 286K, capacity 386K, committed 512K, reserved 1048576K

In this case, you can clearly see the GC events that are caused by allocation failure as the Eden area of memory fills up and needs to be collected.

Conclusion

The addition of EA to the Java HotSpot VM is a useful improvement. When EA was in development, an additional 3% to 6% performance increase in real-world tests was seen that was directly attributable to it.

However, for the developer who is also interested in the how and why of platform features, EA provides an interesting insight: it is a feature that depends upon another optimization (automatic inlining) and is essentially useless without it.

The low-level details and the source code of the JVM’s implementation can be found in opto/escape.hpp in the Java HotSpot VM source code. It is a modified form of the algorithm presented in the “Escape Analysis for Java” proceedings of the ACM SIGPLAN Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) conference in November 1999 by Jong-Deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreedhar, and Sam Midkiff.

Chris Newland

Chris Newland (@chriswhocodes) is a Java Champion. He invented and still leads developers on the JITWatch project, an open source log analyzer for visualizing and inspecting just-in-time compilation decisions made by the HotSpot JVM.

Ben Evans

Ben Evans (@kittylyst) is a Java Champion and Senior Principal Software Engineer at Red Hat. He has written five books on programming, including Optimizing Java (O'Reilly) and The Well-Grounded Java Developer (Manning). Previously he was Lead Architect for Instrumentation at New Relic, a founder of jClarity (acquired by Microsoft) and a member of the Java SE/EE Executive Committee.


Previous Post

For Faster Java Collections, Make Them Lazy

Mike Duigou | 12 min read

Next Post


Runtime Code Generation with Byte Buddy

Fabian Lange | 9 min read
Oracle Chatbot
Disconnected