Towards a multi-threaded javac

I just finished a vacation with my family, and in the in-between times, I made significant progress towards a multi-threaded javac.

Before you get too excited, let me qualify that by saying that there is some low-hanging fruit for this task, and there's a complete rewrite of the compiler. I'm only talking about the former; there are no plans to do the latter.

The difficulty with a multi-threaded javac is that the Java language is quite complicated these days, and as a result the compiler is also quite complicated internally, to cope with all the consequences of interdependent source files. The current compiler is not set up for concurrent operation, and adapting it would be error-prone and destabilizing. (For more details on the compiler's operation, see Compilation Overview on the OpenJDK Compiler Group web pages.)

The low hanging fruit comes by considering the compilation pipeline in three segments: input, process, and output. The source files can all be read and parsed in parallel, because there are no interdependencies there. (Well, almost none. More on that later.) Likewise, once a class declaration has the contents of its class file prepared internally, it can be written out in the background while the compiler begins to work on the next class.

We've known about this low hanging fruit for a while, and Tom Ball recently submitted a patch with a code for parsing source files in parallel. So, faced with a family vacation, I loaded up my laptop with the bits needed to explore this further.

Parallel parsing

If you're parsing files in parallel, the primary conflict is access to the Log, the main class used by the rest of the compiler to generate diagnostics. It is reasonably obvious that even if you're parsing the source files in parallel together, you don't want to see interleaved any diagnostics that might be generated: you want to see all the diagnostics for each file grouped together. Initially, I was thinking to create custom code in the parser to group parser diagnostics together, but since the scanner (lexer) can also generate diagnostics, it seemed better and less intrusive to give each thread its own custom Log that could save up diagnostics until they can all be reported together. The previous work on the Log class made it somewhat "hourglass shaped", with a bunch of methods which are used by the rest of the compiler to create and report diagnostics, and a back end that knows how to present the diagnostics that are generated. In between is a single "report" method, which was originally introduced to make it easy to subtype Log to vary the way that diagnostics are presented. Now, however, that method provided an excellent place to divide Log in two, into an abstract BasicLog, that provides the front end API used by the body of the compiler, and subtypes to handle the diagnostics that are reported. The main compiler uses Log it always did -- one of the Big Rules for the compiler is to minimize change -- but the threads for the new parser front end can now use a new subtype of BasicLog that buffers up diagnostics and reports them together when the work of the thread is complete.

This refactoring forced one other cleanup in Log, which was an ugly hangover from the introduction of JSR 199, the Compiler API. The Diagnostic objects that get created had an ugly hidden reference bug to the Log that created them, which if used incorrectly could provoke NullPointerExceptions or other problems if you tried to access the source code containing the diagnostic. For those that are interested, it's because of interaction with the Log.useSource method which sets temporary state in Log, but the bottom line is that one more refactoring later, the DiagnosticSource interface became a much better DiagnosticSource object, providing a much cleaner standalone abstraction for information about the source code containing the location of the diagnostic.

(Log used to be one big do-everything class; it has slowly been getting better over the years, and watch out for the upcoming exciting new work that Maurizio is doing to improve the quality and presentation of diagnostics. Luckily, these refactorings I'm describing here will not interfere with that work too much.)

There are some other shared resources used by the Parser: most notably, the compiler's name table, but these were easily fixed by synchronizing a few strategic methods.

That, then was sufficient for the first goal — to parse source files in parallel. :-)   Writing the class files concurrently was somewhat more interesting.

Background class generation and writing

Apart from the general refactoring for Log, the work to parse source files in parallel turned out to be very localized, almost to a single method in JavaCompiler, which is responsible for parsing all the source files on the command line. That one method can choose whether to parse the source files sequentially, as before, or in parallel. There is no such easy method for writing out class files. This is because the internal representation of a generated class file may be quite large, and the compiler pipeline is normally set up to write out class files as soon as possible, and to reclaim the resources used. Because of the memory issues and the primarily serial nature of the upstream pipeline, the general goal was not to write all the class files in parallel, but merely to be able to do the file IO in the background. Thus the design goal was to write classes using a single background thread fed by a limited capacity blocking queue, and so improving the flexibility of the compiler pipeline would improve the ability to write out class files in the background. In particular, it was also desirable to fix an outstanding bug such that either all the classes in a source file should be generated, or none should. The current behavior of generating classes for the contents of a source file until any errors are detected does not fit well with simple build systems like make and Ant that use simple date stamps to determine if the compiled contents of a class file are up to date with respect to the source file itself.

There were already some ideas for reorganizing the compiler pipeline within the main JavaCompiler class. Previously, a big "compile" method in JavaCompiler had been broken up into methods attribute, flow, desugar and generate, representing the different stages of processing for each class to be compiled. These methods could be composed in various ways depending on the compilation policy, which is an internal control within the compiler. The methods communicated via lists of work to be processed, and although the concept was good, it never paid off quite as well as anticipated because of the memory required to build all of the items on the lists before handing the list to the next stage. The latest idea that had been developing was to use iterators or queues to connect the compilation phases, rather than lists.

Another refactoring later, it turned out that queues were the way to go (as in java.util.Queue), because they fit the abstraction required and caused less change elsewhere in the compiler.

In a related improvement, the main "to do" list was also updated. Previously, it was just a simple list of items to be processed, using a simple javac ListBuffer. It was updated to implement Queue, and more importantly, to provide additional access to the contents grouped according to the original source file. This made it easier to process all the classes for a source file together, including any anonymous inner classes. Previously, anonymous inner classes were handled much later than their enclosing classes, because while top level and nested classes are discovered and put on the "to do" list very early, anonymous inner classes are not discovered until much later.

However, an earlier bug fix got in the way of being able to effectively complete processing the contents of a single source file all together.

Normally, the compiler uses a very lazy approach to the overall compilation strategy, advancing the processing of each class as needed, with a "to do" list to make sure that everything that needs to be done eventually really does get done. However, limitations in the pipeline precluded that approach in the desugaring phase. If the supertypes of a class are being compiled in the same compilation as the subtype, they need to be analyzed before the subtype gets desugared, because desugaring is somewhat destructive. The previous implementation could not do on demand processing of the supertypes, so instead the work on the subtypes was deferred by putting them back on the "to do" list to be processed later, after any supertypes had been processed, thus defeating any attempt to process these files together. The new, better implementation is simply to advance the processing of the supertypes as needed.

All this refactoring was somewhat easier to implement than to describe here, and again per the Big Rules, the work was reasonably localized to the JavaCompiler and ToDo classes, with little or no changes to the main body of the compiler. The net result is more flexibility in the compiler pipeline, with a better implementation of the code to generate code file by file, rather than class by class. And, to bring the story back to the original goal, it makes it easier adapt the final method of the pipeline so that it was do its work serially, or with a background queue for writing class files in the background. :-)

And now ...

So where is this work now? Right now, it's here on my laptop with me in a plane somewhere between Iceland and Greenland, so let's hope for a safe journey the rest of the way back to California. The work needs some cleaning up, and more testing, on more varied machines. I've been running all the compiler regression tests and building OpenJDK with this new compiler, and it looks good so far. Finally, it will need to be code reviewed, and pushed into the OpenJDK repositories, probably as a series on smaller changesets, rather than one big one. So watch out for this work coming soon to a repository near you ...

Comments:

The tools.jar bundled with JDK 1.6 contains the APIs needed for supporting JSR 199. However this Jar file will ~12M and comes with a lot of baggage thats not needed for just a Compiler API impl. Glassfish team and others like Tomcat/Jetty might benefit if the Compiler API impl classes are also distributed as a stand alone JAR file which will be much smaller in size.

Can you please refer me to the right contact for this enhancment?

Posted by Raju Uppalapati on July 06, 2008 at 03:57 PM PDT #

The big question is how much speed-up would "multi-threaded" javac get when executed on multi-core CPU? :)

Posted by Vladimir Sizikov on July 06, 2008 at 06:16 PM PDT #

Hi,

Good stuff, at least overcoming I/O latency!

I think that's cover some of what I was hoping for nearly 10 years ago!

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4229449

Rgds

Damon

Posted by Damon Hart-Davis on July 06, 2008 at 06:32 PM PDT #

@Vladimir: I did the work on a multi-core laptop, but the results so far are somewhat inconclusive. I can build the compiler (twice) in 9 seconds anyway and can build JDK in 10 minutes, but it's hard to tell how much of that is from javac. It did (informally) seem that more than one core was being used for some amount of time. There's interest in this work from various members of the build group, and I'm planning to do some test builds of JDK on the JPRT build farm. We'll see what results we get there. I think the bottom line is that since we don't want to rework the entire compiler, we'll take what benefits we can get. The work I described is reasonably low risk and reasonably well contained in the main and util packages, and so worth the relatively low effort has been expended so far. -- Jon

Posted by Jonathan Gibbons on July 06, 2008 at 11:47 PM PDT #

@Damon: Yes, I was aware of that bug report. Since that has been closed, the current work is being tracked under http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6629150
although we'll probably also create additional rfes for the related refactorings. -- Jon

Posted by Jonathan Gibbons on July 06, 2008 at 11:54 PM PDT #

Wouldn't it make more sense to optimize the compiler before multi-threading it? Javac has two serious problems: it's slow and it uses too much memory. Other compilers (jikes - much faster, ecj - 1/3 the memory) have proven that it's possible to compile java code faster with a lot less memory. Multi-threading an inefficient implementation just shoves the dirt under a rug, no?

Joe

Posted by Joe on July 07, 2008 at 02:59 AM PDT #

The first thing one notices when hacking javac is how inherently single-threaded and "primitive" it appears to be.

What's the reason not to use a parser-generator? Seems to me this would make it less painfull to evolve.
I never really knew whether to place state context in a member variables (since it's single threaded) or push it onto the stack frames. I assumed member variables are used to mitigate the expense of pushing too much state around since "by ref" is not possible in Java.

Posted by Casper Bang on July 07, 2008 at 03:45 AM PDT #

Enough rudery about javac!

Compared to (for example) the average C++ project, compiling Java is lightning fast for a number of reasons.

So I'm all in favour of grabbing the low-hanging fruit and avoiding breaking javac in subtle (and ultimately very expensive) ways.

Rgds

Damon

Posted by Damon Hart-Davis on July 07, 2008 at 05:44 PM PDT #

Post a Comment:
Comments are closed for this entry.
About

Discussions about technical life in my part of the OpenJDK world.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today