Generating java byte code by building AST trees

There are several ways to generate byte code to run on JVM. You can write a .java file then compile it with javac, or write ASM similar code directly then compile it with tools like JASML, or with tools like BCEL it is even possible to generate your own class in runtime. Aside from these, a quite interesting approach is to construct AST nodes representing the structure of the java code, then generate byte code from that. Actually, this is what the javac does.

When javac compiles .java files into .class files, there is a two step process involved.
First is parsing. Javac reads in the source code, parses it, builds a tree structure representing the source code.
Second is code generation. The code generator takes the tree, acts upon it, produces the .class file.

These two steps are quite independent from each other, which makes it possible to replace either of the two without affecting the other.
So, to achieve our goal, we can create a tree ourself, then hand it to the code generator to generate code.
This is actually not a difficult task. The javac is very decently implemented, with a very clear separation between the two steps.

The javac source is located on the OpenJDK langtools repository, which hosts a series of tools like javadoc, javah etc.. Go to this link for more detail about langtools. If you have Mercurial installed, check out the code from, or if not, you can download an archived copy from this link as well. After you get the source, try to build and run it to make sure it works properly. Refer to this link for how to do this.

After getting the code, let's try to make a very simple javac tree for this file []

public class Test{
    public static void main(String[] args){

The code we are interested in are located located at:

  • src/share/classes/com/sun/tools/javac/parser/, which contains the parser related code.
  • src/share/classes/com/sun/tools/javac/main/, which contains main controller that calls and integrates the two steps.
  • src/share/classes/com/sun/tools/javac/tree/, which contains all the tree related classes.

There are four major types of node for our file:

  • .java file, which may contain more than one classes, that is parsed as
  • class, that is parsed as
  • method, this is parsed as
  • And the System.out.println("Hello!") method call, which is a combination of and

These classes all represent complicated data structures, to create instance of these them you need an instance of

For example, to create an instance of JCClassDecl, call this method

    public JCClassDecl ClassDef(JCModifiers mods,
                                Name name,
                                List<JCTypeParameter> typarams,
                                JCTree extending,
                                List<JCExpression> implementing,
                                List<JCTree> defs)

The names of the parameters are quite self explanatory. And most of them are not really needed here, like the third one- the type parameters. We are not using generics, so just leave it to be a blank list.

Two things worth noting here are:

First, the List here is not of java.util.List. It's instance of

Second, the parameter name is a special class for storing identifiers string in the parser. Refer to the attached source file for how to constructing it. For now, just think it as a String representing the name of the class.

So, as you can see, it is pretty easy to builds a tree for our simple class. I won't dwell on how to make the rest of the tree nodes, refer to for details, which creates the tree matches

Then after building the tree, next thing to do is to make the code generator to generate code for us.

However, javac is not designed to let you do this and there is no easy way to achieve this without some inelegant hacking.

What we want to do is to add an -XDxtest=true option, so when you invoke the javac against any file, it will still verify the existence of the file, but not try to parse it, rather, it takes the tree we build, thinks it as the product of the parser, hands it to the code generator, then write the class file to the disk.

All this can be done by switching two lines of code in


Find the method    

protected JCCompilationUnit parse(JavaFileObject filename, CharSequence content)

Then look for the following two lines:

    Parser parser = parserFactory.newParser(content, keepComments(), genEndPos, lineDebugInfo);
    tree = parser.parseCompilationUni

That's the only place the compiler interacts with the parser. All we need to do is to trick the compiler to take our tree in the second line.
Replace the two lines with

    if (Options.instance(context).get("xtest") != null) {
          DummyTreeMaker maker = new DummyTreeMaker(parserFactory);
          tree = maker.getTree();
    } else {
          Parser parser = parserFactory.newParser(content, keepComments(), genEndPos, lineDebugInfo);
          tree = parser.parseCompilationUnit();

Then build the workspace, create a blank file called, try to compile it with the javac you just built, like

javac -XDxtest=true

Run the generated file and you'll see "Hello!" printed on the screen.
As you can see, javac takes a blank java file, but uses our tree to generate code.

This is a pretty simple example, and is pretty much how javac parses your source file, although the real process is a little more complicated because the javac parser also has to generate line-info, process javadoc, do error report etc..

Now imagine we have a grammar, like this one in the Java Language Specification, and we embed java code into the grammar calling different method in TreeMaker to build different AST nodes as the grammar recognize different constructs. Then we have an automated parser doing the same thing as the javac. This is how the Compiler Grammar project works -- with the help of Antlr, a automatically grammar-generated parser building the same kind of AST trees as javac does. Refer to my previous post on how to build and run the Compiler Grammar project. What's more interesting, once we have this grammar, it can be used more than building trees -- code formatting, code translation etc. all made possible, just use some imagination :)

Download the source files:  This need to be put under src/share/classes/com/sun/tools/javac/parser/  Replace this one with the file with the same name located under src/com/sun/tools/javac/main. This file may not compile in the future, as the langtool source code is changed very often. If so, just locate the file and make changes as decribed above.


Now that you've been able to replace the parser with an ANTLR version, how about trying to replace the .class generation, too, using an ANTLR treewalker?

I'd love to see a comparison of the two approaches.

Posted by Andy Tripp on October 23, 2008 at 08:25 AM PDT #

Yeah, that would be an interesting experiment. And it should not be very hard to generate code that way (putting aside things like type check, optimization etc..)

But the problem is, the tree we built here is a javac tree(JCTree), which is specifically designed for javac. And to use Antlr's tree walker would require a tree built by Antlr's tree rewrite grammar.

In fact, we have thought about using the tree rewrite grammar to generate a JCTree, but that is just too complicated so we switched to embedded actions.

Posted by Yang Jiang on October 23, 2008 at 10:51 AM PDT #

This is a very interesting approach.

I thought may be I can add another twist to it with out the changing the original Javac Source Code.

Since there is only one place where the javac interacts with the parser, one can use AspectJ to inject the code to around the two lines

Parser parser = parserFactory.newParser(content, keepComments(), genEndPos, lineDebugInfo);
tree = parser.parseCompilationUnit();

and place any types of Javac Tree.

Posted by chester Chen on December 18, 2008 at 01:57 AM PST #

Hi guys ...can anyone explain me how to generate .class files from XML input by using ANTLR, if any threads regrading plz post me ....

Anand Ratan

Posted by Anand on January 21, 2009 at 05:59 PM PST #

Nice so see an example of the data structures of javac in But unfortunately the type is not available under Java 6 in tools.jar. ParserFactory make problems too. Would be nice if you update your example. Best regards, Chris

Posted by Christian on April 13, 2009 at 03:00 AM PDT #

How do you make Antlr generating the right javac tree? Can Antlr itself do that with some configurations? I`v looked at the Compiler Grammer project but I didn't figured out how that is done there.

Posted by Kamran on February 16, 2011 at 02:01 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

Yang Jiang


« December 2016