Monday Oct 20, 2008

Generating java byte code by building AST trees

There are several ways to generate byte code to run on JVM. You can write a .java file then compile it with javac, or write ASM similar code directly then compile it with tools like JASML, or with tools like BCEL it is even possible to generate your own class in runtime. Aside from these, a quite interesting approach is to construct AST nodes representing the structure of the java code, then generate byte code from that. Actually, this is what the javac does.

When javac compiles .java files into .class files, there is a two step process involved.
First is parsing. Javac reads in the source code, parses it, builds a tree structure representing the source code.
Second is code generation. The code generator takes the tree, acts upon it, produces the .class file.

These two steps are quite independent from each other, which makes it possible to replace either of the two without affecting the other.
So, to achieve our goal, we can create a tree ourself, then hand it to the code generator to generate code.
This is actually not a difficult task. The javac is very decently implemented, with a very clear separation between the two steps.

The javac source is located on the OpenJDK langtools repository, which hosts a series of tools like javadoc, javah etc.. Go to this link for more detail about langtools. If you have Mercurial installed, check out the code from http://hg.openjdk.java.net/jdk7/jdk7/langtools, or if not, you can download an archived copy from this link as well. After you get the source, try to build and run it to make sure it works properly. Refer to this link for how to do this.

After getting the code, let's try to make a very simple javac tree for this file [Test.java]


public class Test{
    public static void main(String[] args){
       System.out.println("Hello!");
    }
}


The code we are interested in are located located at:

  • src/share/classes/com/sun/tools/javac/parser/, which contains the parser related code.
  • src/share/classes/com/sun/tools/javac/main/, which contains main controller that calls and integrates the two steps.
  • src/share/classes/com/sun/tools/javac/tree/, which contains all the tree related classes.


There are four major types of node for our Test.java file:

  • .java file, which may contain more than one classes, that is parsed as com.sun.tools.javac.tree.JCTree.JCCompilationUnit
  • class, that is parsed as com.sun.tools.javac.tree.JCTree.JCClassDecl
  • method, this is parsed as com.sun.tools.javac.tree.JCTree.JCMethodDecl
  • And the System.out.println("Hello!") method call, which is a combination of com.sun.tools.javac.tree.JCTree.JCFieldAccess and com.sun.tools.javac.tree.JCTree.JCMethodInvocation.


These classes all represent complicated data structures, to create instance of these them you need an instance of
com.sun.tools.javac.tree.TreeMaker.

For example, to create an instance of JCClassDecl, call this method


    public JCClassDecl ClassDef(JCModifiers mods,
                                Name name,
                                List<JCTypeParameter> typarams,
                                JCTree extending,
                                List<JCExpression> implementing,
                                List<JCTree> defs)


The names of the parameters are quite self explanatory. And most of them are not really needed here, like the third one- the type parameters. We are not using generics, so just leave it to be a blank list.


Two things worth noting here are:

First, the List here is not of java.util.List. It's instance of com.sun.tools.javac.util.List.

Second, the parameter name is a special class for storing identifiers string in the parser. Refer to the attached source file for how to constructing it. For now, just think it as a String representing the name of the class.

So, as you can see, it is pretty easy to builds a tree for our simple class. I won't dwell on how to make the rest of the tree nodes, refer to DummyTreeMaker.java for details, which creates the tree matches Test.java.

Then after building the tree, next thing to do is to make the code generator to generate code for us.

However, javac is not designed to let you do this and there is no easy way to achieve this without some inelegant hacking.

What we want to do is to add an -XDxtest=true option, so when you invoke the javac against any file, it will still verify the existence of the file, but not try to parse it, rather, it takes the tree we build, thinks it as the product of the parser, hands it to the code generator, then write the class file to the disk.

All this can be done by switching two lines of code in

src/share/classes/com/sun/tools/javac/main/JavaCompiler.java.

Find the method    

protected JCCompilationUnit parse(JavaFileObject filename, CharSequence content)

Then look for the following two lines:


    Parser parser = parserFactory.newParser(content, keepComments(), genEndPos, lineDebugInfo);
    tree = parser.parseCompilationUni
t();

That's the only place the compiler interacts with the parser. All we need to do is to trick the compiler to take our tree in the second line.
Replace the two lines with


    if (Options.instance(context).get("xtest") != null) {
          DummyTreeMaker maker = new DummyTreeMaker(parserFactory);
          tree = maker.getTree();
    } else {
          Parser parser = parserFactory.newParser(content, keepComments(), genEndPos, lineDebugInfo);
          tree = parser.parseCompilationUnit();
    }


Then build the workspace, create a blank file called Test.java, try to compile it with the javac you just built, like

javac -XDxtest=true Test.java

Run the generated file and you'll see "Hello!" printed on the screen.
As you can see, javac takes a blank java file, but uses our tree to generate code.

This is a pretty simple example, and is pretty much how javac parses your source file, although the real process is a little more complicated because the javac parser also has to generate line-info, process javadoc, do error report etc..

Now imagine we have a grammar, like this one in the Java Language Specification, and we embed java code into the grammar calling different method in TreeMaker to build different AST nodes as the grammar recognize different constructs. Then we have an automated parser doing the same thing as the javac. This is how the Compiler Grammar project works -- with the help of Antlr, a automatically grammar-generated parser building the same kind of AST trees as javac does. Refer to my previous post on how to build and run the Compiler Grammar project. What's more interesting, once we have this grammar, it can be used more than building trees -- code formatting, code translation etc. all made possible, just use some imagination :)



Download the source files:

DummyTreeMaker.java  This need to be put under src/share/classes/com/sun/tools/javac/parser/DummyTreeMaker.java

JavaCompiler.java  Replace this one with the file with the same name located under src/com/sun/tools/javac/main. This file may not compile in the future, as the langtool source code is changed very often. If so, just locate the file and make changes as decribed above.




















About

Yang Jiang

Search

Categories
Archives
« October 2008 »
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
19
21
22
23
24
25
26
27
28
29
30
31
 
       
Today