O Tree, Where Art Thou

One of the more subtle aspects of javac syntax trees is that every tree node has position information associated with it. This information is used to identify the location of errors with the source text, and is used by IDEs when refactoring or reformatting code. Ensuring the information is accurate is tricky, and with a number of projects ongoing to update the Java language, and hence the trees used by the compiler, the time has come for some better test support in this area.

The new test is called TreePosTest, and can either be run by the jtreg test harness or as a standalone utility. It can read one or more files or directories given on the command line, looking for source files. It reads each file in turn, ignoring those with syntax errors (there's a lot of those in the javac test directories!) For each file, it scans the tree, checking invariants for the position information for every tree node. Any errors that are found are reported.

Each tree node has three nominal positions, identified as character offsets from the beginning of the file. (Older versions of javac used line-number/char-on-line coordinates, packed into a single integer.) The three positions are the start of the node within the source text, the end of the node within the source text, and a position used to identify that node -- typically the first non-white character in the source text that is unique to that node. The last of these is stored directly in every tree node. The start position is always available and is recursively computed by TreeInfo.getStartPos. The end position requires a table to maintained on the side; for performance reasons, this table is not normally enabled; it must be enabled if end positions are going to be required. When enabled, the end position for a node is recursively computed by TreeInfo.getEndPos, using the endPosTable. (Certain nodes also store an end position directly, when such a position may be required for an error message.)

Given these three positions, we can identify various invariants.

  • For any node: start <= pos <= end
  • Any node must be enclosed in its parent node: parent.start <= start && end <= parent.end
  • The position of a parent node should not be within any of its childen: parent.pos <= start || end <= parent.pos

The first surprise was that the test program found a number of faults within its own source text. Ooops. Running the test program over the source files in the langtools test/ directory, it found 6000 errors in over 2000 files. More ooops. Fortunately, many of those errors are repetitions, but what started as a proactive exercise to test new compiler code was turning out to have more payoff than expected.

I don't know about you, but I tend not to think in characters positions very easily, and error messages like the following leave a little to be desired:

test/tools/javac/treepostests/TreePosTest.java: encl.start <= start: encl:WILDCARD[start:9232,pos:9232,end:9246] this:TYPEBOUNDKIND[start:9224,pos:9224,end:9231]
? extends 
But, you know what they say: a picture is worth a thousand numbers, so the test program now has an optional GUI mode, in which it becomes clearer that the reported range for the parent wildcard node (in red) incorrectly omits the type bound kind (in green). In fact, the type bound kind and therefore the enclosing wildcard node both actually begin at the preceding '?'.

Here is another example. Here, it becomes clear that the position for the parent AND node is incorrectly within the expression on the right hand side of the &&, instead of at the && operator. In fact, this is an instance of a previously reported bug, 6654037.


Most of the issues that have arisen have been reasonably easy to fix, and bug fixes are already underway. However, there are some problem cases.
Enum declarations
These are desugared right in the parser into equivalent declarations of static final fields within the enclosing class. The question then becomes, what position information should be recorded for these desugared nodes. On the one hand, one might argue to use the "invalid position" constant, NOPOS, since these nodes do not directly correspond to source text, but on the other hand, it is important to record a position in case of errors. (See 6472751.)
Array declarations
Array declarations are complicated by support for legacy syntax, that allows constructions like:
int[] array[];
int[] f()[] { return null; }
A number of issues have been observed with the positions recorded for annotations, but which have not yet been fully investigated.
Currently, these issues are addressed by making allowances within the test program.


The test program can easily be applied to large code bases, such as JDK or JDK test suites. Despite some outstanding issues within javac, the test program has proved its worth in identifying errors within the existing javac code, and should prove useful in future to ensure that any support for new language features will also satisfy the expected invariants for tree positions. And even if the bar is not currently at 100%, at least we know where the bar actually is, by virtue of the specific allowances made in the test program.

Post a Comment:
Comments are closed for this entry.

Discussions about technical life in my part of the OpenJDK world.


« July 2014