Improving javac diagnostics

The Javac diagnostic system at a glance

The entry-point for the diagnostic system is the Log class. Log is used for reporting errors, warnings, notes, etc. throughout the whole compiler pipeline. When an error is reported to the Log class, the Log keep track of that by creating a new diagnostic object, whose contents are then printed to the output stream by means of a DiagnosticFormatter object. A diagnostic object is used for keeping track of several information about a given diagnostic, such as the message key (a locale independent string), the position in which e.g. the error occurred, the source file containing the error, and a list of arguments that can be used for parameterizing the contents of a given diagnostic. The resource file compiler.properties defines the locale-dependent version of a given diagnostic message. For example, compiler.properties contains the following definition:

compiler.err.non-static.cant.be.ref=\\
    non-static {0} {1} cannot be referenced from a static context

Those lines are used to link a locale-dependent string to the message key compiler.err.non-static.cant.be.ref. Here, {0} and {1} are placeholders for the actual diagnostic arguments. This means that when the diagnostic is to be formatted, the string representation of each diagnostic arguments gets substituted in the localized message to form the actual error message, as in:

non-static method foo() cannot be referenced from a static context

A diagnostic argument can in principle be any valid Java object, it could even be a diagnostic itself (useful in case of nested diagnostics)! This mechanism is simple yet flexible enough to allow complex manipulation of diagnostic arguments before the diagnostic itself gets rendered by the Log. There are, however, some problems that must be tackled in order to provide better support for javac diagnostics:

  • It's difficult to render diagnostics in a completely different way. This is due to the fact that (i) the current diagnostic formatter is quite monolithic (difficult to subclass) and (ii) any new formatter should preserve the raw output currently generated by the compiler when the -XDrawDiagnostics option is set (many compiler regression test rely on this format!)
  • Some diagnostic arguments are prematurely converted into strings. This means that the diagnostic object has e.g. a string representing a type as argument instead of the type object itself. This obviously makes it almost impossible to write a new formatter providing some special handling for complex types such as wildcards, captured types, and so on.
The former problem has been tackled by rewriting a brand new hierarchy of diagnostic formatters (see below) while the latter has been addressed by a nightmare cleanup work consisting in detecting (and handling properly) almost every early string conversion of diagnostic arguments present in the compiler pipeline (for those who are interested in details, please refer to bug 6717241)!

Refactoring diagnostic formatters

Before starting, a small historical note. Originally it was considered acceptable for Log to simply output a string. Then generics came along and diagnostics got more complicated; this problem got probably underestimated - as a result some cleanup of the worst problems has been made (the captured type diagnostic has been improved a bit). The javac type system has then been extended in order to support raw diagnostics, for better testing and, after that, Log and other classes were updated to use DiagnosticObject as required by JSR199. Formatters were just the last brick that has been introduced in order have more flexibility - this reformatting work can indeed be regarded as the next stage of a steady ongoing evolution!

The goal of this work has been to extend the capabilities provided by the current general purpose DiagnosticFormatter class with a brand new hierarchy of diagnostic formatters. We now have several diagnostic formatters (and maybe there's still room for more to come!):

  • AbstractDiagnosticFormatter - this abstract class provides the basic formatting capabilities that are shared by all the formatters in the hierarchy. As instance, it provides a set of visitor-like methods that can be overridden by derived classes in order to customize some parts of the formatting logic
  • RawDiagnosticFormatter - this formatter is in charge of generating the so-called raw format. As we said, this format is crucial for testing purposes; for this reason this class is final and therefore cannot be further extended and/or customized
  • BasicDiagnosticFormatter - this formatter is in charge of providing the same formatting capabilities that are currently delivered by javac. As such, it has localization support and all the other bells and whistles that you would expect from a standard javac diagnostic formatter.

What's the point of this refactoring? As you may notice, even with this new hierarchy in place, the diagnostic system works pretty much in the same way as it always did - which also means that some javac diagnostic are still as ugly as they used to be! That's true, but this refactoring allows to overcome some limitations of the old DiagnosticFormatter class in two ways: it's now easier to define a new diagnostic formatter (it suffices to define a new subclass of AbstractDiagnosticFormatter); moreover, thanks to the RawDiagnosticFormatter class, new formatters won't have to worry about preserving the raw format anymore (note that not doing so would result in breaking existing regression tests!). 

Composing formatters: a rich diagnostic formatter

A brand new kind of diagnostic formatter (internally dubbed rich formatter) will be available soon. The goals of this new formatters are twofold:

  • As Jon observed in his entry on diagnostics, the use of qualified names in javac diagnostics make the diagnostics less readable and too verbose. How many times have you failed to get the important bits of an error message as it was completely overwhelmed by two or three lines stuffed with qualified names?
  • As we said, javac diagnostic system suffers from a severe lack of integration with all the recent work that has been done in the type-system area. The result is that javac sometimes emits some very cryptic error messages (esp. the ones regarding wildcard capture and intersection types) that are very difficult to read for almost everyone but Java generic gurus. Formerly, both Peter and Alex have blogged about the need of presenting the contents of a javac diagnostic in a more structured way, in order to provide additional information about the tricky parts (usually type-system related) of an error message. Adding info necessarily ends up in making the diagnostic a bit longer; but I'm sure that this is a price that it's worth to pay for, and the extra verbosity should be definitively regarded as a structured, good one.

The rich formatters act like a diagnostic filter; it processes the contents of a given diagnostics by adding some additional information or, perhaps, by removing some unnecessary qualified name, and then defers the task of producing the diagnostic output to a delegate formatter (it could be either a BasicDiagnosticFormatter or a RawDiagnosticFormatter). The results are quite impressive; consider the following example:

class Foo<T extends String> {
  <T extends Integer> void foo(T t) {           
      test(t);
  }
  void test(T t) {}
}

This is the output currently generated by javac:

Test.java:6: test(T) in Foo<T> cannot be applied to (T)
      test(t);
      \^
1 error

While this is the output generated by our new rich formatter:

Test.java:3: method test in class Foo<T#0> cannot be applied to given types
    test(t);
    \^

required: T#0
found: T#1
where T#0,T#1 are type­ variables:
 T#0 extends String
  (declared in class Foo)
 T#1 extends Integer
  (declared in method <T>foo(T))     
1 error

As you can see there's a lot of really helpful stuff in this message:

  • The important bits now come at the very beginning of this error message (this is because the localized error message in the resource file has been refactored)
  • The source line where the error occurred is now reported immediately after the first line of the error message. This gives the programmer an immediate and valuable feedback
  • Qualified names (e.g. java.lang.Integer) have been replaced by simple names (e.g. Integer)
  • For each type-variable javac now generate additional info containing e.g. the declared bound of a type-variable, its declaration site, and so on. Note that this extra info is crucial here as it allow us to disambiguate between the two type-variable named T!

Don't worry: not all the messages will become that verbose; first of all, non-generic error messages are left more or less unchanged (but qualified names are dropped in favour of simple names!); moreover we plan to add a compiler flag in order to control the diagnostic verbosity level, so that the user can increase the verbosity on a by-need basis, when further info about a given error messsage is required.

Future directions

There's really good stuff going on in the javac diagnostic system. What next? Well, first of all I have to work a bit more in order to make all such things stable and usable. There is still room for some minor improvements e.g. the layout of the where clauses, the layout of some specific error messages, etc. Any suggestion is really welcome here! The real challenge will be to see IDEs exploiting those new diagnostics - that's why I'll soon be working on a different kind of basic formatter capable of generating XML output instead of plain text. Stay tuned for the latest news on this work and don't forget to check the OpenJDK repositories now and then: your most-hated error messages could radically change sooner than you expect!

Thanks to Jon and Alex for their useful comments on this work

Comments:

This is really great news!

Improving javac's diagnostics is top of the list of things I think would improve programmers' experience with generics in particular.

One comment about 'cannot be applied to' by the way - I've noticed that particular terminology sometimes causes confusion since many Java programmers are more used to thinking in terms of methods being 'called' or 'invoked' with arguments of given types. So I wonder if a slightly more friendly version might begin with something like:

Test.java:3: method test in class Foo<T#0> cannot be invoked with given types
...

Another thing that may be worthwhile at some point would be to improve the consistency of the diagnostics. For example, if your class Foo extended Bar<T extends String> which also declared void test(T t), I think the message would currently be 'cannot find symbol' rather than 'cannot be applied'. Perhaps that's the kind of thing the community could pick up though, once your improvements have made it into OpenJDK ;)

Posted by Mark Mahieu on August 10, 2008 at 02:58 AM BST #

Thanks Mark, I'm glad that you like the new improvements! Some replies to your questions:

\*) The 'cannot be applied to' message can definitively become even more user-friendly, following your suggestion and replacing 'applied' with 'invoked'/'called'. All these diagnostic-specific improvements will be addressed at a later point as right now we are concerned about 'global' improvements, that is, fixing stuff that can affect more diagnostics at once!

\*) The kind of example you mention ('cannot.found.symbol' instead of 'cannot.apply.symbol') will be fixed as soon as the rich diagnostic formatter will be available in the repository. Fixing messages coming out from the resolution process has been one of the hot-areas of this work on diagnostics.

\*) There are some very diagnostic specific issues that could be picked up by the community; for anyone interested in contributing, a good starting point for is, again, http://bugs.sun.com/view_bug.do?bug_id=6492019 where I keep track of all the issues involving javac diagnostics.

Posted by Maurizio on August 11, 2008 at 07:16 AM BST #

You mentioned a resource file "compiler.properties". Where would I find a copy of that?

Posted by john baker on June 15, 2009 at 07:03 AM BST #

the simplest way to get a copy of compiler.properties is to get a copy of the langtools workspace (with Mercurial) - the file is available at the following URL:

<LANGTOOLS-ROOT>/src/share/classes/com/sun/tools/javac/resources/compiler.properties

where <LANGTOOLS-ROOT> points to the folder containing your local copy of the langtools workspace.

In order to get a copy of the langtools workspace, assuming that you have Mercurial up and running, simply type:

hg clone http://hg.openjdk.java.net/jdk7/tl/langtools

I hope this helps.

Posted by Maurizio on June 18, 2009 at 10:30 AM BST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Maurizio Cimadamore is a member of the langtools team based in Santa Clara, CA. His efforts are mainly focused on the type-system area of the Java compiler.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today