By rtenhove on May 15, 2007
So it wasn't surprising that, way back in 1989, I read with great interest an article in the ACM SIGPLAN Notices: the user manual for something called the Purdue Compiler Construction Tool Set, PCCTS 1.00B. This described a compiler generator that differed from things like LEX/YACC in a very important way: it generated vertical-descent parsers, just like a Real Human Being would (and like I had been doing for years). This made understanding the generated code very easy, and debugging the input grammar (which was converted into the parser code in a straightforward way) much easier than with YACC.
In 1989 I used PCCTS to create a mini-language that was used to describe high-level behaviours in a GUI framework I was creating at the time. (The language was actually part of the UI resources, allowing the developer to change UI behaviours without touching his C++ code.) PCCTS turned out to be an invaluable tool, since it let me create my mini-language without having to hand-craft the parser, which is a tedious, error-prone activity. Instead, I maintained a simple grammar description. As we gained more experience with the UI framework (used to create a family of interactive graphic language editors), I was able to adapt and extend the language to add more high-level behaviours, allowing the UI developers to specify what they wanted in a declarative fashion. It was very cool, and very effective.
Since that first exposure I've been following PCCTS, which, when it migrated to a Java-based implementation, became known as ANTLR (ANother Tool for Language Recognition, pronounced "antler"). ANTLR 2.0 was a major leap forward in capability (LL(k) parsing, as opposed to the LL(1) parsing that PCCTS 1.0 supported). I've used ANTLR periodically over the years. One interesting use was to take a text-based memory dump (from the Forte 4GL run-time), and convert it into HTML pages, making it much, much easier to explore the memory (hyperlinks are a lot easier to use than grep!) Of course, roll-your-own mini-languages have been rather fun to create, with the aid of ANTLR, over the years, and have crept into my work when need. (The Open ESB WSDL 2.0 API was generated from a high-level API description.)
Now, nearly 20 years after 1.00B was published, ANTLR 3 is nearly ready. This introduces LL(\*) parsing, another leap forward, allow arbitrary lookahead while parsing, while still allowing the parser to be efficient at run-time. I've been beta-testing a new book describing ANTLR 3, and it is a very nice exploration of language theory, practical parsing (and parser generator) issues, and the ins-and-outs of using ANTLR 3.
So if you're like me, and have an interest in language parsing, check out antlr.org. Tools, grammars, full source code, you name it. Enjoy!