Book review: Crafting Interpreters

October 22, 2021 | 6 minute read
Text Size 100%:

Get the lowdown on how to create true interpreters using abstract-syntax trees and fast interpreters that use bytecode

Download a PDF of this article

Book review: Crafting Interpreters
     Crafting Interpreters
     By Robert Nystrom
     640 pages
 

Few are the programmers who love their craft and have not dreamed once of creating their own language. The few who have set out on such a project, either in a comp-sci class in school or on a journey of personal exploration soon find their enthusiasm foundering in the shoals of parsers, intermediate representation, execution, or code generation. Put another way, the journey is always far more difficult and longer than the initial enthusiasm will support.

A contributing factor to the innumerable permanently stalled language projects is that there is little guidance available beyond academic treatments that are so dry and exercise-oriented that they kill motivation even earlier in the process. What is needed is a book that goes through a language project, presenting only as much theory as is needed, and written not by an academic but rather by a practitioner.

Crafting Interpreters is just such a book. It’s written by Robert Nystrom, who is part of the team behind the Dart language, which is a client-optimized app language. Nystrom previously wrote a well-received book on game programming, so he admirably fits the role of nonacademic practitioner. He is also an excellent writer and talented illustrator, as I shall explain in a moment.

You will notice that the book’s title is in the plural, and sure enough Nystrom gives the lowdown on creating two different interpreters of the same language. The first is a true interpreter, in the sense that it creates an abstract-syntax tree (AST) of a program and then executes the program by walking this tree, executing each statement as it goes. (As Nystrom explains, this is the typical approach used in school projects, but it is too slow for most nonstudent interpreters.) It is written in Java. The second interpreter, written in C, compiles the program to bytecode, which it then executes in a virtual machine. This latter approach is similar to how the JVM executes bytecode.

Both interpreters work on the same language, called Lox, which is a Java-like language that will be immediately familiar to most developers. Lox is not a trivial toy idiom: It supports classes, inheritance, first-class functions, and other high-end features. As you’ll come to see, this is characteristic of Nystrom’s book: Where he could have chosen to make his work easier by presenting fewer features, he chooses the other path and presents complex topics in absorbing detail.

Another example of this diligence might be the chapter on ASTs. Whereas academic volumes would load you with theory, Nystrom explains how Lox can be represented as a grammar and then how that grammar can be mapped to a tree-based representation. He then detours into a discussion of what you’ll need to know to work with trees: how to conceive of trees as stand-ins for language, how to use the Visitor pattern, and then how to pretty-print the trees. By the time you’re done with this introduction to trees, you’re truly ready to work with ASTs.

Nystrom then pivots to the parser (he’s already shown the lexer) and discusses his choice of recursive descent. He delves into topics academic books often gloss over in parsing, such as how to handle syntactical errors from within the parser. Later, he does the same thing for runtime errors. Successive chapters cover variable declarations, evaluating expressions, control flow, and, of course, how to handle function calls. This all leads up to the necessary treatment of object-oriented programming: how to create and interpret classes and single inheritance. At this point, you’re 235 pages into the book, and you’ve finally completed your first interpreter, jlox (Java Lox). That’s the good news. The bad news is you’re not yet halfway through.

The rest of the book focuses on the bytecode interpreter, which requires construction of a compiler to generate the bytecode and a virtual machine to execute the latter. Having already explained lexing, parsing, and ASTs, Nystrom is free to get into the nitty-gritty of bytecode design and generation—and then execution. As before, he does not avoid the difficult problems, so he spends a lot of time working on garbage collection, among other topics. Also, because the raison d’être for the bytecode version is performance, Nystrom revisits topics and techniques from the first interpreter and redoes them to provide the speed he’s looking to achieve. By this means, he fills out the material he presented earlier.

In sum, by the time you get to the end of the C-based bytecode interpreter (named clox), you have acquired a deep, working understanding of interpreters.

The journey is long (the book weighs in at more than 600 pages) but greatly facilitated by Nystrom’s approachable and informal style. He also knows his audience well: In multiple places he says, “Let’s skip the design part for the moment and just jump into the code.” Then, as the chapter continues, it becomes clear that the original naïve implementation is not suited to the task, and he goes back and reworks the original code—this time with the design and theory fully present. It’s an engaging way to present material and certainly depicts the bad habits many developers indulge for private projects.

Another aspect of note is the frequency of hand-drawn illustrations. Almost every topic uses engaging illustrations to depict the concepts Nystrom is trying to convey.

As you can surmise from my review so far, this is an excellent book. In fact, I would rank it as one of the two or three best programming books of the last 10 years. It’s a labor of love on Nystrom’s part and every corner of it radiates the care and thoughtfulness he put into it—and the deep concern that the reader be kept engaged throughout.

Before you rush out and get it, though, there are a few caveats. This is a book designed to be read sequentially; it’s not one in which you can just open to a topic of interest and start reading. Each topic builds on previous explanations and code. Because of this, you can’t really skim over much of the book (where you can, Nystrom tells you); if you do, you’ll come to places where you won’t understand the choices made. In other words, this is a book that strictly adheres to its mission of guiding you through the process of crafting interpreters, rather than simply reading about them.

Finally, you need to have a few years of programming experience and ideally some exposure to C, unless you intend to implement only jlox. If you’re unsure of your qualifications, the book is available free to read online at craftinginterpreters.com. (The PDF version and the printed volume require payment.)

My only gripe is the code. Nystrom was faced with a common problem: how to keep the code and explanations in sync as he added and deleted lines of code. Nystrom solved it by inserting numerous single-line comments before and after each snippet, so he could automate pulling them from the codebase into the book. The problem with this approach is that the code is then riddled with these directives, so it’s frustrating to read. This is a minor problem, however, because all the code is ultimately presented in the book and it’s likely that the codebase will be used principally for compilation and execution rather than for study.

It is rare to see a book so expertly written, typeset, and illustrated that so thoroughly explores a single topic. The explanations are perfectly tuned to the reader, they don’t avoid difficult topics, and they are pellucidly clear. If language internals, interpreters, and virtual machines interest you, you will find no better book than this one. It’s my vote for book of the year.

Dig deeper

Andrew Binstock

Andrew Binstock (@platypusguy) is the lead developer on the Jacobin JVM project—a JVM written entirely in Go. He was formerly the editor in chief of Java Magazine, and before that he was the editor of Dr. Dobb’s Journal. Earlier, he cofounded the company behind the open source iText PDF library. He lives in Northern California with his wife, and when he’s not coding, he studies piano.


Previous Post

Primitive data types in Java are a matter of precision

Simon Roberts | 9 min read

Next Post


Autoscale Oracle WebLogic Server for Oracle Container Engine for Kubernetes

Watsh Rajneesh | 24 min read