The parent of JUnit and creator of TDD discusses programming and testing—and how his views on testing have evolved.
November 1, 2016
Kent Beck developed extreme programming (XP) from which many of today's Agile programming practices emerged. He popularized unit testing and TDD, and he was the original co-author of JUnit. He was one of the original signatories of the Agile Manifesto. Beck is interviewed here by Java Magazine editor, Andrew Binstock.
Binstock: I understand you work at Facebook these days. What is it that you do there?
Beck: I am focused on engineer education. My official title is technical coach, and that means what I do most days is pair program and talk with engineers.
Binstock: Are these typically seasoned engineers or those just recently entering the field?
Beck: All sorts. What I find if I coach a bunch of engineers at a given level, I’ll start spotting patterns among whatever bottleneck they’re hitting, and frankly, I get bored telling the same stories and addressing the same issues. So I’ll write a course that addresses those issues. We have an organization that’s very, very good at cranking lots of engineers through the course. So we have courses for new college graduates; we have a course for people making the transition to technical leadership; we have a course for technical leaders hired in from the outside, because Facebook culture is very, very different, and if you are used to leading by giving commands that other people obey, that’s not going to work.
Binstock: When you’re working in a place like Facebook, you’re probably seeing a different kind of scaling dimension than most developers encounter. So what changes there? If I were to ask how your review of programming was informed by the concerns of scaling, what would you say is different?
Beck: It’s a great question because it’s really hard to boil it down, so I can give you some specifics. Logging is far more important. Performance, in some cases, is far more important. A tiny little performance regression can bring the entire site down. Because we’re trying to operate very efficiently in terms of capital and also in terms of CPUs and bandwidth and everything, there’s very little headroom sometimes. So, for certain teams’ performance, there’s a lot to lose, as well as a little bit to gain, and that’s, I think, unusual. Logging is all about being able to debug after something horrible goes wrong. In classic extreme programming style, you aren’t going to need it (YAGNI), so you don’t write it. Well, here you are going to need it, so you do write it. Even if you don’t end up ever needing it, you still need it.
Binstock: I see.
Beck: You need the option of being able to post-mortem a service, and that option’s worth a lot in a way that I just had never seen before.
Binstock: How about when you commit code to the main trunk? I would imagine that the amount of testing that’s applied to that code before that ever gets checked into the main build is probably significantly greater than at typical business sites. Is that true, too?
Beck: That is not true as a blanket statement. There’s a principle I learned from an economics professor called reversibility. Say you have a complicated system that reacts unpredictably to stimuli. Henry Ford built these unprecedentedly large factories, complicated systems in which a tiny little change could have huge effects. So his response to that was to reduce the number of states the factory could be in by, for example, making all cars black. All cars weren’t black because Henry Ford was a controlling tightwad. It was simply so that the paint shop either had paint or it didn’t.
That made the whole thing easier to manage. Well, Facebook can’t reduce the number of states Facebook is in. We want to keep adding more and more states. That’s how we connect the world. So instead of reducing the number of states, we make decisions reversible. In Henry Ford’s factory, once you cut a piece of metal, you can’t uncut it. Well, we do the equivalent of that all the time at Facebook. If you make a decision reversible, then you don’t need to test it with the kind of rigor that you’re talking about. You need to pay attention when it rolls out and turn it off if it causes problems.
Binstock: That’s an interesting alternative approach.
Beck: Well, there’s a bunch of counterexamples. For example, code that handles money does go through extraordinary rigor, or the Linux kernel goes through extraordinary rigor because it’s going to be deployed on hundreds of thousands of machines.
But changes to the website, you get feedback lots of different ways. You get feedback by testing it manually; you get feedback by using it internally. You get feedback by rolling it to a small percentage of the servers and then watching the metrics, and if something goes haywire, then you just turn it off.
Binstock: So nonreversible decisions get the heavy rigor and perhaps extreme testing, and everything else rides much more lightly in the saddle because of the reversibility.
Development of JUnit
Binstock: Let’s discuss the origins of JUnit. This has been documented a lot in various videos that you’ve made. So rather than go through it again, let me ask a few questions. How was the work initially divided between you and Erich Gamma?
Beck: We pair-programmed everything.
Binstock: So you guys were both involved throughout the entire project?
Beck: We literally did not touch the code unless we were both sitting together—for several years.
Binstock: Were you using a form of TDD at the time?
Beck: Yes, strictly. We never added a feature without a broken test case.
Binstock: OK. So how did you run the tests prior to JUnit being able to run tests?
Beck: By bootstrapping. It looked ugly at first. You might be working from the command line, and then very quickly, you get enough functionality that it becomes convenient to run the tests. Then every once in a while, you break things in a way that gives you a false positive result, and then you say, “All the tests are passing, but we’re not running any tests because of whatever change we just made.” Then you have to go back to bootstrapping. People should try that exercise. That is an extremely informative exercise, to bootstrap a testing framework test using itself.
Binstock: The only thing you had before that was SUnit [a JUnit precursor written by Beck for Smalltalk]. That didn’t seem like it was going to be very helpful in writing JUnit except on a conceptual level.
Beck: No, no, we started over from scratch.
Binstock: What role you did you have in JUnit 5? As I understand it, you were not significantly involved in this release.
Beck: Yes, I think no involvement whatsoever is probably the closest. Actually, I think at one point, they were talking about how to make two different kinds of changes in one release, and I said, by making them two different releases. So one piece of parental advice, and that was it.
Binstock: Do you still work on strictly a test-first basis?
Beck: No. Sometimes, yes.
Binstock: OK. Tell me how your thoughts have evolved on that. When I look at your book Extreme Programming Explained, there seems to be very little wiggle room in terms of that. Has your view changed?
Beck: Sure. So there’s a variable that I didn’t know existed at that time, which is really important for the trade-off about when automated testing is valuable. It is the half-life of the line of code. If you’re in exploration mode and you’re just trying to figure out what a program might do and most of your experiments are going to be failures and be deleted in a matter of hours or perhaps days, then most of the benefits of TDD don’t kick in, and it slows down the experimentation—a latency between “I wonder” and “I see.” You want that time to be as short as possible. If tests help you make that time shorter, fine, but often, they make the latency longer, and if the latency matters and the half-life of the line of code is short, then you shouldn’t write tests.
Binstock: Indeed, when exploring, if I run into errors, I may backtrack and write some tests just to get the code going where I think it’s supposed to go.
Beck: I learned there are lots of forms of feedback. Tests are just one form of feedback, and there are some really good things about them, but depending on the situation you’re in, there can also be some very substantial costs. Then you have to decide, is this one of these cases where the trade-off tips one way or the other? People want the rule, the one absolute rule, but that’s just sloppy thinking as far as I’m concerned.
Binstock: Yes, I think perhaps one of the great benefits of more than two decades of programming experience is the great distrust in one overarching rule that’s unflinchingly and unbendingly applied.
Beck: Yes. The only rule is think. IBM had that right.
Binstock: I recall you saying that certain things, like getters and setters, really don’t have to be written test-first.
Beck: I think that’s more specific than what I said. It’s always been my policy that nothing has to be tested. Lots of people write lots of code without any tests, and they make a bunch of money, and they serve the world. So clearly, nothing has to be tested.
There are lots of forms of feedback, and one of the factors that goes into the trade-off equation for tests is: What’s the likelihood of a mistake? So if you have a getter, and it’s just a getter, it never changes. If you can mess that up, we have to have a different conversation. Tests are not going to fix that problem.
Binstock: When you originally formulated the rules for TDD, one of the cornerstones was that each iteration should have the smallest possible increment of functionality. Where did that view come from? What was important about the smallest possible increment?
Beck: If you have a big, beautiful Italian salami, and you want to know how long it’s going to take to eat the whole thing, an effective strategy is to cut off a slice and eat it and then do the arithmetic. So, 1 millimeter takes me 10 seconds, then 300 millimeters are going to take me 3,000 seconds—now maybe more, maybe less. There may be positive feedback loops, negative feedback loops, other things to change that amount of time, but at least you have some experience with it.
The difference between a journeyman programmer and a master, from my perspective, is that the master never tries to eat the whole salami at once. Number one, they always take some big thing, and they put it into slices. That’s the first skill—figuring out where you can slice it.
The second skill is being creative about the order in which you consume the slices, because you might think you have to go left to right, but you don’t. Somebody says, “Well, you have to write the input code before you can write the output code.” I say, “I respectfully disagree. I can build a data structure in memory and write the output code from that.” So I can do input and then output, or output and then input.
If I have n slices, I have n-factorable permutations of those slices, some of which don’t make any sense but many of which do. So the two skills of the master programmer are slicing thinner slices and considering more permutations of the slices as the order of implementation. Neither of those skills ever reaches any kind of asymptote. You can always make thinner slices, and you can always think of more orders in which to implement things to serve different purposes.
If you tell me I have a demo on Friday, I implement things in a different order than if you tell me I have to run a load test on Friday, same project. I’m going to slice it differently, and I’m going to implement the slices in a very different order depending on what my next goal is.
Binstock: So the smallest possible increment is a general rule to apply when there are other factors that don’t suggest thicker slices?
Beck: I don’t believe in the smallest possible slice. I believe in figuring out how to make the slices smaller. As small as I think I’ve gotten the slices, I always find some place, some way to make them half that size, and then I kick myself: “Why didn’t I think of this before? That’s not one test case; that’s three test cases.” Then progress goes much more smoothly.
Binstock: Well, if it hews very closely to the single responsibility principle, it seems to me that you could have the same dynamic there, where methods do just very, very small operations, and then you have to string together thousands of little tiny BB-sized methods, and figure out how to put them together.
Beck: That would be bad because if you had to open 40 or 50 classes, I would argue that violated cohesion at some point, and that’s not the right way to factor it out.
Binstock: I think we’re heading the same way—that there is a limit at which cutting the slice even thinner doesn’t add value and starts to erode other things.
Beck: Well, today, and then I sleep on it, and I wake up in the morning and go, “Why didn’t I think of that? If I slice it sideways instead of up and down, clearly it works better.” Something about walking out the door on Friday afternoon and getting in my car to go home—that was the trigger for me.
The Coding Process
Binstock: Some years ago, I heard you recommend that when developers are coding, they should keep a pad of paper and a pen beside them and write down every decision that they make about the code as they’re doing it.
You suggested that we would all be startled by the number of entries we would make, and that is exactly what happened with me. The more I looked at those lists that I put together, the more I realized that when interruptions occur, the ability to reconstruct the entire world view that I had before the interruption occurred depends a lot on being able to remember all of these microdecisions that have been made. The longer the gap, the more difficult it is even to consult the list to get those microdecisions back.
I’m wondering, have your thoughts on recording those microdecisions evolved in any way in which you can make that list useful rather than just having it be an exercise in coding awareness?
Beck: No, no, it hasn’t. One of the things—and I’ve written about this—is that I’m having memory problems. So I have trouble holding big complicated, or even small, programs in my head. I can be a pair-programming partner just fine because I can rely on my partner’s memory, but me sitting down and trying to write a big complicated program is just not something I can do anymore.
I can still program, though, on the UNIX command line because I can see the whole thing. So as long as it’s a one-liner and I can build it, like, one command at a time, then I can accomplish programming tasks, but it’s not maintainable code. It’s all one-off codes. I do a lot of data mining. So if you said, build a service to do X, that’s just not—different people age in different ways at different rates, and so on—but that’s just not something that I can independently take off and do anymore, which is frustrating as hell.
Binstock: As I think about what you’re talking about and I think about my own efforts to record those microdecisions, there’s a certain part of me that has a new appreciation for things like Knuth’s literate programming where you can actually, in the comments, capture what it is you’re doing, what you’re trying to do, and what decisions you’ve made about it. Actually, I worked that way for a while after hearing your discussion of this particular discipline. In some ways, it was helpful. In other ways, it created a lot of clutter that ultimately I had to go back and pull out of the comments. So the only reason I brought it up was just to see if you had gone any further with that.
Beck: What I find with literate programs is they just don’t maintain very well because there’s so much coupling between the prose and diagrams and the code. I’ve never put it in those terms before, but that’s exactly right. If I make a change to the code, not only do I have to change the code and maybe the test, but I also have to change these four paragraphs and those two diagrams, and I have to regather this data and render it into a graph again. So it’s not efficiently maintainable. If you had code that was very stable and you wanted to explain it, then that wouldn’t come into play, and it would make sense again.
Binstock: I had a conversation with Ward Cunningham in which he talked about pair programming with you many years ago and how surprised he was by how frequently you guys would come to a decision point and the tool by which you moved forward was by asking, “What is the simplest possible thing we can do that will get us past this particular point?” If you always work from the simplest possible thing, do you not, at some point, have to go back and refactor things so that you have code that you can be proud of rather than code that short-circuits the problem? How do you balance those two things?
Beck: Sure. So I don’t get paid to be proud. Like in JUnit, we wrote code that we’d be proud of or we didn’t write the code. We could make that trade-off because we had no deadlines and no paying customers.
But on a regular basis, if I’m not proud of the code but my employer is happy with the results, yes. They call it work, and I get paid to do it. So there are other reasons to clean up.
The answer is sure, you’re going to make locally optimized decisions because you just don’t know stuff, and then you do know stuff and once you learn, then you’re going to realize the design should’ve been like this and this and this instead. Then you have to decide when, how, and whether to retrofit that insight into your current code. Sometimes you do and sometimes you don’t.
But I don’t know what the alternative is. People say, “Well, aren’t you going to have to go refactor?” Well, sure. So what’s the alternative?
I remember I gave a workshop in Denmark, and I gave a day-long impassioned speech about the beauties of iteration. At the end of the day, this guy had been sitting in the front row the entire day looking at me with an increasingly troubled expression—worse, and worse, and worse. He finally raised his hand just before the time was up, and he said, “Wouldn’t it be easier to do it right the first time?” I wanted to hug him. I said, “With all the compassion I have in me, yes, it would. I don’t have any response other than that.”
Binstock: Lovely question!
Beck: I sat next to Niklaus Wirth on an airplane once. I talked to the agent. I told him we were colleagues and would he please move me, and so I’m like a stalker—I fanboy’ed him. I don’t mind. If you get a chance to sit next to Niklaus Wirth, you’re going to do it. So we got to talking, and I told him about TDD and incremental design, and his response was, “I suppose that’s all very well if you don’t know how to design software.”
Binstock: That sounds like the type of thing Wirth is known to say.
Beck: You have to say, “Well, yes, I don’t know how to—congratulations! You do know. I don’t. So what am I supposed to do? I can’t pretend I’m you.”
Binstock: Let’s discuss microservices. It seems to me that test-first on microservices would become complicated in the sense that some services, in order to function, will need the presence of a whole bunch of other services. Do you agree?
Beck: It seems like the same set of trade-offs about having one big class or lots of little classes.
Binstock: Right, except I guess, here you have to use an awful lot of mocks in order to be able to set up a system by which you can test a given service.
Beck: I disagree. If it is in an imperative style, you do have to use a lot of mocks. In a functional style where external dependencies are collected together high up in the call chain, then I don’t think that’s necessary. I think you can get a lot of coverage out of unit tests.
Binstock: Today, the UI is so much more important than at any previous time. How did you unit-test UIs in the past and today? Were you using things like FitNesse and other frameworks, or were you just eyeballing the results of the tests?
Beck: I never had a satisfactory answer. Let me put it that way. I tried a bunch of stuff. I built integration testing frameworks, I used other people’s tools, I tried different ways of summarizing what a UI looked like in some test-stable way, and nothing worked.
Binstock: Today, you’re pretty much in the same position, aren’t you?
Beck: Yes, I haven’t seen anything that fundamentally changes. It’s all about false positives and false negatives. What’s the rate at which your tests say everything’s OK, and everything’s broken? That damages your trust in your tests. How often does the testing framework say something’s wrong, and everything’s fine? Very often, one pixel changes color very slightly, and then the tests all break, and you have to go through one at a time and go, “Oh, yeah, no, this is fine.” And you’re not going to do that very many times before you just blow off the tests.
Binstock: The cost is the lost time and the lost trust.
Binstock: What does your preferred programming environment look like today, whether that’s home or work?
Beck: The things I do that look like programming, I do entirely on either the UNIX command line or in Excel.
Binstock: In Excel?
Beck: Yes, because I can see everything.
Binstock: How do you mean?
Beck: So, like the transformations, I do the data transformations, like numbers to numbers on the UNIX command line, and then I render them into pictures using Excel.
Binstock: When you’re programming things that are not related to data mining, you mentioned earlier that you still use Smalltalk for exploratory things.
Beck: Yes, the advantage of Smalltalk for me is I memorized the API long enough ago that I still have access to all those details.
Binstock: Do you typically work with multiple screens?
Beck: Yes, the more pixels, the better. It was a great Terry Pratchett quote, he says, “People ask me why I have six screens attached to my Mac, and I tell them it’s because I can’t attach eight screens.” Oculus or some kind of virtual reality is just going to blow that out of the water, but nobody knows how.
Binstock: We’ll have to go through a number of iterations of things like that before virtual reality actually finds a role that’ll help with the coding.
Beck: Yes, I’m a big believer in getting rid of textual source code and operating directly on the abstract syntax trees. I did an experimental code editor called Prune with my friend Thiago Hirai. It looked like a text editor and it rendered as a text editor would render, but you could only do operations on the abstract syntax trees, and it was much more efficient, much less error-prone. It required far less cognitive effort. That convinced me that’s the wave of the future, and I don’t know if it’s going to be in 5 years or 25 years, but we’re all going to be operating on syntax trees sometime soon.
Binstock: Yes, of all the things that have changed and moved forward, the requirement that we still code at an ink-and-paper level hasn’t really moved forward very much.
Beck: No, we’re coding on punch cards. It’s rendered one on top of the other, but it’s the same darn stuff.
Binstock: The initial place of programmer activity hasn’t evolved very much at all. Despite having wonderful IDEs and things of that sort, the physical act is still very much the same. One last thing: I know you’re a musician. Do you listen to music when you code?
Binstock: What kind of music do you find that you most enjoy coding to?
Beck: I use it to kind of regulate my energy level, so if I’m a little activated, then I’ll listen to something soothing. And my go-to for that is Thomas Tallis’ The Lamentations of Jeremiah, which is a very flowing vocal quartet kind of medieval music. If I’m a little low and I need picking up, then I listen to go-go music, which is an offshoot of funk native to Washington, DC.
Binstock: OK. I’ve never heard of that.
Beck: That’s my upping music.
Binstock: Wonderful! Thank you!