Shelling Tests

v. tr.To remove from a shell.
v. intr.To shed or become free of a shell.

jtreg provides support for various different types of tests. API tests can use "@run main", simple compiler tests can use "@compile", and simple GUI or applet tests can use "@run applet". And then there's the ubiquitous "misc" or "other" category, for which "@run shell" is provided. However, there are a number of reasons why not to use "shell tests", and recently we made another step forward to reduce the use of shell tests in the langtools repository.

graph showing decline in number of shell tests

Why shell tests?

jtreg provides a simple, basic mechanism for executing a simple series of steps to enable a class to be compiled and run. It is not, and never was, intended to be a full-featured shell language. For anything more complex, support was provided for executing a shell script, as a way of executing an arbitrary sequence of commands. In the early days of JDK, this was a reasonable and expedient choice and served well for a number of years.

Why not shell tests?

Perhaps the most significant issue is that of cross-platform portability, and related to that, which shell environment to use? Initially, on Windows the requirement was to use MKS, but recently, Cygwin has become the standard. Both of these have suble and not so subtle differences to shell environments on Unix-like platforms, which lead to subtle rules about being careful about which commands to use and not use, or to large cut-n-paste blocks of text at the head of a script to set up platform-specific variables such as FS (for the file separator) and PS (for the path separator). These blocks of text had to be updated when the Mac became a standard supported platform, and are potentially a source of issues in the context of porting OpenJDK to other platforms.

A corollary is that it is difficult to write and test a reliable cross-platform test in a shell script. In the early days of Java, it was common for JDK developers to have a collection of machines under their desk, so that tests could be run on all supported platforms. That was both expensive and inefficient. Even today, with virtualization, it is still problematic to run tests on all supported platforms, and so it is better to write tests such that the test more likely to work across all supported platforms.

Another issue is that it can be hard to write a shell script that works well in the face of potential failure of the commands that it executes. It is easy to write a shell script that assumes everything works as expected; it is much harder to (remember to) program defensively that some commands might fail. And when a command fails, you need to be able to verify that it failed for the right reason.

Even worse than the points in the preceding paragraphs, it is hard to write a good test in shell script, period. The inverse may be more obviously true: it is far too easy to write a bad test in shell script, that will not behave as expected. Executing the command "java" will look for the command on your PATH, which is almost certainly not the version you should be using, which is probably ${TESTJAVA}/bin/java. Executing a command like "rm stuff" at the end of the script will likely succeed, and will make the test appear to succeed, even if preceding commands have failed. And on Windows, you need to be very careful to understand the differences in the way that Cygwin and Java handle filenames and paths.

Finally, there is the issue of performance. Driven by the desire for higher quality, that has meant wanting to run more tests more often, and that has led to wanting to run the tests faster [1, 2]. The Java/shell boundary is an impenetrable performance barrier. Even if the shell script invokes Java commands, they're each still in a different/fresh/cold new JVM, which means you lose the potential benefit of any JVM optimizations. Now, sometimes it is indeed necessary for the test to start a new JVM – for example, to run a program in a JVM with a particular set of JVM options – but that is often not the case, especially in the langtools world testing programs like javac, javadoc, and so on. Writing a test in Java gives you the choice of when to stay within the same JVM and when to start a new one; writing a test as a shell script, you have no such choice.

Removing shell tests

As you can see illustrated in the graph above, there have been three significant reductions in the number of shell tests in the langtools repository.

July, 2008

The first major reduction followed an analysis of the reasons that shell scripts were being used. It turned out that a significant number of shell tests were running javac, and then using diff to compare the console output from javac against a so-called golden file containing expected output. Supposedly, jtreg directly supported such a feature, but in reality, it did not work well enough to be useful, and so shell tests were being used to worked around that issue. Once the tool was fixed, it became possible to replace a whole category of shell tests with tests that were equivalent, or better, using other jtreg action tags.

February, 2012

The second major reduction in the number of shell tests followed the removal of the apt early in the development cycle for JDK 8. The tests for the tool were relatively complex, and a legitimate use of shell tests, although if the tool had not been scheduled for removal, it would have been worth rewriting these tests in Java. As it was, it was easier to wait for the tool to be removed, and to remove the tests along with it.

February, 2013

While the first two big reductions in the number of shell tests were significant, they were both the result of special circumstances -- working around bugs in jtreg, and the result of not needing the tests any more. The most recent reduction in the number of shell tests is of more general interest, since it is a more general purpose solution.

In the early days, Java was not a mature enough platform to consider writing the equivalent of shell scripts in Java. In addition, the default jtreg execution mode was "othervm", and it was much faster to run a shell script than it was to start up a new virtual machine just to sequence through the actions of a test that would in turn typically require even more virtual machines to be started.

Times change. The Java platform has matured, and it is now common to use jtreg to run tests in "samevm" or "agentvm" mode, which means that the cost of running a Java class to execute the steps of a test is much less than it was before. New language features (like try-with-resources) and new API (like java.lang.ProcessBuilder and java.nio.file.*) mean that is easier then ever before to use Java to perform a series of steps for which might have have previously considered using a shell script.

Attitudes change too. With hindsight, the initial simplicity of writing a shell test is a short term benefit compared to the long term maintenance cost. An equivalent Java program may have a higher initial cost, such as making you consider checked exceptions, but this can provide benefits later on, such as better detection and reporting of faults when something eventually does go wrong, years later.

With all this in mind, a recent addition to the LangTools team, Vicente Romero, took on the task of analysing the remaining shell scripts in the langtools test suite, with the goal of writing a library of useful methods, such that it was relatively easy to translate each shell script into an equivalent Java program. The result is a class we called ToolBox, and it provides equivalents for the commands we were previously using in our shell scripts. Some methods fork processes as needed; other methods can be implemented directly in the library, leveraging standard Java API.

Lessons Learned

The work to develop the ToolBox library has been a success and as a result, we converted a significant numer of the remaining shell tests in the langtools test suite to Java code. A few tests did not fit into the pattern of the rest of the tests, and will need to be re-examined before we can truly declare a shell-free test zone.

There is something of a hidden cost here, however: after all, nothing comes for free. The cost is that the tests all depend on a common shared library. In the early days, it was regarded as very important that tests should be as standalone as possible, such that they can be run and issues debugged outside the context of the test harness. That is maybe less important these days, and we see the use of standard test frameworks, like TestNG, and custom test frameworks, like JavadocTester for javadoc tests, and now ToolBox for shell-like tests. The cost is that we need to be careful (as with any library code) that if we change or evolve the API, we do not break existing clients.

Leveraging ToolBox

Can we use ToolBox in other test suites, such as the main jdk regression suite? Yes, and no. First and foremost, those tests are in a different repository, and sharing across a repository boundary would be problematic. More significantly however, the code in the langtools ToolBox is customized to the needs of the langtools tests. Although the concept of the ToolBox class should be applicable to the jdk regression test suite, it is likely that an analysis of the shell scripts in that test suite would lead to a different set of utility methods in a jdk version of ToolBox. However, I am confident that if someone were to do so, they would see the same advantages in that test suite that we have seen in the langtools test suite.

1. Speeding up tests: a langtools milestone
2. Speeding up tests again: another milestone

Thanks to Joe Darcy and Vicente Romero for reviewing this entry.

Post a Comment:
Comments are closed for this entry.

Discussions about technical life in my part of the OpenJDK world.


« July 2016