Automated Code-Inclusions in DITA
By Eric Armstrong on Sep 05, 2007
Automated Inclusion of
Fully Tested Code Snippets
using DITA and a Code Processor
Summary: When I was writing in HTML, I had a system that could easily transform
a body of code into a sequence of versions that let the user experiment
with the program and see how it works at each stage of the process. As
a further benefit, it also produce HTML showing added code in bold and deleted code in
font. The big advantage was the ability to changed code in early
versions in ways that made later versions easier (or possible). I could
then re-generate all of the intermediate versions before doing the
final writeup. That system saved a lot of work, but I still had to cut
and paste the code fragments. Using DITA, I won't even have to do
that--I can simply transclude the code fragment right into the document.
Code samples are an important part of the tutorial-writing process. But they also present some significant problems for writers:
- When working on a program, what you learn late in the process can affect the code you wrote in your early examples.
- APIs can change from release to release, which can affect a code snippet that occurs multiple places in a document.
- Code from a program is pasted into the document.
- Changes in code samples need to be highlighted from one snippet to the next (with new code is shown in bold, for example, and deleted code shown in a strikethrough) font.
- The code needs to be tested to be sure it works, but that can only be done by running the program it came from--which may not agree with the code in the tutorial.
In short, the process is characterized by (a) code changes and (b) an excessively manual process required to portray code in a document, and (c) the need to manually execute the tutorial code to test it--unless you can guarantee that the code snippets in the tutorial are an exact duplicate of the code in the program(s).
Those problems can be solved using two technologies: XML processing instructions and DITA transclusions. This paper describes the two solutions--each of which solves a different aspect of the problem--and shows how they dovetail for an almost-complete, end-to-end solution to the code-inclusion problem. (The only aspect left is "smart condensation, but that will be a fairly easily and fun problem to solve.)
XML Processing Instructions
The foundation of the system is a tool I developed for the JAXP/XML tutorial that generates multiple versions of a program. The essential operation is fairly simple:
- Put the code in an "XML" file that starts and ends with a <pre> tag as the root element.
<?xml version='1.0' encoding='US-ASCII'?>
- As code is modified, insert processing instructions to add or delete code for the next version of the program:
<?version 3 add ?>
<?version 4 del ?>
<?version end del ?>
<?version 4 add ?>
<?version end add ?>
<?version end add ?>
Note that the processing instructions can be nested, and that each instruction applies to one or more lines of code. (For code examples, that implementation works pretty well. When a change is limited to a character or two, it's easy for the reader to miss the change. It's harder to miss when you remove the entire line and replace it with a one that has the change.)
The processing program then has three functions:
- Generate version "X" of the program, so it can be run.
- Generate an xHTML file that includes the code in a <pre> tag, so it will display in a browser.
- Generate an xHTML file that highlights the differences between version X and the previous version (version X - 1)
To take the logical next step, two modifications are needed:
- Generate DITA output, rather than xHTML
- Create the instruction to transclude the generated snippet of code, so it can be pasted into place. For example:
Note: The syntax may need a little work. But you get the idea.
Once the content reference (conref) is copied into into the document, all future generated versions of the program will be automatically reflected in the document. So when you're done coding, you'll have a complete outline of the tutorial you want to write, with all code snippets in place. If you then modify the code later, some of the text may need to be rewritten, but all of the code segments will be correct.
With that system:
- The code referenced in the document is guaranteed to be an exact replica of the code in the program.
- No manual operations are needed to cut and paste code or to highlight differences between snippets.
- Unit testing and other testing strategies can be brought to bear to ensure the code stays accurate over time, and works as written.
- Changes to the code are automatically reflected in the document, ensuring that the code stays accurate (although text may need to be revised).
Remaining Problems to Solve
Two problems remain to be solved with this system:
- Code Condensation
In a tutorial, the only code you really need to see is the changes you're making. But you need enough of the surrounding context to find the code you want to change. So you need the nearest method declaration, for example, and a line or two above the changes to establish some context. If there is more than that between the start of the method and the changes, then the excess should be replaced by an ellipsis (...). When pasting the code by hand, you automatically take care of that kind of thing. But when you're automating transcluding the code snippet, the production engine will need to be smart enough to condense the code the same way you would, if you were doing it by hand. (But writing that sort of semi-AI heuristic is a lot of fun.)
- Multiple Snippets
When you've made many changes, you generally want to show only a few of them at at time, and explain what's going on before going on to the next set. To do that, the processing engine needs to be told to how to identify the snippets, perhaps by adding an ID or sequence number to the processing instruction. That value could then be appended inserted as an ID in the DITA output, with separate conref-instructions generated for each instance. You would then paste a set of conrefs into the document, rather than just one.