Modular Docs Part 2: DITA vs. DocBook
By Eric Armstrong on Oct 06, 2008
This is the second in a two-part series. Part 1 describes the motivations for modular documentation. Part 2 zeros in on the reasons for choosing DITA.
When IBM decided to focus on topic-oriented documentation,
it created the Darwin Information Typing Architecture (DITA), even though there was already a huge investment
in DocBook. Moving to a new architecture was a
decidedly non-trivial undertaking--both technically and politically--so it is worth an inquiry as to the reasons for making that move.
Perhaps, one day, we'll be treated to a insider's history of the decision-making process. In the meantime, here are the factors that (I image) played a prominent role in the decision:
- Editable Components
- Validatable References
DocBook had 800 elements. The typical installation had to remove 600 of them to get down to something practical. DITA, in contrast, has 120 elements, making it much easier to use "out of the box".
Simplicity is a major driver for adoption, and adoption is the key to a community growth. To succeed, a standard needs a large and vibrant community, so DITA's relative simplicity was key to creating a community that would have vendors and open source projects competing with each other to provide "best of breed" solutions.
A myriad of special cases had combined to create the
monolithic, 800-element standard that was DocBook.
Reducing the number of elements to the bare essentials
covered 80% of the use cases with a fraction of the
elements, but that still left the other 20% that needed to be addressed. DITA's
designers chose to enable solutions for that set
(rather than building them in), by designing DITA to
You extend DITA by specializing existing formats, giving things new names in that process, but retaining references to the original names. Production systems and editors can then default to the behaviors associated with the original types, unless special instructions are provided for customized processing.
As important as simplicity and extensibility were, however, it is probable that the most serious motivation for the move to DITA came from the need for topic-oriented authoring--and the difficulty of doing that with DocBook. Those difficulties stem from the nature
of the mechanism available for component reuse in DocBook--entity references.
To be reused, a component first had to be taken out out of its DocBook setting (like removing a wing from a model airplane). An entity reference could then be employed to pull it into a document.
But when the component was removed, it had to be placed into a Document Type Definition (DTD)--a control file for XML documents that was not itself in XML--so an XML editor couldn't operate on it--which meant that components, once extracted, could no longer be edited using normal authoring tools.
DITA, in contrast, creates discrete components at the outset, all of which are editable using standard XML editors.
But perhaps even worse than the inability to edit
a component was the inability to validate either it, or the document that referenced it. In the first place,
a DTD wasn't an XML document, so individual components
couldn't be validated. In the second place, an entity
reference could occur anywhere, making it impossible to
prevent a component from being inserted at an illegal
DITA solved the first problem by using element IDs. A reference points to the ID of an element in a normal topic, which meant that components are stored in standard XML files. So components could be edited and validated using standard authoring and production tools. (And as an additional benefit, a topic could contain multiple components.)
DITA solved the second problem by using an attribute on an element to create a reference. An element can only be inserted where it is legal. (Otherwise, it won't pass validation.) The reference is only valid if it refers to the same element, or to an extension of that element--a restriction that is easily validated, or which can be constrained by the editor. The referenced component, meanwhile, can be validated independently.
That implementation created what is perhaps the most significant and at the same time the most underrated feature of the DITA standard:
If all of the components of a DITA document are separately valid,
then the combined document is guaranteed to be valid, as well.
DITA was designed from the ground up for modular,
component-based documentation. The process was
informed by decades of experience with the DocBook
standard. The issues and restrictions were
carefully identified, and brilliant solutions were
Because DITA is extensible, "DITA is to documentation what Object-Oriented is to programming," as co-worker Sowmya Kannan likes to say. But because DITA is designed around discrete components that can be edited and validated, it is equally fair to say that "DITA is to documentation what integrated circuits were to electronics".
In short, DITA is a standard that lets you assemble new deliverables from existing components--adding only a minimum of additional circuitry--with full confidence that the base components "just work".