DITA Production Maps -- A Proposal
By Eric Armstrong on Mar 01, 2008
On the DITA Users mailing list, Dan Vint wrote, wrt DITA OT 1.4.1:
> I contend that the input file structure is not always what is wanted
> or what is best for the output (because the output directory copies
> the hierarchy of the input topics and maps)
That statement is especially the case when you refactor topics, and most especially when files are stored in a directory hierarchy, rather than a database "cloud" or a "flat file" container.
In either of the latter two systems, everything is forced to be at the same level. That puts a premium on naming conventions &/or search functions to help you find things.
An alternative strategy, common on systems that provide hierarchical directory structures, is to put similar things in a directory by themselves, so they're easier to find. (That's a key feature of all systems that derive from a Unix heritage.)
For example, let's say that most of your topics reside in src/project1. But let's say you have multiple projects with shared content, so you set up src/common for boilerplate and reused topics.
Now you have a problem. Topic maps for "Project 1" documents can't live in src/project1, because if they do, they can't reference src/common. (And even were that problem fixed, the "common" topics would now be "above" src/project1, in the eyes of the OT.)
In consequence, all topic maps have to be at the src/ level, above both common/ and projectN/. But now, even with the no-copy-above option for /generatecopyouter (option 1), the output directory structure looks like this:
Dunno. That strikes me as weird. Especially since my "out" directory will most likely be named "project1". The result is a pretty unfortunate directory structure.
So here's what you get:
When this is what you want:
...all project1 topics...
Unfortunate as the output hierarchy may be, however, it is at least constant. That makes it possible to write post-processors to fix it up. The issue then, of course, is fixing links when you do--no small chore. (Hmmm. I wonder if it's possible to run a link-fixing engine like DreamWeaver in batch mode. If it could be done, that problem could be solved!)
The "easy" solutions to the output-hierarchy problem are:
- Use a CMS that has already solved these problems. (expensive)
- Build or buy a dynamic-production system that builds docs as they're accessed, rather than producing static pages. (more expensive)
- Use a database or flat file system, so generated outputs have no hierarchy. (You give up organization at the front end so you don't have to deal with it at the back end.)
If you're on a Unix system, giving up the advantages of directory hierarchies is something of a pain. On the other hand, directories are single-category structures, and sometimes that's a pain, too. (Interesting possibilities arise if DITA topic maps are considered as "directories" by a suitable browsing tool--but only if it also identifies and lists all unmapped topics.)
The only design I see that solves all problems is the addition of a "Production Map" to the OT. (The XDocs CMS may do something similar to this.)
The production map tells where output files are going to be when they're created. Such a map makes it possible to:
- Collapse the directory structure currently created by the OT, so that the files no longer go to subdirectories of out/. That makes it easy to generate a project1/ output directory that has the index page, content-specific topics, and common topics, all in the same place.
- Fix internal links, so that a reference to ../common/X becomes simply a link to X, if X is part of the current document.
- Create separate production namespaces. The common/X topic becomes project1/X or project2/X when generated, so naming conflicts are avoided. (This is an advantage of directory hierarchies. I have no idea how people do it in a flat file system. They must do a lot of renaming and link-fixing.)
- Adjust links to other documents, using the production map as a guide to where they will be.
That last feature is something that is not easily done with DITA and the OT as they stand today--either during production or while editing.
Let's say you're writing a document that compares project1 to project2--but you're using the same set of topics as the source for both of them, with suitable metadata. How does the project1 version of that document refer to the project2 document, or vice versa?
Even more importantly, suppose you're migrating to DITA, and that you have a large existing collection of documents for multiple products and/or multiple versions of those products. As you begin constructing DITA documents, you'll want to include references to existing documents. But if topic is the source for multiple products &/or multiple versions, the desired target for that link is only known at production time.
To create a link in these circumstances, you need a mechanism that causes the right link to be generated. And ideally, it should be one that authoring systems can deal with.
A production map provides a possible solution. For DITA documents, it would need entries that look something like this:
# Map + Metadata --> Destination
# ------ -------- -------------
c.ditamap + val1 --> out/product_family/product1
c.ditamap + val2 --> out/product_family/product2
p3.ditamap + ... --> out/product3
(It might even give specify names for the target docs. TBD.)
For documents of other types, the map could contain entries like this:
# Doc + Metadata --> Location
# ---- -------- -------------
html/\* + val1 --> out/\*
other/\* + val1 --> http://__some URL__
other/\* + val2 --> http://__some other URL__
This map has two different kinds of entries.The first says that any link to an HTML document that exists in the html/ directory should produce a relative link, because things in that directory will exist at the specified location. (The production system does not necessarily have to put them there. The map declares a "promise" as to where those documents will be found when the publication process is complete.)
The second and third entries says that xrefs to other/ documents will become absolute URLs, but the URLs will be different in different contexts. Those documents may not even exist locally. The important point is that you can control the link that gets generated at production time, and have some chance of validating the xref at authoring time, either by following the link specified in the map or by setting up a system of symlinks (on Unix systems) that make the xref resolve to some document, somewhere.
With the first entry, the authoring system can insert xrefs, and the CMS can ensure they are valid. With the second entry, that kind of automation may not be possible (or may be more difficult).
That brings us to the process for inserting xrefs in a DITA topic.
At the moment, there is no way to insert an xref to a product2 document that will work if the output location changes. The best you can do is put in an external xref that is relative--in which case you have no way to verify that the link is accurate, or use an absolute external xref--in which case the link will point to the wrong target somewhere down the road, when the topic is generated in a new context (different product or version, for example).
But if we add a target-metadata attribute, then we can insert an xref that can be managed at authoring time, but resolved at production time. The link would look something like this:
<xref product="p1" href="c.ditamap" target-metadata="__TBD__">
Since the xref is conditionalized by product1 metadata, it's only generated when processed using the p1 ditaval file (val1). It references itself in this case, and specifies the values or values file that produces the intended target.
The extra qualifier references one or more of:
- A ditaval file (for an easy way to specify a collection of metadata attributes).
- A metadata value like "product:p2", which may have advantages in some cases, or may be more trouble than it's worth. (TBD)
- A set of metadata values, so the link isn't tied to the location of the ditaval file. (May or may not make sense. TBD.)
The attribute name is too long for my tastes.I'd love something shorter that was equally accurate. Maybe "target-values"?
When inserting a link using the editor, it needs to notice that there are multiple entries in the map and give you a choice. You select the version you intend, and the editor adds the appropriate attribute.
For extra credit, an editor/CMS system would:
- Notice when you add a ditaval file to something that was previously unconditionalized, and add that attribute everywhere it's needed.
- Notice when you add an entry to the map that specifies a new values file, and give you the option of selectively changing all existing references that use old values.
(The system also need the ability to point to Part C of the product2 document. But to do that, it needs to point not to the topic itself, but to the topicref entry in the topic map.)
Of course, not all references would be created with such a robust front end, so production time pre-checks would be needed to ensure that all links make sense:
- If a referenced map(/topic) can be produced with more than one set of metadata (has multiple entries in the production map), ensure that the reference is qualified.
- Ensure that all qualified references have an entry in the production map, and that all unqualified references are referenced by the current topic map.
The final step is to adjust links at production time. So if we're generating product1 from c.ditamap, and it contains a link to c.ditamap qualified with the product2 values file, then the processing tool figures out that:
- The generated page will be in out/product_family/product1
- The referenced page will be in out/product_family/product2
- Therefore the link should be ../product2/c.html
Of course, the ramifications of this quasi-proposal go well beyond the OT:
- The DITA specification needs a new target-metadata attribute
- Editors need to know how to insert target-metadata qualifiers.
- CMS link-management modules need additional functionality
It's a complex proposal, I have to admit. And it needs work, even then. But it's the only way I can see to solve the problem created by the fact that, with suitably factored topics, the input structure is never an exact match for all of the desired output structures.
The good news is that, given the implementation difficulties and widespread implications, I'm pretty convinced that DITA 1.4.1 did the only thing it reasonably could do to address the issues, at this juncture.