Building a Bridge: DITA, DocBook, and ODF

Some folks here are taking a very strong look at DITA. I'm certainly one of them. But we also have a huge legacy of documents in Solbook format (Sun's subset of DocBook). There are tools for editing such documents, and tools for processing them. and there are many people who are comfortable with those tools. So DITA isn't going to replace the world, just yet.

But DITA makes extensive reuse possible. It's a format with a serious future, because "reuse" is a very big deal. It lets you single-source your information content so have one place to make an edit. That sort of thing becomes important when you have multiple revisions of a product, and/or multiple variations. It becomes important when different tools and different products use the same information in different ways. It can drastically improve quality, ensure uniformity of presentation. Finally, structured formats like DITA and DocBook create the kind of consistently-tagged information that allows for useful automation.

So how do we bridge those two worlds? Fortunately, there are two sets of tools that make it possible:

  • DITA/DocBook/ODF Transforms
  • CMS Plugins
Of course, this sort of thing is a lot easier to say than to do. But saying it is a heck of a lot of fun--and sometimes it's half the battle. So here goes...

DITA/DocBook/ODF Transforms

Flatirons Solutions currently has a set of transforms (the Document Interoperability Framework) that converts DITA to DocBook, and vice versa. They're available now. And they're working on a set that will convert those formats to the OASIS OpenOffice Data Format (ODF).

The ODF transforms are pretty interesting. They would make it possible to edit DITA or DocBook documents in OpenOffice--an open source suite of tools that is available to everyone. That's a far cry from the kind of money you have to spend to get a really good editor these days. (Those editors will still be needed for handling content references, at the very least. But it will be interesting to see what can be done using OpenOffice.

But it's the DITA/DocBook transforms that are of most interest for interchange with legacy systems and tools. (There is also the question of how they handle DITA content references and DocBook entity references. But that's one of the tricky details that a concept paper like this can skim over...)

CMS Plugins

The ability to convert one document format into another is good, but it creates a problem--dual-sourcing. If you extract a document from a repository and convert it to another format, you now have two copies. What happens when the original document changes? How do you find out? How do your edits make it back to the original? If there is no back-path for such changes, how do you ensure that copies are never modified?

The problems stem from dual-sourcing. The solution is to maintain single-sourcing. And that's where plugins come in.

 I was reminded of that plugin capability when I saw it listed as a feature of the XDocs CMS. It was in an evaluation-table I put together quite a while ago. I didn't know what I could do with that feature at the time, but it seemed useful so I made a note of it.

The other morning, while pedaling to work, it came to me that the plugin feature could be used to create a bridge to an external repository.

XDocs is an XML repository that can store both DITA and DocBook, but let's say it's devoted to DITA work. And let's say that DocBook/Solbook documents are in a separate repository. A plugin can be written that accesses the external repository and applies the transforms to them, presenting them as DITA documents to the XDocs user.

These thoughts imply that plugin capability is a critical feature for any CMS that may eventually need to allow for document interchange.

A Two-Way Bridge

As long as the external repository has APIs that can be utilized, the plugin and transformation is possible. But things become even more interesting of the external repository has the capacity for plugins, as well. In that case, the plugins could talk to each other.

That's an ideal scenario, because a single set of interchange-APIs can be defined. (RESTful APIs, of course, based on the HTTP protocol.) Those APIs can then be used across a variety of repositories. Instead of writing a new plugin for every repository, you write one and give it the address you want to connect to. You might need multiple plugins, one each for multiple external repositories, but you only have one set of code .

An ODF Bridge?

The only remaining problem is how to make content in those repositories available to OpenOffice. I'm not quite sure how to do that, but maybe there is something I don't know about OpenOffice.

I'm used to using it to access the file system. But maybe there is some way to get it to access a repository? Or maybe it would be possible to write a plugin that makes a repository appear as though it were part of the file system?

That would be ideal, because then we would be back to having one plugin talk to another, doing the conversion necessary to make the external repository appear as though it were in the tool's native format.



As far as I know, you register a new format with OpenOffice by registering an XSLT stylesheet. Rather, a pair of stylesheets: one going in, one going out. The files could exist as ODF only in Openoffice's head, so to speak. Never ODF on disk.

As to the repository, download TortoiseSVN (Windows) or svn client (\*ix) and manage the working copy/repository relationship outside of OOo. (or use cvs or mercurial or ...)

Office suites could definately use a better versioning/collaboration system.

Posted by Bryce Nordgren on September 20, 2007 at 10:56 AM PDT #

You are looking at 2 things.

- A filter for DITA documents.
- Repository functionality.

There is a OOo plugin for SVN, although it seems not to work on Windows.
A better solution would be general SCM functionality within OOo with plugins for the various repositories, similar to the Team functionality in Eclipse (although probably not as extensive).

Posted by Jan Fluitsma on September 22, 2007 at 11:54 PM PDT #


This is really promising. MS Office has so far to catchup once these "reusable content" systems catch on.

The ability to access this contnent and marge with other systesm is just so fantatsic. Off line editing via WebDav, Web applicatiosn can reuse parts. Real versioning and laebellng concepots when mappd ontop of Dita provide a seriously powerful toolset that makes a business very efficient an able to control their documneations in precise, predicatabel and scalable ways.



Posted by Ged on October 13, 2007 at 02:26 AM PDT #


some thoughts on bridging:

OpenOffice(.org!) can read from nearly any source, may this be WebDAV, FTP, ... talk to the guys at the xml project regarding this topic [1].

XSLT-filters were mentioned already.

One hardly known source is ODMA that has been used for document management systems for very long as (type of) a standard. Nowadays it is vanishing [2], but some time ago I read an announcement on one of the Openoffice project mailing lists that there is an implementation of ODMA available. I think that could be a start for making an opaque bridge to a CMS soting DITA docs.


Posted by Marc on June 01, 2009 at 08:15 PM PDT #


Posted by guest on January 18, 2010 at 09:33 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

« August 2016