Structured Document Formats, Part II

After writing, Are Structured Docs Really Necessary?, I was asked:
>
> Having written up that interesting discussion, what is your gut feeling at this time?
> If you had to make an authoring recommendation to a group to make life easier
> going forward, what would it be?
>
Well, I found myself going back and forth, as you can tell. That post reflected two weeks of thoughts that kept surfacing after a particularly stimulating discussion. This post is an attempt to come up with an answer.

My gut says that, in the ideal world:

  • We need wysiwyg editing like Open Office, where you never think about tags. We also need the capacity to edit in some form of Wiki text, to quickly make small changes online without having to fire up a large desktop application. (One day, when there is a seriously robust, cross-browser, online DITA editor, the Wiki option may not be necessary. But that day is quite a way off.)

  • It is still necessary to add semantic tags, though, (e.g. platform=, platform_version=7, product_version=3) so Open Office needs extending to present an interface for that.

  • Given a wysiwyg editor that can do the right things, we need it to work on Wiki pages.

Note:
In the Open Office survey I took, they ask for feedback on which of 20 different things they should be working on. Two of the more interesting were "Open Office for blogging" and "Open Office for Wikis". The blogging tool already exists, and it works well. The only real problem for a Wiki tool is figuring out how launch Open Office when the user clicks "Edit". (They may be able to use Java Web Start for that--but it will have to be a separate copy of Open Office so it doesn't ask "Is it ok to print?", "Is it ok to save this file?" every time the user wants to do those things--or else Java Web Start will have to gain the ability to be used as a deployment vehicle, without having to accept its security sandbox to do so.)
  • To allow reuse, Open Office needs extending for transclusion, and probably for conditional text, as well:
  • For transclusion and variable substitution, we'll need href microtags that say "transclude the referent", rather than "display a link". (Note: The standard name is "microformats". But that term fails to connote their semantic import. So I call them "microtags".)

  • For items tagged with conditional metadata, the user will need the ability to control colors, fonts, and text decorations. XMetaL is the exemplar for how it should be done.
  • At that point, we have an industrial-strength "Wiki" that allows all of the encoding necessary for serious reuse.

  • We should be able to automate the conversion to DITA topics, but in the process, the document may well be split into several subtopics. So we have to decide whether they should be put back together again when you're viewing the page and you click "Edit". (We could, but what if it is the map you really want to edit? On the other hand, if you get the map and separate topics, it's harder to get a cohesive view while you're editing, and harder to shift content between topics in the document.)

  • In any case, the conversion should probably be limited to generic topics, for the foreseeable future. Information typing to identify tasks and concepts helps to promote reuse, but it adds many schema restrictions without actually guaranteeing reuse. (See more in the Specialization Notes, below.)

  • In that sense, information typing is a lot like static type declarations. They make some things easier, but don't guarantee success. Just as testing is needed to assure success in a program, well-factored content is needed to assure reuse in a topic.

  • Enabling reuse then, comes down to the ability to easily edit and refactor topics. So the editor/Wiki system needs to automate those operations as well. At a minimum, it needs to support topic renames, metadata renames, changes to metadata assignments (which affect all references, as well as the tagged entity), and topic splits. Whether other forms of refactoring can be supported remains to be determined (for example, topic joins).

  • The ability to make global changes easily is also needed. So an interactive, list-driven, pattern-based global substitution tool is required. With such a tool, you have a list of things you're searching for. For each item, there are one or more possible replacements. When you find one, you choose which replacement to use, if any.

Ok. That's where we need to be. So where are we now, and what can we use to get there? Here's what I see:

  • To start, we need a good platform to build on. From a minimal initial system to a full-featured final system, many features need to be implemented. In some cases, tools need to be built to enable operations. (For example, a dependency-managed system for document builds--and/or a build-to-order delivery system--that can incorporate functional and procedural transforms, as well declarative XSL transforms. Those requirements suggest the need for a strong underlying architecture and a flexible,  easily-extended programming environment like the one you get with Rake and (J)Ruby.

  • Even with a Wiki-style front end, we would benefit from interacting with a CMS, instead of a Wiki database. There are 7 open source CMSes that need investigating--three Ruby-based, and four Java-based. (The Java-based candidates may be more mature, and since JRuby can talk to them, they may provide a more feature rich implementation to build on.) The major requirements are link management, webDAV capabilities, and a flexible, pluggable architecture that will allow for access-proxies at some point.

  • It probably makes sense to come at the problem from both ends:
  1. Use DITA where it's needed to minimize duplication. But voluntarily restrict usage to things that will migrate well: Generic topics rather than deeper specializations, version-number variables (but no name variables), a restricted set of metadata tags that are unlikely to change.

  2. Use Wiki documents wherever we can, for maximum collaboration.

  3. Continue to define and foster a merger of the two.
We'll lose the advantages of information typing with this system, but gain the benefits of collaborative authoring, with the lowest possible barriers to entry. Will the trade-off be worth it? Only time will tell.


Specialization Notes

  • In general, it may be that DITA led people slightly astray when it included the information types for concept, task, and reference. Those are extremely useful, but may not be totally necessary for everyone. (They're good examples of specialization, but perhaps they shouldn't be regarded as "the" way to do DITA.)
  • Information typing does have benefits, though, Restricting task-oriented information to tasks does make them easier to read when you want to do something. Restricting concept to conceptual info makes them easier to reuse in different settings. Having a standard format for reference information makes things easier to find. So information types do help to make content more usable, primarily by restricting focus. In effect, the schema becomes a way to help automate the review/edit cycle, to keep content concise.

  • But information typing also adds complexity. It requires more knowledge on the part of the writer and more powerful editors, both of which add up to "barriers to entry". Reducing those barriers means implementing a more user-friendly front end like a Wiki or desktop publishing system. But information created in those unfettered channels will not convert cleanly to the more restricted information types. So to the degree we want to enable collaboration, we probably need to eschew information typing--at least until better online editing systems emerge or until knowledge of the tagging structures become more widespread.
  • To use the DITA specialization for SCORM-compliant training modules, on the other hand, we would need to use the Task and Concept types. That sort of thing could not be handled by a system that restricts itself to generic types. (The same for troubleshooting topics and other specializations.) So for the moment at least, a project that wants the benefits information-typing specializations will have to forgo the advantages of direct online editing--although other forms of collaboration could be enabled, as through a WebDAV-enabled CMS.
  • Some forms of specialization still make sense, however. For example, if a "book" is an index map for something you might well want to print, a "library" would be an index map for a collection of books--something you might browse or submit to a batch processing job, but not something you would print as a single entity.

  • In particular, I note that there are two major reasons for creating specializations. One reason is to remove tags so writers don't have to see them--a problem that goes away with either a front end that converts Open Office documents or Wiki text.  The other reason is to define semantic tags. Semantic tags let you have a part-number list with part-numbers and descriptions, for example, instead of definition list. That lets you automate interactions with a database. And unlike microtags, the schema guarantees that all items in the list are in fact part numbers.

  • There are in fact two ways to add semantic tags--either with metadata (microtags), or with specializations. The former is more flexible, the latter lets you define schema constraints. The former lets you do conditional processing, but requires Open Office extensions or a Wiki-text interface that has been extended for microtags (which need to be defined). The latter can be done with conversion rules, but still requires some form of microtagging: "If the list has a metadata tag that says "parts list", then convert all entries in the list to <part-number> instead of <dd>".)

  • For an organization at the initial stages of defining a topic architecture, schema-constrained semantic specializations are a long way off. But metadata attributes are going to be needed pretty quickly, even if their use is constrained to relatively "invariant" environment characteristics, like version numbers. (Which suggests that Open Office will need the ability to specify and edit arbitrary attributes fairly quickly. Otherwise, a Wiki interface will be the only game in town.)

Comments:

I work with programmers, who really dislike writing but really like being recognised for good ideas and good implementations. They're subject matter experts whose time is expensive and who'd rather be "in the zone" than have to supply verbal content. Thinking aloud:

As an ex-coder, I know that laziness is a virtue. When I see the word "collaboration" I think "motivate the SMEs to do most of the writing, instead of doing that inefficient interview-and-write thing". Software is horrendously volatile anyways, so ya gotta go with the flow and faster flow is better flow.

But even if you had a good browser-based DITA editor, you'd still have people who don't want to use it simply because they don't want to think about XML and structure and semantic tags. They just want to do a semistructured brain dump at their own speed, or do a review by putting in lots of very specific comments, but at no time do they want to think _about_ their writing. That's _your_ job. They just want to work with paragraphs sprinkled with lists and headings and bold and italic and maybe some hierarchical section numbering if they're feeling particularly generous. You know, wiki style.

What you might need is a system where SMEs can crank out the raw wiki material and then a tech writer can add semantic tags and other new tags to it and process queues of review comments and email traffic and then write the page back out to the wiki in a non-lossy format -- that doesn't lose the semantic info.

This would let you maintain a DITA mirror image of the wiki and use nice tools like XQuery on it and crank it thru the DITA-OT publishing tools.

For a new topic, create a new wiki page (from the right template), set its information type, add it to your topic DB, and link it to that outline that is your work in progress. Get the SME's first attempt, clean it up, put in a few to-do's, and bounce it back to him (or her).

Make wiki edits part of a coder's task of closing out a change request. Get acceptance by keeping it easy.

Wiki edits will enter the DITA docbase as unsemantic presentation markup, so when you have time (or the diff you got in your inbox is scarily big), you (the tech writer) load the page, tag it, proofread it, clean it up, save it back to the wiki, and move on. Maybe you assign a review task, maybe to a non-technical SME.

Internally, the wiki is accessible and up to date. In your documentation pipeline for all your beta testers (also known as "customers") you've got the latest and the greatest. Tinker with the publishing tools so that they complain about wiki markup that hasn't been tagged yet.

A wiki that respectfully handles HTML tags with your own shop's attributes might be able to do this. It should also have a mechanism for wiki users to comment on specific sections of wiki content. The people who prefer the wiki interface will be none the wiser. "DITA? \*yawn\*"

In other words, a DITA CMS with a schema-aware power front end for tech writers and a non-lossy wiki connector.

Posted by F.Baube on February 12, 2008 at 01:54 AM PST #

F. Baube wrote:
>
> What you might need is a system where SMEs can crank
> out the raw wiki material and then a tech writer can
> (work with it)
>
Bingo!

PS
Nice to meet a fellow member of the developer/writer community. We're a pretty rare breed, standing as we do at the intersection of two pronounced skill sets with long apprenticeship times!

Posted by Eric Armstrong on February 12, 2008 at 02:30 AM PST #

Having been in and out of XML authoring, I agree the editor needs to be much more WISYWIG than any of the current solutions seem to be.

The minute I showed my development team what our XML authoring environment looked like, they started sending me links to XML development tools for programmers. In their words what we were using was akin to editing in Notepad.

I've always been less concerned about getting raw content in writing from any SME. It can be helpful, but I'd rather spend 15 minutes talking to someone than watch them stare in sheer panic at their machine because someone has asked them to write a 'spec'. Or, hey, give me access to the code and the embedded comments.

Posted by meg miranda on February 12, 2008 at 03:08 AM PST #

I found a link to part 1 at dita(dot)xml(dot)org; read it, related posts, and all the comments on your blog - a whole lot of thought-provoking stuff! Motivated me to weigh in with another argument in favor of structured documentation.

One of the chief aims of DITA's topic specializations is completeness. DITA's framers started by looking at a lot of documentation and figuring out what things the good stuff had in common. The specializations' constraints are a codification of best practices. They make the elements of effective documentation mandatory. (The fact that the specializations are themselves extensible is a great bonus, and testifies to the brilliance of the designers.) The price for lowering the barriers to contribution by not requiring tagged input will be loss of consistency and quality. Check out the wikis of a few random sourceforge projects - is this what you're willing to settle for?

We don't yet have the AI and natural language processing to infer meaning and structure from freeform text. F.Baube has the right of it - human intervention will be needed between the input of unstructured text and the publication of effective documentation no matter what groovy features you build into your wiki/cms.

btw, a couple other insightful people on the subject are Brian Forte at redhatmagazine(dot)com and Andy Oram at radar(dot)oreilly(dot)com

Posted by Jesse Inskeep on February 12, 2008 at 05:41 AM PST #

I found a link to part 1 at dita(dot)xml(dot)org; read it, related posts, and all the comments on your blog - a whole lot of thought-provoking stuff! Motivated me to weigh in with another argument in favor of structured documentation.

One of the chief aims of DITA's topic specializations is completeness. DITA's framers started by looking at a lot of documentation and figuring out what things the good stuff had in common. The specializations' constraints are a codification of best practices. They make the elements of effective documentation mandatory. (The fact that the specializations are themselves extensible is a great bonus, and testifies to the brilliance of the designers.) The price for lowering the barriers to contribution by not requiring tagged input will be loss of consistency and quality. Check out the wikis of a few random sourceforge projects - is this what you're willing to settle for?

We don't yet have the AI and natural language processing to infer meaning and structure from freeform text. F.Baube has the right of it - human intervention will be needed between the input of unstructured text and the publication of effective documentation no matter what groovy features you build into your wiki/cms.

btw, a couple other insightful people on the subject are Brian Forte at redhatmagazine(dot)com and Andy Oram at radar(dot)oreilly(dot)com

Posted by Jesse Inskeep on February 12, 2008 at 05:51 AM PST #

Jesse Inskeep wrote:
>
> (good arguments in favor of document structure)
>
Cannot disagree. There are serious benefit to be derived. But there are also costs--notably in the area of collaborative editing. Can we get both? Possibly. But we \*must\* have interaction mechanisms that are so good, we can make the tags invisible.

Meg Miranda wrote:
>
> I've always been less concerned about getting raw content in
> writing from any SME. It can be helpful, but I'd rather spend
> 15 minutes talking to someone (and writing it up)
>
That's my preferred working style for initial content, as well. I neglected to say this in an earlier response, but one of my major concerns is minor corrections and small additions going forward.

In the time it takes to file a bug, such changes could be made. Many small fixes never get a bug report, because the reader has to switch context to some other tool. And when a bug is filed, it can take a long time before it's closed.

In the meantime, the person processing the bug has to go to one tool to look at it, go to another to find the page, locate the problem, fix, and then close the bug. If you figure that the original reader could most likely have fixed the problem in the time it took them to file a bug, then everything after that point is so much wasted effort.

So I'm more interested in "low barrier to entry" for updates, corrections, and minor additions, rather than for initial content generation.

Posted by Eric Armstrong on February 12, 2008 at 06:01 AM PST #

PS
I agree that the DITA architecture is nothing less than absolutely brilliant. When I do a presentation on the subject, I have one slide
entitled "The Genius Factor". It is nothing less than stellar. I think that if we could come up with an equally brilliant online editing system--one that was so robust that that it was never necessary to think about the underlying tags, there would then be no need to trade off information typing for collaborative online authoring.

I look forward to that day. In the meantime, I'm thinking that generic types and a Wiki interface may provide us sufficient functionality to get started. Will it work well enough to be useful? To be determined. Will we get the kind of content we want? Also to be determined. Strong refactoring tools will help, but it may be a while before we get those, as well.

As mentioned in an earlier blog, the Wiki path is one way to approach the problem. The other path is with a WebDAV-savvy CMS and the appropriate editors. That kind of system requires a bigger upfront investment, but things more or less "just work". It's a seriously viable strategy for getting docs out the door. But when I start thinking about extending the collaborative framework so developers can make small changes when they spot them, I find myself moving towards the Wiki world.

Posted by Eric Armstrong on February 12, 2008 at 06:14 AM PST #

Case in point: Grails people are discussing wiki & user guide:
http://www.nabble.com/Re%3A-Grails-User-Guide-Wiki-p15452610.html
http://www.nabble.com/Re%3A-Grails-User-Guide-Wiki-p15451167.html

Posted by F.Baube on February 12, 2008 at 06:45 PM PST #

Don Day, Sarah Maddox, and others have pointed me to your blog and I'm so glad they did. I've read both Part I and Part II, and I'm very appreciative of your blogging abilities and the fact that you're putting your train of thoughts out there for all of us to read, learn from, and digest, and perhaps even apply in our decisions going forward.

I'm extremely interested in a DITA-wiki hybrid and just this week at the Central Texas DITA User Group meeting we had a demonstration that showed DITA as source going into a wiki, where even the conrefs from the DITA source can become transcluded entities in the wikitext. I hope to write up my notes from that session soon, with links to the slides. The amount of information sharing going on around this fascinating topic of "reuse vs. collaboration" is plain awesome.

I especially like your comment here, and it prompted me to comment. "I'm more interested in "low barrier to entry" for updates, corrections, and minor additions, rather than for initial content generation." I agree, although with the caution that if a community member has the ability to generate that initial content, let them have at it! All are welcome for the collaborative efforts I've been involved in that use a wiki as a documentation editing and publishing tool. Andy Oram's Community Documentation and Free Documentation studies may represent the future of technical information authoring and dissemination.

You're doing a great job of sharing your back-and-forth arguments with yourself, and that's the greatest part of blogging. Thanks so much for writing this up.

Posted by Anne Gentle on February 22, 2008 at 07:03 AM PST #

I haven't seen any mention of DITA Storm here... It provides a simple Wiki front end with the DITA back end.

Posted by Seraphim Larsen on February 26, 2008 at 04:27 AM PST #

Seraphim wrote:
>
> DITA Storm provides a simple Wiki front end with the DITA
> back end.
>
Hi, Seraphim.

I left it out, because the last time I looked at it, it did not do well at handling files created by other editors. I'm sure that situation will change at some point, but that's too limiting a restriction for a production environment.

On the other hand, Bob Doyle found a perfect application at the Boston DITA User Group. The created a test bed using DITA Storm at the front end, and the Dita Open toolkit at the back end. Send all DITA files are created with DITA Storm, that limitation never comes up.
http://dita.xml.org/march-meeting-boston-dita-users-group

Posted by Eric Armstrong on February 26, 2008 at 11:31 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today