The Value of Semantic Tags

So what's wrong with using <b>, <i>, and <tt>, anyway? What's so useful about identifying things as menu items, APIs, or filenames? Here's the list of reasons that surfaced at the recent 2008 DITA/CMS Conference. What are your thoughts?

At their session on DITA, Code Reviews, IBM's Carolyn Inkster and Sharon Rouiller showed the results of using their "bad tag finder"--a CSS stylesheet that made text marked with typographic tags like bold and italics stand out with large, brash fonts and brilliant color, so they were easy to spot. The idea was to quickly identify text that would be better off with semantic tags like filename, or menu item.

They were then asked, "What is the value of adding semantic tags?" (Especially after a conversion, when lingering typographic elements like <tt> and <b>  will encode such information, what makes it desirable to convert them all to the appropriate semantic tags?)

Here are the reasons they gave:

  • Automated link insertion: When messages are marked with a semantic tag, all the messages in a document can be automatically linked to a troubleshooting guide that details the possible causes and ways to deal with them. Alternatively, the troubleshooting guide could be automatically populated with links to each area where a particular message is discussed. Semantic links make that kind of automated document construction possible.

  • Information filtering: Users can filter information using metadata tags, so they can leave out information that pertains to products they're not interested in. (That's not quite the same principle as semantic tags, though...)

  • Different typographic conventions in different languages: There is no such thing as "bold" in Chinese, so for that locale it makes more sense to use a different color. But for other languages, bold works fine. Some languages might even have color-coding conventions (or acquire them over time), where orange is used for one thing and red is used for something else. Semantic tagging makes it easy to produce documents for any convention, in any locale.

  • Translation control: The existence of semantic tags makes it possible to identify terms that shouldn't be translated, both for translators and translation memory systems--for example, product names.

To that list, I would add the following:

  • Intelligent search and replace: So you can change all occurrences of a term in a menu item, for example, without worrying about it when it occurs in a filename.

  • Automated processing: If you have a part-number specialization, it becomes possible to populate your parts list automatically from a database. Alternatively, the editable document could become the gold standard, and the database could be populated from that.
Those are the reasons that have surfaced for semantic tagging, so far. What do you think? Are there other important reasons to add to the list? Or is it pretty complete? Most importantly, is the additional work justified by the potential returns, in your experience?



You've caught most of the ones I can think of, with one exception: semantic tagging separates the display from the markup. If variable names, file names, and emphasized text are all marked up with <i>, and you decide later on that you want just the file names rendered in some other manner, a person will need to look at every instance of <i> to find which ones are file names and which aren't.

It is possible to go overboard, but if you keep the distinctions sane, I think the effort is worth it, and once an author gets used to semantic markup, I don't think there's a significant decrease in productivity during authoring.

All that said, I wonder why it was necessary to have markup like <b> and <i> in DITA in the first place? Seems like it would be best to do what DocBook does and just leave out the representational tags.

Posted by Dick Hamilton on April 14, 2008 at 04:38 AM PDT #

There are some good reasons for semantic tagging, for sure. Another is to use guilabel so that strings in the GUI can be automatically checked for updates and accuracy.

But you can also go overboard. I think there should be a reason for tagging, rather than just tagging everything you can think of. I mention this because I have experienced that tendency, and it makes it almost impossible for writers to use tags consistently. It also makes it a pain to edit the XML. That can be particularly annoying when there is no reason for the semantic tags, even as a future implementation.

Posted by Ruth on March 01, 2010 at 12:19 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

« August 2016