Wednesday Aug 22, 2007

Front-End Quiz Part 6, Separating Content, Behavior, Presentation

Where/how is content best separated from behavior and presentation?

  1. On the client-side, using semantic HTML plus external JS and CSS files.

    This is where I set up camp. Server-side methods exist for separating content, presentation and behavior, but they have different ideas of what "presentation" and "behavior" mean. In terms of separating out the visual presentation and client behavior layer, CSS and unobtrusive JavaScript should be the first line of defense.

  2. On the server-side, using features of your rendering framework/MVC architecture.

    If, by "presention" you simply mean HTML code, and by "content" you mean information stored in the application, and by "behavior" you mean controller logic, then yes. On the other hand, if you're coding font tags, layout tables, spacer images, onclick="" and onload="" attributes directly into your JSPs or page renderers, then I think that's a serious red flag. With CSS handling the visual details and unobtrusive JavaScript handling client interactivity, your web renderer is free to focus on saying "now a heading," "now a list," "now a submit element," etc.

Thursday Jan 18, 2007

HTML class attributes aren't just for CSS

HTML's class attribute was meant to be used in several ways, not just as a CSS tie-in. According to the HTML spec, class attributes are a general-purpose semantic hint. Here's an example of such a usage:

<div class="calendar">

There are at least three ways classes are useful outside the CSS context:

  1. They make HTML structurally self-documenting. HTML is fairly generic. Classes allow you to specify the meaning of various parts of your document, making it easier for yourself and others to unravel its meaning in the future.

  2. They serve as triggers for unobtrusive JavaScript. Now that hard-coded event attributes are dying off, class attributes allow JavaScript to dynamically select and operate on various parts of the document.

  3. Classes are a key part of microformats, which allow online tools to extract meaning from HTML code. Being plain old snippets of HTML, microformats are readable by humans using browsers. But because they're built in a standardized way, they can also be understood by software.

Tuesday Jan 02, 2007

How this Blog Design was Created

I decided to go the total web-standards-nerd route in creating this blog design. Semantic HTML, source-order, accessibility, and unobtrusive JavaScript were all factors from the outset.

The first thing I did was to customize the "Basic" theme. This gave me a starting point. I removed all presentational information, starting with the CSS code. Then I went through the HTML and removed presentational hints. For example, layout tables, <center> tags, &nbsp; characters, etc. Even things like class="left-bar" had to go. "Left" is too presentational, and I relaced it with "sidebar," since it might be argued that "sidebar" is a semantic hint. (Yes, that's how much of a nerd I am.) Fortunately there wasn't much presentation baked into the HTML to start with, because most Roller themes are pretty good about separating content and style.

Next to go was JavaScript. I had just removed the presentational layer, and now it was time to remove the behavioral layer. These layers would be added back in later via external files, once the HTML-only blog was proved working.

Wherever possible, I made the HTML as semantic as possible; selecting and organizing tags elements by meaning and structure. Source-order sensibilities were used, placing more important things higher in the source code. Headings/sections were arranged such that if you removed all but the headings (<h1> - <h6>) you'd get an outline of the page. <p> was used to contain all logical fragments of text. <div> was used to delimit sections. <ul> was used to build lists. It turns out, semantic HTML is really pretty brainless and straightforward, once you yank out presentational noise.

Once the blog was purified, it was boring-looking, but usable. No columns, just black Times New Roman text that strung all the way across the browser screen on a white background, with blue hyperlinks. It was nekkid! Time for some clothes. An empty CSS file was <link>ed to the blog, which I began populating, resulting in the current design.

As you may know, it's easy to do CSS design in Firefox, Opera and Safari, all of which understand and render CSS beautifully. But then you inevitably have to do battle with IE. Most designers approach the IE problem by peppering their CSS file with hacks. Hacks are oddly structured CSS commands that, due to IE bugs, either can't be understood by IE or can only be understood by IE. Instead of doing this, I used conditional comments to load corrective stylesheets into just IE. Conditional comments are an IE-only feature that look like regular HTML <!-- --> comments to real HTML parsers, but have special meaning to IE. This way, I was able to quarantine IE-specific bigfixes to a couple small, IE-specific files.

Thursday Dec 08, 2005

HTML as Self-Describing Data

Despite Sun's status as technology powerhouse and innovater of Java, there are a few software concepts that, culturally, we've failed to catch on to. Here's one of them. Over the past five years I've seen numerous attempts to build a system that represents web content structurally. For example, I once helped in converting documents to a custom XML schema for a web-based CMS. During the process I discovered that their XML schema basically duplicated HTML. The intent was apparently to build the one true web document format that avoided HTML's presentational pitfalls.

The problem was this: HTML already is a structural language, but it's been hacked and kludged by web designers for so long that programmers think of it as a presentational language. Rather than simply correcting this faulty notion in their own heads, they made everything more complicated by building a replacement for HTML and then building an XSL system to translate between them.

What they should have realized is that HTML isn't limited to desktop-browser-screen-based views into data; it can be a source of data. In other words, if written succinctly, HTML is a perfectly decent document schema. (Bracing self for a storm of WTFs from DocBook enthusiasts.) Two advantages of this are performance and simplicity. Documents need minimal processing/transformation before being delivered over the web.

Points of Contention

I can see all kinds of "yeah, but" reactions to this, so I'll debunk some of the main ones now.

Point of contention: HTML is SGML, not XML. We need an XML format.

Response: You can use XHTML, which is an exact reformulation of HTML as XML.

Point of contention: We already use XHTML, but it's corrupt as a document format because our pages don't validate.

Response: Whose fault is that? If you produce XHTML, as with any XML, you need to make sure it's well-formed and valid. It really isn't that hard. Tools like Tidy even exist that automatically fix well-formedness and validation errors in HTML documents.

Point of contention: We have no choice but to use HTML presentationally. For web documents to achieve real-world visual design standards, things like layout tables and spacer images are needed. HTML needs to be twisted and misused, therefore destroying its value as a structural schema.

Response: If you believe this, chances are you're not familiar with the advances in CSS over the last five years. CSS takes away the need to write HTML presentationally. As of 12/2005, the vast majority of browsers in use support CSS well enough to achieve real-world visual design standards. Even if CSS were not enough, XHTML documents can be transformed into any format you want via XSL.

Point of contention: HTML isn't rigorous enough and has all sorts of lame elements like <font>.

Response: Most of the lameness was weeded out and/or deprecated in HTML 4. For best results, ban deprecated elements and use one of these doctypes: HTML 4.01 strict, XHTML 1.0 strict, or XHTML1.1.

Point of contention: HTML still isn't semantic. <p>, <h1>, <ul>, etc. have no real meaning, and elements like <br> and <hr> are especially presentational.

Response: First of all this confuses semantic/non-semantic with specific/generic. HTML is an ultra-generic language by design, which is a good thing. It's what makes HTML flexible and useful in so many different contexts. This is hard for a traditional XML programmer to wrap his head around, because he's used to specific elements like <recipe> and <ingredient>. Second, many in the W3C contend that <br> and <hr> were mistakes. Others maintain that they're useful in a structural document, and even have semantic meaning. In the design of XHTML 2 (the next version of XHTML, still in draft) <hr> has been replaced by the more semantic <separator> element. <br> is thought by many to have semantic meaning and is included in the draft, but the <l> element has also been introduced to represent a line of poetry, for example, and thus alleviate the need for <br>. Similar debates also swirl around <b>, <i> and <u>. In any case you're free to ignore elements you don't like.

Point of contention: We can't use HTML because we need proprietary hooks in our document schema, or constructs such as XLink.

Response: That's what namespaces are for. You can import the HTML namespace into a document, our you can import your own proprietary schema into your HTML docs. With XSL, it's easy and efficient to operate on this type of document to produce pure HTML for web delivery. Furthermore, The latest version of XHTML (1.1) is "modularized" so that it's possible to import pertinent modules of HTML into your schema.

Point of contention: But it wouldn't make sense for the canonical version of a document to be the same document we send to browsers, would it? It can't be served without being transformed and massaged on the server side, can it?

Response: Why, because software engineers fear simplicity? Okay, maybe your documents need server-side treatment such as adding company headings, lists of global company links, etc., before they're sent to a customer's web browser. The point is that much of the complexity of dealing with and maintaining HTML can be factored out of existence by simply updating your understanding of the language and pushing the HTML schema further back into your data model.


My name is Greg Reimer and I'm a web technologist for the Sun.COM web design team.


« February 2017