Tuesday Jan 02, 2007

How this Blog Design was Created

I decided to go the total web-standards-nerd route in creating this blog design. Semantic HTML, source-order, accessibility, and unobtrusive JavaScript were all factors from the outset.

The first thing I did was to customize the "Basic" theme. This gave me a starting point. I removed all presentational information, starting with the CSS code. Then I went through the HTML and removed presentational hints. For example, layout tables, <center> tags, &nbsp; characters, etc. Even things like class="left-bar" had to go. "Left" is too presentational, and I relaced it with "sidebar," since it might be argued that "sidebar" is a semantic hint. (Yes, that's how much of a nerd I am.) Fortunately there wasn't much presentation baked into the HTML to start with, because most Roller themes are pretty good about separating content and style.

Next to go was JavaScript. I had just removed the presentational layer, and now it was time to remove the behavioral layer. These layers would be added back in later via external files, once the HTML-only blog was proved working.

Wherever possible, I made the HTML as semantic as possible; selecting and organizing tags elements by meaning and structure. Source-order sensibilities were used, placing more important things higher in the source code. Headings/sections were arranged such that if you removed all but the headings (<h1> - <h6>) you'd get an outline of the page. <p> was used to contain all logical fragments of text. <div> was used to delimit sections. <ul> was used to build lists. It turns out, semantic HTML is really pretty brainless and straightforward, once you yank out presentational noise.

Once the blog was purified, it was boring-looking, but usable. No columns, just black Times New Roman text that strung all the way across the browser screen on a white background, with blue hyperlinks. It was nekkid! Time for some clothes. An empty CSS file was <link>ed to the blog, which I began populating, resulting in the current design.

As you may know, it's easy to do CSS design in Firefox, Opera and Safari, all of which understand and render CSS beautifully. But then you inevitably have to do battle with IE. Most designers approach the IE problem by peppering their CSS file with hacks. Hacks are oddly structured CSS commands that, due to IE bugs, either can't be understood by IE or can only be understood by IE. Instead of doing this, I used conditional comments to load corrective stylesheets into just IE. Conditional comments are an IE-only feature that look like regular HTML <!-- --> comments to real HTML parsers, but have special meaning to IE. This way, I was able to quarantine IE-specific bigfixes to a couple small, IE-specific files.

Wednesday Dec 14, 2005

On the Importance of URL Canonicity

When designing a linking schema between their pages, web developers often fail to take into account the importance of URL canonicity. For example, when implementing a tabbed navigation design, it's tempting to switch the tab based on a querystring such as page.jsp?tab=1. Similarly, a CMS might use a URL like showArticle.jsp?article=123456. Also, most webservers accept two versions of a directory index URL, for example /foo/ and /foo/index.jsp. Finally, It's common for a website to exist at both example.com and www.example.com.

All of these URLs create canonical issues, or ambiguity, regarding where exactly a piece of content lives. You might ask why there should be ambiguity when a given URL always gets you the same piece of content? In other words, if both /foo/ and /foo/index.jsp return the same content, what's the big deal? Well for starters there's the extra performance hit on your webserver because the browser cached one version and not the other. In fact, any caching system is impacted by non-canonical URLs.

Another problem with non-canonical URLs is in web analytics. Two versions of a URL might really be the same page, but a web analytics package doesn't know this unless you hard code a special rule about it somewhere. The fact is that, canonical or not, the world treats URLs as canonical, and so broken behavior results when they are not.

But what about querystring driven URL schemes? Isn't showArticle.jsp?article=123456 always the same? Strictly speaking, yes, so why should querystrings be bad for canonicity? When there are multiple variables inside a querystring, order generally doesn't matter. foo.jsp?cid=123&uid=456 is the same as foo.jsp?uid=456&cid=123, so in this sense canonicity is broken. But even if you take pains to ensure querystring values are always ordered consistently, the world at large doesn't know this. Querystring driven URLs are treated as ambiguous. For example, Google isn't as quick to index a querystring driven URL as it would be to index a URL that looks canonical.

Fortunately, options exist to make URLs more canonical. Most webservers have URL rewriting capability that can redirect to a given example.com URL from the corresponding www.example.com URL, or vice versa. And most webservers can be similarly configured with regard to the /foo/ vs. /foo/index.html issue.

For web applications where pages are dynamically assembled, it's possible to use path info instead of querystrings. Consider the following URLs:


Here, "articles" serves the same role as "showArticle.jsp." Correspondingly, "?article=123456" and "/123456.jsp" serve as the pointer to the content piece. The implementation of this is beyond the scope of this post, suffice it to say that the capability is built into most web application environments.

Finally, here are a couple of related links:


My name is Greg Reimer and I'm a web technologist for the Sun.COM web design team.


« July 2016