A SAX Parser Based on JavaScript's String.replace() Method?

I've often wished browsers would offer native SAX implementations. SAX is lightweight and fast. Not only that, SAX is easy because it lets you ignore what's not interesting, unlike DOM, where you have to traverse the whole mess and keep it hanging around in memory. SAX also uses callback functions, which any JavaScript programmer should feel comfortable with.

A native SAX implementation in JavaScript would for example let you grab data from RSS feeds over Ajax without loading the entire RSS document into a DOM tree. Or, assuming your XHTML was well-formed, it would let you rapidly query the current document. (Although it wouldn't be able to return references to existing DOM nodes.)

After reading Search and Don't Replace over at John Resig's blog, it got me wondering if you could use that technique as the basis for a SAX parser in JavaScript. Of course there's nothing stopping you from building a SAX parser from scratch in JavaScript, but (methinks) the string tokenizer part of it would be a bit of a beast. However, by taking advantage of the optimization built into JavaScript's RegExp replacement engine, you might just be able to work a nice little souped-up tokenizing engine out of the deal.

So I thought I'd give it a try. What I came up with is nowhere near anything resembling a real-world, valid XML parser. All it knows how to deal with are elements, text nodes and character entities. And not all of the error messages are as helpful as a real world implementation should be. And I'm sure there are plenty of bugs since I banged this out in less than an afternoon. But it ran like scalded cats on a 422kb file. (YMMV depending on what browser you use.) Try it out on this simple XML fragment:

<script type="text/javascript" src="http://blogs.sun.com/greimer/resource/sax.js"></script> <script type="text/javascript"> String.prototype.strip=function(){return this.replace(/\^\\s+|\\s+$/g, "");}; String.prototype.normalize=function(sp){ sp=(!sp && sp!=='')?' ':sp; return this.strip().replace(/\\s+/g,sp); }; function doStartTag(name){alert("opening tag: "+name);} function doEndTag(name){alert("closing tag: "+name);} function doAttribute(name,val){alert("attribute: "+name+'="'+val+'"');} function doText(str){str=str.normalize();if(!str){str='[whitespace]';}alert("encountered text node: "+str);} </script>

The SAX function looks like this:

doSax(stringToParse,doStartTag,doEndTag,doAttribute,doText);

The callback functions for this example are:

function doStartTag(name){alert("opening tag: "+name);}
function doEndTag(name){alert("closing tag: "+name);}
function doAttribute(name,val){alert("attribute: "+name+'="'+val+'"');}
function doText(str){
    str=str.normalize();
    if(!str){str='[whitespace]';}
    alert("encountered text node: "+str);
}

And here's the code: sax.js. I think that with a little work (i.e. the ability to handle namespaces, comments, and other declarations) this could potentially be usable--maybe not as a full-fledged SAX parser--but a quick and dirty utility for reading XML via Ajax. Hmm, a tag soup SAX-style parser might be nice to have too.

Comments:

Nice job!

Posted by Jose on April 06, 2008 at 11:55 PM MDT #

Post a Comment:
Comments are closed for this entry.
About

My name is Greg Reimer and I'm a web technologist for the Sun.COM web design team.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today