RSS: One stylesheet to rule them all?

Bloglines and RSS parsing

As you probably know I've been exploring Bloglines services to easily build just another RSS reader. The fact is that I have found no services to handle channels. This is, you cannot programmatically add/remove/update channels. Too bad. :-(

So, unless I'm completely wrong, we'll have to parse RSS ourselves.

First approach: Rome

The very first approach is, of course, Rome, Sun's open source project for RSS parsing.

But, to be honest, Rome is too big for my liking, too powerful for my modest needs. It depends on JDOM (which, luckily, seems to depend on nothing else). Rome is very good to parse in detail all sort of RSS flavors. All I want it to have a least common denominator of different RSS flavors. RSS 0.91, RSS 0.92, RSS 1.0, RSS 2.0 and Atom 1.0 would be good enough, I think.

Furthermore, I'd like to do it myself so as to exercise Java XML APIs. After all I need to keep up with those things, it's some sort of hobby of mine ;-).

SAX Parsing: the cost of speed

So my very first approach was to go for speed, using SAX parsers. I've built a SAX handler that is chained with other different SAX handlers for every different flavor of RSS.

And, to be honest, the thing works quite fast. SAX parsers consume little memory and run really fast.

But after all the experiment those SAX handlers looked to me quite similar to AWK scripts. This is: they're difficult to maintain. I don't think I'll be able to modify those RSS handlers myself within, say, one month from now.

XSL to the rescue?

So I thought, why not use an XSLT stylesheet to extract information from those different RSS flavors (RSS 0.91, 0.92, 1.0, 2.0, Atom 1.0) and transform those different flavors in a least common XML file? Then with a single SAX parser I'd be able to parse all sort of RSS flavor.

And, since all transformation is handled in a XML file (the XSLT stylesheet) it should be easier to maintain. Of course there's a cost for this ease of mainteinance: we're loosing speed.

I'm sort of tired now (it's difficult to write coherently, sorry by that), but I think a small diagram could explain things better. There we are:

I've been doing some tests and, well, there's some lack of speed. But I think it's worth the effort. Simpler, smaller, easier to maintain, no dependencies... I like it.

What do you think? Any suggestions? Any way to improve the XSLT stylesheet? (By the way, the XSLT stylesheet can be found here).

Cheers,
Antonio

Comentarios:

I like it. It follows the KISS principle. Is this the RSS Reader which is going to use the EventBus as we talked about previously?

Enviado por codecraig en diciembre 20, 2005 a las 05:44 AM CET #

Yes!!

The fact is that December is a bad time to progress any project. Everybody wants to finish theirs and we're overloaded here!

So, as time and Santa permit, I'll try to advance the project. No hurries, though.

Enviado por Antonio en diciembre 21, 2005 a las 03:12 AM CET #

Enviar un comentario:
Los comentarios han sido deshabilitados.
About

swinger

Search

Archives
« julio 2014
lunmarmiéjueviesábdom
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
   
       
Hoy