Syndication, Aggregation & Protocols
By arungupta on Dec 20, 2006
Lets begin with English meaning of the terms "syndication" and "aggregator" first.
Syndication means: "The act of syndicating a news feature by publishing it in multiple newspapers etc simultaneously"
Aggregator means: "An online feed reader, generally used for RSS or Atom feeds to keep track of updates to blogs, news sources, and other websites"
Any content over the web, that changes frequently or at irregular intervals, needs a mechanism to inform it's audience about the updates. RSS and Atom are XML formats designed to generate "syndicated feeds" to publish such frequently updated content. Each feed contain details about the title, a short summary, link to the detailed entry and metadata. This content could be either the entire website or, more interestingly, just a specific section of the website targeted towards an audience. The audience of the content uses "feed aggregator" to fetch the feeds, organize the results, and read the contents. is the standard way to identify syndicated content. The XML format defined by RSS and Atom is really simple leading to it's exponential growth (also 1, 2, 3) in the recent years.
A detailed history of how RSS evolved over multiple versions, in the past 7 years, is available here. A concise history, with a tabular difference of different RSS versions, is available here. RSS 2.0 is the most feature rich version and stands for "Really Simple Syndication". It defines an XML format to publish frequently updated content of your website. An non technical introduction to RSS explains how RSS feed is generated. For example, an RSS feed to my blog is given below. This feed cannot be directly viewed in the browser (both Firefox 1.5.x+ or IE6) as they both have a default stylesheet that displays it nicely formatted in HTML. The XML data (as shown below) behind the feed can be viewed using "View Source" option on the page.
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="http://blogs.sun.com/roller-ui/styles/rss.xsl" media="screen"?> <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1
.1/" xmlns:atom="http://www.w3.org/2005/Atom" > <channel> <title>Miles to go ...</title> <link>http://blogs.sun.com/arungupta/</link> <atom:link rel="self" type="application/rss+xml" href="$url.feed.entries.rss( $model.categoryPath, $model.excerpts)" /> <description>Arun Gupta's Weblog</description> <language>en-us</language> <copyright>Copyright 2006</copyright> <lastBuildDate>Tue, 19 Dec 2006 11:10:03 -0800</lastBuildDate> <generator>Apache Roller (incubating) 3.2-dev(20061208101134:ag92114)< /generator> <item> <guid isPermaLink="true">http://blogs.sun.com/arungupta/entry/running_san_francisco _marathon_2007</guid> <title>Running San Francisco Marathon 2007</title> <dc:creator>Arun Gupta</dc:creator> <link>http://blogs.sun.com/arungupta/entry/running_san_francisco _marathon_2007</link> <pubDate>Tue, 19 Dec 2006 10:34:09 -0800</pubDate> <category>Running</category> <category>marathon</category> <category>running</category> <description>As if one marathon was not
During the history of RSS, there were multiple versions (0.90, 0.91, 0.92, 0.93, 1.0, and 2.0) all of which had shortcomings and multiple incompatibilties. To overcome the political (different camps own these versions and claiming to be correct) and technical difficulties, Atom syndication format was published as an IETF "proposed standard" (IETF terminlogy defined by RFC 2026) in RFC 2487. Like RSS, Atom also defines an XML format to public frequently updated content of your website. For example, an Atom feed to my blog (viewed using "View Source" option) looks like:
<?xml version="1.0" encoding='utf-8'?> <?xml-stylesheet type="text/xsl" href="http://blogs.sun.com/roller-ui
/styles/atom.xsl" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom"> <title type="html">Miles to go ...</title> <subtitle type="html">Arun Gupta's Weblog</subtitle> <id>http://blogs.sun.com/arungupta/feed/entries/atom</id> <link rel="self" type="application/atom+xml" href="$url.feed.entries.atom($model.categoryPath, $model.excerpts)" /> <link rel="alternate" type="text/html" href="http://blogs.sun.com/arungupta /" /> <updated>2006-12-19T11:10:03-08:00</updated> <generator uri="http://rollerweblogger.org" version="3.2-dev(20061208101134:ag92114)" >Apache Roller (incubating)</generator> <entry> <id>http://blogs.sun.com/arungupta/entry/running_san_francisco _marathon_2007</id> <title type="html">Running San Francisco Marathon 2007</title> <author><name>Arun Gupta</name></author> <link rel="alternate" type="text/html" href="http://blogs.sun.com/arungupta /entry/running_san_francisco"/> <published>2006-12-19T10:34:09-08:00</published> <updated>2006-12-19T10:34:09-08:00</updated> <category term="/Running" label="Running" /> <category term="marathon" scheme="http://rollerweblogger.org/ns _marathon_2007 /tags/" /> <category term="running" scheme="http://rollerweblogger.org/ns/tags/" /> <content type="html">As if one marathon was not
A comprehensive comparison of Atom 1.0 and RSS 2.0 highlights the differences between two formats. The key difference between the two formats is given below:
- The biggest complaint about RSS is that the format is "lossy" and does not preserve the type of data. Atom maintains the type and therefore allows wider variety of payloads.
- RSS 2.0 specification is copyrighted and frozen. Atom 1.0 is published as a "proposed standard" in IETF and is extensible.
- RSS 2.0 feeds cannot be auto-discovered. Atom feeds can be auto-discovered
using IANA-registered MIME type
- RSS 2.0 supports no schema. Atom 1.0 includes RelaxNG schema that allows checking for validity of data.
As evident, Atom has some significant advantages over RSS 2.0 and is now more commonly used. For example weblogs.java.net (based on Moveable Type) and blogs.sun.com (based on Roller) both offer Atom 1.0 feeds. Bloglines, the most popular web-based aggregator, supports all the RSS and Atom formats. A known list of Atom 1.0 consumers and Atom 1.0 Feeds shows the growing adoption of Atom 1.0.
In my next blog, I'll talk about Rome and how it makes it easy to work in Java with most syndication formats.