Syndication, Aggregation & Protocols

In my previous Web 2.0-related blogs, I talked about What is Web 2.0 ?, What is AJAX ? and AJAX: jMaki Framework. Switching gears, this blog will talk about another technology that enable the principles of Web 2.0, i.e. RSS/Atom.

Lets begin with English meaning of the terms "syndication" and "aggregator" first. 

Syndication means:  "The act of syndicating a news feature by publishing it in multiple newspapers etc simultaneously"

Aggregator means: "An online feed reader, generally used for RSS or Atom feeds to keep track of updates to blogs, news sources, and other websites"

Any content over the web, that changes frequently or at irregular intervals, needs a mechanism to inform it's audience about the updates. RSS and Atom are XML formats designed to generate "syndicated feeds" to publish such frequently updated content. Each feed contain details about the title, a short summary, link to the detailed entry and metadata. This content could be either the entire website or, more interestingly, just a specific section of the website targeted towards an audience. The audience of the content uses "feed aggregator" to fetch the feeds, organize the results, and read the contents. is the standard way to identify syndicated content. The XML format defined by RSS and Atom is really simple leading to it's exponential growth (also 1, 2, 3)  in the recent years.

A detailed history of how RSS evolved over multiple versions, in the past 7 years, is available here. A concise history, with a tabular difference of different RSS versions, is available here. RSS 2.0 is the most feature rich version and stands for "Really Simple Syndication". It defines an XML format to publish frequently updated content of your website. An non technical introduction to RSS explains how RSS feed is generated. For example, an RSS feed to my blog is given below. This feed cannot be directly viewed in the browser (both Firefox 1.5.x+ or IE6) as they both have a default stylesheet that displays it nicely formatted in HTML. The XML data (as shown below) behind the feed can be viewed using "View Source" option on the page. 

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="http://blogs.sun.com/roller-ui/styles/rss.xsl" media="screen"?>
<rss version="2.0" 
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
 <title>Miles to go ...</title>
 <link>http://blogs.sun.com/arungupta/</link>
 <atom:link rel="self" type="application/rss+xml" href="$url.feed.entries.rss($model.categoryPath, $model.excerpts)" />
 <description>Arun Gupta&apos;s Weblog</description>
 <language>en-us</language>
 <copyright>Copyright 2006</copyright>
 <lastBuildDate>Tue, 19 Dec 2006 11:10:03 -0800</lastBuildDate>
 <generator>Apache Roller (incubating) 3.2-dev(20061208101134:ag92114)</generator>
 <item>
   <guid isPermaLink="true">http://blogs.sun.com/arungupta/entry/running_san_francisco_marathon_2007</guid>
   <title>Running San Francisco Marathon 2007</title>
   <dc:creator>Arun Gupta</dc:creator>
   <link>http://blogs.sun.com/arungupta/entry/running_san_francisco_marathon_2007</link>
   <pubDate>Tue, 19 Dec 2006 10:34:09 -0800</pubDate>
   <category>Running</category>
   <category>marathon</category>
   <category>running</category>
     <description>As if one marathon was not

During the history of RSS, there were multiple versions (0.90, 0.91, 0.92, 0.93, 1.0, and 2.0) all of which had shortcomings and multiple incompatibilties. To overcome the political (different camps own these versions and claiming to be correct) and technical difficulties, Atom syndication format was published as an IETF "proposed standard" (IETF terminlogy defined by RFC 2026) in RFC 2487.  Like RSS, Atom also defines an XML format to public frequently updated content of your website. For example, an Atom feed to my blog (viewed using "View Source" option) looks like:

<?xml version="1.0" encoding='utf-8'?>
<?xml-stylesheet type="text/xsl" href="http://blogs.sun.com/roller-ui/styles/atom.xsl" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom">
   <title type="html">Miles to go ...</title>
   <subtitle type="html">Arun Gupta&apos;s Weblog</subtitle>
   <id>http://blogs.sun.com/arungupta/feed/entries/atom</id>
   <link rel="self" type="application/atom+xml" href="$url.feed.entries.atom($model.categoryPath, $model.excerpts)" />
   <link rel="alternate" type="text/html" href="http://blogs.sun.com/arungupta/" />
   <updated>2006-12-19T11:10:03-08:00</updated>
   <generator uri="http://rollerweblogger.org" version="3.2-dev(20061208101134:ag92114)">Apache Roller (incubating)</generator>
    <entry>
       <id>http://blogs.sun.com/arungupta/entry/running_san_francisco_marathon_2007</id>
       <title type="html">Running San Francisco Marathon 2007</title>
       <author><name>Arun Gupta</name></author>
       <link rel="alternate" type="text/html" href="http://blogs.sun.com/arungupta/entry/running_san_francisco_marathon_2007"/>
       <published>2006-12-19T10:34:09-08:00</published>
       <updated>2006-12-19T10:34:09-08:00</updated>
       <category term="/Running" label="Running" />
       <category term="marathon" scheme="http://rollerweblogger.org/ns/tags/" />
       <category term="running" scheme="http://rollerweblogger.org/ns/tags/" />
       <content type="html">As if one marathon was not

A comprehensive comparison of Atom 1.0 and RSS 2.0 highlights the differences between two formats. The key difference between the two formats is given below:

  • The biggest complaint about RSS is that the format is "lossy" and does not preserve the type of data. Atom maintains the type and therefore allows wider variety of payloads.
  • RSS 2.0 specification is copyrighted and frozen. Atom 1.0 is published as a "proposed standard" in IETF and is extensible.
  • RSS 2.0 feeds cannot be auto-discovered. Atom feeds can be auto-discovered using IANA-registered MIME type application/atom+xml.
  • RSS 2.0 supports no schema. Atom 1.0 includes RelaxNG schema that allows checking for validity of data.

As evident, Atom has some significant advantages over RSS 2.0 and is now more commonly used. For example weblogs.java.net (based on Moveable Type) and blogs.sun.com (based on Roller) both offer Atom 1.0 feeds. Bloglines, the most popular web-based aggregator, supports all the RSS and Atom formats. A known list of Atom 1.0 consumers and  Atom 1.0 Feeds shows the growing adoption of Atom 1.0.

Blogging, news content syndication, podcasting are the most common usage of syndication/aggregation. 

In my next blog, I'll talk about Rome and how it makes it easy to work in Java with most syndication formats.

Technorati: Blogging Syndication Aggregation Feeds RSS Atom Web 2.0 Technology

Comments:

I thought that my php-generated ATOM and RSS feeds were perfect and then today the RSS feed failed to validate because I didn't have an atom:link self reference.

Where the heck did that come from? Who knows. And where was the documentation that would tell me what to do? Nowhere ...

until I stumbled on this blog with its clear example of including the atom namespace.

Thank you

Posted by George Fisher on October 15, 2007 at 08:24 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

profile image
Arun Gupta is a technology enthusiast, a passionate runner, author, and a community guy who works for Oracle Corp.


Java EE 7 Samples

Stay Connected

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today