ODI - Tip of the day, processing RSS from the web

ODI can easily process RSS data from the filesystem or web...the ODI XML technology has a driver whose file specification includes support for filesystem, URL and ftp. So processing RSS feeds is very simple. The reverse engineer can also generate a DTD based on the XML, so you don't even need a DTD or XML schema for the input stream. Converting JSON to XML and reversing is a very simple way to reverse a schema for information such as JSON, you can then enrich the model in ODI. In the URL below which I defined for an XML dataserver, I specify a BBC news RSS feed, I can then easily reverse engineer the schema into ODI.

  • jdbc:snps:xml?f=http://feeds.bbci.co.uk/news/rss.xml&s=BBC&dtd=d:/bbcrss.dtd

When you use the selective reverse engineer you can see the available datastores in the model generated, its very simple;

To be able to access this URL from within the ODI studio I did define my web proxy and URL - from within the ODI Studio UI you do this in the Tools->Preferences Web Browser and Proxy tab, I defined the HTTP proxy and port number to use.

Once reverse engineered you can view the model in an ODI diagram and see the relationships and objects reversed. You can also get to work and build interface sand view the data from the feed. For example, the ITEM datastore has the meat of the RSS news feed, here is a snippet of the content right now;

 Unlike file based XML, the HTTP based ones are read only, but let you easily integrate XML oriented datastreams such as RSS very easily.

Comments:

For the RSS XML example below, use the DTD from below rather than relying on ODI to generate the DTD.

Save the DTD below to your filesystem and use it from your URL;
http://emacspeak.googlecode.com/svn/trunk/html/rss-0.91.dtd

Your url should be something like referencing the DTD;
jdbc:snps:xml?f=http://feeds.bbci.co.uk/news/rss.xml&s=BBCT&dtd=d:/rss-0.91.dtd

Cheers
David

Posted by David on March 15, 2013 at 09:33 AM PDT #

This only consider rss any normal xml source isn't it? Is it possible to stream continuously/incrementally? Shouldn't RSS for ETL/LT be like a pub/sub journalization approach based on time?

Posted by guest on March 19, 2013 at 12:26 AM PDT #

Reading from an HTTP stream is a little known fact I thought it worthwhile sharing, had user asking about such on forum. RSS can be consumed in batch via this route its similar to reading from an Oracle AQ table in batch rather than a real time consumer. OWB provides both approaches for AQ.

Cheers
David

Posted by David on March 19, 2013 at 08:31 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today