RDFa parser for Sesame

RDFa is the microformat-inspired standard for embedding semantic web relations directly into (X)HTML. It is being used more and more widely, and we are starting to have foaf+ssl annotated web pages, such as Alexandre Passant's home page. This is forcing me to update my foaf+ssl Identity Provider to support RDFa.

The problem was that I have been using Sesame as my semweb toolkit, and there is currently was no RDFa parser for it. Luckily I found out that Damian Steer (aka. Shellac) had written a SAX bases rdfa parser for the HP Jena toolkit, which he had put up on the java-rdfa github server. With a bit of help from Damian and the Sesame team, I adapted the code to sesame, create a git fork of the initial project, and uploaded the changes on the bblfish java-rdfa git clone. Currently all but three of the 106 tests pass without problem.

To try this out get git, Linus Torvalds' distributed version control system (read the book), and on a unix system run:

$ git clone  git://github.com/bblfish/java-rdfa.git

This will download the whole history of changes of this project, so you will be able to see how I moved from Shellac's code to the Sesame rdfa parser. You can then parse Alex's home page, by running the following on the command line (thanks a lot to Sands Fish for the Maven tip in his comment to this blog):

$ mvn  exec:java -Dexec.mainClass="rdfa.parse" -Dexec.args="http://apassant.net/"

[snip output of sesame-java-rdfa compilation]

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.geonames.org/ontology/> .
@prefix rel: <http://purl.org/vocab/relationship/> .
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix rsa: <http://www.w3.org/ns/auth/rsa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


<http://apassant.net/> <http://www.w3.org/1999/xhtml/vocab#icon> <http://apassant.net/misc/favicon.ico> ;
        <http://www.w3.org/1999/xhtml/vocab#stylesheet> <http://apassant.net/sites/apassant.net/files/css/css_84042a598208a6aade8783e8c2937a8c.css> , 
                     <http://apassant.net/sites/apassant.net/files/css/css_ba2732162a421c6422a6f5a68742254e.css> .

<http://apassant.net/#id> rdfs:label "About"@en .

<http://apassant.net/alex> a foaf:Person ;
        foaf:name "Alexandre Passant"@en ;
        foaf:workplaceHomepage <http://deri.ie> , 
                               <http://nuigalway.ie> ;
        foaf:schoolHomepage <http://paris-sorbonne.fr> , 
                            <http://dauphine.fr> ;
        foaf:topic_interest <http://dbpedia.org/page/Social_software_%28computer_software%29> ,
                            <http://dbpedia.org/resource/Semantic_Web> ;
        foaf:currentProject <http://www.w3.org/2009/sparql/wiki/> , 
                <http://www.w3.org/2005/Incubator/socialweb/> ;
        <http://purl.org/vocab/bio/0.1/olb> """
\\nDr. Alexandre Passant is a postdoctoral researcher at the Digital Enterprise Research Institute, National University
of Ireland, Galway. His research activities focus around the Semantic Web and Social Software: in particular, how these
fields can interact with and benefit from each other in order to provide a socially-enabled machine-readable Web,
leading to new services and paradigms for end-users. Prior to joining DERI, he was a PhD student at Université 
Paris-Sorbonne and carried out applied research work on \\"Semantic Web technologies for Enterprise 2.0\\" at
Electricité De France. He is the co-author of SIOC, a model to represent the activities of online communities on the
Semantic Web, the author of MOAT, a framework to let people tag their content using Semantic Web technologies, and
is also involved in various related applications as well as standardization activities.\\n"""@en ;
        foaf:based_near <http://dbpedia.org/resource/Galway> ;
        geo:locatedIn <http://dbpedia.org/resource/Galway> ;
        rel:spouseOf <http://julie.letierce.net/#id> ;
        foaf:holdsAccount <http://www.flickr.com/people/terraces/> ,
                          <http://www.linkedin.com/pub/alexandre-passant/1/797/1ab> ,
                          <http://last.fm/user/terraces> , 
                          <http://slideshare.net/terraces> , 
                          <http://twitter.com/terraces> .

<http://apassant.net/#cert> a rsa:RSAPublicKey ;
        cert:identity <http://apassant.net/alex> .

_:node14efunnjjx1 cert:decimal "65537"@en .

<http://apassant.net/#cert> rsa:public_exponent _:node14efunnjjx1 .

_:node14efunnjjx2 cert:hex "8af4cb6d6ec004bd28c08d37f63301a3e63ddfb812475c679cf073c4dc7328bd20dadb9654d4fa588f155ca05e7ca61a6898fbace156edb650d2109ecee65e7f93a2a26b3928d3b97feeb7aa062e3767f4fadfcf169a223f4a621583a7f6fd8992f65ef1d17bc42392f2d6831993c49187e8bdba42e5e9a018328de026813a9f"@en .

<http://apassant.net/#cert> rsa:modulus _:node14efunnjjx2 .

[snip]

This graph can then be queried with SPARQL, merged with other graphs, and just as it links to other resources, those can in turn link back to it, and to elements defined therein. As a result Alexandre Passant can then use this in combination with an appropriate X509 certificate to log into foaf+ssl enabled web sites in one click, without needing to either remember a password or a URL.

Comments:

Just registered at Alexandre's site but given that that doesn't let me comment immediately I thought I'd mention the point I was going to make here instead as it's nearly as relevant.

Actually, his site doesn't 303 redirect his "WebId" to his home page as one might expect; it 301 Moved Permanently redirects. This seems to me to say Alexandre http://apassant.net/alex owl:sameAs his homepage http://apassant.net/ . He might not be a number but I don't think he wants to be a bunch of tag soup either. Does this mess things up in any way?

Posted by Ed Davies on September 09, 2009 at 03:15 PM CEST #

Hi Ed,

Thanks for the comment there was indeed something broken in my apache config file, it's now fixed to a proper 303.
The previous one indeed messed up things ...

Posted by Alex. on September 09, 2009 at 04:17 PM CEST #

As Alex says, his site is now fixed to 303 rather than 301.

My question still stands in general, though. Does this distinction actually make any difference to any known RDF software and Sesame in particular?

Posted by Ed Davies on September 09, 2009 at 05:57 PM CEST #

Great news for RDFa, and great news for me, because I just started to look at Sesame.

One minor nit-pick though -- RDFa was not inspired by Microformats. The first draft of RDF/XHTML was in 2003, and the first public presentation on it was 2004.

I don't mind the association -- I think Microformats is a great idea. :)

But RDFa always had as its goal support for \*all\* RDF features in HTML pages, which is very different to the goals of Microformats.

All the best,

Mark

Posted by Mark Birbeck on September 10, 2009 at 04:40 AM CEST #

What exactly do you want to do with Maven here?

Posted by Sands Fish on September 10, 2009 at 11:14 AM CEST #

Hi Sands,

I'd like to have a maven task that runs the rdfa.parse main method, and that allows the user to set command line arguments. Then someone could just download the code, and run something like

mvn parse http://apassant.net/

and get the triples from it. That would make it immediately useful. Perhaps this would be better done with an ant task that calls maven first to build the project?

Another thing I'd like to do is publish the jars on a maven repository. I was thinking of publishing it on

https://maven-repository.dev.java.net/

But to be able to publish it there I think it has to also be on the dev.java.net repository. I could create a space on https://sommer.dev.java.net I suppose for it. Perhaps there is a better place to publish it though. I suppose I would have to rename the packages to avoid collisions with Shellac's rdfa library for Jena.

Posted by Henry Story on September 10, 2009 at 11:23 AM CEST #

Hi Henry,

Running this quickly from mvn at the command-line should be pretty easy.

Something to the effect of:

mvn -e exec:java -Dexec.mainClass="rdfa.parse" -Dexec.args="http://my/rdfa/url"

(The -e is optional and turns on the verbose reporting of errors.)

As far as publishing Maven artifacts, I've never used the java.net system, so I'm not sure what challenges you'd have there, but you will need unique Maven coordinates. Currently, yours look like:

<groupId>net.rootdev</groupId>
<artifactId>java-rdfa</artifactId>
<packaging>jar</packaging>
<version>0.3-SNAPSHOT</version>

If http://rootdev.net is yours, you should be all set, as you typically publish with whatever groupId makes sense to identify your work using a reverse domain.

Good luck!

Posted by Sands Fish on September 10, 2009 at 11:46 AM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today