Wednesday Sep 09, 2009

RDFa parser for Sesame

RDFa is the microformat-inspired standard for embedding semantic web relations directly into (X)HTML. It is being used more and more widely, and we are starting to have foaf+ssl annotated web pages, such as Alexandre Passant's home page. This is forcing me to update my foaf+ssl Identity Provider to support RDFa.

The problem was that I have been using Sesame as my semweb toolkit, and there is currently was no RDFa parser for it. Luckily I found out that Damian Steer (aka. Shellac) had written a SAX bases rdfa parser for the HP Jena toolkit, which he had put up on the java-rdfa github server. With a bit of help from Damian and the Sesame team, I adapted the code to sesame, create a git fork of the initial project, and uploaded the changes on the bblfish java-rdfa git clone. Currently all but three of the 106 tests pass without problem.

To try this out get git, Linus Torvalds' distributed version control system (read the book), and on a unix system run:

$ git clone  git://github.com/bblfish/java-rdfa.git

This will download the whole history of changes of this project, so you will be able to see how I moved from Shellac's code to the Sesame rdfa parser. You can then parse Alex's home page, by running the following on the command line (thanks a lot to Sands Fish for the Maven tip in his comment to this blog):

$ mvn  exec:java -Dexec.mainClass="rdfa.parse" -Dexec.args="http://apassant.net/"

[snip output of sesame-java-rdfa compilation]

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.geonames.org/ontology/> .
@prefix rel: <http://purl.org/vocab/relationship/> .
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix rsa: <http://www.w3.org/ns/auth/rsa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


<http://apassant.net/> <http://www.w3.org/1999/xhtml/vocab#icon> <http://apassant.net/misc/favicon.ico> ;
        <http://www.w3.org/1999/xhtml/vocab#stylesheet> <http://apassant.net/sites/apassant.net/files/css/css_84042a598208a6aade8783e8c2937a8c.css> , 
                     <http://apassant.net/sites/apassant.net/files/css/css_ba2732162a421c6422a6f5a68742254e.css> .

<http://apassant.net/#id> rdfs:label "About"@en .

<http://apassant.net/alex> a foaf:Person ;
        foaf:name "Alexandre Passant"@en ;
        foaf:workplaceHomepage <http://deri.ie> , 
                               <http://nuigalway.ie> ;
        foaf:schoolHomepage <http://paris-sorbonne.fr> , 
                            <http://dauphine.fr> ;
        foaf:topic_interest <http://dbpedia.org/page/Social_software_%28computer_software%29> ,
                            <http://dbpedia.org/resource/Semantic_Web> ;
        foaf:currentProject <http://www.w3.org/2009/sparql/wiki/> , 
                <http://www.w3.org/2005/Incubator/socialweb/> ;
        <http://purl.org/vocab/bio/0.1/olb> """
\\nDr. Alexandre Passant is a postdoctoral researcher at the Digital Enterprise Research Institute, National University
of Ireland, Galway. His research activities focus around the Semantic Web and Social Software: in particular, how these
fields can interact with and benefit from each other in order to provide a socially-enabled machine-readable Web,
leading to new services and paradigms for end-users. Prior to joining DERI, he was a PhD student at Université 
Paris-Sorbonne and carried out applied research work on \\"Semantic Web technologies for Enterprise 2.0\\" at
Electricité De France. He is the co-author of SIOC, a model to represent the activities of online communities on the
Semantic Web, the author of MOAT, a framework to let people tag their content using Semantic Web technologies, and
is also involved in various related applications as well as standardization activities.\\n"""@en ;
        foaf:based_near <http://dbpedia.org/resource/Galway> ;
        geo:locatedIn <http://dbpedia.org/resource/Galway> ;
        rel:spouseOf <http://julie.letierce.net/#id> ;
        foaf:holdsAccount <http://www.flickr.com/people/terraces/> ,
                          <http://www.linkedin.com/pub/alexandre-passant/1/797/1ab> ,
                          <http://last.fm/user/terraces> , 
                          <http://slideshare.net/terraces> , 
                          <http://twitter.com/terraces> .

<http://apassant.net/#cert> a rsa:RSAPublicKey ;
        cert:identity <http://apassant.net/alex> .

_:node14efunnjjx1 cert:decimal "65537"@en .

<http://apassant.net/#cert> rsa:public_exponent _:node14efunnjjx1 .

_:node14efunnjjx2 cert:hex "8af4cb6d6ec004bd28c08d37f63301a3e63ddfb812475c679cf073c4dc7328bd20dadb9654d4fa588f155ca05e7ca61a6898fbace156edb650d2109ecee65e7f93a2a26b3928d3b97feeb7aa062e3767f4fadfcf169a223f4a621583a7f6fd8992f65ef1d17bc42392f2d6831993c49187e8bdba42e5e9a018328de026813a9f"@en .

<http://apassant.net/#cert> rsa:modulus _:node14efunnjjx2 .

[snip]

This graph can then be queried with SPARQL, merged with other graphs, and just as it links to other resources, those can in turn link back to it, and to elements defined therein. As a result Alexandre Passant can then use this in combination with an appropriate X509 certificate to log into foaf+ssl enabled web sites in one click, without needing to either remember a password or a URL.

Friday Jul 24, 2009

How to write a simple foaf+ssl authentication servlet

After having set up a web server so that it listens to an https socket that accepts certificates signed by any Certification Authority (CA) (see the Tomcat post), we can write a servlet that uses these retrieved certificates to authenticate the user. I will detail one simple way of doing this here.

Retrieving the certificate from the servlet

In Tomcat compatible servlets it is possible to retrieve the certificates used in a connection with the following code:

import java.security.cert.X509Certificate;
protected void doGet(HttpServletRequest request, HttpServletResponse response)
             throws ServletException, IOException {
       //...
       X509Certificate[] certificates = (X509Certificate[]) request
                       .getAttribute("javax.servlet.request.X509Certificate");
       //...
 }

Verifying the WebId

This can be done very easily by using a class such as DereferencingFoafSslVerifier (see source), available as a maven project from so(m)mer repository (in the foafssl/ directory).

Use it like this:

  Collection<? extends FoafSslPrincipal> verifiedWebIDs = null;

  try {
     FoafSslVerifier FOAF_SSL_VERIFIER = new DereferencingFoafSslVerifier();
     verifiedWebIDs = FOAF_SSL_VERIFIER.verifyFoafSslCertificate(foafSslCertificate);
  } catch (Exception e) {
     redirect(response,...); //redirect appropriately
     return;
  }

If the certificate is authenticated by the WebId, you will then end up with a collection of FoafSslPrincipals, which can be used for as an identifier for the user who just logged in. Otherwise you should redirect the user to a page enabling him to login with either OpenId, or the usual username/password pair, or point him to a page such as this one where he can get a foaf+ssl certificate.

For a complete example application that uses this code, have a look at the Identity Provider Servlet, which is running at https://foafssl.org/srv/idp (note this servlet was trying to create a workaround for an iPhone bug. Ignore that code for the moment).

Todo

The current library is too simple and has a few gaping usability holes. Some of the most evident are:

  • No support for rdfa or turtle formats.
  • The Sesame RDF framework/database should be run as a service, so that it can be queried directly by the servlet. Currently the data gathered by the foaf file is lost as soon as the FOAF_SSL_VERIFIER.verifyFoafSslCertificate(foafSslCertificate); method returns. This is ok for a Identity Provider Servlet, but not for most other servers. A Java/RDF mapper such as the So(m)mer mapper would then make it easy for Java programmers to use the information in the database to personalize the site with the information given by the foaf file.
  • develop an access control library that makes it easy to specify which resources can be accessed by which groups of users, specified declaratively. It would be useful for example to be able to specify that a number of resources can be accessed by friends of someone, or friends of friends of someone, or family members, ....

But this is good enough to get going. If you have suggestions on the best way to architect some of these improvements so that we have a more flexible and powerful library, please contact me. I welcome all contributions. :-)

Thursday Jul 23, 2009

How to setup Tomcat as a foaf+ssl server

foaf+ssl is a standards based protocol enabling one click identification/authentication to web sites, without requiring the user to enter either a username or a password. It can be used as a global distributed access control mechanism. It works with current browsers. It is RESTful, thereby working with Linked Data and especially linked foaf files, enabling thereby distributed social networks.

I will show here what is needed to get foaf+ssl working for Tomcat 6x. The general principles are documented on the Tomcat ssl howto page, which should be used for detailed reference. Here I will document the precise setup needed for foaf+ssl. If you want to play with this protocol quickly without bothering with this procedure I recommend using the foaf+ssl Identity Provider service which you can point to on your web pages, and which will then redirect your users to the service of your choosing with the URLEncoded WebId of your visitor.

foaf+ssl works by having the server request a client certificate on an https connection. The server therefore needs an https end point which can be specified in Tomcat by adding the following connector to the conf/server.xml file:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           sslProtocol="TLS"/>
Note: the default https port is 443, but it requires root privileges.

Servers authentify themselves by sending the client a certificate signed by a well known Certificate Authority (CA) whose public key is shipped in all browsers. Browsers use the public key to verify the signature sent by the server. If the server sends a certificate that is not signed by one of these CAs (perhaps it is self signed) then the web browser will usually display some pretty ugly error message, warning the user to stay clear of that site, with some complex way of bypassing the warning, which if the user is courageous and knowledgeable enough will allow him to add the certificate to a list of trusted certs. This warning will put most people off. It is best therefore to buy a CA certified cert.(I found one for €15 at trustico.) Usually the CA's will have very detailed instructions for installing the cert for a wide range of servers. In the case of Tomcat you will end up with the following addition property values:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           sslProtocol="TLS"/>

And of course this requires placing the server cert file at the keystoreFile path.

There are usually two ways for the server to respond to the client not sending a (valid) certificate. Either it can simply fail, or it can allow the server app to decide what to do. Automatic failure is not a good option, especially for a login service, as the user will then be confronted with a blank page. Much better is to allow the server to redirect the user to another page explaining how to get a certificate and giving him the option of authentication using OpenId or simply the well known username/password pattern. To enable Tomcat to respond this way you need to add the clientAuth="want" attribute value pair:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           sslProtocol="TLS" clientAuth="want" />

Most Java Web Servers on receiving a client certificate, attempt to automatically validate it, by verifying that it is correctly signed by one of the CA's shipped with the Java Runtime Environment (JRE), verifying that the cert is still valid, ... As the SSL library that ships with the JRE does not implement foaf+ssl we will need to do the authentication at the application layer. We therefore need to bypass the SSL Implementation. To do this Bruno Harbulot put together the JSSLUtils library available on Google Code. As mentioned in the JSSLUtils Tomcat documentation page this will require you to place two jars in the Tomcat lib directory: jsslutils-0.5.1.jar and jsslutils-extra-apachetomcat6-0.5.2.jar (the version numbers may differ as the library evolves). You will also need to specify the SSLImplementation in the conf file as follows:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           SSLImplementation="org.jsslutils.extra.apachetomcat6.JSSLutilsImplementation" 
           sslProtocol="TLS" clientAuth="want" />

Usually servers send in the request to the client a list of Distinguished Names of certificates authorities (CA) they trust, so that the client can filter from the certificates available in the browser those that match. Getting client certificates signed by CA's is a complex and expensive procedure, which in part explains why requesting client certificates is very rarely used: very few people have certificates signed by well known CAs. Instead those services that rely on client certificate tend to sign those certificates themselves, becoming their own CA. This means that certificates end up being valid for only one domain. foaf+ssl bypasses this problem by accepting certificates signed by any CA, going so far as to allow even self signed certs. The server must therefore send an empty list of CAs meaning that the browser can send any certificate (TLS 1.1). With the JSSLutils library available to Tomcat, this is specified in the conf/server.xml file with the acceptAnyCert=true attribute.

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           SSLImplementation="org.jsslutils.extra.apachetomcat6.JSSLutilsImplementation"
           acceptAnyCert="true" sslProtocol="TLS" clientAuth="want" />

At this point you have set up your Apache Server correctly. A user that arrives at your SSL endpoint and that has a couple of certificates will be asked to choose between them. Your client code can the extract the certificate with the following code:

       X509Certificate[] certificates = (X509Certificate[]) request
                       .getAttribute("javax.servlet.request.X509Certificate");

You can use these certificates then to extract the WebId, and verify the SSL certificates. I will write more about how to do this in my next blog post.

Wednesday Apr 29, 2009

Adding twitter to my blog using Scala

Having added javascript widgets to my blog a few months ago, I found that this slowed the page downloads a lot. Here is a way to speed this up again, by pre-processing the work with a Scala script, and using iFrames to include the result.

Here are the short steps to do this:

  1. I wrote a Scala Program (see source) to take the twitter Atom feed, and generate xhtml from it.
  2. I wrote a shell script to run the compiled scala jar
    #!/bin/bash
    
    export CP=$HOME/java/scala/lib/scala-library.jar:$HOME/java/scala/lib/learning.jar
    
    /usr/bin/java -cp $CP learning.BlogIFrame $\*
    
  3. Then I just started a cron job on my unix server to process the script every half an hour
    $ crontab -l
    5,36 \* \* \* \* $HOME/bin/twitter.sh $HOME/htdocs/tmp/blogs.sun.com/tweets.html
    
  4. Finally I added the iFrame to my blog here pointing to the produced html <IFRAME src="http://bblfish.net/tmp/blogs.sun.com/tweets.html" height="300" frameborder="0"></IFRAME>

As a result there is a lot less load on the twitter server - it only has to serve one atom feed every half an hour instead of 1000 or so a day - and my html blog page does not stall if the twitter site itself is overloaded.

Also I learnt a lot about Scala by doing this little exercise.

Friday Feb 13, 2009

I ♡ NetBeans 6.7 !

picture of NetBeans 7 Daily build 200902061401

As I was developing my recently released foaf+ssl server to demonstrate how to create distributed secure yet open social networks, I stumbled across a daily build of NetBeans 7 (build 200902061401), that is stable, beautiful and that has really helped me get my work done. NetBeans 7 is really going to rock!

Update: What I called NetBeans 7 is now called NetBeans 6.7.

Here is a list of some of the functionality that I realy appreciated.

Maven Integration

Haleluia for Maven Integration! I got going on my project by setting up a little Wicket project which I easily adapted to include the Sesame semantic web libraries and more (view pom).

The nicest part of this is that it then becomes extreemly easy to link the source and the javadoc together. Two commands which should be integrated as menu options have finally made it possible for me to work with NetBeans.


# get the javadoc
$ mvn dependency:resolve -Dclassifier=javadoc
# get sources
$ mvn dependency:sources

This simple thing just used to be a nightmare to do, especially as the number of jars one's project depended on increased. The Sesame group have split a lot of their jars up nicely, so that one could use subset of them, but the way NetBeans was set up it bacame a real huge amazing astounding pain to link those to the source. And what is an open source IDE worth if it can't help you browse the source code and see its documentation easily?

Now I don't think Maven is in any way the final word in project deployement. My criticims in short is that it is not RESTful in a few ways, not least of which is that it fails to use URLs to name things and it makes the cache the central element. It is as if they had turned web architecture upside down web, where people would name things by trying to identify the caches in which they were located rather than their Universal Locator. My guess is that as a result things are a lot less flexible than they could be. As Roy Fielding pointed out recently REST APIs must be hypertext driven. Software is located in a global information space, so there is no good reason in my opinion to not follow this precept.

Clearly though this is a huge huge improovement!

A better file explorer

I have sworn a few times at the previous versions of the NB file manager! Even more so when I had to use it to tie the javadoc to the source code - at that point it became a scream. Finally we have a command line File Explorer with tab completion. This is so beautiful I have to take a picture of it: file explorer

We use the keyboard all the time, and one can get many things done much faster that way. Navigating the File System with a keyboard is just much nicer. So why oh why is it still impossible to use up and down arrow keys in the classic view when some files are greyed out? ( Writing this I noticed there seems to be no way to get back from the classic view to the new command line view - please make it possible to get back! )

GlassFish 3 Integration

Well it is a real pleasure to work with a web server that loads a war in half a second. I use hardly any of the J2EE features so it's a good thing those don't get loaded.

I tried the HTTP Server Monitor and that could be useful if it were more informative. In RESTful development it is really important to know the response codes 303, etc... so that one can follow the conversations between the client and the server. Currently that piece is trying to tie things up too much into baby steps: just as with the File Explorer there should be an easy UI into a feature and an advanced mode. I'd like to see the full pure unadulterated content going over the wire, highligted perhaps to make it easy to find things. (It turns out this has been filed as feature request 36706)

GlassFish integration really helped me get my develop and deploy my foaf+ssl service.

User Interface

As you can see from the main picture the NetBeans UI seems to be going through a big transformation. Gone are some of the huge fat aqua buttons. The pieces are layed out in similar ways as in NB6.5, but this is a lot more elegant. A welcome change.

There is a very useful search bar at the top right of NB 7 now, which prooved to be very helpful at finding documentation, maven repositories, and many other things. It prooved to be very helpful a couple of times in my project.

One simple thing I would like would be to have a menu on each of the windows to open a file in its default OS viewer. So when I edit HTML which is a pleasure to do in NB, I would like to be able to quickly view that code in Firefox, Safari or Opera. Other XML files may have their default viewers, and so I think this is quite generalisable. In any case it should be easy to copy the file path of an open window, as one often has to do external processing of it. For files that are located on the internet, it would be great to be able to get their URL. This would help when chatting to people about source code one is working on for example.

Other

  • There are IntelliJ key bindings now. I really needed this a year or so ago, as I was switching between the IDEs. I have forgotten them now so it's less of a problem for me, but it will be very important for people switching between the IDEs.
  • I think this was part of NB6, but being able to browse the local history of source code is a really great feature. (I noticed that this does not diff html or xml for the moment)
  • Geertjan's Wicket integration Module partly works on this daily build. You may require starting of with NB7 milestone 1 to get going as it seemed still to be fully functional there.
  • I find this daily build needs restarting every day, as it seems to slow down after a while, perhaps using up a lot of memory.

Where is this going

Well those are the features that really stood out for me. And I am very happy to work with NB now.

I still think that the next big step, for NB 8 perhaps, should be the webification of the IDE. I think there is a huge amount to gain by applying Web Architecture principles to an IDE, and then the Net in NetBeans would fully reveal it's meaning.

Thursday Feb 12, 2009

creating a foaf+ssl cert in a few clicks

In a previous blog I showed how to create a foaf+ssl cert manually. I have now put up a simple test server where you can do the same in a few clicks.

The test.foaf-ssl.org/cert service will add a certificate to your browser securely, and create a local acount to which your certificate is pointing. This account will itself point to your Web Id. An account in this scheme is nothing more than an RDF file, such as the ones listed in the certs directory. You can then login in one click, without needing to remember a URL to other foaf+ssl services on the web. There are a few and growing number of prototype implementations listed on the foaf+ssl wiki.

All that remains to be done now is to create more interesting and valuable services using the distributed social networks and foaf+ssl. For some ideas on things to do consult the foaf+ssl Use Cases. The foaf protocols mailing list is a great place to get help on implementations and discuss ideas.

The server is written using Wicket and deployed on a GlassFish Application Server. The code is open source under a BSD licence. Hackers welcome!

Thursday Dec 04, 2008

JavaOne 2009 call for papers

Picture of JavaOne2008 keynote conference room

The JavaOne 2009 call for papers is now open (direct link to form). The deadline for paper submissions is December 19th.

Last year we had three Semantic Web related talks: one panel presentation, an introduction by Dean Allemang, and a small Birds of A Feather session. The talks went very well and were very well attended, surprisingly so given that they were somewhat in the wrong logical order, starting with the panel discussion, and ending with theory. Dean Allemang had over 300 attendees at his talk ( slides ). JavaOne is compared to most developer conferences huge. There are usually over 15 thousand attendees, so it is an excellent venue to speak to and convert a very large crowd to something new in one go.

I don't expect us to grow at the same rate as we did last year (we had a 200% increase in the number of talks). But I think we really should fit in some presentations on Java Semantic Web Frameworks, such as Sesame, Mulgara, Jena, or something that gives an overview on all of them. But I am not here to decide what goes in these talks. The track to look at is probably services track which covers a huge swath from cloud computing to web 2.0 SOA and more.

Remember that JavaOne attendees are practical people most of all. There is also a very large space for businesses to introduce attendees to their products. So we are here at the point where research meets business.

I know this clashes with the 6th European Semantic Web Conference in Greece, so I myself may have to do the impossible task of being at both simultaneously. On the other hand it is only one week before the Semantic Technology Conference in San Jose, so it can be a good time to visit the Bay Area, and meet the companies here, or vacation in the sun. :-)

See: JavaOne2008 or JavaOne tagged photos on flickr.

Wednesday Sep 24, 2008

Serialising Java Objects to RDF with Jersey

Jersey is the reference implementation of JSR311 (JAX-RS) the Java API for RESTful Web Services. In short JSR311 makes it easy to publish graphs of Java Objects to the web, and implement update and POST semantics - all this using simple java annotations. It makes it easy for Java developers to do the right thing when writing data to the web.

JAX-RS deals with the mapping of java objects to a representation format - any format. Writing a good format, though, is at least as tricky as building RESTful services. So JAX-RS solves only half the problem. What is needed is to make it easy to serialize any object in a manner that scales to the web. For this we have RDF, the Resource Description Framework. By combining JAX-RS and the So(m)mer's @rdf annotation one can remove remove the hurdle of having to create yet another format, and do this in a way that should be really easy to understand.

I have been wanting to demonstrate how this could be done, since the JavaOne 2007 presentation on Jersey. Last week I finally got down to writing up some initial code with the help of Paul Sandoz whilst in Grenoble. It turned out to be really easy to do. Here is a description of where this is heading.

Howto

The code to do this available from the so(m)mer subversion repository, in the misc/Jersey directory. I will refer and link to the online code in my explanations here.

Annotate one's classes

First one needs to annotate one's model classes with @rdf annotations on the classes and fields. This is a way to give them global identifiers - URIs. After all if you are publishing to the web, you are publishing to a global context so global names are the only way to remove ambiguity. So for example our Person class can be written out like this:

@rdf(foaf+"Person")
public class Person extends Agent {
    static final String foaf = "http://xmlns.com/foaf/0.1/";

    @rdf(foaf+"surname") private Collection<String> surnames = new HashSet<String>();
    @rdf(foaf+"family_name") private Collection<String> familynames = new HashSet<String>();
    @rdf(foaf+"plan") private Collection<String> plans = new HashSet<String>();
    @rdf(foaf+"img") private Collection<URL> images = new HashSet<URL>();
    @rdf(foaf+"myersBriggs") private Collection<String> myersBriggss = new HashSet<String>();
    @rdf(foaf+"workplaceHomepage") private Collection<FoafDocument> workplaceHomePages = new HashSet<FoafDocument>();
    ....
}

This just requires one to find existing ontologies for the classes one has, or to publish new ones. (Perhaps this framework could be extended so as to automate the publishing of ontologies as deduced somehow form Java classes? - Probably a framework that could be built on top of this)

Map the web resources to the model

Next one has to find a mapping for web resources to objects. This is done by subclassing the RdfResource<T> template class, as we do three times in the Main class. Here is a sample:

 @Path("/person/{id}")
   public static class PersonResource extends RdfResource<Employee> {
      public PersonResource(@PathParam("id") String id) {
          t = DB.getPersonWithId(id);
      }
   }

This just tells Jersey to publish any Employee object on the server at the local /person/{id} url. When a request for some resource say /person/155492 is made, a PersonResource object will be created whose model object can be found by querying the DB for the person with id 155492. For this of course one has to somehow link the model objects ( Person, Office,... in our example ) to some database. This could be done by loading flat files, querying an ldap server, or an SQL server, or whatever... In our example we just created a simple hard coded java class that acts as a DB.

Map the Model to the resource

An object can contain pointers to other objects. In order for the serialiser to know what the URL of objects are one has to map model objects to web resources. This is done simply with the static code in the same Main class [ looking for improovements here too ]

   static {
      RdfResource.register(Employee.class, PersonResource.class);
      RdfResource.register(Room.class, OfficeResource.class);
      RdfResource.register(Building.class, BuildingResource.class);              
   }

Given an object the serialiser (RdfMessageWriter) can then look up the resource URL pattern, and so determine the object's URL. So to take an example, consider an instance of the Room class. From the above map, the serialiser can find that it is linked to the OfficeResource class from which it can find the /building/{buildingName}/{roomId} URI pattern. Using that it can then call the two getters on that Room object, namely getBuildingName() and getRoomId() to build the URL referring to that object. Knowing the URL of an object means that the serialiser can stop its serialisation at that point if the object is not the primary topic of the representation. So when serialising /person/155492 the serialiser does not need to walk through the properties of /building/SCA22/3181. The client may already have that information and if not, the info is just a further GET request away.

Running it on the command line

If you have downloaded the whole repository you can just run

$ ant run
from the command line. This will build the classes, recompile the @rdf annotated classes, and start the simple web server. You can then just curl for a few of the published resources like this:
hjs@bblfish:0$ curl -i http://localhost:9998/person/155492
HTTP/1.1 200 OK
Date: Wed, 24 Sep 2008 14:37:38 GMT
Content-type: text/rdf+n3
Transfer-encoding: chunked

<> <http://xmlns.com/foaf/0.1/primaryTopic> </person/155492#HS> .
</person/155492#HS> <http://xmlns.com/foaf/0.1/knows> <http://www.w3.org/People/Berners-Lee/card#i> .
</person/155492#HS> <http://xmlns.com/foaf/0.1/knows> </person/528#JG> .
</person/155492#HS> <http://xmlns.com/foaf/0.1/birthday> "29_07" .
</person/155492#HS> <http://xmlns.com/foaf/0.1/name> "Henry Story" .

The representation returned not a very elegant serialisation of the Turtle subset of N3. This makes the triple structure of RDF clear- subject relation object - and it uses relative URLs to refer to local resources. Other serialisers could be added, such as for rdf/xml. See the todo list at the end of this article.

The represenation says simple that this resource <> has as primary topic the entity named by #HS in the document. That entity's name is "Henry Story" and knows a few people, one of which is refered to via a global URL http://www.w3.org/People/Berners-Lee/card#i, and the other via a local URL /person/528#JG.

We can find out more about the /person/528#JG thing by making the following request:

hjs@bblfish:0$ curl -i http://localhost:9998/person/528#JG
HTTP/1.1 200 OK
Date: Wed, 24 Sep 2008 14:38:10 GMT
Content-type: text/rdf+n3
Transfer-encoding: chunked

<> <http://xmlns.com/foaf/0.1/primaryTopic> </person/528#JG> .
</person/528#JG> <http://xmlns.com/foaf/0.1/knows> </person/155492#HS> .
</person/528#JG> <http://xmlns.com/foaf/0.1/knows> </person/29604#BT> .
</person/528#JG> <http://xmlns.com/foaf/0.1/knows> <http://www.w3.org/People/Berners-Lee/card#i> .
</person/528#JG> <http://www.w3.org/2000/10/swap/pim/contact#office> </building/SCA22/3181#it> .
</person/528#JG> <http://xmlns.com/foaf/0.1/birthday> "19-05" .
</person/528#JG> <http://xmlns.com/foaf/0.1/name> "James Gosling" .

... where we find out that the resource named by that URL is James Gosling. We find that James has an office named by a further URL, which we can discover more about with yet another request


hjs@bblfish:0$ curl -i http://localhost:9998/building/SCA22/3181#it
HTTP/1.1 200 OK
Date: Wed, 24 Sep 2008 14:38:38 GMT
Content-type: text/rdf+n3
Transfer-encoding: chunked

<> <http://xmlns.com/foaf/0.1/primaryTopic> </building/SCA22/3181#it> .
</building/SCA22/3181#it> <http://www.w3.org/2000/10/swap/pim/contact#address> _:2828781 .
_:2828781 a <http://www.w3.org/2000/10/swap/pim/contact#Address> .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#officeName> "3181" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#street> "4220 Network Circle" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#stateOrProvince> "CA" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#city> "Santa Clara" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#country> "USA" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#postalCode> "95054" .

Here we have a Location that has an Address. The address does not have a global name, so we give it a document local name, _:2828781 and serialise it in the same representation, as shown above.

Because every resource has a clear hyperlinked representation we don't need to serialise the whole virtual machine in one go. We just publish something close to the Concise Bounded Description of the graph of objects.

Browsing the results

Viewing the data through a command line interface is nice, but it's not as fun as when viewing it through a web interface. For that it is best currently to install the Tabulator Firefox plugin. Once you have that you can simply click on our first URL http://localhost:9998/person/155492. This will show up something like this:

picture of tabualator on loading /person/155492

If you then click on JG you will see something like this:

picture tabulator showing James Gosling

This it turns out is a resource naming James Gosling. James knows a few people including a BT. The button next to BT is in blue, because that resource has not yet been loaded, whereas the resource for "Henry Story" has. Load BT by clicking on it, and you get

picture of tabulator showing Henry Story knowning Bernard Traversat

This reveals the information about Bernard Traversat we placed in our little Database class. Click now on the i and we get

Tabulator with info about Tim Berners Lee

Now we suddenly have a whole bunch of information about Tim Berners Lee, including his picture, some of the people he has listed as knowing, where he works, his home page, etc... This is information we did not put in our Database! It's on the web of data.

One of the people Tim Berner's Lee knows is our very own Tim Bray.

tabulator showing info from dbpedia on Tim Bray

And you can go on exploring this data for an eternity. All you did was put a little bit of data on a web server using Jersey, and you can participate in the global web of data.

Todo

There are of course a lot of things that can be done to improove this Jersey/so(m)mer mapper. Here are just a few I can think of now:

  • Improove the N3 output. The code works with the examples but it does not deal well with all the literal types, nor does it yet deal with relations to collections. The output could also be more human readable by avoiding repetiations.
  • Refine the linking between model and resources. The use of getters sounds right, but it could be a bit fragile if methods are renamed....
  • Build serialisers for rdf/xml and other RDF formats.
  • Deal with publication of non information resources, such as http:// xmlns.com/foaf/0.1/Person which names the class of Persons. When you GET it, it redirects you to an information resources:
    hjs@bblfish:1$ curl -i http://xmlns.com/foaf/0.1/Person
    HTTP/1.1 303 See Other
    Date: Wed, 24 Sep 2008 15:30:14 GMT
    Server: Apache/2.0.61 (Unix) PHP/4.4.7 mod_ssl/2.0.61 OpenSSL/0.9.7e mod_fastcgi/2.4.2 Phusion_Passenger/2.0.2 DAV/2 SVN/1.4.2
    Location: http://xmlns.com/foaf/spec/
    Vary: Accept-Encoding
    Content-Length: 234
    Content-Type: text/html; charset=iso-8859-1
    
    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>303 See Other</title>
    </head><body>
    <h1>See Other</h1>
    <p>The answer to your request is located <a href="http://xmlns.com/foaf/spec/">here</a>.</p>
    </body></html>
    
    This should also be made easy and foolproof for Java developers.
  • make it industrial strength...
  • my current implementation is tied much too strongly to so(m)mer. The @rdf annotated classes get rewritten to create getters and setters for every field, that links to the sommer mappers. This is not needed here. All we need is to automatically add the RdfSerialiser interface to make it easy to access the private fields.
  • One may want to add support for serialising @rdf annotated getters
  • Add some basic functionality for POSTing to collections or PUTing resources. This will require some thought.

Bookmarks: digg+, reddit+, del.icio.us+, dzone+, facebook+

Wednesday Sep 17, 2008

Are OO languages Autistic?

illustration of a simple play

One important criterion of Autism is the failure to develop a proper theory of mind.

A standard test to demonstrate mentalizing ability requires the child to track a character's false belief. This test can be done using stories, cartoons, people, or, as illustrated in the figure, a puppet play, which the child watches. In this play, one puppet, called, Sally, leaves her ball in her basket, then goes out to play. While she is out, naughty Anne moves the ball to her own box. Sally returns and wants to play with her ball. The child watching the puppet play is asked where Sally will look for her ball (where does Sally think it is?). Young children aged around 4 and above recognize that Sally will look in the basket, where she (wrongly) thinks the ball is.
Children with autism will tend to answer that Sally will look for the ball in the box.

Here are two further descriptions of autism from today's version of the Wikipedia article:

The main characteristics are of Autism are impairments in social interaction, impairments in communication, restricted interests and repetitive behavior.
Sample symptoms include lack of social or emotional reciprocity, stereotyped and repetitive use of language or idiosyncratic language, and persistent preoccupation with parts of objects.

In order to be able to have a mental theory one needs to be able to understand that other people may have a different view of the world. On a narrow three dimensional understanding of 'view', this reveals itself in that people at different locations in a room will see different things. One person may be able to see a cat behind a tree that will be hidden to another. In some sense though these two views can easily be merged into a coherent description. They are not contradictory. But we can do the same in higher dimensions. We can think of people as believing themselves to be in one of a number of possible worlds. Sally believes she is in a world where the ball is in the basket, whereas Ann believes she is in a world where the ball is in the box. Here the worlds are contradictory. They cannot both be true of the actual world.

To be able to make this type of statement one has to be able to do at least the following things:

  • Speak of ways the world could be
  • Refer to objects across these worlds
  • Compare these worlds
The ability to do this is present in none of the well known Object Oriented (OO) languages by default. One can add it, just as one can add garbage collection to C, but it requires a lot of discipline and care. It does not come naturally. Perhaps a bit like a person with Asperger's syndrome can learn to interact socially with others, but in a reflective awkward way.

Let us illustrate this with a simple example. Let us see how one could naively program the puppet play in Java. Let us first create the objects we will need:

Person sally = new Person("Sally");
Person ann = new Person("Ann");
Container basket = new Container("Basket");
Container box = new Container("Box");
Ball ball = new Ball("b1");
Container room = new Container("Room");
So far so good. We have all the objects. We can easily imagine code like the following to add the ball into the basket, and the basket into the room.
basket.add(ball);
room.add(basket);
Perhaps we have methods whereby the objects can ask what their container is. This would be useful for writing code to make sure that a thing could not be in two different places at once - in the basket and in the box, unless the basket was in the box.
Container c = ball.getImmediateContainer();
Assert.true(c == basket);
try {
      box.add(ball)
      Assert.fail();
} catch (InTwoPlacesException e) {
}
All that is going to be tedious coding, full of complicated issues of their own, but it's the usual stuff. Now what about the beliefs of Sally and Ann? How do we specify those? Perhaps we can think of sally and ann as being small databases of objects they are conscious of. Then one could just add them like this:
sally.consciousOf(basket,box,ball);
ann.consciousOf(basket,box,ball);
But the problem should be obvious now. If we move the ball from the basket to the box, the state of the objects in sally and ann's database will be exactly the same! After all they are the same objects!
basket.remove(ball);
box.add(ball);
Ball sb = sally.get(Ball.class,"b1");
Assert.true(box.contains(sb));
//that is because
Ball ab = ann.get(Ball.class,"b1");
Assert.true(ab==sb);
There is really no way to change the state of the ball for one person and not for the other,... unless perhaps we give both people different objects. This means that for each person we would have to make a copy of all the objects that they could think of. But then we would have a completely different problem: namely deciding when these two objects were the same. For it is usually understood that the equality of two objects depends on their state. So one usually would not think that an physical object could be the same if it was in two different physical places. Certainly if we had a ball b1 in a box, and another ball b2 in a basket, then what on earth would allow us to say we were speaking of the same ball? Perhaps their name, if it we could guarantee that we had unique names for things. But we would still have some pretty odd things going on then, we would have objects that would somehow be equal, but would be in completely different states! And this is just the beginning of our problems. Just think of the dangers involved here in taking an object from ann's belief database, and how easy it would be to by mistake allow it to be added to sally's belief store.

These are not minor problems. These are problems that have dogged logicians for the last century or more. To solve it properly then one should look for languages that were inspired by the work of such logicians. The most serious such project is now knows as the Semantic Web.

Using N3 notation we can write the state of affairs described by our puppet show, and illustrated by the above graph, out like this:

@prefix : <http://test.org/> .

:Ann :believes { :ball :in :box . } .
:Sally :believes { :ball in :basket } .

N3 comes with a special notation for grouping statements by placing them inside of { }. We could then easily ask who believes the ball is in the basket using SPARQL

PREFIX : <http://test.org/>
SELECT ?who
WHERE {
     GRAPH ?g1 { :ball :in :basket }
     ?who :believes ?g1 .
}

The answer would bind ?who to :Sally, but not to :Ann.

RDF therefore gives us the basic tools to escape from the autism of simpler languages:

  • One can easily refer to the same objects across contexts, as URIs are the basic building block of RDF
  • The basic unit of meaning are sets of relations - graphs - and these are formally described.
The above allows query for objects across contexts and so to compare, merge and work with contexts.

It is quite surprising once one realizes this, to think how many languages claim to be web languages, and yet fail to have any default space for the basic building blocks of the web: URIs and the notion of different points of views. When one fetches information from a remote server one just has to take into account the fact that the server's view of the world may be different and incompatible in some respects with one's own. One cannot in an open world just assume that every body agrees with everything. One is forced to develop languages that enable a theory of mind. A lot of failures in distributed programming can probably be traced down to working with tools that don't.

Of course tools can be written in OO languages to work with RDF. Very good ones have been written in Java, such as Sesame, making it possible to query repositories for beliefs across contexts (see this example). But they bring to bear concepts that don't sit naturally with Java, and one should be aware of this. OO languages are good for building objects such as browsers, editors, simple web servers, transformation tools, etc... But they don't make it easy to develop tools that require just the most basic elements of a theory of mind, and so most things to do with communication. For that one will have to use the work done in the semantic web space and familiarize oneself with the languages and tools developed for working with them.

Finally the semantic web also has its OO style with the Web Ontology Language (OWL). This is just a set of relations to describe classes and relations. Notice though that it is designed for intra context inference, ie all inferences that you can make within a world. So in that sense thinking in OO terms does even at the Semantic Web layer seem to not touch on thinking across contexts, or mentally. Mind you, since people deal with objects, it is also important to think about objects to understand people. But it is just one part of the problem.

vote on reddit and follow the discussion
vote on dzone

Thursday Sep 04, 2008

Building Secure, Open and Distributed Social Network Applications

Current Social Networks don't allow you to have friends outside their network. When on Facebook, you can't point to your friend on LinkedIn. They are data silos. This audio enhanced slide show explains how a distributed decentralized social network is being built, how it works, and how to make is secure using the foaf+ssl protocol (a list of pointers on the esw wiki).

It is licenced under a CC Attribution ShareAlike Licence.
My voice is a bit odd on the first slide, but it gets better I think as I go along.

Building Secure Open & Distributed Social Networks( Viewing this slide show requires a flash plugin. Sorry I only remembered this limitation after having put it online. If you know of a good Java substitute let me know. The other solution would have been to use Slidy. PDF and Annotated Open Document Format versions of this presentation are available below. (why is this text visible in Firefox even when the plugin works?) )

This is the presentation I gave at JavaOne 2008 and at numerous other venues in the past four months.

The slidecast works a lot better as a presentation format, than my previous semantic web video RDF: Connecting Software and People which I published as a h.264 video over a couple of years ago, and which takes close to 64MB of disk space. The problem with that format is that it is not easy to skip through the slides to the ones that interest you, or to go back and listen to a passage carefully again. Or at least it feels very clunky. My mp3 sound file only takes 17MB of space in comparison, and the graphics are much better quality in this slide show.

It is hosted by the excellent slideshare service, which translated my OpenOffice odp document ( once they were cleaned up a little: I had to make sure it had no pointers to local files remaining accessible from the Edit>Links menu (which otherwise choked their service)). I used the Audacity sound editor to create the mp3 file which I then place on my bblfish.net server. Syncing the sound and the slides was then very easy using SlideShare's SlideCast application. I found that the quality of the slides was a lot better once I had created an account on their servers. The only thing missing would be a button in addition to the forward and backward button that would allow one to show the text of the audio, for people with hearing problems - something equivalent to the Notes view in Open Office.

You can download the OpenOffice Presentation which contains my notes for each slide and the PDF created from it too. These are all published under a Creative Commons Attribution, Share Alike license. If you would like some of the base material for the slides, please contact me. If you would like to present them in my absence feel free to.

Thursday Aug 28, 2008

picture of my blog

picture from wordle.net

Wordle is a fun little Java applet that analyses your blog and builds any number of beautiful word clouds from the content. It has started a movement I think.

Just a pity that one cannot embed the applet in one's own web page. That would make it dynamic and a lot more interesting. Perhaps that will come. At present it seems the author is not sure what the IBM lawyers have decided.

This is somewhat similar to the graphic app I mentioned previously though with somewhat more content, clearly. Long term readers of my blog may find the picture of my blog to not be such a good reflection of my interests though. Statistic analysis of words does not get you that far.

Tuesday Aug 26, 2008

Sun Intranet Foaf Experiment

image of Address Book displaying internal sun foaf

Building a foaf server from an ldap directory is pretty easy. Rinaldo Di Giorgio put a prototype server together for Sun in less than a week. As a result everyone in Sun now has a experimental temporary foaf id, that we can use to try out some things.

So what can one do with foaf that one could not so easily do with ldap? Well the semantic web is all about linking and meshing information. So one really simple thing to do is to link an external foaf file with the internal one. I did this by adding an owl:sameAs statement to my public foaf file that links my public and my sun id. (It would be better to link the internal foaf file to the external one, but that would have required a bit more work internally). As a result by dragging and dropping my foaf iconfoaf file onto today's release of the AddressBook someone who is inside the Sun firewall, can follow both my internal and my external connections. Someone outside the firewall will not be able to follow the internal link.

By extending the internal foaf server a little more one could easily give people inside of Sun a place to link to their external business connection, wherever they might be in the world. To allow other companies to do this too it would of course help if everyone in Sun had a minimally public foaf ID, which would return only minimal information, or whatever the employee was comfortable revealing about themselves. This would allow Sun to present a yet more human face to the world.

Well that's just a thought, and this is just an experiment. Hopefully it will make the semantic web more real for us here, and allow people's to dream up some great way of bringing all the open source world together, ever closer.

PS. For people inside of Sun it may be easier to just drag my foaf iconinternal foaf file directly on the the AddressBook (started via jnlp). Otherwise to get the internal foaf file to download you need to click the "fetch" button next to the "same As" combo box when viewing my info. Then you need to switch to "Last Imported" and back to allow "Bernard Traversat" to appear in the second column. He appears as someone I foaf:know after the merger of the internal and the external foaf. I know this is clumsy, and I'll try thinking up a way to make this more user friendly very soon. You are welcome to participate on the Address Book Project.

PPS. Sun internal users can get more info on the project home page.

PPPS. We of course use the Firefox Tabulator plugin too for tests. It gives a different interface to my AddressBook. It is more flexible, but less specialised... The Tabulator web application does not work currently because we only produce Turtle output. This is to avoid developers trying to use DOM tools to process these pages, as we don't want to put work into an RDF crystalisation. ( Note: If at some later time you find that the plugin is not compatible with the latest version of Firefox, you can manually disabling compatibility checks. )

Sunday Aug 10, 2008

A Simple Sommer Project

Are you dreaming of the coasts of Java? Need some real sunshine? Have you had enough of hibernation? Here is a simple project to help you wake up and open up some new horizons :-)

A couple of weeks ago, Stephanie Stroka asked me how to make use of the so(m)mer library. So(m)mer is designed to plays a similar role for the semantic web that Hibernate plays for relational databases. So when relational databases feel too constraining, and you need an open database that spans the world, the semantic web is for you. And if you are a Java programmer, then so(m)mer is a good open source project you can participate in.
Anyway, I had not had time to write out a HOWTO, so I directed Stephanie over instant messenger on how to get going. It turns out this was a little more complicated that I had imagined, for someone completely new to the project. So Stephanie wrote out a really simple, indeed the simplest possible project, documented it, and added it to the so(m)mer repository.

As a result there is now a really simple example to follow to get out of hibernation mode, and into the blooming summer. Just follow the instructions.

Saturday May 17, 2008

3 weeks of conferences and workshops in the Bay Area

I am in the Bay Area about to start my third week of conference/workshops with the combined themes of Java, identity, semantic web, and data portability.

The first week at JavaOne went very well. The Semantic Web Panel attracted way over 500 people by my guesstimate (no official figure yet), and Dean Allemang's talk "Semantic Web for the working Ontologist", that took place on the last day attracted well over three hundred attendees. My BOF, happened late at night at the same time as a big party, and only attracted 30 or so attendees. But on the whole JavaOne proved a great success.

Speaking to members of the liberty group at Sun, I discovered the existence of the Internet Identity Workshop in Mountain View, and decided this would be a good opportunity to learn more about this space. This was a very good use of my time, as it helped me get more familiar with many of the problems and technologies in this space. I put forward some of the ideas I had been discussing here relating the semantic web and distributed web of trust ideas using OpenId and foaf+ssl, which seemed to hold up quite well under the close scrutiny of the community. A few fun conversations with Eve Maler (aka xmlgrrl) on the relations between the semantic web and XML nicely spiced up the evenings :-)

That workshop was closely followed by a one day Data Sharing Summit, addressing issues raised by the Data Portability group, which I have been following relatively closely. This one day session was very helpful for my understanding of the types of problems that need solving. An ontology for what can be done with information in a foaf file would indeed be very helpful. This would have to allow one to specify in simple terms what relations could be republished or which ones should not be.

So next on the list is the Semantic Technology Conference in San Jose, which will bring all these threads together. For more on that see see my post on the Semantic Tech highlights.

Update

The presentation I was giving is now available online with audio as Building Secure, Open and Distributed Social Network Applications

Social Networks and Data Portability at Semantic Tech conference in San Jose

The upcoming semantic conference in San Jose, is getting going tomorrow, with an excellent list of speakers and subjects. Here are some highlights of the sessions relating to topics on which I blog regularly.

Many more interesting talks will make sure I will spend another packed week. The full program is available online.

Update

My presentation is now available online with audio as part of the longer Building Secure, Open and Distributed Social Network Applications

Thursday Apr 17, 2008

KiWi: Knowledge in a Wiki

KiWi logo

Last month I attended the European Union KiWi project startup meeting in Salzburg, to which Sun Microsystems Prague is contributing some key use cases.

KiWi is a project to build an Open Source Semantic Wiki. It is based on the IkeWiki [don't follow this link if you have Safari 3.1] Java wiki, which uses the Jena Semantic Web frameworks, the Dojo toolkit for the Web 2.0 functionality, and any one of the Databases Jena can connect to, such as PostgreSQL. KiWi is in many ways similar to Freebase in its hefty use of JavaScript, and its emphasis on structured data. But instead of being a closed source platform, KiWi is open source, and builds upon the Semantic Web standards. In my opinion it currently overuses JavaScript features, to the extent that all clicks lead to dynamic page rewrites that do not change the URL of the browser page. This I feel unRESTful, and the permalink link in the socialise toolbar to the right does not completely remove my qualms. Hopefully this can be fixed in this project. It would be great also if KIWI could participate fully in the Linked Data movement.

The meeting was very well organized by Sebastian Schaffert and his team. It was 4 long days of meetings that made sure that everyone was on the same page, understood the rules of the EU game, and most of all got to know each other. (see kiwiknows tagged pictures on flickr ). Many thanks also to Peter Reiser for moving and shaking the various Sun decision makers to sign the appropriate papers, and dedicate the resources for us to be part of this project.

You can follow the evolution of the project on the Planet Kiwi page.

Anyway, here is a video that shows the resourceful kiwi mascot in action:

Thursday Mar 20, 2008

how binary relations beat tuples

Last week I was handed a puzzle by Francois Bry: "Why does RDF limit itself to binary relations? Why this deliberate lack of expressivity?".

Logical Equivalence Reply

My initial answer was that all tuples could be reduced to binary relations. So take a simple table like this:

User IDnameaddressbirthdaycoursehomepage
1234Henry Story21 rue Saint Honoré
Fontainebleau
France
29 Julyphilosophyhttp://bblfish.net/
1235Danny AyersLoc. Mozzanella, 7
Castiglione di Garfagnana
Lucca
Italy
14 Jansemwebhttp://dannyayers.com

The first row in the above column can be expressed as a set of binary relations as shown in this graph:

The same can clearly be done for the second row.

Since the two models express equivalent information I would opt aesthetically for the graph over the tuples, since it requires less primitives, which tends to make things simpler and clearer. Perhaps that can already be seen in the way the above table is screaming out for refactoring: a person may easily have more than one homepage. Adding a new homepage relation is easy, doing this in a table is a lot less so.

But this line of argument will not convince a battle worn database administrator. Both systems do the same thing. One is widely deployed, the other not. So that is the end of the conversation. Furthermore it seems clear that retrieving a row in a table is quick and easy. If you need chunks of information to be together that beats the join that seems to be required in the graph version above. Pragmatics beats aesthetics hands down it seems.

Global Distributed Open Data

The database engineer might have won the battle, but he will not win the war [1]. Wars are fought at a much higher level, on a global scale. The problem the Semantic Web is attacking is global data, not local data. On the Semantic Web, the web is the database and data is distributed and linked together. On the Semantic Web use case the data won't all be managed in one database by a few resource constrained superusers but distributed in different places and managed by the stake holder of that information. In our example we can imagine three stake holders of different pieces of information: Danny Ayers for his personal information, Me for mine, and the university for its course information. This information will then be available as resources on the web, returning different representations, which in one way or another may encode graphs such as the ones below. Note that duplication of information is a good thing in a distributed network.

By working with the most simple binary relations, it is easy to cut information up down to their most atomic unit, publish them anywhere on the web, distributing the responsibility to different owners. This atomic nature of relations also makes it easy to merge information again. Doing this with tuples would be unnecessarily complex. Binary relations are a consequence of taking the open world assumption seriously in a global space. By using Universal Resource Identifiers (URIs), it is possible for different documents to co-refer to the same entitities, and to link together entities in a global manner.

The Verbosity critique

Another line of attack similar to the first could be that rdf is just too verbose. Imagine the relation children which would relate a person to a list of their children. If one sticks just with binary relations this is going to be very awkward to write out. In a graph it would look like this.

image of a simple list as a graph

Which in Turtle would give something like this:

:Adam :children 
     [ a rdf:List;
       rdf:first :joe;
       rdf:rest [ a rdf:List;
            rdf:first :jane;
            rdf:rest rdf:nil ];
     ] .

which clearly is a bit unnecessarily verbose. But that is not really a problem. One can, and Turtle has, developed a notation for writing out lists. So that one can write much more simply:

:Adam :children ( :joe :jane ) .

This is clearly much easier to read and write than the previous way (not to speak about the equivalent in rdf/xml). RDF is a structure developed at the semantic level. Different notations can be developed to express the same content. The reason it works is because it uses URIs to name things.

Efficiency Considerations

So what about the implementation question: with tables oft accessed data is closely gathered together. This it seems to me is an implementation issue. One can easily imagine RDF databases that would optimize the layout in memory of their data at run time in a Just in Time manner, depending on the queries received. Just as the Java JIT mechanism ends up in a overwhelming number of cases to be faster than hand crafted C, because the JIT can take advantage of local factors such as the memory available on the machine, the type of cpu, and other issues, which a statically compiled C binary cannot do. So in the case of the list structure shown above there is no reason why the database could not just place the :joe and jane in an array of pointers.

In any case, if one wants distributed decentralised data, there is no other way to do it. Pragamatism does have the last word.

Notes

  1. Don't take the battle/war analogy too far please. Both DB technologies and Semantic Web ones can easily work together as demonstrated by tools such as D2RQ.

Wednesday Mar 19, 2008

Semantic Web for the Working Ontologist

I am really excited to see that Dean Allemang and Jim Hendler's book "Semantic Web for the Working Ontologist" is now available for pre-order on Amazon's web site. When I met Dean at Jazoon 2007 he let me have a peek at an early copy of this book[1]: it was exactly what I had been waiting a long time for. A very easy introduction to the Semantic Web and reasoning that does not start with the unnecessarily complex RDF/XML [2] but with the one-cannot-be-simpler triple structure of RDF, and through a series of practical examples brings the reader step by step to a full view of all of the tools in the Semantic Web stack, without a hitch, without a problem, fluidly. I was really impressed. Getting going in the Semantic Web is going to be a lot easier when this book is out. It should remove the serious problem current students are facing of having to find a way through a huge number of excellent but detailed specs, some of which are no longer relevant. One does not learn Java by reading the Java Virtual Machine specification or even the Java Language Specification. Those are excellent tools to use once one has read many of the excellent introductory books such as the unavoidable Java Tutorial or Bruce Eckel's Thinking in Java. Dean Allemang and Jim Hendler's books are going to play the same role for the Semantic Web. Help get millions of people introduced to what has to be the most revolutionary development in computer science since the development of the web itself. Go and pre-order it. I am going to do this right now.

Notes

  1. the draft I looked at 9 months ago had introductions to ntriples, turtle, OWL explained via rules, SPARQL, some simple well known ontologies such as skos and foaf, and a lot more.
  2. The W3C has recently published a new RDF Primer in Turtle in recognition of the difficulty of getting going when the first step requires understanding RDF/XML.
  3. Dean gave a talk at JavaOne that is now available online, which goes over the first chapters of the book. While you are waiting for the book, you can learn a lot by following this.

Wednesday Mar 05, 2008

Opening Sesame with Networked Graphs

Simon Schenk just recently gave me an update to his Networked Graphs library for the Sesame RDF Framework. Even though it is in early alpha state the jars have already worked wonders on my Beatnik Address Book. With four simple SPARQL rules I have been able to tie together most of the loose ends that appear between foaf files, as each one often uses different ways to refer to the same individual.

Why inferencing is needed

So for example in my foaf file I link to Simon Phipps- Sun's very popular Open Source Officer - with the following N3:

 :me foaf:knows   [ a foaf:Person;
                    foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                    foaf:name "Simon Phipps";
                    foaf:homepage <http://www.webmink.net/>;
                    rdfs:seeAlso <http://www.webmink.net/foaf.rdf>;
                  ] .
For those who still don't know N3 (where have you been hiding?) this says that I know a foaf:Person named "Simon Phipps" whose homepage is specified and for which more information can be found at the http://www.webmink.net/foaf.rdf rdf file. Now the problem is that the person in question is identified by a '[' which represents a blank node. Ie we don't have a name (URI) for Simon. So when the Beatnik Address Book gets Simon's foaf file, by following the rdfs:seeAlso relation, it gets among others something like
[] a foaf:Person;
   foaf:name "Simon Phipps";
   foaf:nick "webmink";
   foaf:homepage </>;
   foaf:knows [ a foaf:Person;
                foaf:homepage <http://www.buzzword-compliant.com/>;
                rdfs:seeAlso <http://www.buzzword-compliant.com/foaf.rdf>;
             ] .
This file then contains at least two people. Which one is the same person? Well a human being would guess that the person named "Simon Phipps" is the same in both cases. Networked Graphs helps Beatnik make a similar guess by noting that the foaf:homepage relation is an owl:InverseFunctionalProperty.

Some simple rules

After downloading Simon Phipps's foaf file and mine and placing the relations found in them in their own Named Graph, we can in Sesame 2.0 create a merged view of both these graphs just by creating a graph that is the union of the triples of each .

The Networked Graph layer can then do some interesting inferencing by defining a graph with the following SPARQL rules

#foaf:homepage is inverse functional
grph: ng:definedBy """
  CONSTRUCT { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .  } 
  WHERE { 
       ?a <http://xmlns.com/foaf/0.1/homepage> ?pg .
       ?b <http://xmlns.com/foaf/0.1/homepage> ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 } """\^\^ng:Query .
This is simply saying that if two names for things have the same homepage, then these two names refer to the same thing. I could be more general by writing rules at the owl level, but those would be but more complicated, and I just wanted to test out the Networked Graph sail to start with. So the above will add a bunch of owl:sameAs relations to our NetworkedGraph view on the Sesame database.

The following two rules then just complete the information.

# owl:sameAs is symmetric
#if a = b then b = a 
grph: ng:definedBy """
  CONSTRUCT { ?b <http://www.w3.org/2002/07/owl#sameAs> ?a . } 
  WHERE { 
     ?a <http://www.w3.org/2002/07/owl#sameAs> ?b . 
     FILTER ( ! SAMETERM(?a , ?b) )   
  } """\^\^ng:Query .

# indiscernability of identicals
#two identical things have all the same properties
grph: ng:definedBy """
  CONSTRUCT { ?b ?rel ?c . } 
  WHERE { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .
          ?a ?rel ?c . 
     FILTER ( ! SAMETERM(?rel , <http://www.w3.org/2002/07/owl#sameAs>) )   
  } """\^\^ng:Query .
They make sure that when two things are found to be the same, they have the same properties. I think these two rules should probably be hard coded in the database itself, as they seem so fundamental to reasoning that there must be some very serious optimizations available.

Advanced rules

Anyway the above illustrates just how simple it is to write some very clear inferencing rules. Those are just the simplest that I have bothered to write at present. Networked Graphs allows one to write much more interesting rules, which should help me solve the problems I explained in "Beatnik: change your mind" where I argued that even a simple client application like an address book needs to be able to make judgements on the quality of information. Networked Graphs would allow one to write rules that would amount to "only believe consequences of statements written by people you trust a lot". Perhaps this could be expressed in SPARQL as

CONSTRUCT { ?subject  ?relation ?object . }
WHERE {
    ?g tr:trustlevel ?tl .
    GRAPH ?g { ?subject ?relation ?object . }
    FILTER ( ?tl > 0.5 )
}
Going from the above it is easy to start imagining very interesting uses of Networked Graph rules. For example we may want to classify some ontologies as trusted and only do reasoning on relations over those ontologies. The inverse functional rule could then be generalized to
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX : <https://sommer.dev.java.net/ontologies/beatnik#>

  CONSTRUCT { ?a owl:sameAs ?b .  } 
  WHERE { 
      GRAPH ?g { ?inverseFunc a owl:InverseFunctionalProperty . }
      ?g a :TrustedOntology .

       ?a ?inverseFunc ?pg .
       ?b ?inverseFunc ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 }

Building the So(m)mer Address Book

I will be trying these out later. But for the moment you can already see the difference inferencing brings to an application by downloading the Address Book from subversion at sommer.dev.java.net and running the following commands (leave the password to the svn checkout blank)


> svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer --username guest
> cd sommer
> ant jar
> cd misc/AddressBook/
> ant run
Then you can just drag and drop the foaf file on this page into the address book, and follow the distributed social network by pressing the space bar to get foaf files. To enable inferencing you currently need to set it in the File>Toggle Rules menu. You will see things coming together suddenly when inferencing is on.

There are still a lot of bugs in this software. But you are welcome to post bug reports, or help out in any way you can.

Where this is leading

Going further it seems to me clear that Networked Graphs is starting to realise what Guha, one of the pioneers of the semantic web, wrote about in this thesis "Contexts: A Formalization and Some Applications", which I wrote a short note on Keeping track of Context in Life and on the Web a couple of years ago. That really helped me get a better understanding of the possibilities of the semantic web.

Wednesday Feb 06, 2008

replacing ant with rdf

Tim Boudreau just recently asked "What if we built Java code with...Java?". Why not replace Ant or Maven xml build documents with Java (Groovy/Jruby/jpython/...) scripts? It could be a lot easier to program for Java programmers, and much easier to understand for them too. Why go through xml, when things could be done more simply in a universal language like Java? Good question. But I think it depends on what types of problem one wants to solve. Moving to Java makes the procedural aspect of a build easier to program for a certain category of people. But is that a big enough advantage to warrant a change? Probably not. If we are looking for an improvement, why not explore something really new, something that might resolve some as yet completely unresolved problems at a much higher level? Why not explore what a hyperdata build system could bring to us? Let me start to sketch out some ideas here, very quickly, because I am late on a few other projects I am meant to be working on.

The answer to software becoming more complicated has been to create clear interfaces between the various pieces, and have people specialise in building components to the interfaces. It's the "small is beautiful" philosophy of Unix. As a result though, as software complexity builds up, every piece of software requires more and more pieces of other software, leading us from a system of independent software pieces to networked software. Let me be clear. The software industry has been speaking a lot about software containing networked components and being deployed on the network. This is not what I am pointing to here. No I want to emphasise that the software itself is built of components on the network. Ie. we need more and more a networked build system. This should be a big clue as to why hyperdata can bring something to the table that other systems cannot. Because RDF is a language whose pointer system is build on the Universal Resource Identifier (URI) it eats networked components for lunch, breakfast and dinner. (see my Jazoon presentation).

Currently my subversion repository consists of a lot of lib subdirectories full of jar files taken from other projects. Would it not be better if I referred to these libraries by URL instead? The URL where they can be HTTP gotten from of course? Here are a few advantages:

  • it would use up less space in my SubVersion repository. A pointer just takes up less space than an executable in most cases.
  • it would use up less space on the hard drive of people downloading my code. Why? Because I am referring to the jar via a universal name, a clever IDE will be able to use the local cached version already downloaded for another tool.
  • it would make setting up IDE's a lot easier. Again because each component now has a Universal Name, it will be possible to link up jars to their source code once only.
  • the build process, describing as it does how the code relates to the source, can be used by IDEs to jump to the source (also identified via URLs) when debugging a library on the network. (see some work I started on a bug ontology called Baetle)
  • Doap files can be then used to tie all these pieces together, allowing people to just drag and drop projects from a web site onto their IDE, as I demonstrated with Netbeans
  • as IDE gain knowledge of which components are successors to which other components, from such DOAP files, it is easy to imagine them developing RSS like functionality, where it scans the web for updates to your software components, and alerts you to those updates which you can then test out quickly yourself.
  • The system can be completely decentralised, making it a WEB 3.0 system, rather than a web 2.0 system. It should be as easy as having to place your components and your RDF file on a web server served up with the correct mime types.
  • It will be easy to link up jars or source code ( referred to as usual by URLs ) to bugs (described via something like Baetle ). Making it easy to describe how bugs in one project depend on bugs in other projects.

So here are just a few of the advantages that a hyperdata based build system could bring. They seem important enough in my opinion to justify exploring this in more detail. Ok. Well, let me try something here. When compiling files one needs the following: a classpath and a number of source files.

@prefix java: <http://rdf.sun.com/java/> .

_:cp a java:ClassPath;
       java:contains ( <http://apache.multidist.com/cocoon/2.1.11> <http://openrdf.org/sesame/2.0/> ) .

_:outputJar a java:Jar;
       java:buildFrom <src>;
       java:classpath _:cp .

_:outputJar 
        :pathtemplate "dist/${date}/myprog.jar";
        :fullList <outputjars.rdf> .
If the publication mechanism is done correctly the relative URLs should work on the file system just as well as they do on the http view of the repository. Making a jar would then be a matter of some program following the URLs to download all the pieces (if needed), put them in place and use that to build the code. Clearly this is just a sketch. Perhaps someone else has already had thoughts on this?

About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today