Wednesday Sep 09, 2009

RDFa parser for Sesame

RDFa is the microformat-inspired standard for embedding semantic web relations directly into (X)HTML. It is being used more and more widely, and we are starting to have foaf+ssl annotated web pages, such as Alexandre Passant's home page. This is forcing me to update my foaf+ssl Identity Provider to support RDFa.

The problem was that I have been using Sesame as my semweb toolkit, and there is currently was no RDFa parser for it. Luckily I found out that Damian Steer (aka. Shellac) had written a SAX bases rdfa parser for the HP Jena toolkit, which he had put up on the java-rdfa github server. With a bit of help from Damian and the Sesame team, I adapted the code to sesame, create a git fork of the initial project, and uploaded the changes on the bblfish java-rdfa git clone. Currently all but three of the 106 tests pass without problem.

To try this out get git, Linus Torvalds' distributed version control system (read the book), and on a unix system run:

$ git clone  git://github.com/bblfish/java-rdfa.git

This will download the whole history of changes of this project, so you will be able to see how I moved from Shellac's code to the Sesame rdfa parser. You can then parse Alex's home page, by running the following on the command line (thanks a lot to Sands Fish for the Maven tip in his comment to this blog):

$ mvn  exec:java -Dexec.mainClass="rdfa.parse" -Dexec.args="http://apassant.net/"

[snip output of sesame-java-rdfa compilation]

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.geonames.org/ontology/> .
@prefix rel: <http://purl.org/vocab/relationship/> .
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix rsa: <http://www.w3.org/ns/auth/rsa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


<http://apassant.net/> <http://www.w3.org/1999/xhtml/vocab#icon> <http://apassant.net/misc/favicon.ico> ;
        <http://www.w3.org/1999/xhtml/vocab#stylesheet> <http://apassant.net/sites/apassant.net/files/css/css_84042a598208a6aade8783e8c2937a8c.css> , 
                     <http://apassant.net/sites/apassant.net/files/css/css_ba2732162a421c6422a6f5a68742254e.css> .

<http://apassant.net/#id> rdfs:label "About"@en .

<http://apassant.net/alex> a foaf:Person ;
        foaf:name "Alexandre Passant"@en ;
        foaf:workplaceHomepage <http://deri.ie> , 
                               <http://nuigalway.ie> ;
        foaf:schoolHomepage <http://paris-sorbonne.fr> , 
                            <http://dauphine.fr> ;
        foaf:topic_interest <http://dbpedia.org/page/Social_software_%28computer_software%29> ,
                            <http://dbpedia.org/resource/Semantic_Web> ;
        foaf:currentProject <http://www.w3.org/2009/sparql/wiki/> , 
                <http://www.w3.org/2005/Incubator/socialweb/> ;
        <http://purl.org/vocab/bio/0.1/olb> """
\\nDr. Alexandre Passant is a postdoctoral researcher at the Digital Enterprise Research Institute, National University
of Ireland, Galway. His research activities focus around the Semantic Web and Social Software: in particular, how these
fields can interact with and benefit from each other in order to provide a socially-enabled machine-readable Web,
leading to new services and paradigms for end-users. Prior to joining DERI, he was a PhD student at Université 
Paris-Sorbonne and carried out applied research work on \\"Semantic Web technologies for Enterprise 2.0\\" at
Electricité De France. He is the co-author of SIOC, a model to represent the activities of online communities on the
Semantic Web, the author of MOAT, a framework to let people tag their content using Semantic Web technologies, and
is also involved in various related applications as well as standardization activities.\\n"""@en ;
        foaf:based_near <http://dbpedia.org/resource/Galway> ;
        geo:locatedIn <http://dbpedia.org/resource/Galway> ;
        rel:spouseOf <http://julie.letierce.net/#id> ;
        foaf:holdsAccount <http://www.flickr.com/people/terraces/> ,
                          <http://www.linkedin.com/pub/alexandre-passant/1/797/1ab> ,
                          <http://last.fm/user/terraces> , 
                          <http://slideshare.net/terraces> , 
                          <http://twitter.com/terraces> .

<http://apassant.net/#cert> a rsa:RSAPublicKey ;
        cert:identity <http://apassant.net/alex> .

_:node14efunnjjx1 cert:decimal "65537"@en .

<http://apassant.net/#cert> rsa:public_exponent _:node14efunnjjx1 .

_:node14efunnjjx2 cert:hex "8af4cb6d6ec004bd28c08d37f63301a3e63ddfb812475c679cf073c4dc7328bd20dadb9654d4fa588f155ca05e7ca61a6898fbace156edb650d2109ecee65e7f93a2a26b3928d3b97feeb7aa062e3767f4fadfcf169a223f4a621583a7f6fd8992f65ef1d17bc42392f2d6831993c49187e8bdba42e5e9a018328de026813a9f"@en .

<http://apassant.net/#cert> rsa:modulus _:node14efunnjjx2 .

[snip]

This graph can then be queried with SPARQL, merged with other graphs, and just as it links to other resources, those can in turn link back to it, and to elements defined therein. As a result Alexandre Passant can then use this in combination with an appropriate X509 certificate to log into foaf+ssl enabled web sites in one click, without needing to either remember a password or a URL.

Thursday Jul 23, 2009

How to setup Tomcat as a foaf+ssl server

foaf+ssl is a standards based protocol enabling one click identification/authentication to web sites, without requiring the user to enter either a username or a password. It can be used as a global distributed access control mechanism. It works with current browsers. It is RESTful, thereby working with Linked Data and especially linked foaf files, enabling thereby distributed social networks.

I will show here what is needed to get foaf+ssl working for Tomcat 6x. The general principles are documented on the Tomcat ssl howto page, which should be used for detailed reference. Here I will document the precise setup needed for foaf+ssl. If you want to play with this protocol quickly without bothering with this procedure I recommend using the foaf+ssl Identity Provider service which you can point to on your web pages, and which will then redirect your users to the service of your choosing with the URLEncoded WebId of your visitor.

foaf+ssl works by having the server request a client certificate on an https connection. The server therefore needs an https end point which can be specified in Tomcat by adding the following connector to the conf/server.xml file:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           sslProtocol="TLS"/>
Note: the default https port is 443, but it requires root privileges.

Servers authentify themselves by sending the client a certificate signed by a well known Certificate Authority (CA) whose public key is shipped in all browsers. Browsers use the public key to verify the signature sent by the server. If the server sends a certificate that is not signed by one of these CAs (perhaps it is self signed) then the web browser will usually display some pretty ugly error message, warning the user to stay clear of that site, with some complex way of bypassing the warning, which if the user is courageous and knowledgeable enough will allow him to add the certificate to a list of trusted certs. This warning will put most people off. It is best therefore to buy a CA certified cert.(I found one for €15 at trustico.) Usually the CA's will have very detailed instructions for installing the cert for a wide range of servers. In the case of Tomcat you will end up with the following addition property values:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           sslProtocol="TLS"/>

And of course this requires placing the server cert file at the keystoreFile path.

There are usually two ways for the server to respond to the client not sending a (valid) certificate. Either it can simply fail, or it can allow the server app to decide what to do. Automatic failure is not a good option, especially for a login service, as the user will then be confronted with a blank page. Much better is to allow the server to redirect the user to another page explaining how to get a certificate and giving him the option of authentication using OpenId or simply the well known username/password pattern. To enable Tomcat to respond this way you need to add the clientAuth="want" attribute value pair:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           sslProtocol="TLS" clientAuth="want" />

Most Java Web Servers on receiving a client certificate, attempt to automatically validate it, by verifying that it is correctly signed by one of the CA's shipped with the Java Runtime Environment (JRE), verifying that the cert is still valid, ... As the SSL library that ships with the JRE does not implement foaf+ssl we will need to do the authentication at the application layer. We therefore need to bypass the SSL Implementation. To do this Bruno Harbulot put together the JSSLUtils library available on Google Code. As mentioned in the JSSLUtils Tomcat documentation page this will require you to place two jars in the Tomcat lib directory: jsslutils-0.5.1.jar and jsslutils-extra-apachetomcat6-0.5.2.jar (the version numbers may differ as the library evolves). You will also need to specify the SSLImplementation in the conf file as follows:

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           SSLImplementation="org.jsslutils.extra.apachetomcat6.JSSLutilsImplementation" 
           sslProtocol="TLS" clientAuth="want" />

Usually servers send in the request to the client a list of Distinguished Names of certificates authorities (CA) they trust, so that the client can filter from the certificates available in the browser those that match. Getting client certificates signed by CA's is a complex and expensive procedure, which in part explains why requesting client certificates is very rarely used: very few people have certificates signed by well known CAs. Instead those services that rely on client certificate tend to sign those certificates themselves, becoming their own CA. This means that certificates end up being valid for only one domain. foaf+ssl bypasses this problem by accepting certificates signed by any CA, going so far as to allow even self signed certs. The server must therefore send an empty list of CAs meaning that the browser can send any certificate (TLS 1.1). With the JSSLutils library available to Tomcat, this is specified in the conf/server.xml file with the acceptAnyCert=true attribute.

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="50" scheme="https" secure="true"
           keystoreFile="conf/yourServerCert.kdb" 
               keystoreType="JKS" keystorePass="changeme" 
           SSLImplementation="org.jsslutils.extra.apachetomcat6.JSSLutilsImplementation"
           acceptAnyCert="true" sslProtocol="TLS" clientAuth="want" />

At this point you have set up your Apache Server correctly. A user that arrives at your SSL endpoint and that has a couple of certificates will be asked to choose between them. Your client code can the extract the certificate with the following code:

       X509Certificate[] certificates = (X509Certificate[]) request
                       .getAttribute("javax.servlet.request.X509Certificate");

You can use these certificates then to extract the WebId, and verify the SSL certificates. I will write more about how to do this in my next blog post.

Wednesday Apr 29, 2009

Adding twitter to my blog using Scala

Having added javascript widgets to my blog a few months ago, I found that this slowed the page downloads a lot. Here is a way to speed this up again, by pre-processing the work with a Scala script, and using iFrames to include the result.

Here are the short steps to do this:

  1. I wrote a Scala Program (see source) to take the twitter Atom feed, and generate xhtml from it.
  2. I wrote a shell script to run the compiled scala jar
    #!/bin/bash
    
    export CP=$HOME/java/scala/lib/scala-library.jar:$HOME/java/scala/lib/learning.jar
    
    /usr/bin/java -cp $CP learning.BlogIFrame $\*
    
  3. Then I just started a cron job on my unix server to process the script every half an hour
    $ crontab -l
    5,36 \* \* \* \* $HOME/bin/twitter.sh $HOME/htdocs/tmp/blogs.sun.com/tweets.html
    
  4. Finally I added the iFrame to my blog here pointing to the produced html <IFRAME src="http://bblfish.net/tmp/blogs.sun.com/tweets.html" height="300" frameborder="0"></IFRAME>

As a result there is a lot less load on the twitter server - it only has to serve one atom feed every half an hour instead of 1000 or so a day - and my html blog page does not stall if the twitter site itself is overloaded.

Also I learnt a lot about Scala by doing this little exercise.

Friday Feb 13, 2009

I ♡ NetBeans 6.7 !

picture of NetBeans 7 Daily build 200902061401

As I was developing my recently released foaf+ssl server to demonstrate how to create distributed secure yet open social networks, I stumbled across a daily build of NetBeans 7 (build 200902061401), that is stable, beautiful and that has really helped me get my work done. NetBeans 7 is really going to rock!

Update: What I called NetBeans 7 is now called NetBeans 6.7.

Here is a list of some of the functionality that I realy appreciated.

Maven Integration

Haleluia for Maven Integration! I got going on my project by setting up a little Wicket project which I easily adapted to include the Sesame semantic web libraries and more (view pom).

The nicest part of this is that it then becomes extreemly easy to link the source and the javadoc together. Two commands which should be integrated as menu options have finally made it possible for me to work with NetBeans.


# get the javadoc
$ mvn dependency:resolve -Dclassifier=javadoc
# get sources
$ mvn dependency:sources

This simple thing just used to be a nightmare to do, especially as the number of jars one's project depended on increased. The Sesame group have split a lot of their jars up nicely, so that one could use subset of them, but the way NetBeans was set up it bacame a real huge amazing astounding pain to link those to the source. And what is an open source IDE worth if it can't help you browse the source code and see its documentation easily?

Now I don't think Maven is in any way the final word in project deployement. My criticims in short is that it is not RESTful in a few ways, not least of which is that it fails to use URLs to name things and it makes the cache the central element. It is as if they had turned web architecture upside down web, where people would name things by trying to identify the caches in which they were located rather than their Universal Locator. My guess is that as a result things are a lot less flexible than they could be. As Roy Fielding pointed out recently REST APIs must be hypertext driven. Software is located in a global information space, so there is no good reason in my opinion to not follow this precept.

Clearly though this is a huge huge improovement!

A better file explorer

I have sworn a few times at the previous versions of the NB file manager! Even more so when I had to use it to tie the javadoc to the source code - at that point it became a scream. Finally we have a command line File Explorer with tab completion. This is so beautiful I have to take a picture of it: file explorer

We use the keyboard all the time, and one can get many things done much faster that way. Navigating the File System with a keyboard is just much nicer. So why oh why is it still impossible to use up and down arrow keys in the classic view when some files are greyed out? ( Writing this I noticed there seems to be no way to get back from the classic view to the new command line view - please make it possible to get back! )

GlassFish 3 Integration

Well it is a real pleasure to work with a web server that loads a war in half a second. I use hardly any of the J2EE features so it's a good thing those don't get loaded.

I tried the HTTP Server Monitor and that could be useful if it were more informative. In RESTful development it is really important to know the response codes 303, etc... so that one can follow the conversations between the client and the server. Currently that piece is trying to tie things up too much into baby steps: just as with the File Explorer there should be an easy UI into a feature and an advanced mode. I'd like to see the full pure unadulterated content going over the wire, highligted perhaps to make it easy to find things. (It turns out this has been filed as feature request 36706)

GlassFish integration really helped me get my develop and deploy my foaf+ssl service.

User Interface

As you can see from the main picture the NetBeans UI seems to be going through a big transformation. Gone are some of the huge fat aqua buttons. The pieces are layed out in similar ways as in NB6.5, but this is a lot more elegant. A welcome change.

There is a very useful search bar at the top right of NB 7 now, which prooved to be very helpful at finding documentation, maven repositories, and many other things. It prooved to be very helpful a couple of times in my project.

One simple thing I would like would be to have a menu on each of the windows to open a file in its default OS viewer. So when I edit HTML which is a pleasure to do in NB, I would like to be able to quickly view that code in Firefox, Safari or Opera. Other XML files may have their default viewers, and so I think this is quite generalisable. In any case it should be easy to copy the file path of an open window, as one often has to do external processing of it. For files that are located on the internet, it would be great to be able to get their URL. This would help when chatting to people about source code one is working on for example.

Other

  • There are IntelliJ key bindings now. I really needed this a year or so ago, as I was switching between the IDEs. I have forgotten them now so it's less of a problem for me, but it will be very important for people switching between the IDEs.
  • I think this was part of NB6, but being able to browse the local history of source code is a really great feature. (I noticed that this does not diff html or xml for the moment)
  • Geertjan's Wicket integration Module partly works on this daily build. You may require starting of with NB7 milestone 1 to get going as it seemed still to be fully functional there.
  • I find this daily build needs restarting every day, as it seems to slow down after a while, perhaps using up a lot of memory.

Where is this going

Well those are the features that really stood out for me. And I am very happy to work with NB now.

I still think that the next big step, for NB 8 perhaps, should be the webification of the IDE. I think there is a huge amount to gain by applying Web Architecture principles to an IDE, and then the Net in NetBeans would fully reveal it's meaning.

Friday Dec 12, 2008

ruby script to set skype and adium mood message with twitter on osx

Twitter is a great way to learn many little web2.0ish things. I wanted to set the status message on my Skype and Adium clients using my last twitter message. So I found a howto document by Michael Tyson which I adapted a bit to add Skype support and to only post twits that were not replies to someone else - I decide there was just too much loss of context for that to make sense.

#!/usr/bin/env ruby
#
# Update iChat/Adium/Skype status from Twitter
#
# Michael Tyson 
# http://michael.tyson.id.au
# Contributor: Henry Story

# Set Twitter username here
Username = 'bblfish'

require 'net/http'
require 'rexml/document'
include REXML

# Download timeline XML and extract latest entry
url = "http://twitter.com/statuses/user_timeline/" + Username + ".atom"
xml_data = Net::HTTP.get_response(URI.parse(url)).body
doc    = REXML::Document.new(xml_data)

latest = XPath.match(doc,"//content").detect { |c| not /@/.match(c.text)}
message = latest.text.gsub(/\^[\^:]+:\\s\*/, '')
exit if ! message

# Apply to status
script = 'set message to "' + message.gsub(/"/, '\\\\"') 
         + "\\"\\n" +
         'tell application "System Events"' 
         + "\\n" +
         'if exists process "iChat" then tell application "iChat" to set the status message to message' 
         + "\\n" +
         'if exists process "Adium" then tell application "Adium" to set status message of every account to message' 
         + "\\n" +
         'if exists process "Skype" then tell application "Skype" to send command "set profile mood_text "'
         + ' & message script name "twitter"'
         + "\\n" +
         'end tell' + "\\n"

IO.popen("osascript", "w") { |f| f.puts(script) }

This can then be added to the unix crontab as explained in Michael's article, and all is good.

What can one learn with this little exercise? Quite a lot:

  • Ruby - this is my first Ruby hack
  • Atom - twitter uses an atom xml feed to publish its posts
  • unix crontab
  • AppleScript to send messages to all these silly OSX apps
  • vi to edit all of this, but that's not obligatory, you can use less viral ones
  • the value of reusing data accross applications
So that's a good way to spend a little time when one has had a little bit too much to drink the night before. Hmm, is this what one calls procrastination (video)?

Thursday Nov 20, 2008

foaf+ssl: a first implementation

The first very simple implementations for the foaf+ssl protocol are now out: the first step in adding simple distributed security to the global open distributed decentralized social network that is emerging.

Update Feb 2009: I put up a service to create a foaf+ssl service in a few clicks. Try that out if you are short on time first.

The foaf+ssl protocol has been discussed in detail in a previous blog: "FOAF & SSL: creating a global decentralised authentication protocol", which goes over the theory of what we have implemented here. For those of you who have more time I also recommend my JavaOne 2008 presentation Building Secure, Open and Distributed Social Network Applications, which explains the need for a protocol such as this, gives some background understanding of the semantic web, and covers the working of this protocol in detail, all in a nice to listen to slideshow with audio.

In this article we are going to be rather more practical, and less theoretical, but still too technical for the likes of many. I could spend a lot of time building a nice user interface to help make this blog a point and click experience. But we are not looking for point and click users now, but people who feel at home looking at some code, working with abstract security concepts, who can be critical and find solutions to problems too, and are willing to learn some new things. So I have simplified things as much as needs be for people who fall into that category (and made it easy enough for technical managers to follow too, I hope ).

To try this out yourself you need just download the source code in the So(m)mer repository. This can be done simply with the following command line:


$ svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer --username guest
(leave the password blank)

This is downloading a lot more code than is needed by the way. But I don't have time to spend on isolating all the dependencies, bandwidth is cheap, and the rest of the code in there is pretty interesting too, I am sure you will agree. Depending on your connection speed, this will take some time to download, so we can do something else in the meantime, such as have a quick look at the uml diagram of the foaf+ssl protocol:

foaf+ssl uml sequence diagram

Let us make clear who is playing what role. You are Romeo. You want your client - a simple web browser such as Firefox or Safari will do - to identify yourself to Juliette's Web server. Juliette as it happens is a semantic web expert and she trusts that if you are able to read through this blog, understand it, create your X509 certificate and set up your foaf file so that it publishes your public key information correctly then you are human, intelligent, avant-garde, and you have enough money to own a web server which is all to your advantage. As a result her semantically enabled server will give you the secret information you were looking for.

Juliette knows of course that at a later time things won't be that simple anymore, when distributed social networks will be big enough that the proportion of fools will be large enough for their predators to take an interest in this technology, and the tools for putting up a certificate will come packaged with everyone's operating system, embedded in every tool, etc... At that point things will have moved on and Juliette will have added more criteria to give access to her secret file. Not only will your certificate have to match the information in your foaf file as it does now, but given that she knows your URL and what you have published there of your social graph, she will be able to use that and your position in the social graph of her friends to enabling her server to decide how to treat you.

Creating a certificate and a foaf file

So the first thing to do is for you to create yourself a certificate and a foaf file. This is quite easy. You just need to do the following in a shell.


$ cd sommer/misc/FoafServer/

$ java -version
java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)

$ ant jar

Currently one needs at least Java 5 to run this.

Before you create your certificate, you need to know what your foaf URL is going to be. If you allready have a foaf file, then that is easy, and the following will get you going:


$ java -cp dist/FoafServer.jar net.java.dev.sommer.foafserver.utils.GenerateKey  -shortfoaf

Enter full URL of the person to identify (no relative urls allowed): 
for example: http://bblfish.net/people/henry/card#me
http://bblfish.net/people/henry/card#me

Enter password for new keystore :enterAnewPasswordForNewStore     
publish the triples expressed by this n3
# you can use use cwm to merge it into an rdf file
# or a web service such as http://www.rdfabout.com/demo/validator/ to convert it to rdf/xml
# Generated by sommer.dev.java.net
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix rsa: <http://www.w3.org/ns/auth/rsa#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://bblfish.net/people/henry/card#me> a foaf:Person; 
    is cert:identity of [ 
          a rsa:RSAPublicKey;
          rsa:public_exponent "65537"\^cert:decimal ;
          rsa:modulus """b6bd6ce1a5ef51aaa69752c6af2e71948ab6da
9e5a5f086dba7548d8b80150d392117d90138948062eec6ecb5745a45491eea03a46b0a1c2e6324d
54144f42cdaa05ca39939eb973086cfedc8e31641cf7f29abc58310dcb8e56d9e6dae2233a317167
74d1eb32ced152084cfb860fb8cb5298a3c0270145c5d878f07f6417af"""\^cert:hex ;
          ] .

the public and private keys are in the stored in cert.p12
you can list the contents by running the command
$ openssl pkcs12 -clcerts -nokeys -in cert.p12 | openssl x509 -noout -text

If you do then run the openssl command you will find that the public key components should match the rdf above.


$  openssl pkcs12 -clcerts -nokeys -in cert.p12 | openssl x509 -noout -text
Enter Import Password:
MAC verified OK
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 1 (0x1)
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=http://bblfish.net/people/henry/card#me
        Validity
            Not Before: Nov 19 10:58:50 2008 GMT
            Not After : Nov 10 10:58:50 2009 GMT
        Subject: CN=http://bblfish.net/people/henry/card#me
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
            RSA Public Key: (1024 bit)
                Modulus (1024 bit):
                    00:b6:bd:6c:e1:a5:ef:51:aa:a6:97:52:c6:af:2e:
                    71:94:8a:b6:da:9e:5a:5f:08:6d:ba:75:48:d8:b8:
                    01:50:d3:92:11:7d:90:13:89:48:06:2e:ec:6e:cb:
                    57:45:a4:54:91:ee:a0:3a:46:b0:a1:c2:e6:32:4d:
                    54:14:4f:42:cd:aa:05:ca:39:93:9e:b9:73:08:6c:
                    fe:dc:8e:31:64:1c:f7:f2:9a:bc:58:31:0d:cb:8e:
                    56:d9:e6:da:e2:23:3a:31:71:67:74:d1:eb:32:ce:
                    d1:52:08:4c:fb:86:0f:b8:cb:52:98:a3:c0:27:01:
                    45:c5:d8:78:f0:7f:64:17:af
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Key Usage: critical
                Digital Signature, Non Repudiation, Key Encipherment, Key Agreement, Certificate Sign
            Netscape Cert Type: 
                SSL Client, S/MIME
            X509v3 Subject Key Identifier: 
                85:CD:66:A3:F7:23:DA:42:4B:F6:44:A1:90:A8:FE:27:9E:55:64:FE
            X509v3 Authority Key Identifier: 
                keyid:85:CD:66:A3:F7:23:DA:42:4B:F6:44:A1:90:A8:FE:27:9E:55:64:FE
            X509v3 Subject Alternative Name: 
                URI:http://bblfish.net/people/henry/card#me
    Signature Algorithm: sha1WithRSAEncryption
        a6:e0:3f:7c:cb:78:9b:f1:75:7f:62:ca:20:9e:a3:bb:87:61:
        29:59:3f:b9:bb:70:c5:06:bd:9a:62:fc:98:32:b7:f4:8b:53:
        ca:69:fc:5e:01:6a:4c:d8:85:5c:b3:a1:84:ec:1c:d2:6f:a8:
        0f:dd:c0:ff:9f:88:d2:84:8f:77:48:2e:f0:91:fb:2c:2a:22:
        96:07:be:ce:b2:98:87:ee:40:bd:16:32:fa:11:55:fb:0f:96:
        fb:c4:f8:be:66:3f:98:fa:62:61:0b:2f:b5:02:98:97:53:35:
        b5:46:32:c4:38:01:4c:97:66:aa:79:40:1a:67:45:bd:a0:e1:
        97:72

Notice also that the X509v3 Subject Alternative Name, is your foaf URL. The Issuer Distinguished name (starting with CN= here) could be anything.
This by the way, is the certificate that you will be adding to your browser in the next section.

If you don't have a foaf file, then the simplest way to do this is to:

  1. decide where you are going to place the file on your web server
  2. decide what the name of it is
  3. Put a fake file there named cert.rdf
  4. get that file with a browser by typing in the full url there
  5. your foaf url with then be http://yourhost.com/path/cert.rdf#me

Then you can use the following command to create your foaf file:


$ java -cp dist/FoafServer.jar net.java.dev.sommer.foafserver.utils.GenerateKey

That is the same as the first one but without the -shortfoaf argument. You will be asked for some information to fill up your foaf file, so as to make it a little more realistic -- you might as well get something useful out of this. You can then use either cwm or a web service to convert that N3 into rdf/xml, which you can then publish at the correct location. Now entering your url into a web browser should get your foaf file.

Adding the certificate to the browser

The previous procedure will have created a certificate cert.p12, which you now need to import into your browser. The software that creates the certificate could I guess place it in your browser too, but that would require some of work to make it cross platform. Something to do for sure, but not now. On OSX adding certs programmatically to the Keychain application is quite easy.

So to add the certificate to your browsers store, open up Firefox's preferences and go to the Advanced->Encryption tab as shown here

Firefox 3.03 Advanced Preferences dialog

Click on "View Certificates" button, and you will get the Certificate Manager window pictured here.

Firefox 3.03 Certificate manager

Click the import button, and import the certificate we created in the previous section. That's it.

Starting Juliette's server

In a few days time Ian Jacobi will have a python based server working with the new updated certificate ontology. I will point to that as soon as he has it working. In the mean time you can run Juliette's test server locally like this:


$ ant run

This will start her server on your computer on localhost on port 8843 where it will be listening on a secure socket.

Connecting your client to Juliette's server

So now you can just go to https://localhost:8843/servlet/CheckClient in your favorite browser. This is Juliette's protected resource by the way, so we have moved straight to step 2 in the above UML diagram.

Now because this is a server running locally, and it has a secure port open that emits a certificate that is not signed by a well established security authority things get more complicated than they usually need be. So the following steps appea only because of this and so, to make it clear that this is just a result of this experiment, I have placed the following paragraph in a blue background. You will only need to do this the first time you connect in this experminent, so be weary of the blues.

Firefox gave me the following warning the first time I tried it.

Firefox 3.03 certificate error dialog

This is problematic because it just warns that the server's certificate is not trusted, but does not allow you to specify that you trust it (after all, perhaps you just mailed you the public key in the certificate and you could use that information to decide that you trust the server).

On trying again, shift reloading perhaps, I am not sure, I finally got Firefox to present me with the following secure connection failed page:

Firefox 3.03 secure connection failed page

Safari had done the right things first off. Since we trust localhost:8843 (having just started it and even inspected some of the code ) we just need to click the "Or you can add an exception ..." link, which brings up the dialog below:

Firefox 3.03 secure add exception page

They are trying to frighten users here of course. And so they should. Ahh if only we had a localhost signed certificate by a trusted CA, I would not have to write this whole part of the blog!

So of course you go there and click "Add Exception...", and this brings up the following dialog.

Firefox 3.03 get Certificate dialog

So click "Get Certificate" and get the server certificate. When done you can see the certificate

Firefox 3.03 get Certificate dialog

And confirm the security Exception.

Again all of this need not happen. But since it also makes clear what is going on, it can be helpful to show it.

Choose your certificate

Having accepted the server's certificate, it will now ask you for yours. As a result of this Firefox opens up the following dialog.

Firefox 3.03 Server requesting Client Certificate

Since you only have one client certificate this is an easy choice. If you had a number of them, you could choose which persona to present to the site. When you click Ok, the certificate will be sent back to the server. This is the end of stage 2 in the UML diagram above. At that point Juliette's server ( on localhost ) will go and get your foaf file (step 3), and compare the information about your public key to the one in the certificate you just presented (step 4) by making the following query on your foaf file, as shown in the CheckClient class:

      TupleQuery query = rep.prepareTupleQuery(QueryLanguage.SPARQL,
                                "PREFIX cert: " +
                                "PREFIX rsa: " +
                                "SELECT ?mod ?exp " +
                                "WHERE {" +
                                "   ?sig cert:identity ?person ." +
                                "   ?sig a rsa:RSAPublicKey;" +
                                "        rsa:modulus [ cert:hex ?mod ] ;" +
                                "        rsa:public_exponent [ cert:decimal ?exp ] ." +
                                "}");
       query.setBinding("person", vf.createURI(uri.toString()));
       TupleQueryResult answer = query.evaluate();
If the information in the certificate and the foaf file correspond, then the server will send you Juliette's secret information. In a Tabulator enabled browser this comes out like this:

Firefox 3.03 displaying Juliette's secret info

The source code for all that is not far, and you will see that the algorithms used are very simple. This proves that the minimal piece, which is equivalent to what OpenID does, works. Next we will need to build up the server so that it can make decisions based on a web of trust. But by then you will have your foaf file, and filled up your social network a little for this to work.

Further Work

Discussions on this and on a number of other protocols in the same space is currently happening on the foaf protocols mailing list. You are welcome to join the sommer project to work on the code and debug it. As I mentioned Ian Jacobi has a public server running which he should be updating soon with the new certificate ontology that we have been using here.

Clearly it would be really good to have a number of more advanced servers running this in order to experiment with access controls that add social proximity requirements.

Things to look at:

  • What other browsers does this work with?
  • Can anyone get this to work with Aladdin USB e-Token keys or similar tools?
  • Work on access controls that take social proximity into account
  • Does this remove the need for cookie identifiers on web sites?

I hope to be able to present this at the W3C Workshop on the Future of Social Networking in January 2009.

Wednesday Sep 24, 2008

Serialising Java Objects to RDF with Jersey

Jersey is the reference implementation of JSR311 (JAX-RS) the Java API for RESTful Web Services. In short JSR311 makes it easy to publish graphs of Java Objects to the web, and implement update and POST semantics - all this using simple java annotations. It makes it easy for Java developers to do the right thing when writing data to the web.

JAX-RS deals with the mapping of java objects to a representation format - any format. Writing a good format, though, is at least as tricky as building RESTful services. So JAX-RS solves only half the problem. What is needed is to make it easy to serialize any object in a manner that scales to the web. For this we have RDF, the Resource Description Framework. By combining JAX-RS and the So(m)mer's @rdf annotation one can remove remove the hurdle of having to create yet another format, and do this in a way that should be really easy to understand.

I have been wanting to demonstrate how this could be done, since the JavaOne 2007 presentation on Jersey. Last week I finally got down to writing up some initial code with the help of Paul Sandoz whilst in Grenoble. It turned out to be really easy to do. Here is a description of where this is heading.

Howto

The code to do this available from the so(m)mer subversion repository, in the misc/Jersey directory. I will refer and link to the online code in my explanations here.

Annotate one's classes

First one needs to annotate one's model classes with @rdf annotations on the classes and fields. This is a way to give them global identifiers - URIs. After all if you are publishing to the web, you are publishing to a global context so global names are the only way to remove ambiguity. So for example our Person class can be written out like this:

@rdf(foaf+"Person")
public class Person extends Agent {
    static final String foaf = "http://xmlns.com/foaf/0.1/";

    @rdf(foaf+"surname") private Collection<String> surnames = new HashSet<String>();
    @rdf(foaf+"family_name") private Collection<String> familynames = new HashSet<String>();
    @rdf(foaf+"plan") private Collection<String> plans = new HashSet<String>();
    @rdf(foaf+"img") private Collection<URL> images = new HashSet<URL>();
    @rdf(foaf+"myersBriggs") private Collection<String> myersBriggss = new HashSet<String>();
    @rdf(foaf+"workplaceHomepage") private Collection<FoafDocument> workplaceHomePages = new HashSet<FoafDocument>();
    ....
}

This just requires one to find existing ontologies for the classes one has, or to publish new ones. (Perhaps this framework could be extended so as to automate the publishing of ontologies as deduced somehow form Java classes? - Probably a framework that could be built on top of this)

Map the web resources to the model

Next one has to find a mapping for web resources to objects. This is done by subclassing the RdfResource<T> template class, as we do three times in the Main class. Here is a sample:

 @Path("/person/{id}")
   public static class PersonResource extends RdfResource<Employee> {
      public PersonResource(@PathParam("id") String id) {
          t = DB.getPersonWithId(id);
      }
   }

This just tells Jersey to publish any Employee object on the server at the local /person/{id} url. When a request for some resource say /person/155492 is made, a PersonResource object will be created whose model object can be found by querying the DB for the person with id 155492. For this of course one has to somehow link the model objects ( Person, Office,... in our example ) to some database. This could be done by loading flat files, querying an ldap server, or an SQL server, or whatever... In our example we just created a simple hard coded java class that acts as a DB.

Map the Model to the resource

An object can contain pointers to other objects. In order for the serialiser to know what the URL of objects are one has to map model objects to web resources. This is done simply with the static code in the same Main class [ looking for improovements here too ]

   static {
      RdfResource.register(Employee.class, PersonResource.class);
      RdfResource.register(Room.class, OfficeResource.class);
      RdfResource.register(Building.class, BuildingResource.class);              
   }

Given an object the serialiser (RdfMessageWriter) can then look up the resource URL pattern, and so determine the object's URL. So to take an example, consider an instance of the Room class. From the above map, the serialiser can find that it is linked to the OfficeResource class from which it can find the /building/{buildingName}/{roomId} URI pattern. Using that it can then call the two getters on that Room object, namely getBuildingName() and getRoomId() to build the URL referring to that object. Knowing the URL of an object means that the serialiser can stop its serialisation at that point if the object is not the primary topic of the representation. So when serialising /person/155492 the serialiser does not need to walk through the properties of /building/SCA22/3181. The client may already have that information and if not, the info is just a further GET request away.

Running it on the command line

If you have downloaded the whole repository you can just run

$ ant run
from the command line. This will build the classes, recompile the @rdf annotated classes, and start the simple web server. You can then just curl for a few of the published resources like this:
hjs@bblfish:0$ curl -i http://localhost:9998/person/155492
HTTP/1.1 200 OK
Date: Wed, 24 Sep 2008 14:37:38 GMT
Content-type: text/rdf+n3
Transfer-encoding: chunked

<> <http://xmlns.com/foaf/0.1/primaryTopic> </person/155492#HS> .
</person/155492#HS> <http://xmlns.com/foaf/0.1/knows> <http://www.w3.org/People/Berners-Lee/card#i> .
</person/155492#HS> <http://xmlns.com/foaf/0.1/knows> </person/528#JG> .
</person/155492#HS> <http://xmlns.com/foaf/0.1/birthday> "29_07" .
</person/155492#HS> <http://xmlns.com/foaf/0.1/name> "Henry Story" .

The representation returned not a very elegant serialisation of the Turtle subset of N3. This makes the triple structure of RDF clear- subject relation object - and it uses relative URLs to refer to local resources. Other serialisers could be added, such as for rdf/xml. See the todo list at the end of this article.

The represenation says simple that this resource <> has as primary topic the entity named by #HS in the document. That entity's name is "Henry Story" and knows a few people, one of which is refered to via a global URL http://www.w3.org/People/Berners-Lee/card#i, and the other via a local URL /person/528#JG.

We can find out more about the /person/528#JG thing by making the following request:

hjs@bblfish:0$ curl -i http://localhost:9998/person/528#JG
HTTP/1.1 200 OK
Date: Wed, 24 Sep 2008 14:38:10 GMT
Content-type: text/rdf+n3
Transfer-encoding: chunked

<> <http://xmlns.com/foaf/0.1/primaryTopic> </person/528#JG> .
</person/528#JG> <http://xmlns.com/foaf/0.1/knows> </person/155492#HS> .
</person/528#JG> <http://xmlns.com/foaf/0.1/knows> </person/29604#BT> .
</person/528#JG> <http://xmlns.com/foaf/0.1/knows> <http://www.w3.org/People/Berners-Lee/card#i> .
</person/528#JG> <http://www.w3.org/2000/10/swap/pim/contact#office> </building/SCA22/3181#it> .
</person/528#JG> <http://xmlns.com/foaf/0.1/birthday> "19-05" .
</person/528#JG> <http://xmlns.com/foaf/0.1/name> "James Gosling" .

... where we find out that the resource named by that URL is James Gosling. We find that James has an office named by a further URL, which we can discover more about with yet another request


hjs@bblfish:0$ curl -i http://localhost:9998/building/SCA22/3181#it
HTTP/1.1 200 OK
Date: Wed, 24 Sep 2008 14:38:38 GMT
Content-type: text/rdf+n3
Transfer-encoding: chunked

<> <http://xmlns.com/foaf/0.1/primaryTopic> </building/SCA22/3181#it> .
</building/SCA22/3181#it> <http://www.w3.org/2000/10/swap/pim/contact#address> _:2828781 .
_:2828781 a <http://www.w3.org/2000/10/swap/pim/contact#Address> .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#officeName> "3181" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#street> "4220 Network Circle" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#stateOrProvince> "CA" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#city> "Santa Clara" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#country> "USA" .
_:2828781 <http://www.w3.org/2000/10/swap/pim/contact#postalCode> "95054" .

Here we have a Location that has an Address. The address does not have a global name, so we give it a document local name, _:2828781 and serialise it in the same representation, as shown above.

Because every resource has a clear hyperlinked representation we don't need to serialise the whole virtual machine in one go. We just publish something close to the Concise Bounded Description of the graph of objects.

Browsing the results

Viewing the data through a command line interface is nice, but it's not as fun as when viewing it through a web interface. For that it is best currently to install the Tabulator Firefox plugin. Once you have that you can simply click on our first URL http://localhost:9998/person/155492. This will show up something like this:

picture of tabualator on loading /person/155492

If you then click on JG you will see something like this:

picture tabulator showing James Gosling

This it turns out is a resource naming James Gosling. James knows a few people including a BT. The button next to BT is in blue, because that resource has not yet been loaded, whereas the resource for "Henry Story" has. Load BT by clicking on it, and you get

picture of tabulator showing Henry Story knowning Bernard Traversat

This reveals the information about Bernard Traversat we placed in our little Database class. Click now on the i and we get

Tabulator with info about Tim Berners Lee

Now we suddenly have a whole bunch of information about Tim Berners Lee, including his picture, some of the people he has listed as knowing, where he works, his home page, etc... This is information we did not put in our Database! It's on the web of data.

One of the people Tim Berner's Lee knows is our very own Tim Bray.

tabulator showing info from dbpedia on Tim Bray

And you can go on exploring this data for an eternity. All you did was put a little bit of data on a web server using Jersey, and you can participate in the global web of data.

Todo

There are of course a lot of things that can be done to improove this Jersey/so(m)mer mapper. Here are just a few I can think of now:

  • Improove the N3 output. The code works with the examples but it does not deal well with all the literal types, nor does it yet deal with relations to collections. The output could also be more human readable by avoiding repetiations.
  • Refine the linking between model and resources. The use of getters sounds right, but it could be a bit fragile if methods are renamed....
  • Build serialisers for rdf/xml and other RDF formats.
  • Deal with publication of non information resources, such as http:// xmlns.com/foaf/0.1/Person which names the class of Persons. When you GET it, it redirects you to an information resources:
    hjs@bblfish:1$ curl -i http://xmlns.com/foaf/0.1/Person
    HTTP/1.1 303 See Other
    Date: Wed, 24 Sep 2008 15:30:14 GMT
    Server: Apache/2.0.61 (Unix) PHP/4.4.7 mod_ssl/2.0.61 OpenSSL/0.9.7e mod_fastcgi/2.4.2 Phusion_Passenger/2.0.2 DAV/2 SVN/1.4.2
    Location: http://xmlns.com/foaf/spec/
    Vary: Accept-Encoding
    Content-Length: 234
    Content-Type: text/html; charset=iso-8859-1
    
    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>303 See Other</title>
    </head><body>
    <h1>See Other</h1>
    <p>The answer to your request is located <a href="http://xmlns.com/foaf/spec/">here</a>.</p>
    </body></html>
    
    This should also be made easy and foolproof for Java developers.
  • make it industrial strength...
  • my current implementation is tied much too strongly to so(m)mer. The @rdf annotated classes get rewritten to create getters and setters for every field, that links to the sommer mappers. This is not needed here. All we need is to automatically add the RdfSerialiser interface to make it easy to access the private fields.
  • One may want to add support for serialising @rdf annotated getters
  • Add some basic functionality for POSTing to collections or PUTing resources. This will require some thought.

Bookmarks: digg+, reddit+, del.icio.us+, dzone+, facebook+

Wednesday Sep 17, 2008

Are OO languages Autistic?

illustration of a simple play

One important criterion of Autism is the failure to develop a proper theory of mind.

A standard test to demonstrate mentalizing ability requires the child to track a character's false belief. This test can be done using stories, cartoons, people, or, as illustrated in the figure, a puppet play, which the child watches. In this play, one puppet, called, Sally, leaves her ball in her basket, then goes out to play. While she is out, naughty Anne moves the ball to her own box. Sally returns and wants to play with her ball. The child watching the puppet play is asked where Sally will look for her ball (where does Sally think it is?). Young children aged around 4 and above recognize that Sally will look in the basket, where she (wrongly) thinks the ball is.
Children with autism will tend to answer that Sally will look for the ball in the box.

Here are two further descriptions of autism from today's version of the Wikipedia article:

The main characteristics are of Autism are impairments in social interaction, impairments in communication, restricted interests and repetitive behavior.
Sample symptoms include lack of social or emotional reciprocity, stereotyped and repetitive use of language or idiosyncratic language, and persistent preoccupation with parts of objects.

In order to be able to have a mental theory one needs to be able to understand that other people may have a different view of the world. On a narrow three dimensional understanding of 'view', this reveals itself in that people at different locations in a room will see different things. One person may be able to see a cat behind a tree that will be hidden to another. In some sense though these two views can easily be merged into a coherent description. They are not contradictory. But we can do the same in higher dimensions. We can think of people as believing themselves to be in one of a number of possible worlds. Sally believes she is in a world where the ball is in the basket, whereas Ann believes she is in a world where the ball is in the box. Here the worlds are contradictory. They cannot both be true of the actual world.

To be able to make this type of statement one has to be able to do at least the following things:

  • Speak of ways the world could be
  • Refer to objects across these worlds
  • Compare these worlds
The ability to do this is present in none of the well known Object Oriented (OO) languages by default. One can add it, just as one can add garbage collection to C, but it requires a lot of discipline and care. It does not come naturally. Perhaps a bit like a person with Asperger's syndrome can learn to interact socially with others, but in a reflective awkward way.

Let us illustrate this with a simple example. Let us see how one could naively program the puppet play in Java. Let us first create the objects we will need:

Person sally = new Person("Sally");
Person ann = new Person("Ann");
Container basket = new Container("Basket");
Container box = new Container("Box");
Ball ball = new Ball("b1");
Container room = new Container("Room");
So far so good. We have all the objects. We can easily imagine code like the following to add the ball into the basket, and the basket into the room.
basket.add(ball);
room.add(basket);
Perhaps we have methods whereby the objects can ask what their container is. This would be useful for writing code to make sure that a thing could not be in two different places at once - in the basket and in the box, unless the basket was in the box.
Container c = ball.getImmediateContainer();
Assert.true(c == basket);
try {
      box.add(ball)
      Assert.fail();
} catch (InTwoPlacesException e) {
}
All that is going to be tedious coding, full of complicated issues of their own, but it's the usual stuff. Now what about the beliefs of Sally and Ann? How do we specify those? Perhaps we can think of sally and ann as being small databases of objects they are conscious of. Then one could just add them like this:
sally.consciousOf(basket,box,ball);
ann.consciousOf(basket,box,ball);
But the problem should be obvious now. If we move the ball from the basket to the box, the state of the objects in sally and ann's database will be exactly the same! After all they are the same objects!
basket.remove(ball);
box.add(ball);
Ball sb = sally.get(Ball.class,"b1");
Assert.true(box.contains(sb));
//that is because
Ball ab = ann.get(Ball.class,"b1");
Assert.true(ab==sb);
There is really no way to change the state of the ball for one person and not for the other,... unless perhaps we give both people different objects. This means that for each person we would have to make a copy of all the objects that they could think of. But then we would have a completely different problem: namely deciding when these two objects were the same. For it is usually understood that the equality of two objects depends on their state. So one usually would not think that an physical object could be the same if it was in two different physical places. Certainly if we had a ball b1 in a box, and another ball b2 in a basket, then what on earth would allow us to say we were speaking of the same ball? Perhaps their name, if it we could guarantee that we had unique names for things. But we would still have some pretty odd things going on then, we would have objects that would somehow be equal, but would be in completely different states! And this is just the beginning of our problems. Just think of the dangers involved here in taking an object from ann's belief database, and how easy it would be to by mistake allow it to be added to sally's belief store.

These are not minor problems. These are problems that have dogged logicians for the last century or more. To solve it properly then one should look for languages that were inspired by the work of such logicians. The most serious such project is now knows as the Semantic Web.

Using N3 notation we can write the state of affairs described by our puppet show, and illustrated by the above graph, out like this:

@prefix : <http://test.org/> .

:Ann :believes { :ball :in :box . } .
:Sally :believes { :ball in :basket } .

N3 comes with a special notation for grouping statements by placing them inside of { }. We could then easily ask who believes the ball is in the basket using SPARQL

PREFIX : <http://test.org/>
SELECT ?who
WHERE {
     GRAPH ?g1 { :ball :in :basket }
     ?who :believes ?g1 .
}

The answer would bind ?who to :Sally, but not to :Ann.

RDF therefore gives us the basic tools to escape from the autism of simpler languages:

  • One can easily refer to the same objects across contexts, as URIs are the basic building block of RDF
  • The basic unit of meaning are sets of relations - graphs - and these are formally described.
The above allows query for objects across contexts and so to compare, merge and work with contexts.

It is quite surprising once one realizes this, to think how many languages claim to be web languages, and yet fail to have any default space for the basic building blocks of the web: URIs and the notion of different points of views. When one fetches information from a remote server one just has to take into account the fact that the server's view of the world may be different and incompatible in some respects with one's own. One cannot in an open world just assume that every body agrees with everything. One is forced to develop languages that enable a theory of mind. A lot of failures in distributed programming can probably be traced down to working with tools that don't.

Of course tools can be written in OO languages to work with RDF. Very good ones have been written in Java, such as Sesame, making it possible to query repositories for beliefs across contexts (see this example). But they bring to bear concepts that don't sit naturally with Java, and one should be aware of this. OO languages are good for building objects such as browsers, editors, simple web servers, transformation tools, etc... But they don't make it easy to develop tools that require just the most basic elements of a theory of mind, and so most things to do with communication. For that one will have to use the work done in the semantic web space and familiarize oneself with the languages and tools developed for working with them.

Finally the semantic web also has its OO style with the Web Ontology Language (OWL). This is just a set of relations to describe classes and relations. Notice though that it is designed for intra context inference, ie all inferences that you can make within a world. So in that sense thinking in OO terms does even at the Semantic Web layer seem to not touch on thinking across contexts, or mentally. Mind you, since people deal with objects, it is also important to think about objects to understand people. But it is just one part of the problem.

vote on reddit and follow the discussion
vote on dzone

Tuesday Sep 02, 2008

exploring a web 2.0 app

I was trying today to understand what communication was going in the social site project today, which contains an implementation of the Open Social protocol. As the application uses a lot of asynchronous javascript, and as it is difficult to find the URLs for the resources pointed to (they get constructed at run time), one has to listen in on the wire to find the structure of the project. But that is not at all as easy as one could hope. Here is how I got going:

  • Usually I use the command line tcpflow that just captures all the communication on the wire. The problem is that this does not decrypt the ssl connections, and social site has a lot of requests going over https.
  • There was a Firefox plugin called Slogger that would keep all the communication going back and forth and store it to the hard disk. But it is no longer maintained, and no longer seems to be working for Firefox 3.0.1 . (I even tried the workaround suggested on the plugin page)
  • Firebug is another, and very useful, such Firefox plugin. It works and was good enough for what I needed to do. The only problem is that it seems to have to make the requests again to show you the content, which is a problem for POST requests as these could have side effects. Here is a picture: Firebug HTTP console
    (one can immediately see some oddities here: the last visible request asks for xml content types, but the response is json).
  • Francois Le Droff pointed me to Charles Web Debugging Proxy (written in Java) which is a proprietary proxy, that seems to be able to do it all. I have not tried it yet, as Firebug did the job for me. But I may need to get it at some later time, when state on the server starts getting to be important.

Getting this type of information is really important when debugging an application. Hopefully bug 430155 will be fixed soon, allowing Firebug to do the right thing. I wonder if Google's Chrome comes with this functionality built in?

Sunday Aug 10, 2008

A Simple Sommer Project

Are you dreaming of the coasts of Java? Need some real sunshine? Have you had enough of hibernation? Here is a simple project to help you wake up and open up some new horizons :-)

A couple of weeks ago, Stephanie Stroka asked me how to make use of the so(m)mer library. So(m)mer is designed to plays a similar role for the semantic web that Hibernate plays for relational databases. So when relational databases feel too constraining, and you need an open database that spans the world, the semantic web is for you. And if you are a Java programmer, then so(m)mer is a good open source project you can participate in.
Anyway, I had not had time to write out a HOWTO, so I directed Stephanie over instant messenger on how to get going. It turns out this was a little more complicated that I had imagined, for someone completely new to the project. So Stephanie wrote out a really simple, indeed the simplest possible project, documented it, and added it to the so(m)mer repository.

As a result there is now a really simple example to follow to get out of hibernation mode, and into the blooming summer. Just follow the instructions.

Wednesday Jun 25, 2008

NetBeans and Semantic Wikis

The Kiwi team is meeting at the Prague Sun offices for the next few days to discuss the roadmap of this cutest of all semantic wikis. I completely empathize with Jana Herwig, when she writes:

An IT project is like herding cats, they say - in our case, we’ll be herding kiwis, and if we can enjoy it only half as much as these guys, I’ll be fine:-)

And illustrates it with this video:


VIDEO CLIP -- this should only display if shockwave does not work. Why does it not do the right thing in Firefox 3? If no video appears try it directly on youtube.

I like to think of us developers as being the cats, and the kiwis as the things we want to herd. Tasty :-) That would indeed also explain why I am still in France - herding cats ain't easy.

Why am I still in France and not tasting free beer in Prague? Well, last month's conferences in California has given me a conference overdose, from which I am still recovering. Also I have been speaking so much about the Address Book, that I really need to sit down, roll up my sleeves, and just work on it. I did try to write something on topic yesterday, relating the semantic web and NetBeans - since Prague is the center of NetBeans development. I hope that excuses me somewhat.

A list of on the spot updates to this meeting can be found on kiwi planet.

Tuesday Jun 24, 2008

Webifying Integrated Development Environments

IDEs should be browsers of code on a Read Write Web. A whole revolution in how to build code editors is I believe hidden in those words. So let's imagine it. Fiction anticipates reality.

Imagine your favorite IDE, a future version of NetBeans perhaps or IntelliJ, which would make downloading a new project as easy as dragging and dropping a project url onto your IDE. The project home page would point to a description of the location of the code, the dependencies of this project on other projects, described themselves via URL references, which themselves would be set up in a similar manner. Let's imagine further: instead of downloading all the code from CVS, think of every source code document as having a URL on the web. ( Subversion is in fact designed like this, so this is not so far fetched at all.) And let's imagine that NetBeans thinks about each software component primarily via this URL.
Since every piece of code and every library has a URL, the IDE would be able to use RESTful architectural principles of the web. A few key advantages of this are

  • Caching: web architecture is the ability to cache information on the network or locally without ambiguity. This is how your web browser works ( though it could work better ). To illustrate: once a day Google changes its banner image. Your browser and every browser on earth only fetches that picture once a day, even if you do 100 searches. Does Google serve one image to each browser? No! numerous caches (company, country, or other) cache that picture and send it to the browser without sending the request all the way to the search engine, reducing the load on their servers very significantly.
  • Universal names: since every resource has a URL, any resource can relate in one way or another to any other resource wherever it is located. This is what enables hypertext and what is enabling hyperdata.

Back to the IDE. So now that all code, all libraries, can be served up RESTfully in a Resource Oriented Architecture what does this mean to the IDE? Well a lot. Each may seem small, but together they pack a huge punch:
  • No need to download libraries twice: if you have been working on open source projects at all frequently you must have noticed how often the same libraries are found in each of the projects you have downloaded. Apache logging is a good example.
  • No need to download source code: it's on the web! You don't therefore need a local cache of code you have never looked at. Download what you need when you need it (and then cache it!): the Just in Time principle.
  • Describe things globally: Since you have universal identifiers you can now describe how source code relates to documentation, to people working on the code, or anything else in a global way, that will be valid for all. Just describe the resources. There's a framework around just for that, that is very easy to use with the right introduction.

The above advantages may seem rather insignificant. After all, real developers are tough. They use vi. (And I do). So why should they change? Well notice that they also use Adobe Air or Microsoft Silverlight. So productivity considerations do in fact play a very important factor in the software ecosystem.
Don't normal developers just work on a few pieces of code? Well speaking for myself here, I have 62 different projects in my /Users/hjs/Programming directory, and in each of these I often have a handful of project branches. As more and more code is open source, and owned and tested by different organizations, the number of projects available on the web will continue to explode, and due to the laziness principle the number of projects using code from other projects will grow further. Already whole operating systems consisting of many tens of thousands of different modules can be downloaded and compiled. The ones I have downloaded are just the ones I have had the patience to get. Usually this means jumping through a lot of hoops:

  1. I have to finding the web site of the code. And I may only have a jar name to go by. So Google helps. But that is a whole procedure in itself that should be unecessary. If you have an image in your browser you know where it is located by right-clicking over it and selecting the URL. Why not so with code?
  2. Then I have to browse a web page, which may not be written in my language, and find the repository of the source code
  3. Then I have to find the command line to download the source code, or the command in the IDE and also somehow guess which version number produced the jar I am using.
  4. Once downloaded, and this can take some time, I may have to find the build procedure. There are a few out there. Luckily ant and maven are catching on. But some of these files can be very complicated to understand.
  5. Then I have to link the source code on my local file system to the jar on my local file system my project is using. In NetBeans this is exceedingly tedious - sometimes I have found it to be close to impossible even. IntelliJ has a few little tricks to automate some of this, but it can be pretty nasty too, requiring jumping around different forms. Especially if a project has created a large number of little jar files.
  6. And then all that work is only valid for me. Because all references are to files on my local file system, they cannot be published. NetBeans is a huge pain here in that it often creates absolute file URLs in its properties files. By replacing them with relative urls one can get publish some of the results, but at the cost of copying every dependency into the local repository. And working out what is local and what is remote can take up a lot of time. It will work on my system, but not on someone else's.
  7. Once that project downloaded one may discover that it depends on yet another project, and so we have to go back to step 1.

So doing the above is currently causing me huge headaches even for very simple projects. As a result I do it a lot less often than I could, missing valuable opportunities as a result. Each time I download a project in order to access the sources to walk through my code and find a bug, or to test out a new component I have to do all that download rigmarole described above. If you have a deadline, this can be a killer.

So why do we have to tie together all the components on our local file system? This is because the IDE's are not referring to the resources with global identifiers. The owner of the junit project should say somewhere, in his doap file perhaps that:

 
   @prefix java: <http://java.net/ont/java#> . #made this up
   @prefix code: <http://todo.eg/#> .

   <http://project.eg/svn/lib/junit-4.0.jar> a java:Jar;
         code:builtFrom <http://junit.sourceforge.net/> .

   #what would be needed here needs to be worked out more carefully. The point is that we don't
   #at any point refer to any local file.

Because this future IDE we are imagining together will then know that it has stored a local copy of the jar somewhere on the local file system, and because it will know where it placed the local copy of the source code, it will know how the cached jar relates to the cached source code, as illustrated in the diagram above. So just as when you click on a link on your web browser you don't have to do any maintenance to find out where the images and html files are cached on your hard drive, and how one resource (you local copy of an image) relates to the web page, so we should not have to do any of this type of work in our Development Environment either.

From here many other things follow. A couple of years ago I showed how this could be used link source code to bugs, to create a distributed bug database. Recently I showed how one could use this to improve build scripts. Why even download a whole project if you are stepping through code? Why not just fetch the code that you need when you need it from the web? One HTTP GET at a time. The list of functional improvements is endless. I welcome you to list some that you come up with in the comments section below.

If you want to make a big impact in the IDE space, that will be the way to go.

Wednesday May 07, 2008

Three Semantic Web talks at JavaOne 2008

Following on the success last year, JavaOne 2008 has lined up three talks on the Semantic Web, a 200% increase. The program should be an excellent way for Java enthusiasts to get a feel for how the Semantic Web is getting used in real application making money for real start ups, how to develop such apps in Java, how to build open social networks that bridge the social networking data silos, and with the help of Dean Allemang cover some theoretical grounds from a practical perspective .

Here is the timetable of the sessions at JavaOne. Highlighted in green are the three semantic web sessions. Highlighted in gray are 4 of the 5 sessions on Google's Open Social API, which reveals the importance social networks are taking in development. I don't think though that that API solves the real problem of current social networks: The Data silo problem. Only Semantic Web technologies can do that.

Update Sept 2008: Two of the talks are now available online:

JavaOne Semantic Sessions Time Table

Below are the details of the sessions in tabular format. I believe they should complement each other very well.

Session Title: Developing Semantic Web Applications on the Java™ Platform
Session Time: Thursday - 05/08/2008 - 1:30 PM-2:30 PM
Session ID: PAN-5542
Session Description: The semantic web is nearing the point of widespread practical adoption:

• The core specifications have stabilized.
• Tools and frameworks implementing key features have been through several development cycles (for a listing see http://esw.w3.org/topic/SemanticWebTools).
• An increasing number of major software companies have developed semantically enabled products or are actively researching the space.

As companies start to translate theory into real Java™ technology-based applications, they are confronted with a host of practical software engineering issues:

• What is the standard or recommended functional architecture of a semantic application?
• How does that architecture relate to the semantic web standards?
• Which of those standards are stable, and which can be expected to evolve in ways that would significantly affect prior applications?
• What types of tools/frameworks exist that can be leveraged to help implement semantic applications on the Java platform?
• How mature are the various categories of semantic web tools/frameworks?
• Can API standardization be expected for certain tool/framework categories?
• What best practices exist for the design and implementation of Java technology-based semantic applications?
• What best practices exist for the deployment of Java technology-based semantic applications?
• What future trends in Java platform support for semantic application development can be expected?

This panel session gathers together semantics experts from the software industry to address these and other practical issues relating to the development of semantic applications on the Java platform.

Track: Next Generation Web
Session Type: Panel Session
Duration: 60 minutes
Speaker(s): Jans Aasman, Franz Inc; Dean Allemang, TopQuadrant Inc. ; Brian Sletten, Zepheira, LLC; Henry Story, Sun Microsystems, Inc.; Lew Tucker, Radar Networks

Session Title: Beatnik: Building an Open Social Network Browser
Session Time: Thursday - 05/08/2008 - 7:30 PM-8:20 PM
Session ID: BOF-5911
Session Description: The recent growth of social networking sites is revealing the limits of the current ad hoc data architecture used by Web 2.0 sites. A typical example is that you cannot link to a person in a Facebook account from a LinkedIn account. What is needed to solve these problems is hyperdata, the ability to link data universally.

Hyperdata is to data what hypertext is to text. Where hypertext enables text to link up to other text, hyperdata enables data to link up to other data globally. Where HTML enables open, distributed hypertext, the semantic web enables open, distributed hyperdata. Anybody can publish data that then becomes reachable by any tool crawling the web of relations.

To illustrate the power of hyperdata, this session presents Beatnik, a social network browser and editor written entirely in the Java™ programming language that consumes any of the millions of available friend-of-a-friend (FOAF) files already published on the web and enables users to publish information about themselves and their own social network. It shows how you can drag and drop a FOAF URL onto Beatnik and start exploring a web of relations and find up-to-date information about where your friends live, who their friends are, and where people are currently located. With a click of a button, Beatnik will publish all your own relations to your web server in a nonintrusive way to make you part of the first globally available open social network.

After a quick overview of the semantic web and FOAF, the presentation takes a detailed look at how the Beatnik client is built. This involves digging into one of the many Java technology-based semantic web frameworks, such as Sesame, and its APIs; a Java-platform-to-RDF mapper, such as so(m)mer or Elmo; and how this enables inferencing on the Java platform.

On the server side, the presentation looks at how you can easily publish the contents of an LDAP database into any of the numerous RDF formats using JSR 311, the Java API for RESTful Web Services. It also covers the use of the Atom Publishing Protocol as a publication mechanism and discusses various security techniques for limiting the view of a personal graph of information by using OpenID and distributed-web-of-trust techniques.

Track: Cool Stuff, Cool Desktop; Cool Stuff, Cool Next Gen Web; Open Source, Open Source Next Gen Web; Cool Stuff; Desktop; Next Generation Web; Open Source
Session Type: Birds-of-a-Feather Session (BOF)
Duration: 50 minutes
Speaker(s): Tim Boudreau, Sun Microsystems, Inc.; Henry Story, Sun Microsystems, Inc.

Session Title: Semantic Web for the Working Ontologist
Session Time: Friday 05/09/2008 - 1:30 PM-2:30 PM
Session ID: TS-5555
Session Description: This session presents the basics of practical semantic web deployment using standards-based tools on the Java™ platform. It covers the Resource Description Framework (RDF) as the fundamental mashup language of the web; SPARQL, the query language for RDF; and RDFS and OWL, which provide simple inferencing capabilities.

In the distributed world of the web, information is moving from a hypertext paradigm to a hyperdata paradigm--the web today is not just a web of documents but also a web of data. But that data is available on the web and in the enterprise in a wide variety of forms: HTML, XML, RSS, spreadsheets, databases, and so on. RDF provides a uniform way to identify information in a distributed setting to form a web of data.

The session demonstrates a Java technology-based platform (built on Eclipse) that uses RDF as an interlingua for merging information from multiple web sources. Java technology plays a key role in the success of the system in several ways. First, it uses the large variety of public domain semantic web software available on the Java platform as the basis of interoperability at the API level. Second, it uses the Eclipse framework as a visual editing environment for the ontologies. Finally, it uses the modularity of the Eclipse plug-in environment to enable a sort of plug-and-play architecture among semantic components.

One of the basic ideas of the semantic web is that semantic models, or “ontologies,” can be used to describe how data fits together. In the context of the web of hyperdata, an ontology can describe how data in one source relates to data from another, or even which sources of data should be merged to answer a particular question or support a particular application. The idea is that, armed with these tools, a working ontologist can describe hyperdata applications without resorting to a general-purpose programming language.

TopQuadrant has used these standards to construct a workbench for building semantic applications. Semantic mashups can be built by use of RDFS and OWL. TopQuadrant has also developed a visual flow editor for describing how distributed data can be merged in novel ways; it calls this editor SPARQLMotion, because it extends the standard query language SPARQL with intuitive information flow diagrams modeled in OWL. SPARQLMotion modules can be connected with a simple point-and-click interface to create novel arrangements.

Track: Next Generation Web
Session Type: Technical Session
Duration: 60 minutes
Speaker(s): Dean Allemang, TopQuadrant Inc.

Tuesday May 06, 2008

BOF-5911: Building a Web 3.0 Address Book

To give everyone a chance to try out the So(m)mer Address Book, I have made it available via Java Web Start: just click on the picture to the right, and try it out.

The Address Book is currently demoware: it shows how one can build virally an open distributed social network client that solves the social network data silo problem (video). No need to have an account on every social networking site on which you have friends, and so maintain your data on each one. You can simply belong to one network and link to all your friends wherever they are. With one click of a button you can publish your social network to your own web server, using ftp, scp, WebDAV, or even Atom. You can then link to other people who have (or not in fact), a foaf file. By pressing the space bar when selecting a friend, the Address Book with then GET their file. So you can browse your social network.

To get going you can explore my social network by dragging my foaf file icon onto the first pane of the application.

In BOF-5911 which I will be presenting on Thursday at 7:30pm I will be presenting the social networking problem, demonstrating how the So(m)mer Address Book solves it, and showing in detail how it is build, what the problems are, and what work remains. I will also discuss how this can be used to create global single sign on based on a network of trust.

Update

An improved version of the presentation I gave is now available online with audio as Building Secure, Open and Distributed Social Network Applications

Wednesday Mar 05, 2008

Opening Sesame with Networked Graphs

Simon Schenk just recently gave me an update to his Networked Graphs library for the Sesame RDF Framework. Even though it is in early alpha state the jars have already worked wonders on my Beatnik Address Book. With four simple SPARQL rules I have been able to tie together most of the loose ends that appear between foaf files, as each one often uses different ways to refer to the same individual.

Why inferencing is needed

So for example in my foaf file I link to Simon Phipps- Sun's very popular Open Source Officer - with the following N3:

 :me foaf:knows   [ a foaf:Person;
                    foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                    foaf:name "Simon Phipps";
                    foaf:homepage <http://www.webmink.net/>;
                    rdfs:seeAlso <http://www.webmink.net/foaf.rdf>;
                  ] .
For those who still don't know N3 (where have you been hiding?) this says that I know a foaf:Person named "Simon Phipps" whose homepage is specified and for which more information can be found at the http://www.webmink.net/foaf.rdf rdf file. Now the problem is that the person in question is identified by a '[' which represents a blank node. Ie we don't have a name (URI) for Simon. So when the Beatnik Address Book gets Simon's foaf file, by following the rdfs:seeAlso relation, it gets among others something like
[] a foaf:Person;
   foaf:name "Simon Phipps";
   foaf:nick "webmink";
   foaf:homepage </>;
   foaf:knows [ a foaf:Person;
                foaf:homepage <http://www.buzzword-compliant.com/>;
                rdfs:seeAlso <http://www.buzzword-compliant.com/foaf.rdf>;
             ] .
This file then contains at least two people. Which one is the same person? Well a human being would guess that the person named "Simon Phipps" is the same in both cases. Networked Graphs helps Beatnik make a similar guess by noting that the foaf:homepage relation is an owl:InverseFunctionalProperty.

Some simple rules

After downloading Simon Phipps's foaf file and mine and placing the relations found in them in their own Named Graph, we can in Sesame 2.0 create a merged view of both these graphs just by creating a graph that is the union of the triples of each .

The Networked Graph layer can then do some interesting inferencing by defining a graph with the following SPARQL rules

#foaf:homepage is inverse functional
grph: ng:definedBy """
  CONSTRUCT { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .  } 
  WHERE { 
       ?a <http://xmlns.com/foaf/0.1/homepage> ?pg .
       ?b <http://xmlns.com/foaf/0.1/homepage> ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 } """\^\^ng:Query .
This is simply saying that if two names for things have the same homepage, then these two names refer to the same thing. I could be more general by writing rules at the owl level, but those would be but more complicated, and I just wanted to test out the Networked Graph sail to start with. So the above will add a bunch of owl:sameAs relations to our NetworkedGraph view on the Sesame database.

The following two rules then just complete the information.

# owl:sameAs is symmetric
#if a = b then b = a 
grph: ng:definedBy """
  CONSTRUCT { ?b <http://www.w3.org/2002/07/owl#sameAs> ?a . } 
  WHERE { 
     ?a <http://www.w3.org/2002/07/owl#sameAs> ?b . 
     FILTER ( ! SAMETERM(?a , ?b) )   
  } """\^\^ng:Query .

# indiscernability of identicals
#two identical things have all the same properties
grph: ng:definedBy """
  CONSTRUCT { ?b ?rel ?c . } 
  WHERE { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .
          ?a ?rel ?c . 
     FILTER ( ! SAMETERM(?rel , <http://www.w3.org/2002/07/owl#sameAs>) )   
  } """\^\^ng:Query .
They make sure that when two things are found to be the same, they have the same properties. I think these two rules should probably be hard coded in the database itself, as they seem so fundamental to reasoning that there must be some very serious optimizations available.

Advanced rules

Anyway the above illustrates just how simple it is to write some very clear inferencing rules. Those are just the simplest that I have bothered to write at present. Networked Graphs allows one to write much more interesting rules, which should help me solve the problems I explained in "Beatnik: change your mind" where I argued that even a simple client application like an address book needs to be able to make judgements on the quality of information. Networked Graphs would allow one to write rules that would amount to "only believe consequences of statements written by people you trust a lot". Perhaps this could be expressed in SPARQL as

CONSTRUCT { ?subject  ?relation ?object . }
WHERE {
    ?g tr:trustlevel ?tl .
    GRAPH ?g { ?subject ?relation ?object . }
    FILTER ( ?tl > 0.5 )
}
Going from the above it is easy to start imagining very interesting uses of Networked Graph rules. For example we may want to classify some ontologies as trusted and only do reasoning on relations over those ontologies. The inverse functional rule could then be generalized to
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX : <https://sommer.dev.java.net/ontologies/beatnik#>

  CONSTRUCT { ?a owl:sameAs ?b .  } 
  WHERE { 
      GRAPH ?g { ?inverseFunc a owl:InverseFunctionalProperty . }
      ?g a :TrustedOntology .

       ?a ?inverseFunc ?pg .
       ?b ?inverseFunc ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 }

Building the So(m)mer Address Book

I will be trying these out later. But for the moment you can already see the difference inferencing brings to an application by downloading the Address Book from subversion at sommer.dev.java.net and running the following commands (leave the password to the svn checkout blank)


> svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer --username guest
> cd sommer
> ant jar
> cd misc/AddressBook/
> ant run
Then you can just drag and drop the foaf file on this page into the address book, and follow the distributed social network by pressing the space bar to get foaf files. To enable inferencing you currently need to set it in the File>Toggle Rules menu. You will see things coming together suddenly when inferencing is on.

There are still a lot of bugs in this software. But you are welcome to post bug reports, or help out in any way you can.

Where this is leading

Going further it seems to me clear that Networked Graphs is starting to realise what Guha, one of the pioneers of the semantic web, wrote about in this thesis "Contexts: A Formalization and Some Applications", which I wrote a short note on Keeping track of Context in Life and on the Web a couple of years ago. That really helped me get a better understanding of the possibilities of the semantic web.

Thursday Feb 28, 2008

sparqling international calling codes

The other day I was looking for a list of international calling codes. Since most of them are listed in Wikipedia, it occurred to me it would be easy to get all that information in a nice easy to use format by querying DBPedia with SPARQL. So I wrote a very light weight SPARQL client (source code available here). Download the jar and you can then run the following query:

hjs@bblfish:0$ java -jar Sparql.jar > results.n3

PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 

CONSTRUCT {  ?cntry dbp:callingCode ?code ;
                    rdfs:label ?name . 
} WHERE {
        ?cntry dbp:callingCode ?code .
        OPTIONAL { ?cntry rdfs:label ?name . }
}


\^d
That is after typing the command line java -jar Sparrql.jar > results.n3 I pasted the SPARQL query (in blue above) and ended the input with control-d, which on unix is the end-of-file character. This sent the query to DBPedia, and returned a long list of answers which were place in results.n3 of which the first set is
<http://dbpedia.org/resource/Abu_Dhabi_%28emirate%29> <http://www.w3.org/2000/01/rdf-schema#label> "\\u963F\\u5E03\\u624E\\u6BD4\\u914B\\u957F\\u56FD\\""@zh ,
		"Abu Dhabi (emirate)"@en ,
		"Abu Dhabi (emirato)"@it ,
		"\\u0410\\u0431\\u0443-\\u0414\\u0430\\u0431\\u0438 (\\u044D\\u043C\\u0438\\u0440\\u0430\\u0442)\\""@ru ;
	<http://dbpedia.org/property/callingCode> "971-2"@en .

In the above case the calling code should proabaly not be tagged with an @en. So the data still needs to be cleaned up a little at present. It would be nice to be able to quickly fix the data when one notices something like this. Most of the other results are in xsd:integer format, which I think is also not quite right. The literal string is a better representation of a calling code I think.

Anyway the data is easy to clean up. And we have an example of a very simple but useful query.

Wednesday Feb 06, 2008

replacing ant with rdf

Tim Boudreau just recently asked "What if we built Java code with...Java?". Why not replace Ant or Maven xml build documents with Java (Groovy/Jruby/jpython/...) scripts? It could be a lot easier to program for Java programmers, and much easier to understand for them too. Why go through xml, when things could be done more simply in a universal language like Java? Good question. But I think it depends on what types of problem one wants to solve. Moving to Java makes the procedural aspect of a build easier to program for a certain category of people. But is that a big enough advantage to warrant a change? Probably not. If we are looking for an improvement, why not explore something really new, something that might resolve some as yet completely unresolved problems at a much higher level? Why not explore what a hyperdata build system could bring to us? Let me start to sketch out some ideas here, very quickly, because I am late on a few other projects I am meant to be working on.

The answer to software becoming more complicated has been to create clear interfaces between the various pieces, and have people specialise in building components to the interfaces. It's the "small is beautiful" philosophy of Unix. As a result though, as software complexity builds up, every piece of software requires more and more pieces of other software, leading us from a system of independent software pieces to networked software. Let me be clear. The software industry has been speaking a lot about software containing networked components and being deployed on the network. This is not what I am pointing to here. No I want to emphasise that the software itself is built of components on the network. Ie. we need more and more a networked build system. This should be a big clue as to why hyperdata can bring something to the table that other systems cannot. Because RDF is a language whose pointer system is build on the Universal Resource Identifier (URI) it eats networked components for lunch, breakfast and dinner. (see my Jazoon presentation).

Currently my subversion repository consists of a lot of lib subdirectories full of jar files taken from other projects. Would it not be better if I referred to these libraries by URL instead? The URL where they can be HTTP gotten from of course? Here are a few advantages:

  • it would use up less space in my SubVersion repository. A pointer just takes up less space than an executable in most cases.
  • it would use up less space on the hard drive of people downloading my code. Why? Because I am referring to the jar via a universal name, a clever IDE will be able to use the local cached version already downloaded for another tool.
  • it would make setting up IDE's a lot easier. Again because each component now has a Universal Name, it will be possible to link up jars to their source code once only.
  • the build process, describing as it does how the code relates to the source, can be used by IDEs to jump to the source (also identified via URLs) when debugging a library on the network. (see some work I started on a bug ontology called Baetle)
  • Doap files can be then used to tie all these pieces together, allowing people to just drag and drop projects from a web site onto their IDE, as I demonstrated with Netbeans
  • as IDE gain knowledge of which components are successors to which other components, from such DOAP files, it is easy to imagine them developing RSS like functionality, where it scans the web for updates to your software components, and alerts you to those updates which you can then test out quickly yourself.
  • The system can be completely decentralised, making it a WEB 3.0 system, rather than a web 2.0 system. It should be as easy as having to place your components and your RDF file on a web server served up with the correct mime types.
  • It will be easy to link up jars or source code ( referred to as usual by URLs ) to bugs (described via something like Baetle ). Making it easy to describe how bugs in one project depend on bugs in other projects.

So here are just a few of the advantages that a hyperdata based build system could bring. They seem important enough in my opinion to justify exploring this in more detail. Ok. Well, let me try something here. When compiling files one needs the following: a classpath and a number of source files.

@prefix java: <http://rdf.sun.com/java/> .

_:cp a java:ClassPath;
       java:contains ( <http://apache.multidist.com/cocoon/2.1.11> <http://openrdf.org/sesame/2.0/> ) .

_:outputJar a java:Jar;
       java:buildFrom <src>;
       java:classpath _:cp .

_:outputJar 
        :pathtemplate "dist/${date}/myprog.jar";
        :fullList <outputjars.rdf> .
If the publication mechanism is done correctly the relative URLs should work on the file system just as well as they do on the http view of the repository. Making a jar would then be a matter of some program following the URLs to download all the pieces (if needed), put them in place and use that to build the code. Clearly this is just a sketch. Perhaps someone else has already had thoughts on this?

Friday Feb 01, 2008

3 semantic web talks for JavaOne 2008

At least 3 semantic web talks were accepted for JavaOne 2008, taking place on May 6-9 in San Francisco. There may be more, but the following I am sure of:

  • A talk by Dean Allemang on practical ontology writing based on his soon to be published book "The Working Ontologist". I am really looking forward to it coming out, as it is a book that should help cut down the learning curve dramatically.
  • Über programmer Tim Boudreau and I will be presenting Beatnik: Building an Open Social Network Browser at a Birds of a Feather session. We will look at both the client and server side components and how the theory developed by Dean can turn into a practical product that solves real problems: the data silo effect of current social networking sites.
  • Finally some key players will be joining the "Developing Semantic Web Applications on the Java™ Platform" panel where we will hopefully start a discussion and get feedback on what can be done to bring many many more of the 5 million Java developers on board the semantic web. This panel discussion ( the list of panelists is not complete yet ) will be hosted by Rob Frost of BEA and I.

Hopefully this should allow the 20 thousand or so attendees joining us at JavaOne to get a good overview of the the practical developments in this area. And if they like it, the Semantic Conference in San Jose will be taking place a week later from the 18th to the 22nd of May where they will be able meet many of the leading companies and researchers in this area.

For detailed session information see my later post.

Thursday Jan 03, 2008

Scoble gets thrown off Facebook

picture of current version of Beatnik

Scoble, who became very famous for getting blogging started at Microsoft, got ejected from FaceBook for crawling his network of friends. This is the problem with closed social networks and data silos in general. He seems to think the solution is data portability. More than that: the solution is Open Social Networks. You should be able to use a simple web server and just link up to your friends friend of a friend (foaf) file, whichever service they are using be it their own machine located in their basement, a service provider, a government owned machine, ... . Just as I can link from this blog to any blog. This would allow people to own their piece of the network, like they can own their blogs.

This is what Beatnik, a friend of a friend browser, which I described in this email to the social network portability group, will make it easy for anyone to do.

Everyone is welcome to help on this open source project: artists, documenters, Swing experts, testers, RESTafarians, ...

Wednesday Dec 19, 2007

Hyperdata in Sao Paulo

In the past week I gave a couple of presentations of Hyperdata illustrating the concept with demos of the Tabulator and Beatnik, the Hyper Address Book I am just working on.

The first talk I gave at the University of Sao Paulo, which was called at the last minute by Professor Imre Simon, who had led the Yochai Benkler talk the week before. It was a nice turnout of over 20 people, and I spoke at a more theoretical level of the semantic web, how it related to Metcalf's law, as explained in more detailed in a recently published paper by Prof. James Hendler, and how an application like Beatnik could give a powerful social meaning to all of this. I also looked at some of the interesting problems related to trust and belief revision that come up in a simple application like Beatnik, which touched a chord with Renata Wassermann who has written extensively on that field of the Semantic Web.
Many thanks to Prof Simon, for allowing me to speak. For a view from the audience see Rafael Ferreira's blog (in English) and Professor Ewout's blog (in Portuguese).

Yesterday I gave a more Java oriented technical talk at GlobalCode, an evening learning center in Sao Paulo, with a J2EE project on dev.java.net. I touched on how one may be able to use OpenId and foaf to create a secure yet open social network.
About 25 people attended which must be a really good turnout for a period so close to Christmas, when everyone is looking forward to the surf board present from Santa Claus, getting into their swimming trunks and paddling off to catch the next big wave. Well the really big wave that everyone in the know will be preparing for is the hyperdata wave. And to catch it one needs to practice one's skills. And a good way to do this is to help out with a simple application like Beatnik.
Thanks to Vinicius and Yara Senger for organising this.

Update

The talk I gave is now available online with audio as "Building Secure, Open and Distributed Social Network Applications".

About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today