Sunday Aug 10, 2008

A Simple Sommer Project

Are you dreaming of the coasts of Java? Need some real sunshine? Have you had enough of hibernation? Here is a simple project to help you wake up and open up some new horizons :-)

A couple of weeks ago, Stephanie Stroka asked me how to make use of the so(m)mer library. So(m)mer is designed to plays a similar role for the semantic web that Hibernate plays for relational databases. So when relational databases feel too constraining, and you need an open database that spans the world, the semantic web is for you. And if you are a Java programmer, then so(m)mer is a good open source project you can participate in.
Anyway, I had not had time to write out a HOWTO, so I directed Stephanie over instant messenger on how to get going. It turns out this was a little more complicated that I had imagined, for someone completely new to the project. So Stephanie wrote out a really simple, indeed the simplest possible project, documented it, and added it to the so(m)mer repository.

As a result there is now a really simple example to follow to get out of hibernation mode, and into the blooming summer. Just follow the instructions.

Tuesday Sep 18, 2007

M2N: building Swing apps with N3

At the Triple-I conference in Graz, I came across one very interesting demo by the M2N Intelligence Management company, where they showed a Development environment powered by N3, the powerful, easy to read notation for the Semantic Web. Using a Visual Editor that mapped UML diagrams to N3, instead of the usual limited and difficult to understand OMG MOF family of standards, we could see how one could build a complete User Interface application, including logic in a visual way. The same could be done manually in vi by editing the N3 directly, for those proficient enough. I think they describe parts of this very generally on their solutions page.

It is a pity that M2N does not open source this library, as that would allow one to get a better idea as to the advantages of doing things this way. Sebastian Schaffert - who works at a research company in Salzburg, and was looking at their demo with me - was quite enthusiastic about the idea. There was a lot one could do with such a tool, he thought, such as being able to SPARQL query one's user interface, test it for constraints, etc...

It would be nice to have some feedback from people who had used this on the pros and the cons of their implementation, or of the general idea.

Microsoft Media Manager

David Seth reports today on Micosoft's Interactive Media Manager, based on RDF and OWL, two key semantic web technologies.

This RDF model allows companies to add nuance and intelligence to media management beyond what is possible with traditional metadata.

While I am on media, I might as well mention, for those who don't allready know, that Joost, which I believe is somehow related to Skype, and that is working on Peer to Peer video, is also using RDF. Not sure how, but since Dan Brickley - one of the brains behind foaf - is working there, it is quite likely going to be very interesting.

Friday Jul 13, 2007

The limitations of JSON

A thread on REST-discuss recently turned into a JSON vs XML fight. I had not thought too deeply about JSON before this, but now that I have I though I should summarize what I have learnt.

JSON has a clear and simple syntax, described on json.org. As far as I could see there is no semantics associated with it directly, just as with XML. The syntax does make space for special tokens such as numbers, booleans, ints, etc which of course one automatically presumes will be mapped to the equivalent types: ie things that one can add or compare with boolean operators. Behind the scenes of course a semantics is clearly defined by the fact that it is meant to be used by JavaScript for evaluation purposes. In this it differs from XML, which only assumes it will be parsed by XML aware tools.

On the list there was quite a lot of confusion about syntax and semantics. The picture accompanying this post shows how logicians understand the distinction. Syntax starts by defining tokens and how they can be combined into well formed structures. Semantics defines how these tokens relate to things in the world, and so how one can evaluate the truth, among other things of the well formed syntactic structure. In the picture we are using the NTriples syntax which is very simple defined as the succession of three URIs or 2 URIs and a string followed by a full stop. URIs are Universal names, so their role is to refer to things. In the case of the formula

<http://richard.cyganiak.de/foaf.rdf#cygri> <http://xmlns.com/foaf/0.1/knows> <http://www.anjeve.de/foaf.rdf#AnjaJentzsch> .
the first URI refers to Richard Cyganiak on the left in the picture, the second URI refers to a special knows relation defined at http://xmlns.com/foaf/0.1/, and depicted by the red arrow in the center of the picture, and the third URI refers to Anja Jenzsch who is sitting on the right of the picture. You have to imagine the red arrow as being real - that makes things much easier to understand. So the sentence above is saying that the relation depicted is real. And it is: I took the photo in Berlin this Febuary during the Semantic Desktop workshop in Berlin.

I also noticed some confusion as to the semantics of XML. It seems that many people believe it is the same as the DOM or the Infoset. Those are in fact just objectivisations of the syntax. It would be like saying that the example above just consisted of three URIs followed by a dot. One could speak of which URI followed which one, which one was before the dot. And that would be it. One may even speak about the number of letters that appear in a URI. But that is very different from what that sentence is saying about the world, which is what really interests us in day to day life. I care that Richard knows Anja, not how many vowels appear in Richard's name.

At one point the debate between XML and JSON focused on which had the simplest syntax. I suppose xml with its entity encoding and DTD definitions is more complicated, but that is not really a clinching point. Because if syntactic simplicity were an overarching value, then NTriples and Lisp would have to be declared winners. NTriples is so simple I think one could use the well known very light weight grep command line tool to parse it. Try that with JSON! But that is of course not what is attractive about JSON to the people that use it, namely usually JavaScript developers. What is nice for them is that they can immediately turn the document into a JavaScript structure. They can do that because they assume the JSON document has the JavaScript semantics. [1]

But this is where JSON shows its greatest weakness. Yes the little semantics JSON datastructures have makes them easy to work with. One knows how to interpret an array, how to interpret a number and how to interpret a boolean. But this is very minimal semantics. It is very much pre web semantics. It works as long as the client and the server, the publisher of the data and the consumer of the data are closely tied together. Why so? Because there is no use of URIs, Universal Names, in JSON. JSON has a provincial semantics. Compare to XML which gives a place for the concept of a namespace specified in terms of a URI. To make this clearer let me look at the JSON example from the wikipedia page (as I found it today):

{
    "firstName": "John",
    "lastName": "Smith",
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": 10021
    },
    "phoneNumbers": [
        "212 732-1234",
        "646 123-4567"
    ]
}

We know there is a map between something related to the string "firstName" and something related to the string "John". [2] But what exactly is this saying? That there is a mapping from the string firstName to the string John? And what is that to tell us? What if I find somewhere on the web another string "prenom" written by a French person. How could I say that the "firstName" string refers to the same thing the "prenom" name refers to? This does not fall out nicely.

The provincialism is similar to that which led the xmlrpc specification to forget to put time stamps on their dates, among other things, as I pointed out in "The Limitations of the MetaWeblog API". To assume that sending dates around on the internet without specifying a time zone makes sense, is to assume that every one in the world lives in the same time zone as you.
The web allows us to connect things just by creating hyperlinks. So to tie the meaning of data to a particular script in a particular page is not to take on the full thrust of the web. It is a bit like the example above which writes out phone numbers, but forgets to write the country prefix. Is this data only going to get used by people in the US? What about the provincialism of using a number to represent a postal code. In the UK postal codes are written out mostly with letters. Now those two elements are just modelling mistakes. But if one is going to be serious about creating a data modelling language, then one should avoid making mistakes that are attributable to the idea that string have universal meaning, as if the whole world spoke english, and as if english were not ambigous. Yes, natural language can be disambiguated when one is aware of the exact location and time and context of the speaker. But on a web were everything should link up to everything else, that is not and cannot be the case.
That JSON is so much tied to a web page should not come as a surprise if one looks at its origin, as a serialisation of JavaScript objects. JavaScript is a scripting language designed to live inside a web page, with a few hooks to go outwards. It was certainly not designed as a universal data format.

Compare the above with the following Turtle subset of N3 which presumably expresses the same thing:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://www.w3.org/2000/10/swap/pim/contact#> .

<http://eg.com/joe#p>
   a foaf:Person;
   foaf:firstName "John";
   foaf:family_name "Smith";
   :home [
         :address [
              :city "New York";
              :country "New York";
              :postalCode "10021";
              :street "21 2nd Street";
         ]
   ];
   foaf:phone <tel:+1-212-732-1234>, <tel:+1-646-123-4567>;
.

Now this may require a little learning curve - but frankly not that much - to understand. In fact to make it even simpler I have drawn out the relations specified above in the following graph:

(I have added some of the inferred types)

The RDF version has the following advantages:

  • you can know what any of the terms mean by clicking on them (append the prefix to the name) and do an HTTP GET
  • you can make statements of equality between relations and things, such as
    foaf:firstname = frenchfoaf:prenom .
  • you can infer things from the above, such as that
    <http://eg.com/joe#p> a foaf:Agent .
  • you can mix vocabularies from different namespaces as above, just as in Java you can mix classes developed by different organisations. There does not even seem to be the notion of a namespace in JSON, so how would you reuse the work of others?
  • you can split the data about something in pieces. So you can put your information about <http://eg.com/joe#p> at the "http://eg.com/joe" URL, in a RESTful way, and other people can talk about him by using that URL. I could for example add the following to my foaf file:
    <http://bblfish.net/people/henry/card#me> foaf:knows <http://eg.com/joe#p> .
    You can't do that in a standard way in JSON because it does not have a URI as a base type (weird for a language that wants to be a web language, to miss the core element of the web, and yet put so much energy into all these other features such as booleans and numbers!)

Now that does not mean JSON can't be made to work this way, as the SPARQL JSON result set serialisation does. But it does not do the right thing by default. A bit like languages before Java that did not have unicode support by default. The few who were aware of the problems would do the right things, all the rest would just discover the reality of their mistakes by painful experience.

This does not take away from the major advantage that JSON has of being much easier to integrate with JavaScript, which is a real benefit to web developers. It should possible to get the same effect with a few good libraries. The Tabulator project provides a javascript library to parse rdf, but it would probably require something like a so(m)mer mapping from relations to javascript objects for it to be as transparent to those developers as JSON is.

Notes

[1]
Now procedural languages such as JavaScript don't have the same notion of semantics as the one I spoke of previously. The notion of semantics defined there is a procedural one: namely two documents can be said to have the same semantics if they behave the same way.
[2]
The spec says that an "object is an unordered set of name-value pairs", which would mean that person could have another "firstName" I presume. But I also heard other people speak about those being hash maps, which only allow unique keys. Not sure which is the correct interpretation...

Vote for this: |dzone

Tuesday Jul 03, 2007

Restful semantic web services

Here is my first stab at an outline for what a restful semantic web services would look like.

Let me start with the obvious. Imagine we have an example shopping service, at http://shop.eg/, which sells books. Clearly we would want URLs for every book that we wish to buy, with RDF representations at the given URL. As I find RDF/XML hard to read and write, I'll show the N3 representations. So to take a concrete example, let us imagine our example shopping service selling the book "RESTful Web Services" at the URL http://shop.eg/books/isbn/0596529260 . If we do an HTTP GET on that URL we could receive the following representation:


@prefix : <http://books.eg/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix shop: <http://shopping.eg/ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix currency: <http://bank.eg/currencies#> .


<#theBook> a shop:Book, shop:Product;
   dc:title "Restful Web Services"@en ;
   dc:creator [ a foaf:Person; foaf:name "Sam Ruby"],
             [ a foaf:Person; foaf:name "Leonard Richardson"] ;
   dc:contributor [ a foaf:Person; foaf:name "David Heinemeier Hansson" ];
   dc:publisher <http://www.oreilly.com/>;
   dct:created "08-07-2007T"\^\^xsd:dateTime;
   dc:description """This is the first book that applies the REST design philosophy to real web services. It sets 
 down the best practices you need to  make your design a success, and the techniques you need to turn your 
 design into working code. You can harness the power of the Web for programmable applications: you just 
 have to work with the Web instead of against it. This book shows you how."""@en;
   shop:price "26.39"\^\^currency:dollars;
   dc:subject </category/computing>, </category/computing/web>, </category/computing/web/rest>, 
                   </category/computing/architecture>,</category/computing/architecture/REST>, </category/computing/howto> .


So we can easily imagine a page like this for every product. These pages can be accessible either by browsing the categories pages, querying a SPARQL endpoint, among many other ways. It should be very easy to generate such representations for a web site. All it requires is to build up an ontology of products - which the shop already has available, if only for the purposes of building inventories - and tie these to the database using a tool such as D2RQ, or a combination of JSR311 and @rdf annotations (see so(m)mer).

Now what is missing is a way to let the browser know what it can do with this product. The simplest possible way of doing this would be to create a specialized relation for that web service to POST some information to a Cart resource, describing the item to be added to the cart. Perhaps something like:

<#theBook> shop:addToCart <http://shop.eg/cart/> .

This relation would just mean that one has to POST the url to the cart, to have it added there. The cart itself may then have a shop:buy relation to some resource, which by convention the user agent would need to send a credit card, expiration date, and other information to.
This means that one would have to define a number of RDF relationships, for every type of action one could do in a shop (and later on the web), and explain the types of messages to be sent to the endpoint, and what their consequences are. This is simple but it does not seem very extensible. What if one wants to buy the hard copy version of the book, or 10 items of the book? The hard copy version of course could have its own URL, and so it may be as simple as placing the buy relation on that page. But is this going to work with PCs where one can add and remove a huge number of pieces. I remember Steve Jobs being proud of the huge number of different configurations one could buy his desktop systems with, well over 100 thousand different configurations I remember. This could make it quite difficult to navigate a store, if one is not careful.

On the current web this is dealt with by using html forms, which can allow the user to choose between a large number of variables, by selecting check boxes, combo boxes, drop down menues and more, and then POST a representation to a collection, and thereby create a new action, such as adding the product to the cart, or buying it. The person browsing the site knows what the action does, because it is usually written out in a natural language, in a way that makes it quite obvious to a human being. The person then does that action because he desires to do so, because he wishes his desires to be fulfilled. Now this may seem very simple, but just consider the innumerable types of actions that we can fulfill using the very simple tool of html forms: we can add things to a virtual cart, buy things, comment about things, search for things, organise a meeting, etc, etc.... So forms can be seen both as shortcuts to navigate to a large number of different resources, and to create new resources (usually best done with POST).
If we want software agents to do such tasks for us we need both to have something like a machine understandable form, and some way of specifying what the action of POSTing the form will have on the world. So we need to find a way to do what the web does in a more clearly specified way, so that even machines, or simple user agents can understand it. Let's look at each one:

  • Forms are ways of asking the user to bind results to variables
  • the variables can the be used to build something, such as a URL, or a message.
  • The form then specifies the type of action to do with the constructed message, such as a GET, POST, PUT, etc...
  • The human readable text explains what the result of the action is, and what the meaning of each of the fields are.

Now what semantic technology binds variables to values? Which one asks questions? SPARQL comes immediately to mind. Seeing this and remembering a well known motto of sales people "Satisfy the customer every desire" a very general but conceptually simple solution to this problem occurred to me. It may seem a little weird at first (and perhaps it will continue to seem weird) but I thought it is elegant enough to be used as a starting point. The idea is really simple: the representation returned by the book resource will specify a collection end point to POST RDF too, and it will specify what to POST back by sending a SPARQL query in the representation. It will then be up to the software agent reading the representation to answer the query if he wishes a certain type of action to occur. If he understand the query he will be able to answer, if he does not, there should be no results. He need not do anything with the query at all.

The following is the first thing that occurred to me. The details are less important than the principle of thinking of forms as asking the client a question.

PREFIX shop: <http://shopping.eg/ns#>
PREFIX bdi: <http://intentionality.eg/ns#>
CONSTRUCT {
     ?mycart a shop:Cart ;
             shop:contains [ a shop:LineItem;
                               shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
                               shop:quantity ?q ;
                             ]  .
}  WHERE {
       ?mycart a shop:Cart ;
               shop:for ?me ;
               shop:ownedBy <http://shop.eg/>.
       GRAPH ?desire {
               ?mycart shop:contains
                            [ a shop:LineItem;
                               shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
                               shop:quantity ?q ;
                            ]  .

       }
       ?desire bdi:of ?me .
       ?desire bdi:fulfillby "2007-07-30T..."\^\^xsd:dateTime .
}

So this is saying quite simply: Find out if you want to have your shopping cart filled up with a number of this book. The user agent (the equivalent of the web browser) asks its data store the given SPARQL query. It asks itself whether it desires to add a number of books to its shopping cart, and if it wishes that desire to be fulfulled by a certain time. If the agent does not understand the relations in the query, then the CONSTRUCT clause will return an empty graph. If it does understand it, and the query returns a result, then it is because it wished the action to take place. The constructed graph may be something like:

@prefix shop: <http://shopping.eg/ns#>
 <http://shop.eg/cart/bblfish/> a shop:Cart ;
             shop:contains [ a shop:LineItem;
                               shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
                               shop:quantity 2 ;
                             ]  .

This can then be POSTed to the collection end point http://shop.eg/cart/, with the result of adding two instances of the book to the cart. Presumably the cart would return a graph with the above relations in it plus another SPARQL query explaining how to buy the items in the cart.

So the full RDF for the book page would look something like this:

@prefix : <http://books.eg/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix shop: <http://shopping.eg/ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix currency: <http://bank.eg/currencies#> .


<#theBook> a shop:Book, shop:Product;
   dc:title "Restful Web Services"@en ;
   dc:creator [ a foaf:Person; foaf:name "Sam Ruby"],
             [ a foaf:Person; foaf:name "Leonard Richardson"] ;
   dc:contributor [ a foaf:Person; foaf:name "David Heinemeier Hansson" ];
   dc:publisher <http://www.oreilly.com/>;
   dct:created "08-07-2007T"\^\^xsd:dateTime;
   dc:description """This is the first book that applies the REST design philosophy to real web services. It sets 
 down the best practices you need to make your design a success, and the techniques you need to turn your
  design into working code. You can harness the power of the Web for  programmable applications: you just 
 have to work with the Web instead of against it. This book shows you how."""@en;
   shop:price "26.39"\^\^currency:dollars;
   dc:subject </category/computing>, </category/computing/web>, </category/computing/web/rest>, 
                      </category/computing/architecture>,</category/computing/architecture/REST>,</category/computing/howto>;
   shop:addToCart [ a Post;
                    shop:collection <http://shop.eg/cart/>;
                    shop:query """
PREFIX shop: <http://shopping.eg/ns#>
PREFIX bdi: <http://intentionality.eg/ns#>
CONSTRUCT {
     ?mycart a shop:Cart ;
             shop:contains [ a shop:LineItem;
                               shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
                               shop:quantity ?q ;
                             ]  .
}  WHERE {
       ?mycart a shop:Cart ;
               shop:for ?me ;
               shop:ownedBy <http://shop.eg/>.
       GRAPH ?desire {
               ?mycart shop:contains
                            [ a shop:LineItem;
                               shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
                               shop:quantity ?q ;
                            ]  .

       }
       ?desire bdi:of ?me .
       ?desire bdi:fulfillby "2007-07-30T..."\^\^xsd:dateTime .
}"""
                  ] .

So there are quite a few things to tie up here, but it seems we have the key elements here:

  • RESTful web services: we use GET and POST the way they are meant to be used,
  • Resource Oriented Architecture: each shoppable item has its resource, that can return a representation
  • the well known motto "hypermedia is the engine of application state": each URL is dereferenceable to further representations. Each page of a containing a buyable item describes how one can proceed to the next step to buy the product. In this case the SPARQL query returns a graph to be POSTed to a given url.
  • with the clarity of the Semantic framework thrown in too. Ie. We can proove certain things about the statements made, which is very helpful in bringing clarity to a vocabulary. Understanding the consequences of what is said is part and parcel of understanding itself.
Have I reached buzzword compliance yet? :-)

Notes

From discussions around the net (on #swig for example) I was made aware of certain problems.

  • SPARQL is a little powerful, and it may seem to give too much leverage to the service, who could ask all kinds of questions of the user agent, such as the SPARQL equivalent of "What is your bank account number". Possible answers may be:
    • Of course a user agent that does shopping automatically on the web, is going to have to be ready for all kinds of misuses, so whatever is done, this type of problem is going to crop up. Servers also need to protect themselves from probing questions by user agents. So this is something that both sides will need to look at.
    • Forms are pretty powerful too. Are forms really so different from queries? They can ask you for a credit card number, your date of birth, the name of your friends, your sexual inclinations, ... What can web formst not ask you?
  • SPARQL is a language that does a couple of things going for it: it has a way of binding variables to a message, and it builds on the solid semantic web structure. But there may be other ways of doing the same thing. OWL-S also uses rdf to describe actions, create a way to bing answers to messages, and descibe the preconditions and postconditions of actions. It even uses the proposed standard of Semantic Web Rules Language SWRL. As there seems to be a strong relation between SPARQL and a rule language (one can think of a SPARQL query as a rule), it may be that part of the interest in this solution is simply the same reason SWRL emerged in OWL-S. OWL-S has a binding to SOAP and none to a RESTful web service. As I have a poor grasp of SOAP I find that difficult to understand. Perhaps a binding to a more restful web service such as the one proposed here would make it more amenable to a wider public.

Monday Jul 02, 2007

refactoring xml

Refactoring is defined as "Improving a computer program by reorganising its internal structure without altering its external behaviour". This is incredibly useful in OO programming, and is what has led to the growth of IDEs such as Netbeans, IntelliJ and Eclipse, and is behind very powerful software development movements such as Agile and Xtreeme programming. It is what helps every OO programmer get over the insidious writers block. Don't worry too much about the model or field names now, it will be easy to refactor those later!

If maintaining behavior is what defines refactoring of OO programs - change the code, but maintain the behavior - what would the equivalent be for XML? If XML is considered a syntax for declarative languages, then refactoring XML would be changing the XML whilst maintaining its meaning. So this brings us right to the question of meaning. Meaning in a procedural language is easy to define. It is closely related to behavior, and behavior is what programming languages do their best to specify very precisely. Java pushes that very far, creating very complex and detailed tests for every aspect of the language. Nothing can be called Java if it does not pass the JCP, if it does not act the way specified.
So again what is meaning of an XML document? XML does not define behavior. It does not even define an abstract semantics, how the symbols refer to the world. XML is purely specified at the syntactic level: how can one combine strings to form valid XML documents, or valid subsets of XML documents. If there is no general mapping of XML to one thing, then there is nothing that can be maintained to retain its meaning. There is nothing in general that can be said to be preserved by transformation one XML document into another.
So it is not really possible to define the meaning of an XML document in the abstract. One has to look at subsets of it, such at the Atom syndication format. These subset are given more or less formal semantics. The atom syndication format is given an english readable one for example. Other XML formats in the wild may have none at all, other than what an english reader will be able to deduce by looking at it. Now it is not always necessary to formally describe the semantics of a language for it to gain one. Natural languages for example do not have formal semantics, they evolved one. The problem with artificial languages that don't have a formal semantics is that in order to reconstruct it one has to look at how they are used, and so one has to make very subtle distinction between appropriate and inappropriate uses. This inevitably ends up being time consuming and controversial. Nothing that is going to make it easy to build automatic refactoring tools.

This is where Frameworks such as RDF come in very handy. The semantics of RDF, are very well defined using model theory. This defines clearly what every element of an RDF document means, what it refers to. To refactor RDF is then simply any change that preserves the meaning of the document. If two RDF names refer to the same resource, then one can replace one name with the other, the meaning will remain the same, or at least the facts described by the one will be the same as the one described by the other, which may be exactly what the person doing the refactoring wishes to preserve.

In conclusion: to refactor a document is to change it at the syntactic level whilst preserving its meaning. One cannot refactor XML in general, and in particular instances it will be much easier to build refactoring tools for documents with clear semantics. XML documents that have clear RDF interpretations will be very very easy to refactor mechanically. So if you are ever asking yourself what XML format you want to use: think how useful it is to be able to refactor your Java programs. And consider that by using a format with clear semantics you will be able to make use of similar tools for your data.

Thursday Jun 28, 2007

Jazoon: Web 3.0

Well over 100 people attended my Web 3.0 talk at the Jazoon conference in Zurich today, which covered the same topics as my JavaOne BOF (slides here). You can count them here in the picture which I took at the end of the talk.

The Jazoon conference is 1/17th of the size of JavaOne, so the attendance numbers were tremendous. As a comparison I had 250 people attended the JavaOne BOF. Had the same percentage of the JavaOne conference attended my talk, I would have had an audience of 110\*17=1770!

I had just about time to cover the slides in the 40 minutes allocated to me, so it was really great to have a follow on question and answer session which at least a third of the people remained for. Dean Allemang (blog) shared the space with me on the Q&A, and was able to bring his vast experience to bear. The attendees were thus able to get a quick overview of TopBraid Composer which Dean presented quickly in response to a question on tooling. Questions on security popped up, which allowed me to speak a little more about RDF graphs and quads, essential pieces of the Semantic Web story.

Tuesday Jun 26, 2007

Jazoon

Roy Fielding gave his very well attended keynote presentation today (Tuesday 26) at Jazoon, the new Java developers conference taking place for the first time in Zurich this week. Coming here just to hear Roy talk was worth the whole trip in itself.

This is the first year of Jazoon, and yet the venue was able to attract over 800 developers (I am not sure of the exact number), which bodes well for its future. So to have close to 10% of the attendees (photo) come to Dean Allemang's talk "Semantic Mashups using RDF, RSS and microformats" was a very good surprise. Dean, who is working for TopQuadrant producers of the Eclipse based TopBraid Composer, is not just a very good presenter, but also a very knowledgeable Semantic Web evangelist. He gave Harold Carr (blog) and others a demo (photo) of TopQuadrant, that started up outside the conference room, moved down into the bar at the entrance (photo), and as it kept being interrupted by great side tracks into Philosophy, Jungian psychology (Jung of course worked in Zurich), Semantic Web company adoption, Literature, Mathematics, Religion, sexual politics, and so much more, that the demo only came to a tentative conclusion around 1am in a bar in the center of Zurich discussing the relations between REST and RDF and how this differed from SOAP. (For Dean's impressions of Jazoon, see his "Swiss Java" blog post.)

My talk, "Web 3.0: This is the Semantic Web" will be taking place on Thursday at 11am. I will be going into more technical details, looking at the foundations of the Semantic Web step by step. As a surprise I may even be able to get a slot for Dean to present his TopBraid composer, which is not just a Ontology editor, but also a complete mashup environment.

Time for me to go to sleep!

Friday Jun 15, 2007

semantics of invalid passports

For those travelers out there who are into XML and semantics, the question is how does one specify that a valid passport has a passport number in it, semantically.

In OWL one can specify that a relation has a cardinality. So for example the mortal:parent relation with domain and range of mortal:Human, would usually be defined as having cardinality 2. So whenever we have something that is a human, we can deduce that it has two parents, even if we don't know who they are.

Note that this is not expressible in a DTD or even in RelaxNG . The best one could say is that a <Person> element could have 0 to 2 <parent> elements. So one could have something like this

<Person>
  <name>Henry Story</name>
  <parent><Person>...</Person></parent>
  <parent><Person>...</Person></parent>
</Person>
One can not say that it must have 2 elements, without thereby specifying documents that are necessarily infinitely long, since each parent is a Person, and so would have to itself have a parent element, etc. etc... With XML we are stuck to the level of syntax.

Working at the level of syntax does have some advantage: it is obvious how to query for the existence of information. One just searches the document using an XQuery. So say I have an XML passport, and I want to find out if it contains a number, it would be easy to find out by searching for the passportNumber attribute for example. The disadvantage is that the query will only succeed on certain types of xml documents, namely those that put the information in that spot. It won't work with information written in other xml passport formats or real paper passports.

Now how does one specify that a passport has a number printed in it? We don't want to say that a doc:Passport has a relation doc:passportNumber with cardinality of 1. Though that seems correct, it would fail to help us find invalid passports that did not have a number printed on them, since

  • a OWL reasoner would add the relation to a blank node anyway by following the suggested owl cardinality rule.
  • there could be a statement as to the passport number written down somewhere else completely, which might have been meshed with the information about the passport. A passport with a passport number written on a separate piece of paper won't help you cross the border...
  • The passport might have had a passport number in it until I cut that information out of the passport, or it got erased in some way by a mischievous border guard. The government databases would still attribute a passport number to my passport. So as soon as I asked them what it is, I would end up having correct knowledge of my passport, yet my passport still be invalid.

Here is the solution presented by Tim Berners Lee:

OWL cardinality might say that a person must have at least one passport number, but it can NOT say that a document about a person contains information about at least one passport number.

N3 rules can, with log:semantics (which related a document to the graph you get from parsing it) and log:includes, and nested graphs:

@forAll x, p, g1.
{   x a Person; passport p.
	p log:semantics g1.
        g1 log:notIncludes  { x passportNumber []   }
}
    =>
{  ?p a InvalidPassport  }.

On the semantic web, as anyone can in principle say anything about anything, you can never make statements about how many passportNumber statements there are without specifying the document in question.

A passport is quite clearly both a document and an object. As an object it can have properties such as a being in your pocket. As a document it tells us something about the world, among other things information about the owner of the passport. There are many source of information in the world. If one wants to find out what possible worlds a particular source of information describes, one has to limit one's query to that source of information.

Note: Semantics of Graphs

Following David Lewis I like to think of a graph as a set of possible worlds which satisfy the patterns of the graph. Tim Berners Lee's formula is saying: find me the set of possible worlds that correctly interpret the passport I have. If this set includes worlds where I don't have a passport number, then my passport is invalid. That is because I can only have such worlds in my interpretation of my passport if I don't have the number on my passport.

This interpretation of graphs must be a little too strong semantically, as it leads to the following problems:

How does one query for documents about mathematical truths? If a document says that "2+2=4" it will be true in all possible worlds, just as the document that says "1+1=2", and so querying for the one will be right if I query for the other. Perhaps here one has to query literally, namely for the string or regex "2+2=4".

Wednesday Jun 06, 2007

RESTful Web Services: the book

RESTful Web Services is a newly published book that should be a great help in giving people an overview of how to build web services that work with the architecture of the Web. The authors of the book are I believe serious RESTafarians. They hang out (virtually) on the yahoo REST discuss newsgroup. So I know ahead of time that they will most likely never fail on the REST side of things. Such a book should therefore be a great help for people desiring to develop web services.

As an aside, I am currently reading it online via Safari Books, which is a really useful service, especially for people like me who are always traveling and don't have space to carry wads of paper around the world. As I have been intimately involved in this area for a while - I read Roy Fielding's thesis in 2004, and it immediately made sense of my intuitions - I am skipping through the book from chapter to chapter as my interests guide me, using the search tool when needed. As this is an important book, I will write up my comments here in a number of posts as I work my way through it.

What of course is missing in Roy's thesis, which is a high level abstract description of an architectural style, are practical examples, which is what this book sets out to provide. The advantage of Roy's level of abstraction is that it permitted him to make some very important points without loosing himself in arbitrary implementation debates. Many implementations can fit his architectural style. That is the power of speaking at the right level of abstraction: it permits one to say something well, in such a way that it can withstand the test of time. Developers of course want to see how an abstract theory applies to their everyday work, and so a cook book such as "RESTful Web Services" is going to appeal to them. The danger is that by stepping closer to implementation details, certain choices are made that turn out to be in fact arbitrary, ill conceived, non optimal or incomplete. The risk is well worth taking if it can help people find their way around more easily in a sea of standards. This is where the rubber hits the road.

Right from the beginning the authors, Sam Ruby and Leonard Richardson coin the phrase "Resource Oriented Architecture".

Why come up with a new term, Resource-Oriented Architecture? Why not just say REST? Well, I do say REST, on the cover of this book, and I hold that everything in the Resource-Oriented Architecture is also RESTful. But REST is not an architecture: it's a set of design criteria. You can say that one architecture meets those criteria better than another, but there is no one "REST architecture."

The emphasis on Resources is I agree with them fundamental. Their chapter 4 does a very good job of showing why. URIs name Resources. URLs in particular name Resources that can return representations in well defined ways. REST stands for "Representation of State Transfer", and the representations transferred are the representations of resources identified by URLs. The whole thing fits like a glove.

Except that where there is a glove, there are two, one for each hand. And they are missing the other glove, so to speak. And the lack is glaringly obvious. Just as important as Roy Fielding's work, just as abstract, and developed by some of the best minds on the web, even in the world, is RDF, which stands for Resource Description Framework. I emphasize the "Resource" in RDF because for someone writing a book on Resource Oriented Architecture, to have only three short mentions of the framework for describing resources standardized by non less that the World Wide Web Consortium is just ... flabbergasting. Ignoring this work is like trying to walk around on one leg. It is possible. But it is difficult. And certainly a big waste of energy, time and money. Of course since what they are proposing is so much better than what may have gone on previously, which seems akin to trying to walk around on a gloveless hand, it may not immediately be obvious what is missing. I shall try to make this clear in the series of notes.

Just as REST is very simple, so is RDF. It is easiest to describe something on the web if you have a URL for it. If you want to say something about it, that it relates to something else for example, or that it has a certain property, you need to specify which property it has. Since a property is a thing, it too is easiest to speak about if it has a URL. So once you have identified the property in the global namespace you want to say what its value is, you need to specify what the value of that property is, which can be a string or another object. That's RDF for you. It's so simple I am able to explain it to people in bars within a minute. Here is an example, which says that my name is Henry:

<http://bblfish.net/people/henry/card#me> <http://xmlns.com/foaf/0.1/name> "Henry Story" .

Click on the URLs and you will GET their meaning. Since resources can return any number of representations, different user agents can get the representation they prefer. For the name relation you will get an html representation back if you are requesting it from a browser. With this system you can describe the world. We know this since it is simply a generalization of the system found in relational databases, where instead of identifying things with table dependent primary keys, we identify them with URIs.

So RDF, just as REST, is at its base very easy to understand and furthermore the two are complementary. Even though REST is simple, it nevertheless needs a book such as "RESTful web services" to help make it practical. There are many dispersed standards out there which this books helps bring together. It would have been a great book if it had not missed out the other half of the equation. Luckily this should be easy to fix. And I will do so in the following notes, showing how RDF can help you become even more efficient in establishing your web services. Can it really be even easier? Yes. And furthermore without contradicting what this book says.

Friday Jun 01, 2007

Semantic Wonderland

Among the most impressive demos at JavaOne was the open sourced Project Wonderland[1] which James Gosling presented during his Toy show. It is a virtual world that grew out of project Looking Glass, the 2.5D Java Desktop that was unveiled a couple of years ago. The desktop has now been integrated into a full 3D world (or should it be 4D? space+time) where one can move around, meet people, work together on projects, etc...

It was not too difficult to get it to work on OSX (even though Apple is lagging with a 8 months old beta release of Java 6, grrrr!), by following the instructions on the main page, and reading the Java.net thread "Building Wonderland on MacOS" [2].

Once I got it started I noticed this billboard entitled "Knowledge Driven Hyperlinks: A Semantic Web Application". Really intriguing!

Apparently one gets the best out of wonderland by running it on Linux, as one can then interact with real X applications. The one that was most tested with OSX is Ubuntu Edgy, under BootCamp. A new version of Parallels has just come out though, that has OpenGL and DirectX graphics acceleration for Windows, so it may soon be possible to run Wonderland in Parallels using Ubuntu, and so get all the features, before making the leap to a full Linux OS again.

I am going to try one of these options out. This is going to be real fun! :-)

Notes

Saturday May 26, 2007

Answers to "Duck Typing Done Right"

I woke up this morning with a large number of comments to my previous post "Duck Typing Done right" . It would be confusing to answer them all together in the comments section there, so I aggregated my responses here.

I realize the material covered here is very new to many people. Luckily it is very easy to understand. For a quick introduction see my short Video introduction to the Semantic Web.

Also I should mention that the RDF is a declarative framework. So its relationship to method Duck Typing is not a direct one. But nevertheless there is a lot to learn by understanding the simplicity of the RDF framework.

On the reference of Strings

Kevin asks why the URI "http://a.com/Duck" is less ambigous than a string "Duck". In one respect Kevin is completely correct. In RDF they are both equally precise. But what they refer to is quite different from what one expects. The string "Duck" refers to the string "Duck". A URI on the other hand refers to the resource identified by it; URIs stand for Universal Resource Identifiers after all. The URI "http://a.com/Duck", as defined above, refers to the set of Ducks. How do you know? Well you should be able to GET <http://a.com/Duck> and receive a human or machine representation for it, selectable via content negotiation. This won't work in the simple examples I gave in my previous post, as they were just quick examples I hacked together by way of illustration. But try GETing <http://xmlns.com/foaf/0.1/knows> for a real working example. See my longer post on this issue GET my meaning?

Think about the web. Everyday you type in URLs into a web browser and you get the page you want. When you type "http://google.com/" you don't sometimes get <http://altavista.com>. The web works as well as it does, because URLs identify things uniquely. Everyone can mint their own if they own some section of the namespace, and PUT the meaning for that resource at that resource's location.

On Ambiguity and Vagueness

Phil Daws is correct to point out that URIs don't remove all fuzziness or vagueness. We can have fuzzy or vague concepts, and that is a good thing. foaf:knows (<http://xmlns.com/foaf/0.1/knows> ) whilst unambigous is quite a fuzzily defined relation. If you click on it's URL this is what you will get:

We take a broad view of 'knows', but do require some form of reciprocated interaction (ie. stalkers need not apply). Since social attitudes and conventions on this topic vary greatly between communities, counties and cultures, it is not appropriate for FOAF to be overly-specific here.

If someone foaf:knows a person, it would be usual for the relation to be reciprocated. However this doesn't mean that there is any obligation for either party to publish FOAF describing this relationship. A foaf:knows relationship does not imply friendship, endorsement, or that a face-to-face meeting has taken place: phone, fax, email, and smoke signals are all perfectly acceptable ways of communicating with people you know.

You probably know hundreds of people, yet might only list a few in your public FOAF file. That's OK. Or you might list them all. It is perfectly fine to have a FOAF file and not list anyone else in it at all. This illustrates the Semantic Web principle of partial description: RDF documents rarely describe the entire picture. There is always more to be said, more information living elsewhere in the Web (or in our heads...).

Since foaf:knows is vague by design, it may be suprising that it has uses. Typically these involve combining other RDF properties. For example, an application might look at properties of each foaf:weblog that was foaf:made by someone you "foaf:knows". Or check the newsfeed of the online photo archive for each of these people, to show you recent photos taken by people you know.

For more information on this see my post "Fuzzy thinking in Berkeley"

On UFOs

Paddy worries that this requires a Universal Class Hierarchy. No worries there. The Semantic Web is designed to work in a distributed way. People can grow their vocabularies, just like we all have grown the web by each publishing our own files on it. The Semantic Web is about linked data. The semantic web does not require UFOs (Unified Foundational Ontologies) to get going, and it may never need them at all, though I suspect that having one could be very helpful. See my longer post UFO's seen growing on the Web.

Relations are first class objects

Paddy and Jon Olson were mislead by my uses of classes to think that RDF ties relations/properties to classes. They don't. Relations in RDF are first class citizens, as you may see in the Dublin Core metadata initiative, which defines a set of very simple and very general relations to describe resources on the web, such as dc:author, dc:created etc... I think we need a :sparql relation that would relate anything to an authoritative SPARQL endpoint, for example. There clearly is no need to constrain the domain of such a relation in any way.

Scalability and efficiency

Jon Olson agrees with me that duck typing is good enough for some very large and good software projects. One of my favorite semantic web tools for example is cwm, which is written in python. When I say Duck Typing does not scale as implemented in those languages, I mean really big scale, like you know, the web. URIs is what has allowed the web to scale to the whole planet, and what will allow it to scale into structured data way beyond what we may even be comfortable imagining right now. This is not over engineered at all as Eric Biesterfeld fears. In fact it works because it gets the key elements right. And they are very simple as I demonstrated in my recent JavaOne BOF. The key concepts are:
  • URIs refer to resources,
  • resources return representations,
  • to describe something on the web one needs to
    • refer to the thing one wishes to describe, and that requires a URI,
    • second specify the property relation one wishes to attribute to it (and that also requires a URI)
    • and finally specify the value of that property.
That's it.

Semantics

An anonymous writer mentions the "ugliness" of the syntax. This is not a problem. The semantic web is about semantic (see the illustration on this post) It defines the relationship of a string to what it names. It does not require a specific syntax. If you don't like the xml/rdf syntax, which most people think is overly complicated, then use the N3 syntax, or come up with something better.

On Other Languages

As mentioned above there need not be one syntax for RDF. Of course it helps in communication if we agree on something, and currently, for better of for worse that is rdf/xml.

But that does not mean that other procedural languages cannot play well with it. They can since the syntax is not what is important, but the semantics, and those are very well defined.

There are a number of very useful bindings in pretty much every language. From Franz lisp to the redland library for c, python, perl, ruby, to Prolog bindings, and many Java bindings such as Sesame and Jena. Way too much to list here. For a very comprehensive overview see Mike Bergman's full survey of Semantic tools.

Note

I have received a huge amount of hits from reddit. Way over 500. If it is still on the top page when you read this, take the time to vote for it :-)

Friday May 25, 2007

Duck Typing done right

Dynamic Languages such as Python, Ruby and Groovy, make a big deal of their flexibility. You can add new methods to classes, extend them, etc... at run time, and do all kinds of funky stuff. You can even treat an object as of a certain type by looking at it's methods. This is called Duck Typing: "If it quacks like a duck and swims like a Duck then it's a duck", goes the well known saying. The main criticism of Duck Typing has been that what is gained in flexibility is lost in precision: it may be good for small projects, but it does not scale. I want to show here both that the criticism is correct, and how to overcome it.

Let us look at Duck Typing a little more closely. If something is a bird that quacks like a duck and swims like a duck, then why not indeed treat it like a duck? Well one reason that occurs immediately, is that in nature there are always weird exceptions. It may be difficult to see the survival advantage of looking like a duck, as opposed to say looking like a lion, but one should never be surprised at the surprising nature of nature.
Anyway, that's not the type of problem people working with duck typing ever have. How come? Well it's simple: they usually limit the interactions of their objects to a certain context, where the objects being dealt with are such that if any one of them quacks like a duck, then it is a duck. And so here we in essence have the reason for the criticism: In order for duck typing to work, one has to limit the context, one has to limit the objects manipulated by the program, in such a way that the duck typing falls out right. Enlarge the context, and at some point you will find objects that don't fit the presuppositions of your code. So: for simple semantic reasons, those programs won't scale. The more the code is mixed and meshed with other code, the more likely it is that an exception will turn up. The context in which the duck typing works is a hidden assumption, usually held in the head of the small group of developers working on the code.

A slightly different way of coming to the same conclusion, is to realize that these programming languages don't really do an analysis of the sound of quacking ducks. Nor do they look at objects and try to classify the way these are swimming. What they do is look at the name of the methods attached on an object, and then do a simple string comparison. If an object has the swim method, they will assume that swim stands for the same type of thing that ducks do. Now of course it is well established that natural language is ambiguous and hence very context dependent. The methods names gain their meaning from their association to english words, which are ambiguous. There may for example be a method named swim, where those letters stand for the acronym "See What I Mean". That method may return a link to some page on the web that describes the subject of the method in more detail, and have no relation to water activities. Calling that method in expectation of a sound will lead to some unexpected results
But once more, this is not a problem duck typing programs usually have. Programmers developing in those languages will be careful to limit the execution of the program to only deal with objects where swim stand for the things ducks do. But it does not take much for that presupposition to fail. Extend the context somewhat by loading some foreign code, and at some point these presuppositions will break down and nasty difficult to locate bugs will surface. Once again, the criticism of duck typing not being scalable is perfectly valid.

So what is the solution? Well it requires one very simple step: one has to use identifiers that are context free. If you can use identifiers for swimming that are universal, then they will alway mean the same thing, and so the problem of ambiguity will never surface. Universal identifiers? Oh yes, we have those: they are called URIs.
Here is an example. Let us

  • name the class of ducks
    <http://a.com/Duck> a owl:Class;
             rdfs:subClassOf <http://a.com/Bird>;
             rdfs:comment "The class of ducks, those living things that waddle around in ponds" .
    
  • name the relation <http://a.com/swimming> which relates a thing to the time it is swimming
     <http://a.com/swimming> a owl:DatatypeProperty;
                             rdfs:domain <http://a.com/Animal> ;
                             rdfs:range xsd:dateTime .
     
  • name the relation <http://a.com/quacking> which relates a thing to the time it is quacking (like a duck)
     <http://a.com/quacking> a owl:DatatypeProperty;
                             rdfs:domain <http://a.com/Duck> ;
                             rdfs:range xsd:dateTime .
    
  • state that an duck is an animal
     <http://a.com/Duck> rdfs:subClassOf <http://a.com/Animal> .
    
Now if you ever see the relation
:d1  <http://a.com/quacking> "2007-05-25T16:43:02"\^\^xsd:dateTime .

then you know that :d1 is a duck ( or that the relation is false, but that is another matter ), and this will be true whatever the context you find the relation in. You know this because the url http://a.com/quacking always refers to the same relation, and that relation was defined as linking ducks to times.
Furthermore notice how you may conclude many more things from the above statement. Perhaps you have an ontology of animals written in OWL, that states that Ducks are one of those animals that always has two parents. Given that, you would be able to conclude that :d1 has two parents, even if you don't know which they are. Animals are physical beings, you may discover by clicking on the http://a.com/Animal URL, and in particular one of those physical things that always has a location. It would therefore be quite correct to query for the location of :d1...
You can get to know a lot of things with just one simple statement. In fact with the semantic web, what that single statement tells you gets richer and richer the more you know. The wider the context of your knowledge the more you know when someone tells you something, since you can use inferencing to deduce all the things you have not been told. The more things you know, the easier it is to make inferences (see Metcalf's law).

In conclusion, duck typing is done right on the semantic web. You don't have to know everything about something to work with what you have, and the more you know the more you can do with the information given to you. You can have duck typing and scale.

Thursday May 17, 2007

Webcards: a Mozilla microformats plugin

Last week Jiri Kopsa pointed me to the very useful mozilla extension for microformats called Webcards. Install it, reboot, and Mozilla will then pop up a discreet green bar on web pages that follow the microformats guidelines. So for example on Sean Bechhofer page 1 vcard is detected. Clicking on that information brings up a slick panel with more links to other sources on the web, such as delicious if tags are detected, or in this case linkedin and a a stylish button that will add the vcard to one's address book [1].

Microformats are a really simple way to add a little structure to a web page. As I understand it from our experimentation a year ago [2] in adding this to BlogEd[3], one reason it was successful even before the existence of such extensions, is that it allowed web developers to exchange css style sheets more easily, and it reduced the need to come up with one's own style sheet vocabulary. So people agreeing to name the classes the same way, however far apart they lived, could then build on each other's work. As a result a lot of data appeared that can the be used by extensions such as Webcards.

Webcards really shows how useful a little structure can be. One can add addresses to one's address book, and appointments to one's calendar with the click of a button. The publisher gains by using these in improvements to their web site design. So everybody is happy. One downside as far as structure goes is that due to lack of namespaces support, there is a bottleneck in extending the format. One has to go through the the very friendly microformats group, and they have stated that they really only want to deal with the most common formats. So it is not a solution to any and every data needs. For that one should look at eRDF or RDFa extensions to xhtml and html. I don't have an opinion on which is best. Perhaps a good starting point is this comparison chart.

The structured web is a continuum, from the least structured (plain text), on through html, to rdf. For a very good but long analysis, see the article by Mike Bergman An Intrepid Guide to Ontologies which covers this in serious depth.

As rdf formats such as foaf and sioc gain momentum, similar but perhaps less slick, mozilla extensions have appeared on the web. One such is the Semantic Radar extension. Perhaps Webcards will be able to detect the use of such vocabularies in RDFa or eRDF extended web pages too, using technologies similar to that offered by the Operator plugin as described recently by Ellias Torres.

[1] note this does not work on OSX. I had to save the file to the hard drive, rename it with a vcf extension, before this could get added to Apple's Address Book.
[2] Thanks to the help of Anotoine Moreau de Bellaing (no web page?)
[3] I know, BlogEd has not been moving a lot recently. It's just that I felt the Blog Editor space was pretty crowded allready, and that there were perhaps more valuable things to do on the structured data front in Sun.

Sunday May 13, 2007

Metamorphosis: RDF for Veterans

Yesterday evening I decided to walk from Market westwards. I walked all the way past the San Francisco Opera, through to Hayes Street when I noticed a crowd at an Opening Exhibition of the works of final year industrial design students called Metamorphosis. It was open to all, so I entered.

Looking around I noticed an exhibit with an icon that struck me as amazingly similar to the official RDF icons. More surprising even was that this icon was clearly meant to represent relationships, the foundation of the semantic web. So I looked around for the creator and found Trishatah Hunter, who explained her work to me in more detail. She had never heard of rdf or the semantic web!

Trishatah's device is designed to help war veterans find support when in need, feel part of a community, of a larger social network network on which they can rely. Is this unintentionally the first piece FOAF jewelry?

PS. Not sure what exactly the name for that type of jewelry is...
PPS. Another very nice work was Reflections, a work on the importance of objects to memory. It is a space to place objects in. The lights dim very slowly until the object is invisible behind its mirrored glass container. To see it again one has to touch the object, as if to call it back to memory.

Barbara McQueen's opening in SF

Last Saturday I was walking up Sutter street in San Francisco in search of a restaurant, having just checked in to my hotel. A lady approached me and remarked on my little badge with an RDF Icon pinned to my grey pullover, wondering what it was about. She herself had two large badges, one of which read "Borat for President!", under the smiling picture of the comedian. She then went on to invite me to an exhibition opening down the road. I followed a little bemused, and ended up indeed in the opening of Barbara McQueens exhibit of photos, many very nice ones of the late Steve McQueen. There were a few drinks and nice finger food so I stayed around to hear her speak, and ended up meeting Barabara Traub (photo of her) who has just recently published a new Book Desert to Dream: A Decade of Burning Man Photography, a collection of stunning pictures of the crazy desert festival.

So in one go I linked up the Semantic Web, Steve McQueen, Burning Man and Borat.

Friday May 11, 2007

Semantic Web Birds of a Feather at JavaOne 2007

Nova Spivack, Lew Tucker, and Tim Boudreau joined me today in a panel discussion on the Semantic Web at Java One. Given that it was at 8pm and that we were competing with a huge party downstairs with free drinks, with robots fighting each other, with live bands, and numerous other private pub parties, the turnout of over 250 participants was quite extraordinary [1]. There was a huge amount of material to cover, and we managed to save 13 minutes at the end for questions. The line of questioners was very long and I think most were answered to the satisfaction of the questioners. It was really great having Nova and Lew over. They brought a lot of experience to the discussion, which I hope gave everyone a feel for the richness of what is going on in this area.

Since many people asked for the presentation it is available here.

[1] It was quite difficult to tell from the stage how many people were in the room, but a good one third of the 1200 room was full . 580 people had registered for the talk.

Tuesday May 08, 2007

Dropping some Doap into NetBeans

Yesterday evening I gave a short 10 minute presentation on the Semantic Web in front of a crowd of 1000 NetBeans developers during James Gosling's closing presentation at NetBeans Day in San Francisco.

In interaction with Tim Boudreau we managed to give a super condensed introduction to the Semantic Web, something that is only possible because its foundations are so crystal clear - which was the underlying theme of the talk. It's just URIs and REST and relations to build clickable data. (see the pdf of the presentation)

All of this lead to a really simple demonstration of an integration with NetBeans that Tim Boudreau was key in helping me put together. Tim wrote the skeleton of a simple NetBeans plugin (stored in the contrib/doap section of the NetBeans CVS repository), and I used Sesame 2 beta 3 to extract the data from the so(m)mer doap file that got dropped onto NetBeans. As a result NetBeans asked us were we wanted to download the project, and after selecting a directory on my file system, it proceeded to check out the code. On completion it asked us if we wanted to install the modules in our NetBeans project. Now that is simple. Drag a DOAP url onto NetBeans: project checked out!

This Thursday we will be giving a much more detailed overview of the Semantic Web in the BOF-6746 - Web 3.0: This is the Semantic Web, taking place at 8pm at Moscone on Thursday. Hope to see you there!

Wednesday Feb 14, 2007

JSR-311: a Java API for RESTful Web Services?

JSR 311: Java (TM) API for RESTful Web Services has been put forward by Marc Hadley and Paul Sandoz from Sun. The initial expert membership group come from Apache, BEA, Google, JBoss, Jerome Louvel of the very nice RESTlet framework, and TmaxSoft.

But it has also created some very strong negative pushback. Eliot Rusty Harold does not like it at all:

Remember, these are the same jokers who gave us servlets and the URLConnection class as well as gems like JAX-RPC and JAX-WS. They still seem to believe that these are actually good specs, and they are proposing to tunnel REST services through JAX-WS (Java API for XML Web Services) endpoints.

They also seem to believe that "building RESTful Web services using the Java Platform is significantly more complex than building SOAP-based services". I don't know that this is false, but if it's true it's only because Sun's HTTP API were designed by architecture astronauts who didn't actually understand HTTP. This proposal does not seem to be addressing the need for a decent HTTP API on either the client or server side that actually follows RESTful principles instead of fighting against them.

To give you an idea of the background we're dealing with here, one of the two people who wrote the proposal "represents Sun on the W3C XML Protocol and W3C WS-Addressing working groups where he is co-editor of the SOAP 1.2 and WS-Addressing 1.0 specifications. Marc was co-specification lead for JAX-WS 2.0 (the Java API for Web Services) developed at the JCP and has also served as Sun's technical lead and alternate board member at the Web Services Interoperability Organization (WS-I)."

Heavy words indeed.

Roy fielding is dead against the name
Marc, I already explained to Rajiv last November that I would not allow Sun to go forward with the REST name in the API. It doesn't make any sense to name one API as the RESTful API for Java, and I simply cannot allow Sun to claim ownership of the name (which is what the JSR process does by design). Change the API name to something neutral, like JAX-RS.

Jerome Louvel, whose RESTlet framework is very promising, has a fuller explanation of what is being attempted in this JSR.

I am still not so sure what they want to do exactly, but it seems to be based on JRA which has annotations of the form:

@Get
@HttpResource(location="/customers")
public List<Customer> getCustomers();

@Post
@HttpResource(location="/customers")
public void addCustomer(Customer customer);
which frankly does seem a little weird. But I am willing to give it some time to understand. Marc Hadley has more details about what he is thinking of.

If one were to standardize an api why not standardize the RESTlet API? That makes more immediate sense to me. It is working code, people are already participating in the process, and the feedback is very good. Now I don't know exactly how JSRs work and what the relationship is between the initial proposal and the final solution. Is the final proposal going to be close to the current one, with those types of annotations? Or could it be something much closer to RESTlets?

On the other hand I am pleased to see a JSR that is proposing to standardise annotations. That makes me think it would probably the time may be ripe to think of standardize the @rdf annotations mapping POJOs to the semantic web the way so(m)mer and elmo are doing.

More reading:

Some blogs on the subject:

So in summary: it is great that REST has attracted so much attention as to stimulate research into how to make it easy for Java developers to do the right thing by default. Let's hope that out of this something is born that does indeed succeed in reaching that goal.

Off topic: Eliott Rusty Harold recently wrote up 10 predictions for xml in 2007 where he gives a very partial description of RDF, correctly pointing out the importance of GRDDL, but completely missing out the importance of SPARQL.

Friday Feb 09, 2007

Beatnik: change your mind

Some people lie, sometimes people die, people make mistakes: one thing's for certain you gotta be prepared to change your mind.

Whatever the cause, when we drag information into the Beatnik Address Book (BAB) we may later want to remove it. In a normal Address Book this is straightforward. Every vcard you pick up on the Internet is a file. If you wish to remove it, you remove the file. Job done. Beatnik though does not just store information: Beatnik also reasons.

When you decide that information you thought was correct is wrong, you can't just forget it. You have to disassociate yourself from all the conclusions you drew from it initially. Sometimes this can require a lot of change. You thought she loved you, so you built a future for you two: a house and kids you hoped, and a party next week for sure. Now she's left with another man. You'd better forget that house. The party is booked, but you'd better rename in. She no longer lives with you, but there with him. In the morning there is no one scurrying around the house. This is what the process of mourning is all about. You've got the blues.

Making up one's mind

Beatnik won't initially reason very far. We want to start simple. We'll just give it some simple rules to follow. The most useful one perhaps is to simply work with inverse functional properties.

This is really simple. In the friend of a friend ontology the foaf:mbox relation is declared as being an InverseFunctionalProperty. That means that if I get the graph at http://eg.com/joe I can add it to my database like this.

If I then get the graph at http://eg.com/hjs

I can then merge both graphs and get the following

Notice that I can merge the blank nodes in both graphs because they each have the same relation foaf:mbox to the resource mailto:henry.story@sun.com. Since there can only be one thing that is related to that mbox in that way, we know they are the same nodes. As a result we can learn that :joe knows a person whose home page is http://bblfish.net, and that same person foaf:knows :jane, neither of those relations were known (directly) beforehand.

Nice. And this is really easy to do. A couple of pages of lines of java code can work through this logic and add the required relationships and merge the required blank nodes.

Changing one's mind

The problem comes if I ever come to doubt what Joe's foaf file says. I would not just be able to remove all the relations that spring from or reach the :joe node, since the relation that Henry knows jane is not directly attached to :joe, and yet that relation came from joe's foaf file.

Not trusting :joe's foaf file may be expressed by adding a new relation <http://eg.com/joe> a :Falsehood . to the database. Since doing this forces changing the other statements in the database we have what is known as non monotonic reasoning.

To allow the removal of statements and the consequences those statements led to, an rdf database has to do one of two things:

  • If it adds the consequences of every statement to the default graph (the graph of things believed by the database) then it has to keep track of how these facts were derived. Removing any statement will then require searching the database for statements that relied on it and nothing else in order to remove them too, given that the statement that one is attempting to remove is not itself the consequence of some other things one also believes (tricky). This is the method that Sesame 1.0 employs and that is described in “Inferencing and Truth Maintenance in RDF Schema - a naive practical approach” [1]. The algorithm Jeen and Arjohn develop shows how this works with RDF Schema, but it is not very flexible. It requires the database to use hidden data structures that are not available to the programmer, and so for example in this case, where we want to do Inverse Functional property deductions, we are not going to be easily able to adapt their procedure to our needs.
  • Not to add the consequence of statements to the database, but to do Prolog like backtracking when answering a query over only the union of those graphs that are trusted. So for example one could ask the engine to find all People. Depending on a number of things, the engine might first look if there are any things related to the class foaf:Person. It would then look at all things that were related to subclasses of foaf:Person if any. Then it may look for things that have relations that have domains that are foaf:Person such as foaf:knows for example. Finally with all the people gathered it would look to see if none of them were the same.
    All this could be done by trying to apply a number of rules to the data in the database in attempting to answer the query, in a Prolog manner. Given that Beatnik has very simple views on the data it is probably simple enough to do this kind of work efficiently.

So what is needed to do this well is the following:

  • notion of separate graphs/context
  • the ability to easily union over graphs of statements and query the union of those easily
  • defeasible inferencing or backtracking reasoning
  • flexible inferencing would be best. I like N3 rules where one can make statemnts about rules belonging to certain types of graphs. For example it would be great to be able to write rules such as: { ?g a :Trusted. ?g => { ?a ?r ?b } } => { ?a ?r ?b } a rule to believe all statements that belong to trusted graphs.

From my incomplete studies (please let me know if I am wrong) none of the Java frameworks for doing this are ideal yet, but it looks like Jena is at present the closest. It has good reasoning support, but I am not sure it is very good yet at making it easy to reason over contexts. Sesame is building up support for contexts, but has no reasoning abilities right now in version 2.0. Mulgara has very foresightedly always had context support, but I am not sure if it has Prolog like backtracking reasoning support.

[1] “Inferencing and Truth Maintenance in RDF Schema - a naive practical approach” by Jeen Broekstra and Arjohn Kampman

About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today