By bblfish on Aug 09, 2006
Where I proove that thinking of the web as a database does not mean thinking of it in terms of an xml database. Rather we will have to think in terms of graphs.
XML applied to Documents
XML stands for eXtensible Markup Language. The most successful markup language is and remains html and its cleaned up xhtml successor. So let me start with this.
Xhtml is designed to markup strings such as the following "Still pond A
frog jumps in Plop!" with presentation information. In a wysiwyg editor
I would do this by highlighting parts of the text using my mouse, then
setting a property for that part of the text by clicking some button
such as bold available to me from some menu. XML allows me save
this information in the document by giving me a way to create arbitrary
parenthesizing properties. So for example using the xhtml markup
language, the following
face="papyrus,helvetica"≻Still pond≺br/≻A frog jumps
should display in a compliant browser as
A frog jumps in
XML applied to Data
Take some information written out one way or another. This will usually be in some tabular format as in a receipt, a spread sheet or a database table. Take this receipt I have in front of me for example. The text on it is "1 Fresh Orange Juice 1.95 1 Café au Lait 1,20 1 Sparkling Water 1,40". This is not very easy to parse for a human, let a alone a machine. But extrapolating from the experience with html, I can use xml to mark the text up like this:
≺bill≻ ≺item≻≺description≻1 Fresh Orange Juice≺/description≻≺price currency="euro"≻1,95≺/price≻≺/item≻ ≺item≻≺description≻1 Café au Lait≺/description≻≺price currency="euro"≻1,20≺/price≻≺/item≻ ≺item≻≺description≻1 Sparking Water≺/description≻≺price currency="euro"≻1,40≺/price≻≺/item≻ ≺/bill≻
This is now much more accessible to computer automation, as the machine can deduce from the tags what interpretation to give the enclosed string. And so was born the great enthusiasm for xml formats and web services.
World Wide Data
It is clear that by following the above procedure we can create machine readable documents for every type of data, stored in every kind of database available worldwide. We just need to mark it up. Wait. We need to do more. We need to agree on a vocabulary and a tree like way of displaying it, since xml forms a markup tree. Let us assume for this article that the naming problem has somehow been solved, and let's look more closely at the data format problem.
Say I want to describe a house, then I will want to have an xml format something like this
≺house≻ ≺owner≻...≺/owner≻ ≺address≻...≺/address≻ ≺rooms≻ ≺room≻...≺/room≻ ... ≺/rooms≻ ≺/house≻
but if I want to describe a person I would of course describe them like this
≺person≻ ≺name≻...≺/name≻ ≺owns≻≺house≻...≺/house≻≺/owns≻ ≺friend≻...≺/friend≻ ≺friend≻...≺/friend≻ .... ≺/person≻
In the first case the person object is part of the house document, whereas in the second case the house information is part of the person document. Both are equally valid ways of doing this. This is not an isolated case. It will happen whenever we wish to describe some object. No object has priority of any other. Here is another example. We may want to describe a book like this:
≺book≻ ≺author≻Ken Wilber≺/author≻ ≺title≻Sex, Ecology, Spirituality≺/title≻ ... ≺/book≻
but of course if I had a CV/resumé of Ken Wilber then my xml would be like this
≺CV≻ ≺name≻Ken Wilber≺/name≻ ≺publications≻ ≺book≻...≺/book≻ ≺/publications≻ ≺/CV≻
again there is no natural way putting things. In one case the
element is the root of the tree, in another it is an element of the
tree. It follows that every type of object will require its own type of
document to describe it. This is not a problem if the world were
composed just of turtles. But it isn't: there are an infinite number of
types of things in our very rich world. Furthermore what is of interest
in each type of document depends completely on the context. In one type
of document we may be more interested in the friends a person has, in
another in his medical history, in yet another his academic
achievements, etc... So there is not even one objective way to describe
anything! If we were to create a tree structured document to describe
every type of thing we are interested in, we would therefore also need
to create an uncountable number of document formats for every different
way we wanted to describe each class of objects.
This is summarized simply by saying "The World is a Graph". The world
can just be described holistically as consisting of objects and
relations between those objects. Take any object in the world, you will
be able to reach any other object by following relations stemming from
it. Make that type of object the root of your graph, and you have an xml
So the problem is not so much that it is not possible to describe each subgraph we find using XML. One can! The problem emerges rather when considering the tools required to query and understand these documents. It is clear from the arguments above that when thinking web wide, one has to give up the idea that information will reach one in a limited number of hierarchically structured ways. As a result tools such as XQuery, that are designed to query documents at the xml structure level are not adapted for querying information across documents, since the tree structure of the xml documents will gets in the way of the description of the graph that the world is and that documents are attempting to describe. XQuery people know this, which is why they don't like RDF. But it is not RDF that is the problem. It is reality that is the problem. And that is a lot more difficult to change.
To repeat, if RDF never had been invented, your database of documents would end up containing an infinitely large number of different types of xml documents to describing the infinite types of objects out there, each of course requiring its own specific interpretation (since XML does not come with a semantics). And so you may as well start off using RDF, since that is where you will end up anyway.
The world is an interconnected graph of things. RDF allows one to describe the world as a graph. SPARQL is the query language to query such a graph. Use the tools that fit the world!
- This is not to say that rdf/xml is perfect. I myself believe it is a really good first attempt at trying to do something very ambitious. Sadly it was done a little too early. Something better will certainly come along. In the mean time it is good enough for nearly everything anyone may want to do with it when wishing to send data out on the web.
- Having many XML documents is not a problem for the Semantic web since it is easy to convert each of the formats using GRDDL to rdf.