Tuesday Sep 18, 2007

Microsoft Media Manager

David Seth reports today on Micosoft's Interactive Media Manager, based on RDF and OWL, two key semantic web technologies.

This RDF model allows companies to add nuance and intelligence to media management beyond what is possible with traditional metadata.

While I am on media, I might as well mention, for those who don't allready know, that Joost, which I believe is somehow related to Skype, and that is working on Peer to Peer video, is also using RDF. Not sure how, but since Dan Brickley - one of the brains behind foaf - is working there, it is quite likely going to be very interesting.

Friday Jun 15, 2007

semantics of invalid passports

For those travelers out there who are into XML and semantics, the question is how does one specify that a valid passport has a passport number in it, semantically.

In OWL one can specify that a relation has a cardinality. So for example the mortal:parent relation with domain and range of mortal:Human, would usually be defined as having cardinality 2. So whenever we have something that is a human, we can deduce that it has two parents, even if we don't know who they are.

Note that this is not expressible in a DTD or even in RelaxNG . The best one could say is that a <Person> element could have 0 to 2 <parent> elements. So one could have something like this

<Person>
  <name>Henry Story</name>
  <parent><Person>...</Person></parent>
  <parent><Person>...</Person></parent>
</Person>
One can not say that it must have 2 elements, without thereby specifying documents that are necessarily infinitely long, since each parent is a Person, and so would have to itself have a parent element, etc. etc... With XML we are stuck to the level of syntax.

Working at the level of syntax does have some advantage: it is obvious how to query for the existence of information. One just searches the document using an XQuery. So say I have an XML passport, and I want to find out if it contains a number, it would be easy to find out by searching for the passportNumber attribute for example. The disadvantage is that the query will only succeed on certain types of xml documents, namely those that put the information in that spot. It won't work with information written in other xml passport formats or real paper passports.

Now how does one specify that a passport has a number printed in it? We don't want to say that a doc:Passport has a relation doc:passportNumber with cardinality of 1. Though that seems correct, it would fail to help us find invalid passports that did not have a number printed on them, since

  • a OWL reasoner would add the relation to a blank node anyway by following the suggested owl cardinality rule.
  • there could be a statement as to the passport number written down somewhere else completely, which might have been meshed with the information about the passport. A passport with a passport number written on a separate piece of paper won't help you cross the border...
  • The passport might have had a passport number in it until I cut that information out of the passport, or it got erased in some way by a mischievous border guard. The government databases would still attribute a passport number to my passport. So as soon as I asked them what it is, I would end up having correct knowledge of my passport, yet my passport still be invalid.

Here is the solution presented by Tim Berners Lee:

OWL cardinality might say that a person must have at least one passport number, but it can NOT say that a document about a person contains information about at least one passport number.

N3 rules can, with log:semantics (which related a document to the graph you get from parsing it) and log:includes, and nested graphs:

@forAll x, p, g1.
{   x a Person; passport p.
	p log:semantics g1.
        g1 log:notIncludes  { x passportNumber []   }
}
    =>
{  ?p a InvalidPassport  }.

On the semantic web, as anyone can in principle say anything about anything, you can never make statements about how many passportNumber statements there are without specifying the document in question.

A passport is quite clearly both a document and an object. As an object it can have properties such as a being in your pocket. As a document it tells us something about the world, among other things information about the owner of the passport. There are many source of information in the world. If one wants to find out what possible worlds a particular source of information describes, one has to limit one's query to that source of information.

Note: Semantics of Graphs

Following David Lewis I like to think of a graph as a set of possible worlds which satisfy the patterns of the graph. Tim Berners Lee's formula is saying: find me the set of possible worlds that correctly interpret the passport I have. If this set includes worlds where I don't have a passport number, then my passport is invalid. That is because I can only have such worlds in my interpretation of my passport if I don't have the number on my passport.

This interpretation of graphs must be a little too strong semantically, as it leads to the following problems:

How does one query for documents about mathematical truths? If a document says that "2+2=4" it will be true in all possible worlds, just as the document that says "1+1=2", and so querying for the one will be right if I query for the other. Perhaps here one has to query literally, namely for the string or regex "2+2=4".

About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today