semantics of invalid passports

For those travelers out there who are into XML and semantics, the question is how does one specify that a valid passport has a passport number in it, semantically.

In OWL one can specify that a relation has a cardinality. So for example the mortal:parent relation with domain and range of mortal:Human, would usually be defined as having cardinality 2. So whenever we have something that is a human, we can deduce that it has two parents, even if we don't know who they are.

Note that this is not expressible in a DTD or even in RelaxNG . The best one could say is that a <Person> element could have 0 to 2 <parent> elements. So one could have something like this

  <name>Henry Story</name>
One can not say that it must have 2 elements, without thereby specifying documents that are necessarily infinitely long, since each parent is a Person, and so would have to itself have a parent element, etc. etc... With XML we are stuck to the level of syntax.

Working at the level of syntax does have some advantage: it is obvious how to query for the existence of information. One just searches the document using an XQuery. So say I have an XML passport, and I want to find out if it contains a number, it would be easy to find out by searching for the passportNumber attribute for example. The disadvantage is that the query will only succeed on certain types of xml documents, namely those that put the information in that spot. It won't work with information written in other xml passport formats or real paper passports.

Now how does one specify that a passport has a number printed in it? We don't want to say that a doc:Passport has a relation doc:passportNumber with cardinality of 1. Though that seems correct, it would fail to help us find invalid passports that did not have a number printed on them, since

  • a OWL reasoner would add the relation to a blank node anyway by following the suggested owl cardinality rule.
  • there could be a statement as to the passport number written down somewhere else completely, which might have been meshed with the information about the passport. A passport with a passport number written on a separate piece of paper won't help you cross the border...
  • The passport might have had a passport number in it until I cut that information out of the passport, or it got erased in some way by a mischievous border guard. The government databases would still attribute a passport number to my passport. So as soon as I asked them what it is, I would end up having correct knowledge of my passport, yet my passport still be invalid.

Here is the solution presented by Tim Berners Lee:

OWL cardinality might say that a person must have at least one passport number, but it can NOT say that a document about a person contains information about at least one passport number.

N3 rules can, with log:semantics (which related a document to the graph you get from parsing it) and log:includes, and nested graphs:

@forAll x, p, g1.
{   x a Person; passport p.
	p log:semantics g1.
        g1 log:notIncludes  { x passportNumber []   }
{  ?p a InvalidPassport  }.

On the semantic web, as anyone can in principle say anything about anything, you can never make statements about how many passportNumber statements there are without specifying the document in question.

A passport is quite clearly both a document and an object. As an object it can have properties such as a being in your pocket. As a document it tells us something about the world, among other things information about the owner of the passport. There are many source of information in the world. If one wants to find out what possible worlds a particular source of information describes, one has to limit one's query to that source of information.

Note: Semantics of Graphs

Following David Lewis I like to think of a graph as a set of possible worlds which satisfy the patterns of the graph. Tim Berners Lee's formula is saying: find me the set of possible worlds that correctly interpret the passport I have. If this set includes worlds where I don't have a passport number, then my passport is invalid. That is because I can only have such worlds in my interpretation of my passport if I don't have the number on my passport.

This interpretation of graphs must be a little too strong semantically, as it leads to the following problems:

How does one query for documents about mathematical truths? If a document says that "2+2=4" it will be true in all possible worlds, just as the document that says "1+1=2", and so querying for the one will be right if I query for the other. Perhaps here one has to query literally, namely for the string or regex "2+2=4".


I don't know anything about RelaxNG, but I suspect that you can declare that any human: hasBioFather (required) hasBioMother (required) Where BioFather and BioMother are subclasses of Parent.

Posted by Taylor on June 15, 2007 at 03:42 PM CEST #

Relax NG is a schema language. It has no notion of subclass. It can just tell you how your document should be written out, where the elements can appear, where the attributes can appear, etc... It is a way of specifying a subset of all the possible xml documents. But it does \*NOT\* give you semantics in the sense of relating the words to the world. It gives you the minimal semantics of relating relaxNG to xml documents perhaps, but that won't be what you want if you want to say that all humans have 2 parents.

Posted by Henry Story on June 16, 2007 at 10:17 AM CEST #

You say: Tim Berners Lee's formula is saying: find me the set of possible worlds that correctly interpret the passport I have. If this set includes worlds where I don't have a passport number, then my passport is invalid.

Since log:notIncludes is a builtin in the CWM - the closed world machine, its probably better (and simpler) paraphrased as: Find me the documents that do not include (and for which cannot be inferred) a number for my passport...

And yes, it is sometimes said that cwm isn't really based on the closed world assumption, but as I understand it, this is because it can load additional data from the web and that's really a question not related to the point here

Posted by Valentin on June 17, 2007 at 07:54 AM CEST #

Post a Comment:
Comments are closed for this entry.



« July 2016