Answers to "Duck Typing Done Right"

I woke up this morning with a large number of comments to my previous post "Duck Typing Done right" . It would be confusing to answer them all together in the comments section there, so I aggregated my responses here.

I realize the material covered here is very new to many people. Luckily it is very easy to understand. For a quick introduction see my short Video introduction to the Semantic Web.

Also I should mention that the RDF is a declarative framework. So its relationship to method Duck Typing is not a direct one. But nevertheless there is a lot to learn by understanding the simplicity of the RDF framework.

On the reference of Strings

Kevin asks why the URI "http://a.com/Duck" is less ambigous than a string "Duck". In one respect Kevin is completely correct. In RDF they are both equally precise. But what they refer to is quite different from what one expects. The string "Duck" refers to the string "Duck". A URI on the other hand refers to the resource identified by it; URIs stand for Universal Resource Identifiers after all. The URI "http://a.com/Duck", as defined above, refers to the set of Ducks. How do you know? Well you should be able to GET <http://a.com/Duck> and receive a human or machine representation for it, selectable via content negotiation. This won't work in the simple examples I gave in my previous post, as they were just quick examples I hacked together by way of illustration. But try GETing <http://xmlns.com/foaf/0.1/knows> for a real working example. See my longer post on this issue GET my meaning?

Think about the web. Everyday you type in URLs into a web browser and you get the page you want. When you type "http://google.com/" you don't sometimes get <http://altavista.com>. The web works as well as it does, because URLs identify things uniquely. Everyone can mint their own if they own some section of the namespace, and PUT the meaning for that resource at that resource's location.

On Ambiguity and Vagueness

Phil Daws is correct to point out that URIs don't remove all fuzziness or vagueness. We can have fuzzy or vague concepts, and that is a good thing. foaf:knows (<http://xmlns.com/foaf/0.1/knows> ) whilst unambigous is quite a fuzzily defined relation. If you click on it's URL this is what you will get:

We take a broad view of 'knows', but do require some form of reciprocated interaction (ie. stalkers need not apply). Since social attitudes and conventions on this topic vary greatly between communities, counties and cultures, it is not appropriate for FOAF to be overly-specific here.

If someone foaf:knows a person, it would be usual for the relation to be reciprocated. However this doesn't mean that there is any obligation for either party to publish FOAF describing this relationship. A foaf:knows relationship does not imply friendship, endorsement, or that a face-to-face meeting has taken place: phone, fax, email, and smoke signals are all perfectly acceptable ways of communicating with people you know.

You probably know hundreds of people, yet might only list a few in your public FOAF file. That's OK. Or you might list them all. It is perfectly fine to have a FOAF file and not list anyone else in it at all. This illustrates the Semantic Web principle of partial description: RDF documents rarely describe the entire picture. There is always more to be said, more information living elsewhere in the Web (or in our heads...).

Since foaf:knows is vague by design, it may be suprising that it has uses. Typically these involve combining other RDF properties. For example, an application might look at properties of each foaf:weblog that was foaf:made by someone you "foaf:knows". Or check the newsfeed of the online photo archive for each of these people, to show you recent photos taken by people you know.

For more information on this see my post "Fuzzy thinking in Berkeley"

On UFOs

Paddy worries that this requires a Universal Class Hierarchy. No worries there. The Semantic Web is designed to work in a distributed way. People can grow their vocabularies, just like we all have grown the web by each publishing our own files on it. The Semantic Web is about linked data. The semantic web does not require UFOs (Unified Foundational Ontologies) to get going, and it may never need them at all, though I suspect that having one could be very helpful. See my longer post UFO's seen growing on the Web.

Relations are first class objects

Paddy and Jon Olson were mislead by my uses of classes to think that RDF ties relations/properties to classes. They don't. Relations in RDF are first class citizens, as you may see in the Dublin Core metadata initiative, which defines a set of very simple and very general relations to describe resources on the web, such as dc:author, dc:created etc... I think we need a :sparql relation that would relate anything to an authoritative SPARQL endpoint, for example. There clearly is no need to constrain the domain of such a relation in any way.

Scalability and efficiency

Jon Olson agrees with me that duck typing is good enough for some very large and good software projects. One of my favorite semantic web tools for example is cwm, which is written in python. When I say Duck Typing does not scale as implemented in those languages, I mean really big scale, like you know, the web. URIs is what has allowed the web to scale to the whole planet, and what will allow it to scale into structured data way beyond what we may even be comfortable imagining right now. This is not over engineered at all as Eric Biesterfeld fears. In fact it works because it gets the key elements right. And they are very simple as I demonstrated in my recent JavaOne BOF. The key concepts are:
  • URIs refer to resources,
  • resources return representations,
  • to describe something on the web one needs to
    • refer to the thing one wishes to describe, and that requires a URI,
    • second specify the property relation one wishes to attribute to it (and that also requires a URI)
    • and finally specify the value of that property.
That's it.

Semantics

An anonymous writer mentions the "ugliness" of the syntax. This is not a problem. The semantic web is about semantic (see the illustration on this post) It defines the relationship of a string to what it names. It does not require a specific syntax. If you don't like the xml/rdf syntax, which most people think is overly complicated, then use the N3 syntax, or come up with something better.

On Other Languages

As mentioned above there need not be one syntax for RDF. Of course it helps in communication if we agree on something, and currently, for better of for worse that is rdf/xml.

But that does not mean that other procedural languages cannot play well with it. They can since the syntax is not what is important, but the semantics, and those are very well defined.

There are a number of very useful bindings in pretty much every language. From Franz lisp to the redland library for c, python, perl, ruby, to Prolog bindings, and many Java bindings such as Sesame and Jena. Way too much to list here. For a very comprehensive overview see Mike Bergman's full survey of Semantic tools.

Note

I have received a huge amount of hits from reddit. Way over 500. If it is still on the top page when you read this, take the time to vote for it :-)
Comments:

Lots of good comments on reddit by the way.

Posted by Henry Story on May 29, 2007 at 09:36 AM CEST #

Could someone please explain why, when URIs are required for things like namespaces, everyone invariably uses URLs rather than URNs? (this is not a sarcastic comment, by the way, I would really like to know).

When I first learnt HTML years ago, one of the most confusing concepts was that of namespaces. The examples always used URLs for namespaces. It was really hard to grasp that these apparent addresses did not really exist at all, they were really just being used as unique names.

Relatively recently, last year, I finally got around to sussing out the differences between URIs, URLs and URNs (a couple of good explanations in the Wikipedia: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier and http://en.wikipedia.org/wiki/Uniform_Resource_Name).

It seems to me that URNs are far clearer for naming or categorizing things than URLs are. Compare "http://a.com/Duck" (URL) with "urn:animal:ferret" (URN - from Wikipedia). The URL looks like an address, the URN doesn't.

Posted by SimonTeW on May 30, 2007 at 08:54 PM CEST #

Hi Simon,

I think the people who use URLs as namespaces and then don't put the description of what the names mean or the relax ng for the syntax at that location , are confused, not you. It is a bit like having an anchor href on a web page to some other page that does not exist. Sites that do that get to be annoying. So much so that one will tend not to visit them anymore.

The web allows pointers to point to nowhere. It is a simple pragmatic fact that sites that do that won't get much traffic.

The same will ably to the machine readable web. It is just that it is so new and people developing it have come from such diverse backgrounds that these type of obvious things got lost in the fray. (I don't want to say that using URNs is always a bad idea btw.)

Another reason is perhaps nobody could quite agree on what should be there. In the semantic web community it is now widely agreed that each term should be dereferenceable at...

As far as the discussion between URLs and URNs go and read Norman Walsh's post Names and Addresses. It summarises the issue very well.

Posted by Henry Story on May 30, 2007 at 09:27 PM CEST #

Reading a long blog comment on this post by Jesse Noller, I came across this very good discussion Why type systems are interesting.

Posted by Henry Story on May 31, 2007 at 02:11 PM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today