Duck Typing done right

Dynamic Languages such as Python, Ruby and Groovy, make a big deal of their flexibility. You can add new methods to classes, extend them, etc... at run time, and do all kinds of funky stuff. You can even treat an object as of a certain type by looking at it's methods. This is called Duck Typing: "If it quacks like a duck and swims like a Duck then it's a duck", goes the well known saying. The main criticism of Duck Typing has been that what is gained in flexibility is lost in precision: it may be good for small projects, but it does not scale. I want to show here both that the criticism is correct, and how to overcome it.

Let us look at Duck Typing a little more closely. If something is a bird that quacks like a duck and swims like a duck, then why not indeed treat it like a duck? Well one reason that occurs immediately, is that in nature there are always weird exceptions. It may be difficult to see the survival advantage of looking like a duck, as opposed to say looking like a lion, but one should never be surprised at the surprising nature of nature.
Anyway, that's not the type of problem people working with duck typing ever have. How come? Well it's simple: they usually limit the interactions of their objects to a certain context, where the objects being dealt with are such that if any one of them quacks like a duck, then it is a duck. And so here we in essence have the reason for the criticism: In order for duck typing to work, one has to limit the context, one has to limit the objects manipulated by the program, in such a way that the duck typing falls out right. Enlarge the context, and at some point you will find objects that don't fit the presuppositions of your code. So: for simple semantic reasons, those programs won't scale. The more the code is mixed and meshed with other code, the more likely it is that an exception will turn up. The context in which the duck typing works is a hidden assumption, usually held in the head of the small group of developers working on the code.

A slightly different way of coming to the same conclusion, is to realize that these programming languages don't really do an analysis of the sound of quacking ducks. Nor do they look at objects and try to classify the way these are swimming. What they do is look at the name of the methods attached on an object, and then do a simple string comparison. If an object has the swim method, they will assume that swim stands for the same type of thing that ducks do. Now of course it is well established that natural language is ambiguous and hence very context dependent. The methods names gain their meaning from their association to english words, which are ambiguous. There may for example be a method named swim, where those letters stand for the acronym "See What I Mean". That method may return a link to some page on the web that describes the subject of the method in more detail, and have no relation to water activities. Calling that method in expectation of a sound will lead to some unexpected results
But once more, this is not a problem duck typing programs usually have. Programmers developing in those languages will be careful to limit the execution of the program to only deal with objects where swim stand for the things ducks do. But it does not take much for that presupposition to fail. Extend the context somewhat by loading some foreign code, and at some point these presuppositions will break down and nasty difficult to locate bugs will surface. Once again, the criticism of duck typing not being scalable is perfectly valid.

So what is the solution? Well it requires one very simple step: one has to use identifiers that are context free. If you can use identifiers for swimming that are universal, then they will alway mean the same thing, and so the problem of ambiguity will never surface. Universal identifiers? Oh yes, we have those: they are called URIs.
Here is an example. Let us

  • name the class of ducks
    <http://a.com/Duck> a owl:Class;
             rdfs:subClassOf <http://a.com/Bird>;
             rdfs:comment "The class of ducks, those living things that waddle around in ponds" .
    
  • name the relation <http://a.com/swimming> which relates a thing to the time it is swimming
     <http://a.com/swimming> a owl:DatatypeProperty;
                             rdfs:domain <http://a.com/Animal> ;
                             rdfs:range xsd:dateTime .
     
  • name the relation <http://a.com/quacking> which relates a thing to the time it is quacking (like a duck)
     <http://a.com/quacking> a owl:DatatypeProperty;
                             rdfs:domain <http://a.com/Duck> ;
                             rdfs:range xsd:dateTime .
    
  • state that an duck is an animal
     <http://a.com/Duck> rdfs:subClassOf <http://a.com/Animal> .
    
Now if you ever see the relation
:d1  <http://a.com/quacking> "2007-05-25T16:43:02"\^\^xsd:dateTime .

then you know that :d1 is a duck ( or that the relation is false, but that is another matter ), and this will be true whatever the context you find the relation in. You know this because the url http://a.com/quacking always refers to the same relation, and that relation was defined as linking ducks to times.
Furthermore notice how you may conclude many more things from the above statement. Perhaps you have an ontology of animals written in OWL, that states that Ducks are one of those animals that always has two parents. Given that, you would be able to conclude that :d1 has two parents, even if you don't know which they are. Animals are physical beings, you may discover by clicking on the http://a.com/Animal URL, and in particular one of those physical things that always has a location. It would therefore be quite correct to query for the location of :d1...
You can get to know a lot of things with just one simple statement. In fact with the semantic web, what that single statement tells you gets richer and richer the more you know. The wider the context of your knowledge the more you know when someone tells you something, since you can use inferencing to deduce all the things you have not been told. The more things you know, the easier it is to make inferences (see Metcalf's law).

In conclusion, duck typing is done right on the semantic web. You don't have to know everything about something to work with what you have, and the more you know the more you can do with the information given to you. You can have duck typing and scale.

Comments:

I must be missing something subtle here. I don't see how the URI http://a.com/Duck is any more or less unique/ambiguous than the single 4-letter word "Duck" - and similarly and more or less ambiguous. Is this fowl or "Duck and cover"? Where's the win? Help!

Posted by Kevin on May 26, 2007 at 04:53 AM CEST #

Just have to say, Perl has all this as well, as does Lisp and any number of other dynamic languages, it was given a name in Python and the then Ruby, so the ecstatic fanboys think it's something that only their faves have.

Posted by Perl Defender on May 26, 2007 at 08:46 AM CEST #

Every time I read something like this, I have to silently thank the universe that industry hasn't tried to foist this 'semantic web' business off on me yet. It looks awful! For "duck typing done right", look at O'Caml? (Both in its object typing, and in how it uses polymorphic variants....)

Posted by guest on May 26, 2007 at 10:30 AM CEST #

Hi Henry,

"So what is the solution? Well it requires one very simple step: one has to use identifiers that are context free. If you can use identifiers for swimming that are universal, then they will alway mean the same thing, and so the problem of ambiguity will never surface."

I don't agree with this - just because the identifier is a URI doesn't mean that people will use it to mean the 'same thing'. Meaning isn't discrete - it's a continuum that varies with context of communication, and I don't think having identifiers with low risk of collision changes this.

Posted by Phil Dawes on May 26, 2007 at 10:30 AM CEST #

Nooooo! Not a universal class hierarchy, please.
You state the need as being when you: "Extend the context somewhat by loading some foreign code". All foreign code would need to be understood and appropriate adapter methods used if signatures didn't match.
If I have a class method that takes an object and calls that objects talk() method expecting a string and have used this extensively in my application; if I were to re-use a body of code where that information is elsewhere in the objects, then I would have to insert a new talk() method in the objects that returned appropriate data. The original, duck-typing method still would not have to explicitly check for the type of object passed to it.
The almost canonical use of duck typing in python is in the acceptance of file-like objects in the standard library. This allows actual file objects to be substituted by instances of StringIO which allows a string to ook like a file by mimicking many of the file classes methods.

- Paddy.

Posted by paddy3118 on May 26, 2007 at 12:27 PM CEST #

In principle this works, but in Java terms, why not just have the method as a top-level entity in a package, then when you import that method, you can use it on any object? If it doesn't exist on that object (including if another package's method with the same name exists) then fail.

Posted by Ricky Clarkson on May 26, 2007 at 01:42 PM CEST #

This post kind of defeats the advantage of duck typing for me. One of the best features that languages supporting this sort of operation have is that I don't have to care whether what I'm actually being passed is in fact a duck. An example of this is Ruby's Comparable mixin. It takes the stance of "I don't care what sort of object you are... As long as you declare the <=> method and it behaves the way I expect it to I'm happy." The best metaphor that comes to mind is "Close enough for government work." So rather than having to support some universal class hierarchy with what looks disturbingly like multiple inheritance (have we learned NOTHING from C++?), duck typists simply limit their scope. I would also argue they don't do this intentionally but implicitly. Nobody I've encountered says "I'm calling the swim method... What if someone passes me an object that implements this method but does something different than I'd expect?" Instead they simply code the method call under the assumption that they'll never encounter these other objects. This only fails to scale if people start calling into code without understanding what that code is supposed to do, thus violating encapsulation (and producing code that might scale in implementation, but not in maintenance). In short I think what you're proposing would require just as much careful planning and plotting as duck typing, but it wouldn't be nearly as fun.

Posted by Jon Olson on May 26, 2007 at 01:58 PM CEST #

So, what you're suggesting is to remove the primary advantage of duck typing: its ease of use! The point of these languages is its comparative painlessness in developing software where you don't \*need\* something like this. Not everything is an over-engineered enterprise system!

Posted by Eric Biesterfeld on May 26, 2007 at 04:11 PM CEST #

Your suggestion is just a different way of using an interface or abstract base class, but it doesn't solve anything. It still doesn't ensure that the method does what you intend it to do. Any solution requires people to adhere to a contract. The issue is whether the contract should be implicit (duck typing) or explicit (interfaces/ABCs). Languages like Java and C# may ensure type compatibility, but they will never ensure behavior compatibility. The real beauty of duck typing is that you can get better reuse of libraries without enforcing a specific type hierarchy. Sure, you have to "do the right thing" to make duck typing work, but that's true for just about anything in programming.

Posted by David Avraamides on May 26, 2007 at 05:25 PM CEST #

Hi, thanks for the numerous responses. I have answered as many as I could in the next post on this blog entitled "Answers to "Duck Typing done Right". I have heard a lot of good of O'Caml by the way, I just don't know much of it. It need to find some time to learn it and see how what I say here applies to it.

Posted by Henry Story on May 26, 2007 at 05:53 PM CEST #

You've essentially reinvented Lisp packages. From my reddit comment:

See, in Common Lisp a package owns its symbols: foo:bar is a different symbol from baz:bar. Furthermore, a package can use symbols from another package: quux might use foo, making quux:bar the same symbol as foo:bar. But since foo:bar/quux:bar on the one hand and baz:bar on the other are different symbols, calling a function named with the one symbol won't ever mistakenly call a function named with the other.

This is essentially no different from naming all functions, classes and objects with URL contexts save that foo:bar is somewhat more attractive than <http://www.foo.com/names/bar>. Packages aren't URLs, but they are unique identifiers (and since package names are themselves Lisp symbols, and symbols can be any string, one could use a URL as a package name if one wished: |http://www.foo.com/names/bar| is a valid Lisp package name, albeit an ugly one.

Posted by Bob Uhl on May 27, 2007 at 12:26 AM CEST #

I'm a little late to post it seems. I don't think that your solution is the best method for handling Duck Typing. What you need is something that lets you describe meaning. Basically, something akin to Concepts in C++.

Posted by Kiriai on May 27, 2007 at 01:12 AM CEST #

Thanks for the comment Bob. This flexibility of Lisp probably explains why Franz has been able to put together such a scalable and efficient RDF server powered by Lisp. (note: I have not tested this myself yet, but am going off reports of many others)

I don't see why URIs would be thought to be ugly though. They are well understood, and with namespaces become very readable. I usually write foaf:knows, rather than "http://xmlns.com/foaf/0.1/knows" . I used the full URIs without namespaces in the examples to emphasize the point.

URIs have the advantage of being well standardized, widely understood, language independent, and have been very successful in creating the largest information space know to man: the web we know today.

Posted by Henry Story on May 27, 2007 at 01:13 AM CEST #

Lots of good comments on this post at reddit by the way.

Posted by Henry Story on May 29, 2007 at 09:34 AM CEST #

Fanboy? Ha. Perl sucks!

Posted by guest on May 30, 2007 at 11:53 PM CEST #

mh.. very late reply.. as someone who have been using python, ruby and ST for more than a decade in total, I'd like to point out that the duck-approach in dynamic languages is not really to check if some object responds to a method and behave ccordignly, but rather to use the method, stop. Tests will ensure that everything works (or not, thus it was not a duck you were cooking). Whenever you see a class/method check in a dynamic languages, 90% of the time it comes from someone not familiar with the language.

Posted by riffraff on June 20, 2007 at 12:26 PM CEST #

I wrote a more detailed critique of this that I would like mentioned in the comments. The critique is at http://paddy3118.blogspot.com/2008/05/duck-typing-done-right-is-wrong.html entitled "Duck Typing Done Right Is Wrong!"

- Paddy.

Posted by Paddy3118 on May 24, 2008 at 01:36 AM CEST #

But what dynamic languages do with duck typing (very succesfully and in many large systems) 'other' languages use interfaces.

Just as a programmer who violates the contract inherent in duck typing by passing in an object that quacks like a duck but isn't a duck, a programmer can also violate the interface contract by 'implementing' the interface incorrectly.

Posted by Michael Foord on May 25, 2008 at 07:17 AM CEST #

> Just as a programmer who violates the contract inherent in duck typing by passing in an object that quacks like a duck but isn't a duck, a programmer can also violate the interface contract by 'implementing' the interface incorrectly.

With interfaces the responsibility is on the interface implementer to follow the contract of the interface. So if something breaks it should be quite clear who was wrong: the interface designer for being unclear in his specification, or the implementor with his broken class.

Now if the methods added in ducktyping had a global namespace, then the problem would be much reduced, because the global namespace would make it very clear what 'quack' was meant. Was it

- info.animals.duck.quack
- gov.us.nasa.bomb.quack ( Quick Attack )

and that would much reduce the danger of calling the wrong method on an object.
This is exactly what the semantic web gives us. It allows us to name things with Universal Resource Identifiers, thereby both making it easy to access information about the thing named, and clearly distinguish the things named.

Posted by Henry Story on May 25, 2008 at 12:35 PM CEST #

Hi Henry,
I can see how the semantic web could help us pin down any foreign IP we bring to a project, Knowing we have gov.us.nasa.bomb.quack/revision/6.3.1 might help us to easily convey any problems we have with external IP back to the vendor, and aid us in keeping track of what constitutes our system, but this has very little to do with Duck Typing, or its scalability. You have to know what that external IP _does_, if you use Duck typing or not.

P.S what happens in the semantic web if companies get bought-out/amalgamated? Often products are re-named, or subsumed into larger packages.

- Paddy.

Posted by Paddy3118 on May 26, 2008 at 03:56 AM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today