Proof: Data Portability requires Linked Data

Data Portability requires Linked Data. To show this let me take a concrete and topical example that is the core use case of the Data Portability movement: Jane wants to move her account from social network A to social network B. And she wants to do this in a way that entails the minimal loss of information.

Let us suppose Jane wants to make a rich copy, and that she wants to do this without hyperdata. Ideally she would like to have exactly the same information in the new space as she had in the old space. So if Jane had a network of friends in social network A she would like to have the same network of friends in B. But this implies moving all the information about all her friends from A to B, including their social network too. For after all the great thing about one's friends is how they can help us make new friends. But then would one not want to move all the social network of one's friends too? Where does it stop? As William Blake said so well in Auguries of Innocence

        To see a world in a grain of sand,
	And a heaven in a wild flower,
	Hold infinity in the palm of your hand,
	And eternity in an hour.
the problem is that everything is linked in some way, and so it is impossible to move one thing and all its relations from one place to another using just copy by value, without moving everything. A full and rich copy is therefore impossible.

So what about pragmatically limiting ourselves to some subset of the information? We have to reduce our ambitions. So let us limit the data Jane can move to just her personal data and closest social network. So she copies some subset of the information about her friends over to network B. Nice, but who is going to keep that information up to date? When Jane's friend Jack moves house, how is Jane going to know about this in her new social network? Would Jack not have to keep his information on social Network B up to date too? And now if every one of Jack's 1000 friends moves to a different social network, won't he have to now keep 1000 identities up to date on each of those networks? Making it easy for Jane to move social network is going to make life hell for Jack it seems. Well of course not: Jack is never going to keep the information about himself up to date on these other social networks, however limited it is going to be. And so if Jane moves social network she is going to have to leave her friends behind.

The solution of course is not to try to copy the information about one's friends from one social network to another, but rather to move one's own information over and then link back to one's friends in their preferred social network. By linking by reference to one's friends identity one reduces to a minimum the information that needs to be ported whilst maintaining all the relationships that existed previously. Thus one can move one's identity without loss.

The rest follows nearly immediately from these observations. Since the only way to refer to resources in a global namespace is via URIs ( and the most practical way currently is to do this with URLs ), URI's will play the role of pointers in our space. This is the key architectural decision of the semantic web. So by giving people URLs as names we can point to our friends wherever they are, and even move our data without loss. All we need to do when we move our foaf file is to have the web server serve up a HTTP redirect message at the old URL, and all links to our old file will be redirected to our new home.

Notes

Comments:

The hyperdata link is invalid, it's actually missing an s to folktologie(s).

Posted by remy on February 21, 2008 at 05:28 AM CET #

Thanks for the note Remy. I just fixed that.

Posted by Henry Story on February 21, 2008 at 05:38 AM CET #

I must admit that I do not see that you are making an important point. Linked data, whatever means for examples hypertext links is used for linking, simply means that \*symbolic addresses\* are used. While an \*address proper\* uniquely denote a memory location, and hence is dependent on both, the computer and the operating system, a \*symbolic address\* is an abstraction independent of the hardware and basis software.

Symbolic addresses can be al sort of things, provided they is a resolution mechanism associated. If I say to my wife, "this nice little running sush colsed to harajuku", then it is a symbolic address for her.

Hypertext links in whatever formalisms, for example URI, might be good symbolic address. Note however, that URI do not have an agreed upon resolution meachanism. If I mention to you http://bry.de/abX12/, you understand that I am referring to a unique ... well, even what i am speaking of in unknown, say a unique "thing" whatever that word might mean. It is an assumption that software applications exchanging URI on the web know how to resolve them. Thus, if I tell you http://bry.de/abX12/, it is assumed that both of us know whether I speak of my daughter, my belove Marrocan restaurant in Paris 14th arrondissement, or of your blog entry on "proof data protabnility requires linked".

Now, let us turn to social network software exchanging data. If such application decide, as it is common in Germany, to use first-names, last-names, and date-of-birth for uniquely identifying people, then they have perfect symbolic addresses. Well, not so perfect, as there surely are many Andreas Schroeder born on the 21st of April 1984. But they will find way out - or just noot care and leave it out to the users to find out which Andreas Schroeder is the rght one. In France, such applicatiions might use the national identification number buil up, more or less, of a digit for the sex, the date of birth, and the running numberr entry in the birth register. This is also a perfect symbolic identifier. Well, no so perfect because France is not the whole world (what some in France have difficulties to accept).

In fact, symbolic identifiers can only be perfect if there is an entity controlling their use. Often, it is not desirable at all to hve such an entity, because would be a costly centrallized management with social and political side-effects that are by far not neglectable.

Thus, symbolic identifiers in an open an decentral world turn out not t be perfect. Is it that bad? Not at all. If I have, as a user, to find out myself which Andreas Schroeder born on the 21st of April 1984 is the one I am looking for, I can manage very in the search engine age: I add, for example, that the one I am looking for was in the Bay area in the 90es of the 20th century. THat is it.

Let's summarize:

- Symbolic identifiers are necessary in computer networks (as they already are in high-level programming)
- Perfect symbolic identifiers require a centrallized administration
- Imperfect symbolic identifiers work well in the search engine age

My conclusion is that it is perdfectly rasonable to have open standards for social networks not striving for perfecrt symbolic identifiers managed by a soviet-like admin.

Get it?

Posted by François Bry on April 04, 2008 at 01:36 AM CEST #

Francois Bry wrote:
> Hypertext links in whatever formalisms, for example URI, might be good symbolic
> address. Note however, that URI do not have an agreed upon resolution mechanism.

'http://...' addresses have a very well established resolution mechanism. The HTTP protocol specifies clearly how it works.

> If I mention to you http://bry.de/abX12/, you understand that I am referring to a unique ... well,
> even what i am speaking of is unknown, say a unique "thing" whatever that word might
> mean.

it refers to a resource, yes

> It is an assumption that software applications exchanging URI on the web know
> how to resolve them. Thus, if I tell you http://bry.de/abX12/, it is assumed that both of
> us know whether I speak of my daughter, my belove Marrocan restaurant in Paris 14th
> arrondissement, or of your blog entry on "proof data protability requires linked data".

URIs refer to Resources, per definition: URIs stand for Universal Resource Identifiers. You are right that as far as mathematical logic goes it is not important how they refer. And this is reflected in core semantic specs published by the W3C [0]. In practice though, the usefulness of these URIs will depend very much on how easy it is to understand their meaning.

So yes, <http://bry.de/abX12/> could refer to anything. One way of determining its meaning may go to ask the owner of the domain bry.de what that URL means. Clearly that would not be a very efficient way to proceed, and the URL would not get widely reused. This is why in a Linked Data network, the resource <http://bry.de/abX12/> is made to return a human readable HTML and/or a machine readable RDF representations. URL's that point to resources that are set up this way are easy to understand: just click on the URL to GET it's meaning [1]. As a result the use of them will spread: language is a virus. [2]

This is the key to understanding LinkedData (hyperdata). When you click on my foaf name
<http://bblfish.net/people/henry/card#me> you get back information about me. That is why the data is LINKED, as you can jump from one document to another this way.

Just GET it!

Henry

[0] eg: http://www.w3.org/TR/rdf-mt/
[1] http://blogs.sun.com/bblfish/entry/get_my_meaning
[2] http://blogs.sun.com/bblfish/entry/language_is_a_virus

Posted by Henry Story on April 07, 2008 at 02:35 AM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today