Friday Apr 18, 2008

The OpenId Sequence Diagram

OpenId very neatly solves the global identity problem within the constraints of working with legacy browsers. It is a complex protocol though as the following sequence diagram illustrates, and this may be a problem for automated agents that need to jump around the web from hyperlink to hyperlink, as hyperdata agents tend to do.

The diagram illustrates the following scenario. Romeo wants to find the current location of Juliette. So his semantic web user agent GET's her current foaf file. But Juliette wants to protect information about her current whereabouts and reveal it only to people she trusts, so she configures her server to require the user agent to authenticate itself in order to get more information. If the user agent can prove that is is owned by one of her trusted friends, and Romeo in particular, she will deliver the information to it (and so to him).

The steps numbered in the sequence diagram are as follows:

  1. A User Agent fetches a web page that requires authentication. OpenId was designed with legacy web browsers in mind, for which it would return a page containing an OpenId login box such as the one to the right. openid login box In the case of a hyperdata agent as in our use case, the agent would GET a public foaf file, which might contain a link to an OpenId authentication endpoint. Perhaps with some rdf such as the following N3:
    <> openid:login </openidAuth.cgi> .
    
    Perhaps some more information would indicate which resources were protected.
  2. In current practice a human user notices the login box and types his identifying URL in it, such as http://openid.sun.com/bblfish This is the brilliant invention of OpenId: getting hundreds of millions of people to find it natural to identify themselves via a URL, instead of an email. The user then clicks the "Login button".
    In our semantic use case the hyperdata agent would notice the above openid link and would deduce that it needs to login to the site to get more information. Romeo's Id ( http://romeo.net/ perhaps ) would then be POSTed to the /openidAuth.cgi authentication endpoint.
  3. The OpenId authentication endpoint then fetches the web page by GETing Romeo's url http://romeo.net/. This returned representation contains a link in the header of the page pointing Romeo's OpenId server url. If the representation returned is html then this would contain the following in the header
     <link rel="openid.server" href="https://openid.sun.com/openid/service" />
    
  4. The representation returned in step 3, could contain a lot of other information too. A link to a foaf file may not be a bad idea as I described in foaf and openid. The returned representation in step 3 could even be RDFa extended html, in which case this step may not even be necessary. For a hyperdata server the information may be useful, as it may suggest a connection Romeo could have to some other people that would allow it to decide whether it wishes to continue the login process.
  5. Juliette's OpenId authentication endpoint then sends a redirect to Romeo's user agent, directing it towards his OpenId Identity Provider. The redirect also contains the URL of the OpenId authentication cgi, so that in step 8 below the Identity Provider can redirect a message back.
  6. Romeo user agent dutifully redirects romeo to the identity provider, which then returns a form with a username and password entry box.
  7. Romeo's user agent could learn to fill the user name password pair in automatically and even skip the previous step 6 . In any case given the user name and password, the Identity Provider then sends back some cryptographic tokens to the User Agent to have it redirect to the OpenId Authentication cgi at http://juliette.net/openidAuth.cgi.
  8. Romeo's Hyperdata user agent then dutifully redirects back to the OpenId authentication endpoint
  9. The authentication endpoint sends a request to the Openid Identity provider to verify that the cryptographic token is authentic. If it is, a conventional answer is sent back.
  10. The OpenId authentication endpoint finally sends a response back with a session cookie, giving access to various resources on Juliette's web site. Perhaps it even knows to redirect the user agent to a protected resource, though that would have required some information concerning this to have been sent in stage 2.
  11. Finally Romeo's user agent can GET Juliette's protected information if Juliette's hyperdata web server permits it. In this case it will, because Juliette loves Romeo.

All of the steps above could be automatized, so from the user's point of view they may not be complicated. The user agent could even learn to fill in the user name and password required by the Identity Provider. But there are still a very large number of connections between the User Agent and the different services. If these connections are to be secure they would need to protected by SSL (as hinted at by the double line arrows). And SSL connections are not cheap. So the above may be unacceptably slow. On the other hand it would work with a protocol that is growing fast in acceptance.

It is is certainly worth comparing this sequence diagram with the very light weight one presented in "FOAF & SLL: creating a global decentralised authentication protocol".

Thanks again to Benjamin Nowack for bringing the discussion on RDFAuth to thinking about using the OpenId protocol directly as described above. See his post on the semantic web mailing list. Benjamin also pointed to the HTTP OpenID Authentication proposal, which shows how some of the above can be simplified if certain assumptions about the capabilities of the client are made. It would be worth making a sequence diagram of that proposal too.

Friday Mar 28, 2008

RDFAuth: sketch of a buzzword compliant authentication protocol

Here is a proposal for an authentication scheme that is even simpler than OpenId ( see sequence diagram ), more secure, more RESTful, with fewer points of failure and fewer points of control, that is needed in order to make Open Distributed Social Networks with privacy controls possible.

Update

The following sketch led to the even simpler protocol described in Foaf and SSL creating a global decentralized authentication protocol. It is very close to what is proposed here but builds very closely on SSL, so as to reduce what is new down to nearly nothing.

Background

Ok, so now I have your attention, I would like to first mention that I am a great fan of OpenId. I have blogged about it numerous times and enthusiastically in this space. I came across the idea I will develop below, not because I thought OpenId needed improving, but because I have chosen to follow some very strict architectural guidelines: it had to satisfy RESTful, Resource oriented hyperdata constraints. With the Beatnik Address Book I have proven - to myself at least - that the creation of an Open Distributed Social Network (a hot topic at the moment, see the Economist's recent article on Online social network) is feasible and easy to do. What was missing is a way for people to keep some privacy, clearly a big selling point for the large Social Network Providers such as Facebook. So I went on the search of a solution to create a Open Distributed Social Network with privacy controls. And initially I had thought of using OpenId.

OpenId Limitations

But OpenId has a few problems:

  • First it is really designed to work with the limitations of current web browsers. It is partly because of this that there is a lot of hopping around from the service to the Identity Provider with HTTP redirects. As the Tabulator, Knowee or Beatnik.
  • Parts of OpenId 2, and especially the Attribute Exchange spec really don't feel very RESTful. There is a method for PUTing new property values in a database and a way to remove them that does not use either the HTTP PUT method or the DELETE method.
  • The OpenId Attribute Exchange is nice but not very flexible. It can keep some basic information about a person, but it does not make use of hyperdata. And the way it is set up, it would only be able to do so with great difficulty. A RESTfully published foaf file can give the same information, is a lot more flexible and extensible, whilst also making use of Linked Data, and as it happens also solves the Social Network Data Silo problems. Just that!
  • OpenId requires an Identity Server. There are a couple of problems with this:
    • This server provides a Dynamic service but not a RESTful one. Ie. the representations sent back and forth to it, cannot be cached.
    • The service is a control point. Anyone owning such a service will know which sites you authenticate onto. True, you can set up your own service, but that is clearly not what is happening. The big players are offering their customers OpenIds tied to particular authentication servers, and that is what most people will accept.
As I found out by developing what I am here calling RDFAuth, for want of a better name, none of these restrictions are necessary.

RDFAuth, a sketch

So following my strict architectural guidelines, I came across what I am just calling RDFAuth, but like everything else here this is a sketch and open to change. I am not a security specialist nor an HTTP specialist. I am like someone who comes to an architect in order to build a house on some land he has, with some sketch of what he would like the house to look like, some ideas of what functionality he needs and what the price he is willing to pay is. What I want here is something very simple, that can be made to work with a few perl scripts.

Let me first present the actors and the resources they wish to act upon.

  • Romeo has a Semantic Web Address Book, his User Agent (UA). He is looking for the whereabouts of Juliette.
  • Juliette has a URL identifier ( as I do ) which returns a public foaf representation and links to a protected resource.
  • The protected resource contains information she only wants some people to know, in this instance Romeo. It contains information as to her current whereabouts.
  • Romeo also has a public foaf file. He may have a protected one too, but it does not make an entrance in this scene of the play. His public foaf file links to a public PGP key. I described how that is done in Cryptographic Web of Trust.
  • Romeo's Public key is RESTfully stored on a server somewhere, accessible by URL.

So Romeo wants to find out where Juliette is, but Juliette only wants to reveal this to Romeo. Juliette has told her server to only allow Romeo, identified by his URL, to view the site. She could have also have had a more open policy, allowing any of her or Romeo's friends to have access to this site, as specified by their foaf file. The server could then crawl their respective foaf files at regular intervals to see if it needed to add anyone to the list of people having access to the site. This is what the DIG group did in conjunction with OpenId. Juliette could also have a policy that decides Just In Time, as the person presents herself, whether or not to grant them access. She could use the information in that person's foaf file and relating it to some trust metric to make her decision. How Juliette specifies who gets access to the protected resource here is not part of this protocol. This is completely up to Juliette and the policies she chooses her agent to follow.

So here is the sketch of the sequence of requests and responses.

  1. First Romeo's user Agent knows that Juliette's foaf name is http://juliette.org/#juliette so it sends an HTTP GET request to Juliette's foaf file located of course at http://juliette.org/
    The server responds with a public foaf file containing a link to the protected resource perhaps with the N3
      <> rdfs:seeAlso <protected/juliette> .
    
    Perhaps this could also contain some relations describing that resource as protected, which groups may access it, etc... but that is not necessary.
  2. Romeo's User Agent then decides it wants to check out protected/juliette. It sends a GET request to that resource but this time receives a variation of the Basic Authentication Scheme, perhaps something like:
    HTTP/1.0 401 UNAUTHORIZED
    Server: Knowee/0.4
    Date: Sat, 1 Apr 2008 10:18:15 GMT
    WWW-Authenticate: RdfAuth realm="http://juliette.org/protected/\*" nonce="ILoveYouToo"
    
    The idea is that Juliette's server returns a nonce (in order to avoid replay attacks), and a realm over which this protection will be valid. But I am really making this up here. Better ideas are welcome.
  3. Romeo's web agent then encrypts some string (the realm?) and the nonce with Romeo's private key. Only an agent trusted by Romeo can do this.
  4. The User Agent then sends a new GET request with the encrypted string, and his identifier, perhaps something like this
    GET /protected/juliette HTTP/1.0
    Host: juliette.org
    Authorization: RdfAuth id="http://romeo.name/#romeo" key="THE_REALM_AND_NONCE_ENCRYPTED"
    Content-Type: application/rdf+xml, text/rdf+n3
    
    Since we need an identifier, why not just use Romeos' foaf name? It happens to also point to his foaf file. All the better.
  5. Because Juliette's web server can then use Romeo's foaf name to GET his public foaf file, which contains a link to his public key, as explained in "Cryptographic Web of Trust".
  6. Juliette's web server can then query the returned representation, perhaps meshed with some other information in its database, with something equivalent to the following SPARQL query
    PREFIX wot: <http://xmlns.com/wot/0.1/>
    SELECT ?pgp
    WHERE {
         [] wot:identity <http://romeo.name/#romeo>;
            wot:pubkeyAddress ?pgp .
    } 
    
    The nice thing about working at the semantic layer, is that it decouples the spec a lot from the representation returned. Of course as usage grows those representations that are understood by the most servers will create a de facto convention. Intially I suggest using RDF/XML of course. But it could just as well be N3, RDFa, perhaps even some microformat dialect, or even some GRDDLable XML, as the POWDER working group is proposing to do.
  7. Having found the URL of the PGP key, Juliette's server, can GET it - and as with much else in this protocol cache it for future use.
  8. Having the PGP key, Juliette's server can now decrypt the encrypted string sent to her by Romeo's User Agent. If the decrypted string matches the expected string, Juliette will know that the User Agent has access to Romeo's private key. So she decides this is enough to trust it.
  9. As a result Juliette's server returns the protected representation.
Now Romeo's User Agent knows where Juliette is, displays it, and Romeo rushes off to see her.

Advantages

It should be clear from the sketch what the numerous advantages of this system are over OpenId. (I can't speak of other authentication services as I am not a security expert).

  • The User Agent has no redirects to follow. In the above example it needs to request one resource http://juliette.org/ twice (2 and 4) but that may only be necessary the first time it accesses this resource. The second time the UA can immediately jump to step 3. [but see problem with replay attacks raised in the comments by Ed Davies, and my reply] Furthermore it may be possible - this is a question to HTTP specialists - to merge step 1 and 2. Would it be possible for a request 1. to return a 20x code with the public representation, plus a WWWAuthenticate header, suggesting that the UA can get a more detailed representation of the same resource if authenticated? In any case the redirect rigmarole of OpenId, which is really there to overcome the limitations of current web browsers, in not needed.
  • There is no need for an Attribute Exchange type service. Foaf deals with that in a clear and extensible RESTful manner. This simplifies the spec dramatically.
  • There is no need for an identity server, so one less point of failure, and one less point of control in the system. The public key plays that role in a clean and simple manner
  • The whole protocol is RESTful. This means that all representations can be cached, meaning that steps 5 and 7 need only occur once per individual.
  • As RDF is built for extensibility, and we are being architecturally very clean, the system should be able to grow cleanly.

Contributions

I have been quietly exploring these ideas on the foaf and semantic web mailing lists, where I received a lot of excellent suggestions and feedback.

Finally

So I suppose I am now looking for feedback from a wider community. PGP experts, security experts, REST and HTTP experts, semantic web and linked data experts, only you can help this get somewhere. I will never have the time to learn these fields in enough detail by myself. In any case all this is absolutely obviously simple, and so completely unpatentable :-)

Thanks for taking the time to read this

Friday Feb 15, 2008

Proof: Data Portability requires Linked Data

Data Portability requires Linked Data. To show this let me take a concrete and topical example that is the core use case of the Data Portability movement: Jane wants to move her account from social network A to social network B. And she wants to do this in a way that entails the minimal loss of information.

Let us suppose Jane wants to make a rich copy, and that she wants to do this without hyperdata. Ideally she would like to have exactly the same information in the new space as she had in the old space. So if Jane had a network of friends in social network A she would like to have the same network of friends in B. But this implies moving all the information about all her friends from A to B, including their social network too. For after all the great thing about one's friends is how they can help us make new friends. But then would one not want to move all the social network of one's friends too? Where does it stop? As William Blake said so well in Auguries of Innocence

        To see a world in a grain of sand,
	And a heaven in a wild flower,
	Hold infinity in the palm of your hand,
	And eternity in an hour.
the problem is that everything is linked in some way, and so it is impossible to move one thing and all its relations from one place to another using just copy by value, without moving everything. A full and rich copy is therefore impossible.

So what about pragmatically limiting ourselves to some subset of the information? We have to reduce our ambitions. So let us limit the data Jane can move to just her personal data and closest social network. So she copies some subset of the information about her friends over to network B. Nice, but who is going to keep that information up to date? When Jane's friend Jack moves house, how is Jane going to know about this in her new social network? Would Jack not have to keep his information on social Network B up to date too? And now if every one of Jack's 1000 friends moves to a different social network, won't he have to now keep 1000 identities up to date on each of those networks? Making it easy for Jane to move social network is going to make life hell for Jack it seems. Well of course not: Jack is never going to keep the information about himself up to date on these other social networks, however limited it is going to be. And so if Jane moves social network she is going to have to leave her friends behind.

The solution of course is not to try to copy the information about one's friends from one social network to another, but rather to move one's own information over and then link back to one's friends in their preferred social network. By linking by reference to one's friends identity one reduces to a minimum the information that needs to be ported whilst maintaining all the relationships that existed previously. Thus one can move one's identity without loss.

The rest follows nearly immediately from these observations. Since the only way to refer to resources in a global namespace is via URIs ( and the most practical way currently is to do this with URLs ), URI's will play the role of pointers in our space. This is the key architectural decision of the semantic web. So by giving people URLs as names we can point to our friends wherever they are, and even move our data without loss. All we need to do when we move our foaf file is to have the web server serve up a HTTP redirect message at the old URL, and all links to our old file will be redirected to our new home.

Notes

Wednesday Aug 22, 2007

SPARQLing AltaVista: the meaning of forms

Did you know that AltaVista has a SPARQL endpoint? And that all of its results are served up that way? No? Well take the Red Pill and I will show you how deep the rabbit hole goes...

Take the query for the three words "matrix rabbit hole". Go to AltaVista and enter those words into the search box. Press "Find" and you will end up at the result page http://www.altavista.com/web/results?itag=ody&q=rabbit+hole+matrix&kgs=1&kls=0. This page lists the following information for each result:

  • The title of the page
  • The link to the page
  • An extract of the page containing the relevant words
  • A link to more results from that particular web site
In other words it is just the result of the following SPARQL query [1]:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PRFIX eg: <http://altavista.eg/ont>
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
CONSTRUCT {
     ?page dc:title ?title;
           eg:summary ?summary;
           eg:moreResults ?more .
} WHERE {
     ?page dc:title ?title;
           eg:content ?content;
           eg:summary ?summary;
           eg:moreResults ?more .
     ?content pf:textMatch "+matrix +rabbit +hole" .
}
LIMIT 10
OFFSET 0

The AltaVista engineers - and I know them well having worked there for 5 years - of course understand User Interface issues very well, and so they don't return the default XML result format. They pass all their results first through a clever and optimised XSLT transform, that gives you the page that you now see in your browser.

In order to do this, the AltaVista engineers developed - and I can now speak openly about this - a clever mapping between html forms and SPARQL queries. Sadly it is such a long time ago that I worked there now, that my memory is a little dim on the exact manner in which they did this. So please forgive my mistakes. But I am sure we can work this out together.
Html Forms consist essentially of a number of key-value pairs where the end user is asked to provide the values, a form processing agent URL, and an action button if the user answers the question asked of him. Given that, the trick is just to create a simple SPARQL template language, so that one can relate a form processing agent to one or more SPARQL query templates. What does a SPARQL query template look like? Well it is really very similar to a SPARQL query, except that it has ??vars which need to be replaced by values from the form. So the SPARQL template associated with the front page form could be expressed like this:

@prefix fm: <http://altavisat.eg/ont#> .

<http://www.altavista.com/web/results>  a fm:FormHandler;
    fm:template """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PRFIX eg: <http://altavista.eg/ont>
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
CONSTRUCT {
     ?page dc:title ?title;
           eg:summary ?summary;
           eg:moreResults ?more .
} WHERE {
     ?page dc:title ?title;
           eg:content ?content;
           eg:summary ?summary;
           eg:moreResults ?more .
     ?content pf:textMatch ??(\\" ??q \\")
}
LIMIT ???lmt
OFFSET ??( ??stq \* ???lmt )
""" .

So when AltaVista receives a form submission, the AV front end decode it and replaces all the ??key patterns above with the value of key as passed by the form. It then replaces all the ???keyparam patterns with default values set by the user and available from his session state. Finally the ??( ... ) operations are exectued. This can be in the form of a multiplication or a string concatenation as shown above, or other operations such as dealing with defaults. Having done that you end up with the previous SPARQL query, which is ready to be sent to the back end engines. The back engines have a more powerful language to play with, allowing AltaVista to propose new paying services to large customers such as Yahoo [2].

This has a few advantages. It reduces the size of the forms POSTED to AltaVista: a SPARQL query would take a lot more space, which would end up taking a fraction of a second off the processing time and so make it ever so much more painfully obvious that the redirect they are doing to av.rds.yahoo.com is destroying their performance. It would also give end users more freedom than is needed: It is a good policy decision to reduce the uses of a tool, when one aims to shrink one's market.

This mapping could be extreemly useful in a number of other ways.
For one it would help make it clear to machines what the meaning of a form is. Forms are questions asked to an agent. The meaning of the question is usually obvious to a human end user who speaks the language of the web page that is being shown. But for a machine to do the same, it helps to map the form to a semanticall defined query, which can be reasoned with. In this case the answers given by the human is used to construct a question that is sent to the server. In other cases the form is asking the user for his desired, and using this to construct an action. There is some interesting work in mapping different uses of forms to rdf still, but I think this does bring a key element into play.
Having a machine understandable version of a form means that a robot can start putting his rdf glasses and see things right. All that would be needed would be to link the form handler to an XSLT that could transform the resulting html to the SPARQL result format, and each of the thousands of existing web forms suddenly become transparent to the world of machine agents. [3]
It could also help reduce the work of defining new Protocols. The good part of OpenId Attribute Exchange for example is just a complex specification for a limited SPARQL template, if you put your rdf glasses on.[4]

With time you get to see the real structure of the world. As that happens the questions you start asking become a lot more interesting.

Notes

  1. The pf:textMatch relation is defined by the LARQ, the Jena Lucene-ARQ free text indexing for SPARQL extension, and it makes a lot of sense.
    The namesppace is sadly not dereferenceable. IE. Clicking on http://jena.hpl.hp.com/ARQ/property does not give you the definition of the relation. It would be nice if it did.
    Note also how in SPARQL you can have literals as subjects. This is explained in section 12.1.4 of the current SPARQL query language specification.
    Thanks to Andy Seaborne for the link.
  2. I do know of course that AltaVista is part of Yahoo! now. And by the way all of the above is meant to be taken with a pinch of salt red pills.
  3. This is called screen scraping, and is of course more work for the consumer. It is nicer when the information provider has a strong interest in providing a stable format.
  4. A large part of the spec is a duplication of the work that should be done by HTTP verbs such as GET, PUT, POST and DELETE. Using the Atom Protocol to publish a foaf file would deal with a large part of the spec in a few lines. Well that's what it looks like to me after studying the spec for a few hours only, and so I may have missed something.

Friday Aug 17, 2007

Open Data: Information wants to be linked

With over 2 billion relations from the great web community data projects such as Wikipedia, Project Gutenberg, Music Brainz, and many more... the Linking Open Data initiative is tying together a vast pool of quality machine readable information on which one can run any of the over 500 Semantic Web tools. As the value of linked information increases much faster than the networks described by Metcalf's Law, the value of this must be tremendous.

By creating data browsing interfaces such as Tabulator, one has a very simple RESTful, Resource Oriented Architecture API to work with. With various SPARQL endpoints available or to be built, one can treat that information like a hugely powerful database.

Forget Web APIs: long live linked data!

Some of the projects listed are:

Thursday Aug 09, 2007

cryptographic web of trust

As our identity moves more and more onto the Web our ability to have people trust that what we write has not been altered is becoming increasingly important. As our home page becomes our OpenId, linking to our CV, blogs and foaf information, it will become more important for services to be able to trust that the information they see really is what I wrote. Otherwise the following scenario can become all to easy to imagine. Someone breaks into my web server and changes the OpenId link on my home page to point to a server they control. They then go and post comments around the web using my openid as identity. Their server of course always authenticates them. People who receive the posts then click on the OpenId URL which happens to be my home page, read information about me, and seeing that information trust that the comment really came from me.

What is needed is some way to increase the methods people can have to trust information I state. Here I describe how one can use cryptography to increase that trust level. Starting from my creation of a PGP key, I show how I can describe my public signature in my foaf file, used PGP to sign that file and then link to the signature so that people can detect a tampered file. From this basis I show how one can build a very solid cryptographically enhanced web of trust.

Creating your PGP public key

After reading the first part of "SafariBooks Online, and feeling comfortable that I understood the basics, I decided it was high time for me to create myself a Public PGP key. Reading the GPG Manual and a few other HOWTOs on the web, using the gnu GPG library, I managed to make myself one quite easily.

Linking to the Public Key from your foaf file

Having this it was just a matter of placing it on my web site and using the Web Of Trust ontology developed by Dan Brickley to point to it and describe it, with the following triples:

@prefix wot: <http://xmlns.com/wot/0.1/> .
@prefix : <http://bblfish.net/people/henry/card#> .

:me  is wot:identity of [ a wot:PubKey;
                        wot:pubkeyAddress <http://bblfish.net/people/henry/henry.pubkey.asc>;
                        wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7";
                        wot:hex_id "4CAE10D7";
                        wot:length 1024 ] .

The is wot:identity of construct is a nice N3 short hand for refering to the inverse relation of wot:identity, without having to name one. This states that my public key can be found at http://bblfish.net/people/henry/henry.pubkey.asc, and describe its fingerprint, key length and its hex id. I am not sure why the wot:PubKey resource has to be a blank node, and can't be the URL of the public key itself, which would make for the following N3:

:me  is wot:identity of [ = < http://bblfish.net/people/henry/henry.pubkey.asc>
                          a wot:PubKey;
                          wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7" ] .

Perhaps simply because it is quite likely that one would want to put copies of one's public key in dfiferent places? owl:sameAs could have done the trick there too though...

Signing your foaf file

Anyway, once that is done, I want to be able to sign my foaf file. Of course it would be pretty tricky to sign the foaf file and put the signature into the foaf file simultaneously, as that would change the content of the foaf file and make the signature invalid. So the easiest solution is to simply have the foaf file point to the signature with something like this [1]

<> wot:assurance <card.asc>

The problem with this solution is that my foaf file at http://bblfish.net/people/henry/card currently returns two different representations: a rdf/xml one and an N3 one depending one how it is called. (More on this in I have a Web 2.0 name!). Now a signature is valid only for a sequence of bits, and the rdf sequence of bits is different from the N3 sequence, so they can't both have the same sig. There is a complicated solution developed by Jeremy Carroll in his paper Signing RDF Graphs, which proposes to sign a canonicalised RDF graph. The problem is that the algorithm to create such a graph does take some time to compute, only works on a subset of all RDF graphs, but mostly that no software currently implements that algorithm.
Luckily there is a simple solution, which I got by inspiration from my work on creating an ontology for Atom, Atom OWL, and that is to link my card explicity to their alternate representation (the <link type="alternate" href="...">) [2] and to sign those alternate representations. This gives me the following triples:

@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix awol: <http://bblfish.net/work/atom-owl/2006-06-06/#> .

<http://bblfish.net/people/henry/card>   a foaf:PersonalProfileDocument;
     iana:alternate <http://bblfish.net/people/henry/card.rdf>,
                    <http://bblfish.net/people/henry/card.n3> .

<http://bblfish.net/people/henry/card.rdf>
       wot:assurance <http://bblfish.net/people/henry/asc.card.rdf.asc> ;
       awol:type "application/rdf+xml" .
<http://bblfish.net/people/henry/card.n3>
       wot:assurance &lhttp://bblfish.net/people/henry/asc.card.n3.asc> ;
       awol:type "text/rdf+n3" .

Here I am saying that my card has two alternate representations, an rdf and an n3 one, what their mime type is, and where I can find the signature for each. Simple.

So if I put all of the above together I get the following extract from my N3 file:

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix wot: <http://xmlns.com/wot/0.1/> .
@prefix awol: <http://bblfish.net/work/atom-owl/2006-06-06/#> .
@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix : <http://bblfish.net/people/henry/card#> .

<http://bblfish.net/people/henry/card>   a foaf:PersonalProfileDocument;
     foaf:maker :me;
     foaf:title "Henry Story's FOAF file";
     foaf:primaryTopic :me ;
     iana:alternate <http://bblfish.net/people/henry/card.rdf>,
                    <http://bblfish.net/people/henry/card.n3> .

<http://bblfish.net/people/henry/card.rdf>
       wot:assurance <http://bblfish.net/people/henry/asc.card.rdf.asc> ;
       awol:type "application/rdf+xml" .
<http://bblfish.net/people/henry/card.n3>
       wot:assurance <http://bblfish.net/people/henry/asc.card.n3.asc> ;
       awol:type "text/rdf+n3" .

:me    a foaf:Person;
       foaf:title "Mr";
       foaf:family_name "Story";
       foaf:givenname "Henry";
       foaf:openid <http://openid.sun.com/bblfish> ;
       foaf:openid <http://bblfish.videntity.org/> ;
       is wot:identity of [ a wot:PubKey;
                            wot:pubkeyAddress <ttp://bblfish.net/people/henry/henry.pubkey.asc>;
                            wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7";
                            wot:hex_id "4CAE10D7";
                            wot:length 1024 ];

Which can be graphical represented as follows:

Building a web of Trust

From here is is easy to see how I can use the wot ontology to sign other files, link to my friends public signatures, sign their public signatures, how they could sign mine, etc, etc. and thereby create a cryptographically enhanced Web of Trust built on decentralised identity. Of course this is still a little complicated to put together by hand, but it should be really easy to automate, by incorporating it into the Beatnik Address Book for example. To illustrate this I will show I linked up to Dan Brickley and Tim Berners Lee's signature.

Dan Brickley has a foaf file which links to his public key and to the signature of his file. The relevant part of the graph is:

   <http://danbri.org/foaf.rdf>     a foaf:PersonalProfileDocument;
               wot:assurance <http://danbri.org/foaf.rdf/foaf.rdf.asc> .

   <http://danbri.org/foaf.rdf#danbri> 
               foaf:pubkeyAddress <http://danbri.org/danbri-pubkey.txt>

(foaf:pubkeyAddress is a relation that is not defined in foaf yet. Dan, one of the co-creators of foaf, is clearly experimenting here) Now with this information I can download Danbri's rdf, the public key and the signature and test that the document has not been tampered with. On the command line I do it like this:

bblfish$ gpg --import danbri.pubkey.asc
bblfish$ curl http://danbri.org/foaf.rdf > danbri.foaf.rdf
bblfish$ curl http://danbri.org/foaf.rdf.asc > danbri.foaf.rdf.asc
bblfish$ gpg --verify danbri.foaf.rdf.asc danbri.foaf.rdf

Of course this assumes that someone has not broken onto his server and changed all those files. As it happens I chatted with Dan over Skype (which I got from his foaf file!) and he sent me his public key that way. It certainly felt like I had Dan on the other side of the line, so I trust this public key enough. But why not publish what public key I am relying on? Then Dan and others can know what signature I am using and correct me if I am wrong. So I added the following to my foaf file

:me foaf:knows    [ = <http://danbri.org/foaf.rdf#danbri>;
                    a foaf:Person;
                    foaf:name  "Dan Brickley";
                    is wot:identity of
                          [ a wot:PubKey;
                            wot:pubkeyAddress <http://danbri.org/danbri-pubkey.txt>;
                            wot:hex_id "B573B63A" ]
                  ] .

In the above I am linking to Dan's public key on his server, the one that might have been compromised. Note that I specify the wot:hex_id of the public key though. Finding another public key with the same hex would be tremendously difficult I am told. But who knows. I have not done the maths on this. But I can make it even more difficult by signing his public key with my key, and placing that signature on my server.

bblfish$ gpg -a --detach-sign danbri.pubkey.asc

Then I can make that signature public by linking to it from my foaf file

<http://danbri.org/danbri-pubkey.txt> wot:assurance <danbri.pubkey.asc.asc> .

Now people who read my foaf file will know how I verify Dan's information, and detect if something does not fit between what I say and what Dan's says. If they then do the same when linking to my foaf file - that is mention in it where one is to find my public key, it's hash and sign my public key with their signature, placing that on their server - then anyone who would want to compromise any of us consistently to the outside world, would have to compromise all of the information on each of our servers consistently. As more people are added to the network, and link up to each other, the complexity of doing this grows exponentially. As there are more ways to tie information together there are more ways people can find the information, so the information becomes more valuable (as described in RDF and Metcalf's Law), and there are more ways to find inconsistencies, thereby making the information more reliable, thereby making it more valuable, and so on, and so on...

One can even make things more complicated for a wannabe hacker by placing other people's public keys one one's server, thereby duplicating information. Tim Berners Lee does not link to his public key, but I got it over irc, and published it on my web server. Then I can add that information to my foaf file:

:me foaf:knows [ = <http://www.w3.org/People/Berners-Lee/card#i>;
                    a foaf:Person;
                    foaf:name "Tim Berners Lee";
                    is wot:identity of
                           [ a wot:Pubkey;
                             wot:pubkeyAddress <timbl.pubkey.asc> ;
                             wot:hex_id "9FC3D57E" ];
                  ] .

By making public what public keys I use, I get the following benefits:

  • people can contact me and let me know if I am being taken for a ride,
  • it helps them locate public keys
  • the metadata associated with a public key grows
  • more people link into the cryptography network, making it more likely that more tools will build into this system

Encrypting parts of one's foaf file

Now that you have my public signature you can send me encrypted mail or other files using my public signature, verify signatures I place around the web and read files I encrypt with my private key. Of the files that I can encrypt one interesting one is my own foaf file. Of course encrypting the whole foaf file is not so helpful, as it breaks down the web of trust piece. But there may well be pieces of a foaf file that I would like to encrypt for only some people to see. This can be done quite easily as described in "Encrypting Foaf Files" [3]. Note that the encrypted file can be encrypted for a number of different users simultaneously. This would be simple to do, and would be of great help in making current practice available in an intelligent and coherent way to the semantic web. One thing I could do is encrypt it for all of my friend whose public key I know. I am not sure what kind of information I would want to do this with, but well, it's good to know its possible.

Notes

  1. This is how "PGP Signing FOAF Files" describes the procedure.
  2. The iana alternate (http://www.iana.org/assignments/relation/) relation is not dereferenceable. It would be very helpful if iana made ontologies for each of their relations available.
  3. Thanks to Stephen Livingstone for pointing that way in a reply to this blog post on the openid mailing list.
  4. A huge numbers of resources on the subject can be found on the Semantic Web Trust and Security Resource Guide, which I found via the Wikipedia Web of Trust page.
  5. This can be used to build a very simple but powerful web authentication protocol as described in my March 2008 article RDFAuth: sketch of a buzzword compliant authentication protocol

Wednesday Jun 06, 2007

http://openid.sun.com/bblfish

That's my new OpenId.

I was able to successfully log onto:

(Using Safari but not Firefox 2.004 !?)

And I did not have to invent a new username and password, nor fill out any form, other than to fill out my id. I did not have to wait for an email confirmation, nor send an email response or go to a web verification site. I did not have to add one more password to my keychain.
A really small step, but oh what a useful one!

I can add this info to my foaf file with the simple relation :

<http://bblfish.net/people/henry/card#me> <http://xmlns.com/foaf/0.1/openid> <http://openid.sun.com/bblfish> .
This will come in very useful, one way or another. See the article "foaf and openid", for an example.

Thursday May 17, 2007

Webcards: a Mozilla microformats plugin

Last week Jiri Kopsa pointed me to the very useful mozilla extension for microformats called Webcards. Install it, reboot, and Mozilla will then pop up a discreet green bar on web pages that follow the microformats guidelines. So for example on Sean Bechhofer page 1 vcard is detected. Clicking on that information brings up a slick panel with more links to other sources on the web, such as delicious if tags are detected, or in this case linkedin and a a stylish button that will add the vcard to one's address book [1].

Microformats are a really simple way to add a little structure to a web page. As I understand it from our experimentation a year ago [2] in adding this to BlogEd[3], one reason it was successful even before the existence of such extensions, is that it allowed web developers to exchange css style sheets more easily, and it reduced the need to come up with one's own style sheet vocabulary. So people agreeing to name the classes the same way, however far apart they lived, could then build on each other's work. As a result a lot of data appeared that can the be used by extensions such as Webcards.

Webcards really shows how useful a little structure can be. One can add addresses to one's address book, and appointments to one's calendar with the click of a button. The publisher gains by using these in improvements to their web site design. So everybody is happy. One downside as far as structure goes is that due to lack of namespaces support, there is a bottleneck in extending the format. One has to go through the the very friendly microformats group, and they have stated that they really only want to deal with the most common formats. So it is not a solution to any and every data needs. For that one should look at eRDF or RDFa extensions to xhtml and html. I don't have an opinion on which is best. Perhaps a good starting point is this comparison chart.

The structured web is a continuum, from the least structured (plain text), on through html, to rdf. For a very good but long analysis, see the article by Mike Bergman An Intrepid Guide to Ontologies which covers this in serious depth.

As rdf formats such as foaf and sioc gain momentum, similar but perhaps less slick, mozilla extensions have appeared on the web. One such is the Semantic Radar extension. Perhaps Webcards will be able to detect the use of such vocabularies in RDFa or eRDF extended web pages too, using technologies similar to that offered by the Operator plugin as described recently by Ellias Torres.

[1] note this does not work on OSX. I had to save the file to the hard drive, rename it with a vcf extension, before this could get added to Apple's Address Book.
[2] Thanks to the help of Anotoine Moreau de Bellaing (no web page?)
[3] I know, BlogEd has not been moving a lot recently. It's just that I felt the Blog Editor space was pretty crowded allready, and that there were perhaps more valuable things to do on the structured data front in Sun.

Monday Nov 06, 2006

Long term vision for the future of the world

Nova Spivack has just published a long but very well written and accessible paper in which he describes his long term vision of the evolution of the web and the world. Don't read it if you don't like science fiction, or if standing at great heights makes you dizzy.

Starting with a quick summary of the latest developments in web technology as they have started to become visible in what we are now calling Web 2.0, he draws a picture showing how the initial semantic stepping stones provided by services such as del.icio.us, flickr, digg, linkedin etc, are leading us towards a world in which the web will slowly gain in intelligence in little distributed pockets here and there, and then in more and more wide areas of the web, until linked up altogether we develop what could be called a global mind that has many similarities to Ken Wilber's and Jean Gebser integral stage of consciousness. Just as we are made up of cells - the same types of cells that could be found billions of years ago, but organized and coordinated to form an intelligent whole - so in the global mind we will be the cells and the semantically integrated web will be our neural network.

In The Meaning and future of the Semantic Web, Nova Spivack, who runs a Semantic Web startup company Radar Networks, and has a good history of being ahead of the curve (he started Earth Web, also known as Gamelan, a Yahoo for Java applets which gave my first java applet back in 1996 a little "cool" prize, when java had only just popped onto the landscape) gives a good long term vision for where to go. If you read that don't forget that a Journey of a thousand miles begins beneath one's feet. It is good to know where you are going, but the length of the journey may be frightening to some. Don't try to want to be at the end of the journey allready. You'll be there soon enough, trust me, time passes very quickly. Enjoy ever second of it. That's what I do when I go on long bicycle rides. If you know where you are going, you'll have more time to admire the landscape around you.

It is worth comparing such a vision with the one painted by Douglas Adams in 1990, in A Hitch Hiker's Guide to cyberspace.

About

bblfish

Search

Archives
« July 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
   
       
Today