RDFAuth: sketch of a buzzword compliant authentication protocol

Here is a proposal for an authentication scheme that is even simpler than OpenId ( see sequence diagram ), more secure, more RESTful, with fewer points of failure and fewer points of control, that is needed in order to make Open Distributed Social Networks with privacy controls possible.

Update

The following sketch led to the even simpler protocol described in Foaf and SSL creating a global decentralized authentication protocol. It is very close to what is proposed here but builds very closely on SSL, so as to reduce what is new down to nearly nothing.

Background

Ok, so now I have your attention, I would like to first mention that I am a great fan of OpenId. I have blogged about it numerous times and enthusiastically in this space. I came across the idea I will develop below, not because I thought OpenId needed improving, but because I have chosen to follow some very strict architectural guidelines: it had to satisfy RESTful, Resource oriented hyperdata constraints. With the Beatnik Address Book I have proven - to myself at least - that the creation of an Open Distributed Social Network (a hot topic at the moment, see the Economist's recent article on Online social network) is feasible and easy to do. What was missing is a way for people to keep some privacy, clearly a big selling point for the large Social Network Providers such as Facebook. So I went on the search of a solution to create a Open Distributed Social Network with privacy controls. And initially I had thought of using OpenId.

OpenId Limitations

But OpenId has a few problems:

  • First it is really designed to work with the limitations of current web browsers. It is partly because of this that there is a lot of hopping around from the service to the Identity Provider with HTTP redirects. As the Tabulator, Knowee or Beatnik.
  • Parts of OpenId 2, and especially the Attribute Exchange spec really don't feel very RESTful. There is a method for PUTing new property values in a database and a way to remove them that does not use either the HTTP PUT method or the DELETE method.
  • The OpenId Attribute Exchange is nice but not very flexible. It can keep some basic information about a person, but it does not make use of hyperdata. And the way it is set up, it would only be able to do so with great difficulty. A RESTfully published foaf file can give the same information, is a lot more flexible and extensible, whilst also making use of Linked Data, and as it happens also solves the Social Network Data Silo problems. Just that!
  • OpenId requires an Identity Server. There are a couple of problems with this:
    • This server provides a Dynamic service but not a RESTful one. Ie. the representations sent back and forth to it, cannot be cached.
    • The service is a control point. Anyone owning such a service will know which sites you authenticate onto. True, you can set up your own service, but that is clearly not what is happening. The big players are offering their customers OpenIds tied to particular authentication servers, and that is what most people will accept.
As I found out by developing what I am here calling RDFAuth, for want of a better name, none of these restrictions are necessary.

RDFAuth, a sketch

So following my strict architectural guidelines, I came across what I am just calling RDFAuth, but like everything else here this is a sketch and open to change. I am not a security specialist nor an HTTP specialist. I am like someone who comes to an architect in order to build a house on some land he has, with some sketch of what he would like the house to look like, some ideas of what functionality he needs and what the price he is willing to pay is. What I want here is something very simple, that can be made to work with a few perl scripts.

Let me first present the actors and the resources they wish to act upon.

  • Romeo has a Semantic Web Address Book, his User Agent (UA). He is looking for the whereabouts of Juliette.
  • Juliette has a URL identifier ( as I do ) which returns a public foaf representation and links to a protected resource.
  • The protected resource contains information she only wants some people to know, in this instance Romeo. It contains information as to her current whereabouts.
  • Romeo also has a public foaf file. He may have a protected one too, but it does not make an entrance in this scene of the play. His public foaf file links to a public PGP key. I described how that is done in Cryptographic Web of Trust.
  • Romeo's Public key is RESTfully stored on a server somewhere, accessible by URL.

So Romeo wants to find out where Juliette is, but Juliette only wants to reveal this to Romeo. Juliette has told her server to only allow Romeo, identified by his URL, to view the site. She could have also have had a more open policy, allowing any of her or Romeo's friends to have access to this site, as specified by their foaf file. The server could then crawl their respective foaf files at regular intervals to see if it needed to add anyone to the list of people having access to the site. This is what the DIG group did in conjunction with OpenId. Juliette could also have a policy that decides Just In Time, as the person presents herself, whether or not to grant them access. She could use the information in that person's foaf file and relating it to some trust metric to make her decision. How Juliette specifies who gets access to the protected resource here is not part of this protocol. This is completely up to Juliette and the policies she chooses her agent to follow.

So here is the sketch of the sequence of requests and responses.

  1. First Romeo's user Agent knows that Juliette's foaf name is http://juliette.org/#juliette so it sends an HTTP GET request to Juliette's foaf file located of course at http://juliette.org/
    The server responds with a public foaf file containing a link to the protected resource perhaps with the N3
      <> rdfs:seeAlso <protected/juliette> .
    
    Perhaps this could also contain some relations describing that resource as protected, which groups may access it, etc... but that is not necessary.
  2. Romeo's User Agent then decides it wants to check out protected/juliette. It sends a GET request to that resource but this time receives a variation of the Basic Authentication Scheme, perhaps something like:
    HTTP/1.0 401 UNAUTHORIZED
    Server: Knowee/0.4
    Date: Sat, 1 Apr 2008 10:18:15 GMT
    WWW-Authenticate: RdfAuth realm="http://juliette.org/protected/\*" nonce="ILoveYouToo"
    
    The idea is that Juliette's server returns a nonce (in order to avoid replay attacks), and a realm over which this protection will be valid. But I am really making this up here. Better ideas are welcome.
  3. Romeo's web agent then encrypts some string (the realm?) and the nonce with Romeo's private key. Only an agent trusted by Romeo can do this.
  4. The User Agent then sends a new GET request with the encrypted string, and his identifier, perhaps something like this
    GET /protected/juliette HTTP/1.0
    Host: juliette.org
    Authorization: RdfAuth id="http://romeo.name/#romeo" key="THE_REALM_AND_NONCE_ENCRYPTED"
    Content-Type: application/rdf+xml, text/rdf+n3
    
    Since we need an identifier, why not just use Romeos' foaf name? It happens to also point to his foaf file. All the better.
  5. Because Juliette's web server can then use Romeo's foaf name to GET his public foaf file, which contains a link to his public key, as explained in "Cryptographic Web of Trust".
  6. Juliette's web server can then query the returned representation, perhaps meshed with some other information in its database, with something equivalent to the following SPARQL query
    PREFIX wot: <http://xmlns.com/wot/0.1/>
    SELECT ?pgp
    WHERE {
         [] wot:identity <http://romeo.name/#romeo>;
            wot:pubkeyAddress ?pgp .
    } 
    
    The nice thing about working at the semantic layer, is that it decouples the spec a lot from the representation returned. Of course as usage grows those representations that are understood by the most servers will create a de facto convention. Intially I suggest using RDF/XML of course. But it could just as well be N3, RDFa, perhaps even some microformat dialect, or even some GRDDLable XML, as the POWDER working group is proposing to do.
  7. Having found the URL of the PGP key, Juliette's server, can GET it - and as with much else in this protocol cache it for future use.
  8. Having the PGP key, Juliette's server can now decrypt the encrypted string sent to her by Romeo's User Agent. If the decrypted string matches the expected string, Juliette will know that the User Agent has access to Romeo's private key. So she decides this is enough to trust it.
  9. As a result Juliette's server returns the protected representation.
Now Romeo's User Agent knows where Juliette is, displays it, and Romeo rushes off to see her.

Advantages

It should be clear from the sketch what the numerous advantages of this system are over OpenId. (I can't speak of other authentication services as I am not a security expert).

  • The User Agent has no redirects to follow. In the above example it needs to request one resource http://juliette.org/ twice (2 and 4) but that may only be necessary the first time it accesses this resource. The second time the UA can immediately jump to step 3. [but see problem with replay attacks raised in the comments by Ed Davies, and my reply] Furthermore it may be possible - this is a question to HTTP specialists - to merge step 1 and 2. Would it be possible for a request 1. to return a 20x code with the public representation, plus a WWWAuthenticate header, suggesting that the UA can get a more detailed representation of the same resource if authenticated? In any case the redirect rigmarole of OpenId, which is really there to overcome the limitations of current web browsers, in not needed.
  • There is no need for an Attribute Exchange type service. Foaf deals with that in a clear and extensible RESTful manner. This simplifies the spec dramatically.
  • There is no need for an identity server, so one less point of failure, and one less point of control in the system. The public key plays that role in a clean and simple manner
  • The whole protocol is RESTful. This means that all representations can be cached, meaning that steps 5 and 7 need only occur once per individual.
  • As RDF is built for extensibility, and we are being architecturally very clean, the system should be able to grow cleanly.

Contributions

I have been quietly exploring these ideas on the foaf and semantic web mailing lists, where I received a lot of excellent suggestions and feedback.

Finally

So I suppose I am now looking for feedback from a wider community. PGP experts, security experts, REST and HTTP experts, semantic web and linked data experts, only you can help this get somewhere. I will never have the time to learn these fields in enough detail by myself. In any case all this is absolutely obviously simple, and so completely unpatentable :-)

Thanks for taking the time to read this

Comments:

First off, I think this an excellent approach which is well overdue for wider use. I hope you're right that a token server isn't required. This is exactly the sort of thing public keys should be used for.

A few questions about details, though:

"The second time the UA can immediately jump to step 4."

Which nonce does it use? Or, how could this reuse be distinguished from a replay attack?

At step 4 Romeo's user agent identifies the person it is working for (id="http://romeo.name/#romeo") but not which of their public keys are being used. I've not used cryptography much but my understanding is that some people have more than one key; e.g. they have a master key which is long term and a working key which can be revoked. Presumably they'd want the public keys for both referenced in their foaf file (even though the private working key's just on a USB stick in their desk draw whereas the private master key is in a safe in a locked filing cabinet in the basement...). Wouldn't it be better to indicate which key is to be used rather than assume there's just one?

Posted by Ed Davies on March 28, 2008 at 03:19 PM CET #

This proposal appears identical to the 2005 version of LID ... including URLs, FOAF and GPG...

Posted by Johannes Ernst on March 28, 2008 at 05:32 PM CET #

Ed, your point about which PGP key to use came up in the discussion on the semantic web mailing lists. I agree that adding a HTTP header with the URL of the Public Key would solve the problem best. Otherwise Juliette's server would have to try a number of public keys in successsion, which is clearly not optimal.

Johannes: do you have a URL for the 2005 version of LID. I f this has been solved already I'd be really happy to know. Then I could implement it immediately in Beatnik. I'll try searching the web tomorrow for those keywords. A pointer would be helpful though.

Posted by Henry Story on March 28, 2008 at 06:34 PM CET #

Ok I found a URL for LID:
http://lid.netmesh.org/wiki/Main_Page

I like the Light Weight Identity Name.

Ed wrote:
>"The second time the UA can immediately jump to step 4."
>
>Which nonce does it use? Or, how could this reuse be distinguished from a replay attack?

Ah yes. Quite right. I wonder if one could turn things around and have the client generate the nonce, and specify this in the reply. The server could then check that the nonce had not been used before. Perhaps the nonce could be generated with a timestamp. In case the server does not like the nonce it could return the response in 2 with a new nonce. Would that make sense? In any case this would always require step 3. I'll change that and point people to the comments section.

What other techniques are there otherwise to reduce the encryption/decryption cost? Is it high?

Posted by Henry Story on March 29, 2008 at 02:23 AM CET #

If the Capulets can watch the HTTP traffic to see the request (4) in order to replay it then, presumably, they can also see the reply (9) in which case that reply would also need to be encrypted, with Romeo's public key I would imagine. In which case, what's the harm of a replay attack?

I suppose, when they see the results change they could guess that Juliet has moved but that's not much help. I'm not even sure of that - I don't know enough about cryptography but I imagine any practical encryption would put a salt in.

Would this all be better going over HTTPS anyway?

Posted by Ed Davies on March 29, 2008 at 03:06 AM CET #

Ed Davies wrote:
"...in which case that reply would also need to be encrypted, with Romeo's public key I would imagine. In which case, what's the harm of a replay attack?"

I had always thought of using HTTPS myself, but now I think of it, why indeed not encrypt the response with Romeo's public key? Then, as you point out, only he could read it anyway. That could simplify the protocol even further...

I don't see that anything can be deduced by people observing changes in the returned document. A single white space difference should cause massive change in an encrypted document I believe.

Mhh. That turns things around in quite an interesting way... Wow!

I suppose people will say PGP encryption is expensive. Is that true? How expensive? I know that PGP key generation takes forever.

Posted by Henry Story on March 29, 2008 at 03:37 AM CET #

Johannes,

No, this is not identical to LID. For one thing, LID seems to break web architecture in that it uses a single URI to identify both a person (or persona) and a document.

HTTP GET on http://lid.netmesh.org/liddemouser/ (which, it states on http://lid.netmesh.org/wiki/EssenceOfLID , "...identifies a (hypothetical) person, called Mr. LID Demo User") returns a 200 response with a text/html body.

FOAF and its friends are careful to make that distinction.

Posted by Ed Davies on March 29, 2008 at 05:13 AM CET #

About merging step 1 and 2 (partly repeating my comments on #swig for your readers ;) ):
One might just use one FOAF resource for Juliette and just return a 200 on every request. If there's some authentication sent with the request and it is valid and leads to some authorisation, then - depending on the identity of the requester - more/different information is sent back. (Only if it's not valid/recognised, return a 403.) When there is no authentication stuff in the request, only some basic information is returned. In that basic information you could leave some pointers that you could get more if you try to access it with authentication. Note that this information could be in any format, we're not relying on RDF here.
But that might not be very RESTful, putting that in the message body. Also, what do WWW architecture people say about a resource changing its contents in that way? But Juliette's protected resource would just do the same, it would (probably) return different information depending on the identity of the requester. Is that ok? And: is it against the HTTP spec to have a resource being openly accessible (returning 200) but allowing authentication for it nonetheless?

Posted by Simon Reinhardt on March 29, 2008 at 06:57 AM CET #

Oh and another thing:
RDF data is just data and can be accessed in many ways, e.g. by SPARQL or through documents. But indeed the concept of a document is, as Leigh Dodds says (http://www.ldodds.com/blog/archives/000324.html), a useful one.
So maybe I want to split up my FOAF file into several documents which all have their own topic. One for contact data, one for my social network, one for my CV, one for my projects, one for my calendar data & travel plan, ...
How the RDF is stored is orthogonal to that, it could just reside in one triple store and you would create views through SPARQL CONSTRUCT.
Using several documents is also better if you have sectioned your website like that. Each page of your website could link to a different RDF file which is an alternate of it (+ doing conneg for both of them).
Now I wouldn't want to have public and protected versions of all of them. Or even new resources for every requester. Web pages don't work like that either, people just get different information when they login.

Posted by Simon Reinhardt on March 29, 2008 at 07:11 AM CET #

At step 3, the string which is encrypted (digested, really) needs to include the full URI otherwise the Capulets could use the request to probe which other resources Juliette has made accessible to Romeo in the realm, even if they couldn't decode the bits returned.

This could either be via a replay attack if there isn't a nonce to protect against this or, with a nonce, just by being quick and getting their request in before Romeo's step 4 request (thereby also executing a denial of service attack).

Posted by Ed Davies on March 29, 2008 at 12:42 PM CET #

Romeo [nm] should know [e] Juliette [nf] loves [e] him [pm] deeply [o] in [prep] her [pf] heart [a] ... ;-)

Posted by Colm Kennedy on March 30, 2008 at 04:28 AM CEST #

I like this.

I don't (yet) understand the detail, but I'd like to understand why one would criticise openid for being dynamic (not cache-aware) when one would expect cache hits to effectively look like replay attacks. Am I being stupid (again)?

Posted by Bryan on April 01, 2008 at 09:53 AM CEST #

Hi Bryan,

the difference between the sketch proposed here and OpenId is that the public key can be cached by Juliette's server. The next time Romeo requests a protected resource with a different encrypted string, Juliette's server can decrypt the string without making connection to any service.

In the case of OpenId on the other hand, if Romeo requests a different protected resource ( and needs a new session ) then Juliette's server will need to go through the following rigmarole:
- make a request again to the Identity Provider via a redirect through Romeo's browser
- receive the response via a redirect from the Identity Provider through Romeo's browser
- send and encrypted string directly to the Identity Provider, and receive an ok back

Posted by Henry Story on April 01, 2008 at 10:12 AM CEST #

Who killed Alice and Bob?

Posted by Nicolas Mendoza on April 01, 2008 at 11:13 AM CEST #

On the IETF mailing list Henrik Nordstrom commented in the following mail

http://lists.w3.org/Archives/Public/ietf-http-wg/2008AprJun/0008.html

that

[[
You cannot shortcut 1+2 unfortunately while keeping RESTful as HTTP does not allow mixing public and authenticated content on the same URI, and attempting to do so will mess up the cache model of HTTP. But 2 only needs to be performed once for the whole duration of the session (which can be arbitrary long, subject to server controlled constraints).
]]

ok that makes sense: we need two resources, one for public and one for protected content, as illustrated above.

I can imagine, though it may make the protocol more complex, that the client could guess/know that it needs to authenticat at 2 and somehow, by self generating a nonce, step right to 4. This would require the server to approve of the nonce, and so make life more difficult for it. But who knows. One could at least allow it in the protocol. Light weight servers would as a policy just jump back to 3 and send a nonce they approve of.

Henrik also suggests

[[
If you use pgp for authentication then the identification key in the authentication should be the pgp identity (which is a pgp key with some name & email recorded). But it's also worth noting that pgp signatures do embed the needed information to identify which key has made the signature (but not it's distribution points).
]]

Yes I agree too. I think Romeo's User Agent should send the URI of the public key. If there were a way to add a link from the public key back the the URI of Romeo then that would be even better. Perhaps that could be done by linking to an RDF representation of the public key.

More interesting replies in his post.

Posted by Henry Story on April 02, 2008 at 05:23 AM CEST #

"You cannot shortcut 1+2 unfortunately while keeping RESTful as HTTP does not allow mixing public and authenticated content on the same URI, and attempting to do so will mess up the cache model of HTTP."

What about with a response header like this?

Vary: authorization, content-type, ...

I think it's better to make the extra information be a different resource but maybe if somebody argued that the "redacted" public version is just a different representation I'd only grumble quietly rather than scream loudly.

Posted by Ed Davies on April 02, 2008 at 08:52 AM CEST #

Forgive me if I didn't catch something essential here, but what prevents Mallory (to get back to the standard Alice/Bob-names) from sending a fake request, pretending she is http://romeo.name/#romeo - but signed with her own key?

She could then intercept the server's request #5 and give a fake FOAF referring to her forged key, and the server would still be very happy and believe the request is from http://romeo.name/#romeo - which was previously noted as a trusted source.

If this is to avoided, the server must already know which PGP key http://romeo.name/#romeo uses - or do so indirectly - but this means there is no need for the fancy RDF-REST requests from the server.

(In PGP you can say for instance that you trust a key belongs to http://romeo.name/#romeo if three of the people you trust (and met in person) have met and signed Romeo's key.)

BTW, instead of encrypting the returned payload (message-level security) I would rather just let proven technology do the transport-level security (HTTPS), and use this for signing and authentication.

Posted by Stian Soiland-Reyes on April 04, 2008 at 04:31 PM CEST #

I think Toby Inster found the best solution. I illustrate it here:

http://lists.w3.org/Archives/Public/ietf-http-wg/2008AprJun/0017.html

There is a UML diagram at the end of the mail that makes it clear how simple this is.

I am working on an implementation of that. It is really nice because it is built seamlessly on top of https. It just requires one extra header.

Posted by Henry Story on April 13, 2008 at 04:49 PM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today