Web Finger proposals overview

If all you had was an email address, would it not be nice to be able to have a mechanism to find someone's home page or OpenId from it? Two proposals have been put forward to show how this could be done. I will look at them and add a sketch of my own that hopefully should lead us to a solution that takes the best of both proposals.

The WebFinger GoogleCode page explains what webfinger is very well:

Back in the day you could, given somebody's UNIX account (email address), type
$ finger email@example.com 
and get some information about that person, whatever they wanted to share: perhaps their office location, phone number, URL, current activities, etc.

The new ideas generalize this to the web, by following a very simple insight: If you have an email address like henry.story@sun.com, then the owner of sun.com is responsible for managing the email. That is the same organization responsible for managing the web site http://sun.com. So all that is needed is some machine readable pointer from http://sun.com/ to a lookup giving more information about owner of the email address. That's it!

The WebFinger proposal

The WebFinger proposed solution showed the way so I will start from here. It is not too complicated, at least as described by John Panzer's "Personal Web Discovery" post.

John suggests that there should be a convention that servers have a file in the /host-meta root location of the HTTP server to describe metadata about the site. (This seems to me to break web architecture. But never mind: the resource http://sun.com/ can have a link to some file that describes a mapping from email ids to information about it.) The WebFinger solution is to have that resource be in a new application/host-meta file format. (not xml btw). This would have mapping of the form

Link-Pattern: <http://meta.sun.com/?q={%uri}>; 
    rel="describedby";type="application/xrd+xml"
So if you wanted to find out about me, you'd be able to do a simple HTTP GET request on http://meta.sun.com/?q=henry.story@sun.com, which will return a representation in another new application/xrd+xml format about the user.

The idea is really good, but it has three more or less important flaws:

  • It seems to require by convention all web sites to set up a /host-meta location on their web servers. Making such a global requirement seems a bit strong, and does not in my opinion follow web architecture. It is not up to a spec to describe the meaning of URIs, especially those belonging to other people.
  • It seems to require a non xml application/host-meta format
  • It creates yet another file format to describe resources the application/xrd+xml. It is better to describe resources at a semantic level using the Resouces Description Framework, and not enter the format battle zone. To describe people there is already the widely known friend of a friend ontology, which can be clearly extended by anyone. Luckily it would be easy for the XRD format to participate in this, by simply creating a GRDDL mapping to the semantics.

All these new format creation's are a real pain. They require new parsers, testing of the spec, mapping to semantics, etc... There is no reason to do this anymore, it is a solved problem.

But lots of kudos for the good idea!

The FingerPoint proposal

Toby Inkster, co inventor of foaf+ssl, authored the fingerpoint proposal, which avoids the problems outlined above.

Fingerpoint defines one useful relation sparql:fingerpoint relation (available at the namespace of the relation of course, as all good linked data should), and is defined as

sparql:fingerpoint
	a owl:ObjectProperty ;
	rdfs:label "fingerpoint" ;
	rdfs:comment """A link from a Root Document to an Endpoint Document 
                        capable of returning information about people having 
                        e-mail addresses at the associated domain.""" ;
	rdfs:subPropertyOf sparql:endpoint ;
	rdfs:domain sparql:RootDocument .
It is then possible to have the root page link to a SPARQL endpoint that can be used to query very flexibily for information. Because the link is defined semantically there are a number of ways to point to the sparql endpoint:
  • Using the up and coming HTTP-Link HTTP header,
  • Using the well tried html <link> element.
  • Using RDFa embedded in the html of the page
  • By having the home page return any other represenation that may be popular or not, such as rdf/xml, N3, or XRD...
Toby does not mention those last two options in his spec, but the beauty of defining things semantically is that one is open to such possibilities from the start.

So Toby gets more power as the WebFinger proposal, by only inventing 1 new relation! All the rest is already defined by existing standards.

The only problem one can see with this is that SPARQL, though not that difficult to learn, is perhaps a bit too powerful for what is needed. You can really ask anything of a SPARQL endpoint!

A possible intermediary proposal: semantic forms

What is really going on here? Let us think in simple HTML terms, and forget about machine readable data a bit. If this were done for a human being, what we really would want is a page that looks like the webfinger.org site, which currently is just one query box and a search button (just like Google's front page). Let me reproduce this here:

Here is the html for this form as its purest, without styling:

     <form  action='/lookup' method='GET'>
         <img src='http://webfinger.org/images/finger.png' />
         <input name='email' type='text' value='' />         
         <button type='submit' value='Look Up'>Look Up</button>
     </form>

What we want is some way to make it clear to a robot, that the above form somehow maps into the following SPARQL query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?homepage
WHERE {
   [] foaf:mbox ?email;
      foaf:homepage ?homepage
}

Perhaps this could be done with something as simple as an RDFa extension such as:

     <form  action='/lookup' method='GET'>
         <img src='http://webfinger.org/images/finger.png' />
         <input name='email' type='text' value='' />         
         <button type='submit' value='homepage' 
                sparql='PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
                 GET ?homepage
                 WHERE {
                   [] foaf:mbox ?email;
                      foaf:homepage ?homepage
                 }">Look Up</button>
     </form>

When the user (or robot) presses the form, the page he ends up on is the result of the SPARQL query where the values of the form variables have been replaced by the identically named variables in the SPARQL query. So if I entered henry.story@sun.com in the form, I would end up on the page http://sun.com/lookup?email=henry.story@sun.com, which could perhaps just be a redirect to this blog page... This would then be the answer to the SPARQL query

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?homepage
WHERE {
   [] foaf:mbox "henry.story@bblfish.net";
      foaf:homepage ?homepage
}
(note: that would be wrong as far as the definition of foaf:mbox goes, which relates a person to an mbox, not a string... but let us pass on this detail for the moment)

Here we would be defining a new GET method in SPARQL, which find the type of web page that the post would end up landing on: namely a page that is the homepage of whoever's email address we have.

The nice thing about this is that as with Toby Inkster's proposal we would only need one new relation from the home page to such a finder page, and once such a sparql form mapping mechanism is defined, it could be used in many other ways too, so that it would make sense for people to learn it. For example it could be useful to make web sites available to shopping agents, as I had started thinking about in RESTful semantic web services before RDFa was out.

But most of all, something along these lines, would allow services to have a very simple CGI to answer such a query, without needing to invest in a full blown SPARQL query engine. At the same time it makes the mapping to the semantics of the form very clear. Perhaps someone has a solution to do this already. Perhaps there is a better way of doing it. But it is along these lines that I would be looking for a solution...

(See also an earlier post of mine SPARQLing AltaVista: the meaning of forms)

How this relates to OpenId and foaf+ssl

One of the key use cases for such a Web Finger comes from the difficulty people have of thinking of URLs as identifiers of people. Such a WebFinger proposal if successful, would allow people to type in their email address into an OpenId login box, and from there the Relying Party (the server that the user wants to log into), could find their homepage (usually the same as their OpenId page), and from there find their FOAF description (see "FOAF and OpenID").

Of course this user interface problem does not come up with foaf+ssl, because by using client side certificates, foaf+ssl does not require the user to remember his WebID. The browser does that for him - it's built in.

Nevertheless it is good that OpenId is creating the need for such a service. It is a good idea, and could be very useful even for foaf+ssl, but for different reasons: making it easy to help people find someone's foaf file from the email address could have many very neat applications, if only for enhancing email clients in interesting new ways.

Updates

It was remarked in the comments to this post that the format for the /host-meta format is now XRD. So that removes one criticism of the first proposal. I wonder how flexible XRD is now. Can it express everything RDF/XML can? Does it have a GRDDL?

Comments:

I've got a small demo of Fingerpoint here:

http://buzzword.org.uk/2009/fingerpoint/

It includes an example e-mail address you can query. If you do a little detective work on how querying that e-mail address works, you'll find that the server providing data for it uses entirely static files - it doesn't run its own SPARQL endpoint, but instead relies on a third-party endpoint provided by sparql.org.

Posted by Toby Inkster on November 29, 2009 at 03:30 PM CET #

[Trackback] Web finger proposals overview - http://blogs.sun.com/bblfish/entry/web_finger_proposals_overview

Posted by karlpro on November 29, 2009 at 04:41 PM CET #

[Trackback] This post was mentioned on Twitter by Henry Story: New review of #webfinger proposals, how they relate to #openid, #foaf+ssl, #rdfa, #sparql, and a new rdf forms idea: http://is.gd/572Wa

Posted by uberVU - social comments on November 29, 2009 at 04:43 PM CET #

Hi Henry - as far as I remember, from Blaine Cook's (http://romeda.org/blog/) session on Webfinger at the recent IIW, the reason for the initial pointer being in a 'well-known location' (http://example.com/.well-known/host-meta in the current Webfinger spec at http://code.google.com/p/webfinger/wiki/WebFingerProtocol) is that folks like Google count every byte on the home page, and would not add stuff that would be retrieved in every response when it is only useful to a tiny fraction of requesters. Oh - and the current spec has an XRD file there, rather than application/host-meta.

Posted by Pat Patterson on November 29, 2009 at 06:14 PM CET #

On the html form -> sparql thing, I've been extending rdfa for a while like this:

http://github.com/shellac/java-rdfa/blob/master/src/test/resources/query-tests/1.html

which results in the query:

http://github.com/shellac/java-rdfa/blob/master/src/test/resources/query-tests/1.rq

(That's a test, but hopefully you can see how it works for more complex cases).

Posted by Damian on November 30, 2009 at 08:03 AM CET #

I just found the interesting work by Michael Hausenblas on RDF Forms "an (X)HTML form form decorated with RDFa using the RDForms vocabulary"

http://esw.w3.org/topic/PushBackDataToLegacySourcesRDForms

Posted by Henry Story on November 30, 2009 at 08:21 AM CET #

Agree on the application/host-meta format question: XRDS is powering discovery for a huge number of OpenIDs now (among other things), is well known and has stable libraries available for processing the metadata.

In the OpenID community, we've wrestled with the email->URL conversion problem. There have been various conventions proposed like yours above (e.g. "johndoe@domain.com" => http://openid.domain.com/johndoe" as a conversion method) but all of these are challenged; a "standard" is unlikely to be embraced any time soon on that, and even a convention requires OpenID providers to "normalize" their web apps to support the "normalized OpenID URLs" -- great if it happens, but don't hold your breath.

In any case, I'm confused by the webfinger proposal, especially as it's from Brad Fitzpatrick (the original OpenID) guy, because if you have an OpenID URL, you have everything you need to "finger" that user. I know the emphasis is on email->URL conversion here, but once you have the URL, you are just one fetch away from an XRDS, which can be loaded up with all the metadata one likes -- homepage, FOAF info, RDF, service endpoints, etc.

Instead of the /host-meta, why not adopt the convention of treating the domain itself as an endpoint for an XRDS that has the metadata containing the pattern for email->URL patterning? That is, just do a fetch for an XRDS on "sun.com", and get the "root XRDS ", which contains (among other things) all that info, how to go from "henry.story@sun.com" to the appropriate OpenID URL?

That still requires adding an XRDS for the domain itself, but leverages all the machinery already in place for OpenID, and would provide all the needed flexibility for mapping emails to URLs for that domain.

Posted by Michael Graves on November 30, 2009 at 09:45 AM CET #

Thanks for the write-up Henry!

Others have commented on the wide deployment of XRD, and that host-meta doesn't have a separate format. The other thing to keep in mind is that the XRD can point to other URLs. I've been using Microformats, but it would be easy to point to a SPARQL or FOAF endpoint if you prefer.

I wonder if foaf+ssl could be enhanced by Webfinger; one of the problems with browser-centric identity approaches is that when the user isn't using their primary browser, they're stuck. With Webfinger, the location of their SSL certificate could be discoverable. Comments aren't a great space to discuss the details and issues, but it's maybe worth some thought.

To confirm Pat's comment, the well-known location is definitely about traffic concerns – neither Google nor Yahoo are able to add any more content (headers or otherwise) to their homepages. The alternative is to use OPTIONS, but there are a number of major issues with that approach, so the consensus thus far is to build on the success of robots.txt and favicon.ico. It's not ideal, but real deployment issues mean that for this stuff to get adopted, we need to do our best to work around constraints.

Michael: your third paragraph is exactly what Webfinger does! We can't use the root itself because of deployment constraints, but otherwise that's exactly what's happening.

The reason to use email addresses rather than URLs is because URLs have inconsistent formatting – sometimes they look like "http://service.com/username", but more often they look like "http://www.google.com/search?q=blah&hl=en&ie=UTF-8&oe=UTF-8&rls=en&client=firefox"; for OpenID, the use of URLs is a major barrier to adoption. My hope is that Webfinger will help much more than OpenID, but it's a good place to start.

Posted by Blaine Cook on November 30, 2009 at 08:12 PM CET #

Hi Blain,

thanks for taking the time to comment here!

[[
I wonder if foaf+ssl could be enhanced by Webfinger; one of the problems with browser-centric identity approaches is that when the user isn't using their primary browser, they're stuck.
]]

That is not really a problem. It is cheap to produce certificates the foaf+ssl way, so a client just has to go to his WebID provider, and there he can get a new certificate with one click (using the keygen element in the browser). Imagine that LinkedIn becomes a WebID provider, and you are at an internet café. Then you would go to your LinkedIn page - I think most people can remember that, now that they have given people vanity URLs ( mine is http://linkedin.com/bblfish ). LinkedIn can help people find their ID as they do now by for example asking people to enter their cell phone number, and then sending them an SMS with a reminder of their vanity URL and a one time password. With this they can then get a time limited certificate for the internet café they are staying at. More on this at:

http://esw.w3.org/topic/foaf%2Bssl/FAQ

But in a sense that scenario is a supporting case for a machine webfinger proposal I suggested above as the same human readable form, can be then made machine readable.

[[
...the well-known location is definitely about traffic concerns – neither Google nor Yahoo are able to add any more content (headers or otherwise) to their homepages.
]]

I know the problem well Blain. It was a major concern when I worked at AltaVista. Every byte was counted on those pages, as they have extremely high traffic, being often used as the default browser page. In fact at AV we rarely changed the logos. It was a shock for us to see Google changing the logo once a day in 1998, and having quite a large one too, for that matter. But I think web caches had improved by that time, so that Google could be sure that the images would be cached in every cache in the world very quickly. So the same will be true of that home page.

But if you look carefully, there seems to be one place where they could put that information, namely in the same place as the Google About page. Every site seems to have a page like that.

Or they could have a page listing all the different services they have. So if one of those pages could be repurposed, and marked up with rdfa, or served up in a different representation with an XML format,... then... Perhaps just a link from the html page to an 'alternative' representation of that home page would be enough.

But yes, the examples of favicon and robots.txt exists, though I do feel those are a mistake.

Btw. Is there a GRDDL in the works for XRD?

Posted by Henry Story on December 01, 2009 at 12:34 AM CET #

Coming from outside the Semantic Web community, my intuition is that the use of SPARQL is a non-starter for broad adoption: it's too little-known and too complex.

Yes, from your standpoint "Toby gets more power as the WebFinger proposal, by only inventing 1 new relation! All the rest is already defined by existing standards." But as most Web developers aren't familiar with these standards, their reaction would be that fingerpoint introduces a bunch of weird buzzwords and a scary query syntax.

Me, I have a passing familiarity with RDF and am willing to use it, but SPARQL looks quite intimidating, and I would presumably have to find a custom server for it and build and deploy that; this significantly dampens my enthusiasm. I much prefer the WebFinger approach of plugging the email address into a simple HTTP query and getting back data (I'd prefer it were in FOAF format, but whatever.)

Posted by Jens Alfke on December 07, 2009 at 04:09 PM CET #

In reply to Jens: if you're running a small site with no more than 20 to 30 users, Fingerpoint will work pretty well using plain old static files - not even any server-side scripting needed. You just need to hook your data into a public third-party SPARQL endpoint, such as sparql.org's.

If you take a look at http://fingerpoint.tobyinkster.co.uk/ there's an example server set up like that. It just includes a plain old HTML LINK element which links to sparql.org, passing the server's data file (http://fingerpoint.tobyinkster.co.uk/data.ttl) as a query string parameter. Let sparql.org take care of running a SPARQL server for you! There's an online query form at http://buzzword.org.uk/2009/fingerpoint/ - if you use that form to finger "somebody@fingerpoint.tobyinkster.co.uk" then you should see that it works, and reasonably quickly too.

If you're operating a larger site, with dozens, hundreds, thousands or millions of users, then you're realistically going to want to run your own SPARQL server. But the ARC2 library for PHP makes it pretty easy to run one using off-the-shelf LAMP hosting.

In answer to Blaine & Pat, my initial tests seem to suggest that Fingerpoint uses significantly \*less\* bandwidth than Webfinger, when an actual finger request is being made. But you're right that it adds a header to every HEAD request to the root path, whether the client is attempting a finger request or not. (You could theoretically omit it from GET requests, but I don't know how in keeping that would be with the spirit of HTTP.)

I don't really know of a way around this without either requiring clients to make oddball requests (either an unusual HTTP verb or a special request header) or forcing the use of well-known URIs. It's a question of which method is the least evil. The "/.well-known/" proposal goes a long way to making well-known URIs less evil, but I would still say that there's a trace of evilness there.

DNS might provide an answer.

Posted by Toby Inkster on December 11, 2009 at 04:56 AM CET #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today