Thursday Sep 04, 2008

Building Secure, Open and Distributed Social Network Applications

Current Social Networks don't allow you to have friends outside their network. When on Facebook, you can't point to your friend on LinkedIn. They are data silos. This audio enhanced slide show explains how a distributed decentralized social network is being built, how it works, and how to make is secure using the foaf+ssl protocol (a list of pointers on the esw wiki).

It is licenced under a CC Attribution ShareAlike Licence.
My voice is a bit odd on the first slide, but it gets better I think as I go along.

Building Secure Open & Distributed Social Networks( Viewing this slide show requires a flash plugin. Sorry I only remembered this limitation after having put it online. If you know of a good Java substitute let me know. The other solution would have been to use Slidy. PDF and Annotated Open Document Format versions of this presentation are available below. (why is this text visible in Firefox even when the plugin works?) )

This is the presentation I gave at JavaOne 2008 and at numerous other venues in the past four months.

The slidecast works a lot better as a presentation format, than my previous semantic web video RDF: Connecting Software and People which I published as a h.264 video over a couple of years ago, and which takes close to 64MB of disk space. The problem with that format is that it is not easy to skip through the slides to the ones that interest you, or to go back and listen to a passage carefully again. Or at least it feels very clunky. My mp3 sound file only takes 17MB of space in comparison, and the graphics are much better quality in this slide show.

It is hosted by the excellent slideshare service, which translated my OpenOffice odp document ( once they were cleaned up a little: I had to make sure it had no pointers to local files remaining accessible from the Edit>Links menu (which otherwise choked their service)). I used the Audacity sound editor to create the mp3 file which I then place on my bblfish.net server. Syncing the sound and the slides was then very easy using SlideShare's SlideCast application. I found that the quality of the slides was a lot better once I had created an account on their servers. The only thing missing would be a button in addition to the forward and backward button that would allow one to show the text of the audio, for people with hearing problems - something equivalent to the Notes view in Open Office.

You can download the OpenOffice Presentation which contains my notes for each slide and the PDF created from it too. These are all published under a Creative Commons Attribution, Share Alike license. If you would like some of the base material for the slides, please contact me. If you would like to present them in my absence feel free to.

Thursday Apr 17, 2008

semantic camp paris

picture of Karima Rafes

A couple of weeks ago I attended the second Semantic Bar Camp which took place at the Orange research labs at Issy les Moulineaux, near Paris. This was a great opportunity to meet many of the French researchers in the Semantic Web space, to take part in the French debate, and to help convince interested parties of the reality of the technology.

Jean Rohmer of the large French defense group Thales played the role of the devil's advocate, arguing that the Semantic Web was just pie in the sky theory without practical applications. We delved into various aspects of the theory of the Semantic Web, and I underlined how the biological/evolutionary aspect of language, the Academie Francaise notwithstanding, was a key aspect in understanding the evolution of the web of data. But the best argument was a simple demonstration of the Beatnik Address Book, which showed how hyperdata could solve the serious problem of 2008: the growing number of closed social networks. At the next camp I hope we will be able to delve much more deeply into how to build real practical applications.

Many thanks to Karima Rafes for organizing this well attended bar camp ( pictures ). Stephane Lauriere from XWiki and who is on the Nepomuk Semantic Desktop project, also posted some photos. And I would like to recommend Alexandre Passant's blog to all french speaking readers.

Update

The talk I gave is now available online with audio as "Building Secure, Open and Distributed Social Network Applications".

Friday Mar 28, 2008

RDFAuth: sketch of a buzzword compliant authentication protocol

Here is a proposal for an authentication scheme that is even simpler than OpenId ( see sequence diagram ), more secure, more RESTful, with fewer points of failure and fewer points of control, that is needed in order to make Open Distributed Social Networks with privacy controls possible.

Update

The following sketch led to the even simpler protocol described in Foaf and SSL creating a global decentralized authentication protocol. It is very close to what is proposed here but builds very closely on SSL, so as to reduce what is new down to nearly nothing.

Background

Ok, so now I have your attention, I would like to first mention that I am a great fan of OpenId. I have blogged about it numerous times and enthusiastically in this space. I came across the idea I will develop below, not because I thought OpenId needed improving, but because I have chosen to follow some very strict architectural guidelines: it had to satisfy RESTful, Resource oriented hyperdata constraints. With the Beatnik Address Book I have proven - to myself at least - that the creation of an Open Distributed Social Network (a hot topic at the moment, see the Economist's recent article on Online social network) is feasible and easy to do. What was missing is a way for people to keep some privacy, clearly a big selling point for the large Social Network Providers such as Facebook. So I went on the search of a solution to create a Open Distributed Social Network with privacy controls. And initially I had thought of using OpenId.

OpenId Limitations

But OpenId has a few problems:

  • First it is really designed to work with the limitations of current web browsers. It is partly because of this that there is a lot of hopping around from the service to the Identity Provider with HTTP redirects. As the Tabulator, Knowee or Beatnik.
  • Parts of OpenId 2, and especially the Attribute Exchange spec really don't feel very RESTful. There is a method for PUTing new property values in a database and a way to remove them that does not use either the HTTP PUT method or the DELETE method.
  • The OpenId Attribute Exchange is nice but not very flexible. It can keep some basic information about a person, but it does not make use of hyperdata. And the way it is set up, it would only be able to do so with great difficulty. A RESTfully published foaf file can give the same information, is a lot more flexible and extensible, whilst also making use of Linked Data, and as it happens also solves the Social Network Data Silo problems. Just that!
  • OpenId requires an Identity Server. There are a couple of problems with this:
    • This server provides a Dynamic service but not a RESTful one. Ie. the representations sent back and forth to it, cannot be cached.
    • The service is a control point. Anyone owning such a service will know which sites you authenticate onto. True, you can set up your own service, but that is clearly not what is happening. The big players are offering their customers OpenIds tied to particular authentication servers, and that is what most people will accept.
As I found out by developing what I am here calling RDFAuth, for want of a better name, none of these restrictions are necessary.

RDFAuth, a sketch

So following my strict architectural guidelines, I came across what I am just calling RDFAuth, but like everything else here this is a sketch and open to change. I am not a security specialist nor an HTTP specialist. I am like someone who comes to an architect in order to build a house on some land he has, with some sketch of what he would like the house to look like, some ideas of what functionality he needs and what the price he is willing to pay is. What I want here is something very simple, that can be made to work with a few perl scripts.

Let me first present the actors and the resources they wish to act upon.

  • Romeo has a Semantic Web Address Book, his User Agent (UA). He is looking for the whereabouts of Juliette.
  • Juliette has a URL identifier ( as I do ) which returns a public foaf representation and links to a protected resource.
  • The protected resource contains information she only wants some people to know, in this instance Romeo. It contains information as to her current whereabouts.
  • Romeo also has a public foaf file. He may have a protected one too, but it does not make an entrance in this scene of the play. His public foaf file links to a public PGP key. I described how that is done in Cryptographic Web of Trust.
  • Romeo's Public key is RESTfully stored on a server somewhere, accessible by URL.

So Romeo wants to find out where Juliette is, but Juliette only wants to reveal this to Romeo. Juliette has told her server to only allow Romeo, identified by his URL, to view the site. She could have also have had a more open policy, allowing any of her or Romeo's friends to have access to this site, as specified by their foaf file. The server could then crawl their respective foaf files at regular intervals to see if it needed to add anyone to the list of people having access to the site. This is what the DIG group did in conjunction with OpenId. Juliette could also have a policy that decides Just In Time, as the person presents herself, whether or not to grant them access. She could use the information in that person's foaf file and relating it to some trust metric to make her decision. How Juliette specifies who gets access to the protected resource here is not part of this protocol. This is completely up to Juliette and the policies she chooses her agent to follow.

So here is the sketch of the sequence of requests and responses.

  1. First Romeo's user Agent knows that Juliette's foaf name is http://juliette.org/#juliette so it sends an HTTP GET request to Juliette's foaf file located of course at http://juliette.org/
    The server responds with a public foaf file containing a link to the protected resource perhaps with the N3
      <> rdfs:seeAlso <protected/juliette> .
    
    Perhaps this could also contain some relations describing that resource as protected, which groups may access it, etc... but that is not necessary.
  2. Romeo's User Agent then decides it wants to check out protected/juliette. It sends a GET request to that resource but this time receives a variation of the Basic Authentication Scheme, perhaps something like:
    HTTP/1.0 401 UNAUTHORIZED
    Server: Knowee/0.4
    Date: Sat, 1 Apr 2008 10:18:15 GMT
    WWW-Authenticate: RdfAuth realm="http://juliette.org/protected/\*" nonce="ILoveYouToo"
    
    The idea is that Juliette's server returns a nonce (in order to avoid replay attacks), and a realm over which this protection will be valid. But I am really making this up here. Better ideas are welcome.
  3. Romeo's web agent then encrypts some string (the realm?) and the nonce with Romeo's private key. Only an agent trusted by Romeo can do this.
  4. The User Agent then sends a new GET request with the encrypted string, and his identifier, perhaps something like this
    GET /protected/juliette HTTP/1.0
    Host: juliette.org
    Authorization: RdfAuth id="http://romeo.name/#romeo" key="THE_REALM_AND_NONCE_ENCRYPTED"
    Content-Type: application/rdf+xml, text/rdf+n3
    
    Since we need an identifier, why not just use Romeos' foaf name? It happens to also point to his foaf file. All the better.
  5. Because Juliette's web server can then use Romeo's foaf name to GET his public foaf file, which contains a link to his public key, as explained in "Cryptographic Web of Trust".
  6. Juliette's web server can then query the returned representation, perhaps meshed with some other information in its database, with something equivalent to the following SPARQL query
    PREFIX wot: <http://xmlns.com/wot/0.1/>
    SELECT ?pgp
    WHERE {
         [] wot:identity <http://romeo.name/#romeo>;
            wot:pubkeyAddress ?pgp .
    } 
    
    The nice thing about working at the semantic layer, is that it decouples the spec a lot from the representation returned. Of course as usage grows those representations that are understood by the most servers will create a de facto convention. Intially I suggest using RDF/XML of course. But it could just as well be N3, RDFa, perhaps even some microformat dialect, or even some GRDDLable XML, as the POWDER working group is proposing to do.
  7. Having found the URL of the PGP key, Juliette's server, can GET it - and as with much else in this protocol cache it for future use.
  8. Having the PGP key, Juliette's server can now decrypt the encrypted string sent to her by Romeo's User Agent. If the decrypted string matches the expected string, Juliette will know that the User Agent has access to Romeo's private key. So she decides this is enough to trust it.
  9. As a result Juliette's server returns the protected representation.
Now Romeo's User Agent knows where Juliette is, displays it, and Romeo rushes off to see her.

Advantages

It should be clear from the sketch what the numerous advantages of this system are over OpenId. (I can't speak of other authentication services as I am not a security expert).

  • The User Agent has no redirects to follow. In the above example it needs to request one resource http://juliette.org/ twice (2 and 4) but that may only be necessary the first time it accesses this resource. The second time the UA can immediately jump to step 3. [but see problem with replay attacks raised in the comments by Ed Davies, and my reply] Furthermore it may be possible - this is a question to HTTP specialists - to merge step 1 and 2. Would it be possible for a request 1. to return a 20x code with the public representation, plus a WWWAuthenticate header, suggesting that the UA can get a more detailed representation of the same resource if authenticated? In any case the redirect rigmarole of OpenId, which is really there to overcome the limitations of current web browsers, in not needed.
  • There is no need for an Attribute Exchange type service. Foaf deals with that in a clear and extensible RESTful manner. This simplifies the spec dramatically.
  • There is no need for an identity server, so one less point of failure, and one less point of control in the system. The public key plays that role in a clean and simple manner
  • The whole protocol is RESTful. This means that all representations can be cached, meaning that steps 5 and 7 need only occur once per individual.
  • As RDF is built for extensibility, and we are being architecturally very clean, the system should be able to grow cleanly.

Contributions

I have been quietly exploring these ideas on the foaf and semantic web mailing lists, where I received a lot of excellent suggestions and feedback.

Finally

So I suppose I am now looking for feedback from a wider community. PGP experts, security experts, REST and HTTP experts, semantic web and linked data experts, only you can help this get somewhere. I will never have the time to learn these fields in enough detail by myself. In any case all this is absolutely obviously simple, and so completely unpatentable :-)

Thanks for taking the time to read this

Wednesday Mar 05, 2008

Opening Sesame with Networked Graphs

Simon Schenk just recently gave me an update to his Networked Graphs library for the Sesame RDF Framework. Even though it is in early alpha state the jars have already worked wonders on my Beatnik Address Book. With four simple SPARQL rules I have been able to tie together most of the loose ends that appear between foaf files, as each one often uses different ways to refer to the same individual.

Why inferencing is needed

So for example in my foaf file I link to Simon Phipps- Sun's very popular Open Source Officer - with the following N3:

 :me foaf:knows   [ a foaf:Person;
                    foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                    foaf:name "Simon Phipps";
                    foaf:homepage <http://www.webmink.net/>;
                    rdfs:seeAlso <http://www.webmink.net/foaf.rdf>;
                  ] .
For those who still don't know N3 (where have you been hiding?) this says that I know a foaf:Person named "Simon Phipps" whose homepage is specified and for which more information can be found at the http://www.webmink.net/foaf.rdf rdf file. Now the problem is that the person in question is identified by a '[' which represents a blank node. Ie we don't have a name (URI) for Simon. So when the Beatnik Address Book gets Simon's foaf file, by following the rdfs:seeAlso relation, it gets among others something like
[] a foaf:Person;
   foaf:name "Simon Phipps";
   foaf:nick "webmink";
   foaf:homepage </>;
   foaf:knows [ a foaf:Person;
                foaf:homepage <http://www.buzzword-compliant.com/>;
                rdfs:seeAlso <http://www.buzzword-compliant.com/foaf.rdf>;
             ] .
This file then contains at least two people. Which one is the same person? Well a human being would guess that the person named "Simon Phipps" is the same in both cases. Networked Graphs helps Beatnik make a similar guess by noting that the foaf:homepage relation is an owl:InverseFunctionalProperty.

Some simple rules

After downloading Simon Phipps's foaf file and mine and placing the relations found in them in their own Named Graph, we can in Sesame 2.0 create a merged view of both these graphs just by creating a graph that is the union of the triples of each .

The Networked Graph layer can then do some interesting inferencing by defining a graph with the following SPARQL rules

#foaf:homepage is inverse functional
grph: ng:definedBy """
  CONSTRUCT { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .  } 
  WHERE { 
       ?a <http://xmlns.com/foaf/0.1/homepage> ?pg .
       ?b <http://xmlns.com/foaf/0.1/homepage> ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 } """\^\^ng:Query .
This is simply saying that if two names for things have the same homepage, then these two names refer to the same thing. I could be more general by writing rules at the owl level, but those would be but more complicated, and I just wanted to test out the Networked Graph sail to start with. So the above will add a bunch of owl:sameAs relations to our NetworkedGraph view on the Sesame database.

The following two rules then just complete the information.

# owl:sameAs is symmetric
#if a = b then b = a 
grph: ng:definedBy """
  CONSTRUCT { ?b <http://www.w3.org/2002/07/owl#sameAs> ?a . } 
  WHERE { 
     ?a <http://www.w3.org/2002/07/owl#sameAs> ?b . 
     FILTER ( ! SAMETERM(?a , ?b) )   
  } """\^\^ng:Query .

# indiscernability of identicals
#two identical things have all the same properties
grph: ng:definedBy """
  CONSTRUCT { ?b ?rel ?c . } 
  WHERE { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .
          ?a ?rel ?c . 
     FILTER ( ! SAMETERM(?rel , <http://www.w3.org/2002/07/owl#sameAs>) )   
  } """\^\^ng:Query .
They make sure that when two things are found to be the same, they have the same properties. I think these two rules should probably be hard coded in the database itself, as they seem so fundamental to reasoning that there must be some very serious optimizations available.

Advanced rules

Anyway the above illustrates just how simple it is to write some very clear inferencing rules. Those are just the simplest that I have bothered to write at present. Networked Graphs allows one to write much more interesting rules, which should help me solve the problems I explained in "Beatnik: change your mind" where I argued that even a simple client application like an address book needs to be able to make judgements on the quality of information. Networked Graphs would allow one to write rules that would amount to "only believe consequences of statements written by people you trust a lot". Perhaps this could be expressed in SPARQL as

CONSTRUCT { ?subject  ?relation ?object . }
WHERE {
    ?g tr:trustlevel ?tl .
    GRAPH ?g { ?subject ?relation ?object . }
    FILTER ( ?tl > 0.5 )
}
Going from the above it is easy to start imagining very interesting uses of Networked Graph rules. For example we may want to classify some ontologies as trusted and only do reasoning on relations over those ontologies. The inverse functional rule could then be generalized to
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX : <https://sommer.dev.java.net/ontologies/beatnik#>

  CONSTRUCT { ?a owl:sameAs ?b .  } 
  WHERE { 
      GRAPH ?g { ?inverseFunc a owl:InverseFunctionalProperty . }
      ?g a :TrustedOntology .

       ?a ?inverseFunc ?pg .
       ?b ?inverseFunc ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 }

Building the So(m)mer Address Book

I will be trying these out later. But for the moment you can already see the difference inferencing brings to an application by downloading the Address Book from subversion at sommer.dev.java.net and running the following commands (leave the password to the svn checkout blank)


> svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer --username guest
> cd sommer
> ant jar
> cd misc/AddressBook/
> ant run
Then you can just drag and drop the foaf file on this page into the address book, and follow the distributed social network by pressing the space bar to get foaf files. To enable inferencing you currently need to set it in the File>Toggle Rules menu. You will see things coming together suddenly when inferencing is on.

There are still a lot of bugs in this software. But you are welcome to post bug reports, or help out in any way you can.

Where this is leading

Going further it seems to me clear that Networked Graphs is starting to realise what Guha, one of the pioneers of the semantic web, wrote about in this thesis "Contexts: A Formalization and Some Applications", which I wrote a short note on Keeping track of Context in Life and on the Web a couple of years ago. That really helped me get a better understanding of the possibilities of the semantic web.

Monday Feb 25, 2008

Semantic Bar Camp London and Flue

Last Saturday early early morning I took the train to London to go to the weekend Semantic Bar Camp that was held at Imperial College, in the computer science department I studied in. I arrived, late, because I had missed the train in Paris by one minute, and so missed getting an overview of the event. On arrival I was asked to put my name down for a presentation and stick the paper on the board on the first empty slot available. 15 minutes later I improvised a talk on Linked Data. I did not realize that there were a lot of microformats people in the audience with little semantic web experience, so I did not take care enough to lay some important foundations, and show how microformats information should be able to work well with information in an RDF database [1]. I demonstrated the Beatnik Address Book and gave an overview of why this was now filling a really important gap, enabling distributed social networks, a topic on which I have written a lot recently. It inspired Dan Brickley who has been working on SPARQL over XMPP to give me some code and show how this could be integrated into Beatnik... It seems pretty easy to do. What would the use case be though...

There were a number of very interesting talks over the weekend. Daniel Lewis collected a few of the blogs covering the event. Ian Davis presented the work he has been leading on Open Data Licences (pic). Yves Raimond and his team presented some interesting work on semantics and music and an advanced inferencing engine based on SWI Prolog called Henry (picture). Tom Shelley from the Economist got us all asking questions on the pros and cons of personal knowledge in a short presentation (picture). The more information is known on us the better services can be offered, but also what are the risks? Is this not a reason one may end up needing agent technology: ie one may prefer programs to move rather than data to move? Georgi Kobilarov gave a nice overview of the very useful Linked Data project DBPedia (picture)...

All during the weekend I felt very tired which I put down for a while to the trip from Paris. On Monday morning as my condition had gotten much worse it became clear that that I had caught a virus. For two days I could hardly get out of bed, struck by a vicious flue, which has only just left me today. On Friday I was too tired to do any thinking work, so I went to see the Du Champ, Man Ray and Picabia exhibition at the Tate Modern, where you can see Du Champ's irreverent rendition of the Mona Lisa - below the picture are written the letters "L.H.O.O.Q" which if pronounced speedily enough sounds like "Elle a chaud au cul".

Notes

  1. All I need is some XSLT or Xquery transform to turn microformatted html into RDF (any well known format will do). Mind you, at a later microformat talks it turns out that this may not be quite so easy, as it seems that that the microformat community has not yet agreed on a clear grammar...

Update

The talk I gave is now available online as "Building Secure, Open and Distributed Social Network Applications".

Friday Feb 01, 2008

3 semantic web talks for JavaOne 2008

At least 3 semantic web talks were accepted for JavaOne 2008, taking place on May 6-9 in San Francisco. There may be more, but the following I am sure of:

  • A talk by Dean Allemang on practical ontology writing based on his soon to be published book "The Working Ontologist". I am really looking forward to it coming out, as it is a book that should help cut down the learning curve dramatically.
  • Über programmer Tim Boudreau and I will be presenting Beatnik: Building an Open Social Network Browser at a Birds of a Feather session. We will look at both the client and server side components and how the theory developed by Dean can turn into a practical product that solves real problems: the data silo effect of current social networking sites.
  • Finally some key players will be joining the "Developing Semantic Web Applications on the Java™ Platform" panel where we will hopefully start a discussion and get feedback on what can be done to bring many many more of the 5 million Java developers on board the semantic web. This panel discussion ( the list of panelists is not complete yet ) will be hosted by Rob Frost of BEA and I.

Hopefully this should allow the 20 thousand or so attendees joining us at JavaOne to get a good overview of the the practical developments in this area. And if they like it, the Semantic Conference in San Jose will be taking place a week later from the 18th to the 22nd of May where they will be able meet many of the leading companies and researchers in this area.

For detailed session information see my later post.

Sunday Jan 06, 2008

2008: The Rise of Linked Data

Here is my one prediction for 2008. Social Networking's breakdown will lead to the rise of Linked Data. Here is the logic:

  1. Social Networking sites have grown tremendously over the last few years fuelled by huge profits from advertising dollars. When I worked at AltaVista it was well known that the more you knew about your users the more valuable an ad became. If you know all the friends, interests, habits of someone, and you know what they are doing right now, you can suggest exactly the right product at the right time to them. The cost of a simple add on AltaVista was $5 per thousand page views. If you knew a lot about what someone was looking for the value could go up to $50.
  2. The allure of profit is leading to an ever increasing number of players in this space. See the Social Networking 3.0 talk at Stanford earlier in 2007.
  3. This in turn leads to a fracturing of the Social Networking space. As more players enter the space, each ends up with a smaller and partial view of the whole graph or social relations.
  4. Which is leading to the need for Social Network Portability, and more generally Data Portability. Users such as Scoble want to use their data on their own computer and link it together. Social Network Providers such as Plaxo or Facebook have a financial interest in helping their users move with their social network to their service. Facebook helps users extract all the information from GMail. Plaxo wants to help users extract all the information from every other social network.
  5. Privacy concerns will mount tremendously as a result. Each social network will increase in their users the fear of giving their data over to other "spamming" services, to defend their position. But to do this they will make it more and more difficult to extract the data from their service, annoying and so going against their users desires for linking their information. This will seem more and more like an issue for anti trust involvement as the ire of more and more people mount.

The force of the above logic will release the energy needed for an investment in Linked Data tools such as Beatnik, since it solves all the problems mentioned above - at the expense of killing the dream some investors may have had of a world where they own Nineteen Eighty Four like, the world.

Data Portability: Scoble Right or Wrong and beyond

Scoble explains Video

In this video Scoble explains how he got thrown off Facebook.

Here is a short summary, but the video is well worth watching as the emotions come through much better...
Facebook, which asks its users for their Gmail password in order to extract all the contacts someone has from their mail history and build up a possible list of friends, Facebook which scans the web for information to suggests friendships you may have, that same Facebook does not want anyone, including YOU, to be able to extract the data in your account on their web site even were it only into your own electronic address book. To do this they encode all email addresses as images which make it very difficult for a computer to decode, and so makes it tedious to move and use that information. So when Scoble tried to extract his 5000 friends using Optical Character Recognition - an idea suggested by Plaxo which wants to be a hub of people information - , Facebook noticed this and cut off his account. (I think he may have been reinstated now - but whether there is a point in belonging to such a service is a serious question now).As a result Scoble and other have asked people to join the conversation on the Data Portability group.

This clearly is a very important issue. But his solution to the issue was not the best one. By using Plaxo - which wants to be the social graph hub of the web - to extract his data, he would have been able to do what clearly he should be able to do, namely add his contact information easily to Outlook. But he did this at the cost of allowing a third entity to gather a lot of information about him and his contacts. CNET's The Scoble scuffle: Facebook, Plaxo at odds over data portability, touches on the issue. Allowing a third service provider to extract all your data in order to give you access to it, is not improving your freedom. It is just giving another commercial entity access to a huge network of information about you. And the more a company knows about its users the more valuable the advertising its sells becomes. There is no mystery here as to why Social Networking sites have had so much money pumped into them over the last few years. So you have jumped out of the frying pan right into the fire here. Clearly if you are concerned with security of your information - with Facebook you had one commercial entity that had a lot more information about you than it should - now you have two.

Really what you want is the following:

  1. Selectivity in who gets what information about you:
    • Strangers should be able to see the minimum information I want to make public.
    • acquaintances should see more
    • family should see other information
    • ... these policies should be flexible and determinable by the owner of the information, by the person making the speech act of affirming it.
    And even though I may be happy for a service provider to maintain this data, you may not even wish to allow them access to it. It should be possible to have this information on your server at home controlled only by you.
  2. Link to friends wherever they are. After all if you have to go through one central aggregator of relationship information, then that aggregator will have a view of all the relationship information available, giving one actor complete and overwhelming advantage as opposed to everyone else. You need distributed data, also known as linked data or hyperdata.
  3. An Open Data structure so as to allow ecosystems to grow and use that information. I want the tools on my computer to all be able to work with my social network information.
  4. A way to determine trust

Allowing different people to see more or less information (point 1 above) should be quite easy to set up by having the server return different representation depending on who is viewing the information, determined by their having logged in to your site with something like OpenId. Linking information in a distributed way is easy using Semantic Web technologies, and is demonstrated by tools such as Beatnik. Beatnik is just one of the tools that could use such information on my desktop (thereby fulfilling point 3 above).

What you say, out loudly or on your web site is a speech act. All information is the speech act of some one, and it is this that allows us to determine our level of trust it in. This is also why one should try to say less rather than more, since every piece of information one publishes is information one may have to defend. It is therefore much better if we have a system where everyone can look after a small part of the graph of information they have a responsibility for and defend it. They can then point to information maintained by other people, who will have to defend their piece. But since pointing to information maintained by others is a vote of confidence in them, an economy of links will emerge whereby people want to increase the number of quality links to them, which will only happen if they are deemed trustworthy. So the system allows for distributed trust. For a simple but excellent example see the Distributed Information Group wiki's policy for allowing people to post.

Thursday Jan 03, 2008

Scoble gets thrown off Facebook

picture of current version of Beatnik

Scoble, who became very famous for getting blogging started at Microsoft, got ejected from FaceBook for crawling his network of friends. This is the problem with closed social networks and data silos in general. He seems to think the solution is data portability. More than that: the solution is Open Social Networks. You should be able to use a simple web server and just link up to your friends friend of a friend (foaf) file, whichever service they are using be it their own machine located in their basement, a service provider, a government owned machine, ... . Just as I can link from this blog to any blog. This would allow people to own their piece of the network, like they can own their blogs.

This is what Beatnik, a friend of a friend browser, which I described in this email to the social network portability group, will make it easy for anyone to do.

Everyone is welcome to help on this open source project: artists, documenters, Swing experts, testers, RESTafarians, ...

Wednesday Dec 19, 2007

Hyperdata in Sao Paulo

In the past week I gave a couple of presentations of Hyperdata illustrating the concept with demos of the Tabulator and Beatnik, the Hyper Address Book I am just working on.

The first talk I gave at the University of Sao Paulo, which was called at the last minute by Professor Imre Simon, who had led the Yochai Benkler talk the week before. It was a nice turnout of over 20 people, and I spoke at a more theoretical level of the semantic web, how it related to Metcalf's law, as explained in more detailed in a recently published paper by Prof. James Hendler, and how an application like Beatnik could give a powerful social meaning to all of this. I also looked at some of the interesting problems related to trust and belief revision that come up in a simple application like Beatnik, which touched a chord with Renata Wassermann who has written extensively on that field of the Semantic Web.
Many thanks to Prof Simon, for allowing me to speak. For a view from the audience see Rafael Ferreira's blog (in English) and Professor Ewout's blog (in Portuguese).

Yesterday I gave a more Java oriented technical talk at GlobalCode, an evening learning center in Sao Paulo, with a J2EE project on dev.java.net. I touched on how one may be able to use OpenId and foaf to create a secure yet open social network.
About 25 people attended which must be a really good turnout for a period so close to Christmas, when everyone is looking forward to the surf board present from Santa Claus, getting into their swimming trunks and paddling off to catch the next big wave. Well the really big wave that everyone in the know will be preparing for is the hyperdata wave. And to catch it one needs to practice one's skills. And a good way to do this is to help out with a simple application like Beatnik.
Thanks to Vinicius and Yara Senger for organising this.

Update

The talk I gave is now available online with audio as "Building Secure, Open and Distributed Social Network Applications".

Thursday Dec 06, 2007

Yochai Benkler: The Wealth of Networks

This afternoon I attended a teleconference at the University of Sao Paulo where Yochai Benkler talked from the Berkman Center for Internet and Society at Harvard, about his now famous book "The Wealth of Networks" (available online) and answered questions from the audience. Yochai talked about the impact of open source and peer to peer modes of co-operative production on economics, politics, arts and education. The book has many excellent and illuminating examples on how massively parallel and distributed use of human resources can outperform large centrally organised tayloristics production methods. He does point out that this won't work in every field of endeavour, but more naturally in knowledge based ones, where the cost of reproduction is close to zero. More details in the freely available book.

The conference was organised by Imre Simon from the Institute of Advanced Studies of the University of Sao Paulo. A web site in portuguese is dedicated to this talk, and it was broadcast live on the web.

At the end of the talk, as the last question from the floor, I asked about what research had been done into applying Metcalf's law to networks as powerful as the Semantic Web, and so how this would affect questions on the wealth of networks. Yochai seemed to think that the Semantic Web was too much about data, and not about people. Of course Beatnik, the semantic address book I am working on right now, is going to show how this dichotomy is completely illusory, and how the distributed, decentralised world of hyperdata should fit perfectly into the central thesis of the book. :-)

Tuesday Aug 28, 2007

My Bloomin' Friends

Closed Social Networks are blossoming all over the place. They provide a semblance of protection, at a price: lock in. Locked into the social network provider you get convenience in the form of tools to make conversation easier (video, email, chat boards, ...), some form of privacy protection (if you trust the provider), introductions to 'like minded' people, and other niceties.

Some of us work in the open air: we have to set standards in public view; we stand by what we say; we accept criticism from wherever it comes; and we can't choose our friends based on their social network provider. We describe ourselves in our foaf files where we can specify what we do, how to contact us, our interests, and links to who we know by pointing to their Universal Identifiers. There is no trouble linking between people who are open in this way. We are happy to reference each other: it strenghtens the exposure of our work and the quality of the web. This is how I link to Paul Gearon:

:me foaf:knows  [ = <http://web.mac.com/thegearons/people/PaulGearon/foaf.rdf#me>; 
                  a foaf:Person;
                  foaf:name "Paul Gearon" ] .
I could just point to his URL, but the little extra duplicate information can make life easier for people/robots browsing the data web. It can help people notice inconcistencies and help me correct them.

But not everyone lives in the open the same way, and not everyone wants to make the same amount of information about themselves public. There are a number of different ways to deal with this. I want to discuss a few of them here.

Content Negotition

How much someone says about themselves is up to them, and so is how they protect their information. The same URL that identifies someone, could return more or less information depending on who is asking. I could set up my foaf file so only friends who log in via openid can see my friends. Others would just get default information about me. I could be even more clever. I could allow any friend of my friend who logs in via their openid to see my full foaf file; others would see information about me, and a select group of open friends. Closed Social Networks could open up by making it convenient to specify these policies, and providing the right infrastructure to do so.

Indirect Identification

By directly identifying someone via a URL (as I do) we can leave a lot of the policy of what they make visible up to them. But those that don't have a foaf name, need to be identified indirectly. We can do that by identifying them via some property such as their blog, their home page, their email address, or their openid. I am very open about my email addresses. They are published and visible to all.
 <http://bblfish.net/people/henry/card#me>     <http://xmlns.com/foaf/0.1/mbox> <mailto:henry.story@bblfish.net> .
I value it more that people can contact me easily - living as I do in the middle of nowhere and often living nowhere in particular - than the pain of spammers. Too many people are lazy about security, using virus filled Windoze computers, obvious passwords, cracked software for me to be under any illusion that hiding my email is going to prevent the bad guys from getting it.

However I can't assume that everyone else will accept me applying this argument to their email address. For this there is a nice mathematical technique: I can encrypt their email address using the SHA1 hash function. This create a close to unique string that cannot be dissasembled. You cannot go from the sha1 sum of an email address back to the email. But you can always calculate the same sha1sum from an email. This is how I identify Simon Phipps, Sun's Open source Officer:

:me foaf:knows [ a foaf:Person;
                 foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                 foaf:name "Simon Phipps";
                 rdfs:seeAlso <http://www.webmink.net/foaf.rdf>
               ].

If you know Simon's email, then you will know that I know him. "What use is that?" I can hear someone ask. It's all about Working with People on the Internet. Imagine you are reading email on a newsgroup with a foaf enabled mail tool linked to a foaf enabled Address Book (such as Beatnik). You come on an email by Simon saying something interesting about how Sun has changed its stock ticker to JAVA for example. My logo and perhaps that of a couple of other people appears on the mail reader in a way that indicates to you that we know Simon. The post is no longer anonymous for you, and so has more trust value. You feel part of a community.[1]

So spammers can not use that information to spam. Either they already know your email address, and so they are probably already spamming you, or they don't, and this won't help them. They can only [2] learn about social network claims: who claims to know who. They could use this, it is true to introduce themselves as an aquaintance of a friend of yours. A bit of a risky strategy that could quickly get them on a black list. Currently being black listed may not be an expensive proposition. But in a cryptographic web of trust this will be both much easier to notice, and more damaging for the infringers.

Fuzzy Identification

I can directly and indirectly identify a lot of people in my Address Book as described above. This is perfectly acceptable for people who have an open life, like I do, and a large portion of the Open Source community, bloggers, standard setters, etc... But on last count I had over 700 people in my AddressBook. It is a lot of work to identify all of them individuall, and to decide how much visibility I should give them. I may not even want people to know how many people I know this way. Also I may want deniability: there are people one may know, but one may not want to highlight that, and one may want to be able to deny that one knows them to some people. The foaf:sha1sum gives me a way to identify someone, but if some nozy person comes to me and asks me about that person's life after having identified the corresponding email address, there is no escape route other than refusing any conversation, which by itself can easily be taken to be significant. What we need is a way to fuzily identify a group of one's aquaintance.

Bloom Filters

This is what Bloom Filters enable one to do. Originally used in times when memory was expensive, they allowed the whole vocabulary of a language to be condensed into a reasonably short string. Here we can use it to group all the email addresses of our friends together in one opaque string. I could express as follows in RDF (bear in mind that the rdf vocabulary has not been settled on):
:me foaf:bloomMbox [ a bloom:Bloom;
                     bloom:base64String """"
            IAOgQgSAAAICCAADAoQgDABAAiQKgIABgyAIBEhAAAAIUKBACCYAABAAaEkGQAGIEAHRUAgAAQUw 
            hCgwACJNQxQAAggAgCIgAAAAKgICEKAAAABCQiB0JCAAAIkgDASAYiAAAEIQAAIAABDCEAZACOpA 
            ICEEMAGAEGEAxIA=""";
                     bloom:hashNum 4;
                     bloom:length 1000 ] .

Given the above Bloom someone can query it with an email address using the inverse algorithm and the Bloom will answer either that I may know that person, or that it can't tell. The loaf project explains some of the advantages of having this in more detail.

The best way to get a feel for how it works is to try it. Here I have written a little java applet [3] that allows you to test my Bloom for people I know, and to create your own bloom [4].

Your browser is completely ignoring the <APPLET> tag! Go to java.com to download the latest.

Some emails you can try with positive results are tbray attextuality dot C O M, or bill at dehora dot net (suitably transformed of course). The applet lowercases all email addresses when creating and when testing the bloom.

To create your own bloom just click the "Create Bloom" tab. An easy way to extract all your email addresses from an OSX Address Book is to run the following on the command line:

hjs@bblfish:0$ osascript -e 'tell application "Address Book" to get the value of every email of every person' | perl -pe 's/,+ /\\n/g' | sort | uniq | pbcopy

You should now be able to paste the list of all your contacts in the applet. To restrict the Addresses to on of your groups named "foaf" for example replace the relevant section above with tell application "Address Book" to get the value of every email of every person in foaf.

You will need to choose the number of hashes and the maximal size of the bucket you wish to fill. The greater the number of hashes and the greater the size of the bucket, the more precision you get and the less deniability.[5]

Conclusion

None of the above tools are by themselves the complete solution for creating an Open Social Network that will satisfy everyone. But for people willing to live in the open, the correct and astute use of them should satisfy most of people's requirements. Access Control on URLs can make it possible to reveal more or less information depending on who is looking; indirect identification can allow one to name people even without direct identification; sha1sums allows one to partially hide sensitve identifying information; and Blooms allow one to make fuzzy statements of set membership. All of these can be combined in different ways. So one can make statements about sha1sum identified people on the open web, or one can do so behind an access controlled file that only friends logged in with OpenId can see. There are bound to be more fun things to be discovered here. But this should make clear just how much can be done in this space.

Notes

  1. For the link from email addresses to sha1sums to work, it helps to canonicalise the emails to all lowercase. This should probably be made more explict in the foaf:mbox_sha1sum definition.
  2. "They can 'only' learn about social network claims", is quite a lot more than some people are willing to accept. See the article by Mark Wahl "Organizing principles for identity systems: Attacks on anonymized social networks and fudging oracles" which contains some very good pointers. For people who want to retain complete anonymity, and this is what people subscribe to when they answer public surveys, any leakage of information is too much leakage. The problem is that because of Metcalf's Law it is nearly impossible to stop information combining itself: Information wants to be linked. So I think, when we are not tied to stringent laws, we should accept this rather than fight it, and use it to our advantage when hunting down spammers: the law holds for them too.
  3. You can get the source code for the applet on the so(m)mer repository in the misc/Bloom subdirectory. I used the pt.tumba.spell.BloomFilter class which I adapted a little for my needs. This was just the first one I found out there. It is probably not the most efficient one, as it uses an array of booleans, when it could use an byte array. If you know of other libraries please let me know.
    The code was put together really quickly and may well contain bugs. Feedback and patches and contributions are welcome.
  4. the advantage of Java Applets over server side code is really obvious here:
    • I don't need a server with a fixed port number to show you this
    • someone can't easily start a denial of service attack to bring the server down
    • You email addresses never leave your computer, so there is no fear of loss of privacy.
    On the last point it would be nice if browser vendors made it easier to get info about the exact restrictions a Java Applet had. I would like to be able to click on an Applet and verify or set it to "no network communication whatever". This would increase trust even more in cases like this.
  5. More info on the load site. Apparently one needs more than 1/4 deniability if one is to preserve some measure of privacy, according to the paper "the price of privacy and the limits of LP decoding" by Cynthia Dwork, Frank McSherry and Kunal Talwar (Microsoft Research) who suggests that
    ... any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.
    Thanks again to Mark Wahl for these references.
  6. Thanks a lot to Dan Brickley for working together with me on this last Friday, and pointing me to many of the important work done here. Dan also wrote a little python script to do something similar. Some of the sites I came across during our discussion: Not having studied bloom filters in detail, I am not sure how compatible the blooms of each of these libraries are. The super simple ruby bloom library does not seem to specify the number of hashes that were used to create a Bloom.
  7. Nick Lothian reminded me in a comment to this that he has written a Bloom Filter demo for facebook. I don't have a facebook account (because I am already on LinkedIn, and I can't really be bothered to move all my information, and because I don't like closed networks), so I was not able to use it. Perhaps I should get a facebook account just for this... Let me know.

Thursday Aug 09, 2007

cryptographic web of trust

As our identity moves more and more onto the Web our ability to have people trust that what we write has not been altered is becoming increasingly important. As our home page becomes our OpenId, linking to our CV, blogs and foaf information, it will become more important for services to be able to trust that the information they see really is what I wrote. Otherwise the following scenario can become all to easy to imagine. Someone breaks into my web server and changes the OpenId link on my home page to point to a server they control. They then go and post comments around the web using my openid as identity. Their server of course always authenticates them. People who receive the posts then click on the OpenId URL which happens to be my home page, read information about me, and seeing that information trust that the comment really came from me.

What is needed is some way to increase the methods people can have to trust information I state. Here I describe how one can use cryptography to increase that trust level. Starting from my creation of a PGP key, I show how I can describe my public signature in my foaf file, used PGP to sign that file and then link to the signature so that people can detect a tampered file. From this basis I show how one can build a very solid cryptographically enhanced web of trust.

Creating your PGP public key

After reading the first part of "SafariBooks Online, and feeling comfortable that I understood the basics, I decided it was high time for me to create myself a Public PGP key. Reading the GPG Manual and a few other HOWTOs on the web, using the gnu GPG library, I managed to make myself one quite easily.

Linking to the Public Key from your foaf file

Having this it was just a matter of placing it on my web site and using the Web Of Trust ontology developed by Dan Brickley to point to it and describe it, with the following triples:

@prefix wot: <http://xmlns.com/wot/0.1/> .
@prefix : <http://bblfish.net/people/henry/card#> .

:me  is wot:identity of [ a wot:PubKey;
                        wot:pubkeyAddress <http://bblfish.net/people/henry/henry.pubkey.asc>;
                        wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7";
                        wot:hex_id "4CAE10D7";
                        wot:length 1024 ] .

The is wot:identity of construct is a nice N3 short hand for refering to the inverse relation of wot:identity, without having to name one. This states that my public key can be found at http://bblfish.net/people/henry/henry.pubkey.asc, and describe its fingerprint, key length and its hex id. I am not sure why the wot:PubKey resource has to be a blank node, and can't be the URL of the public key itself, which would make for the following N3:

:me  is wot:identity of [ = < http://bblfish.net/people/henry/henry.pubkey.asc>
                          a wot:PubKey;
                          wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7" ] .

Perhaps simply because it is quite likely that one would want to put copies of one's public key in dfiferent places? owl:sameAs could have done the trick there too though...

Signing your foaf file

Anyway, once that is done, I want to be able to sign my foaf file. Of course it would be pretty tricky to sign the foaf file and put the signature into the foaf file simultaneously, as that would change the content of the foaf file and make the signature invalid. So the easiest solution is to simply have the foaf file point to the signature with something like this [1]

<> wot:assurance <card.asc>

The problem with this solution is that my foaf file at http://bblfish.net/people/henry/card currently returns two different representations: a rdf/xml one and an N3 one depending one how it is called. (More on this in I have a Web 2.0 name!). Now a signature is valid only for a sequence of bits, and the rdf sequence of bits is different from the N3 sequence, so they can't both have the same sig. There is a complicated solution developed by Jeremy Carroll in his paper Signing RDF Graphs, which proposes to sign a canonicalised RDF graph. The problem is that the algorithm to create such a graph does take some time to compute, only works on a subset of all RDF graphs, but mostly that no software currently implements that algorithm.
Luckily there is a simple solution, which I got by inspiration from my work on creating an ontology for Atom, Atom OWL, and that is to link my card explicity to their alternate representation (the <link type="alternate" href="...">) [2] and to sign those alternate representations. This gives me the following triples:

@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix awol: <http://bblfish.net/work/atom-owl/2006-06-06/#> .

<http://bblfish.net/people/henry/card>   a foaf:PersonalProfileDocument;
     iana:alternate <http://bblfish.net/people/henry/card.rdf>,
                    <http://bblfish.net/people/henry/card.n3> .

<http://bblfish.net/people/henry/card.rdf>
       wot:assurance <http://bblfish.net/people/henry/asc.card.rdf.asc> ;
       awol:type "application/rdf+xml" .
<http://bblfish.net/people/henry/card.n3>
       wot:assurance &lhttp://bblfish.net/people/henry/asc.card.n3.asc> ;
       awol:type "text/rdf+n3" .

Here I am saying that my card has two alternate representations, an rdf and an n3 one, what their mime type is, and where I can find the signature for each. Simple.

So if I put all of the above together I get the following extract from my N3 file:

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix wot: <http://xmlns.com/wot/0.1/> .
@prefix awol: <http://bblfish.net/work/atom-owl/2006-06-06/#> .
@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix : <http://bblfish.net/people/henry/card#> .

<http://bblfish.net/people/henry/card>   a foaf:PersonalProfileDocument;
     foaf:maker :me;
     foaf:title "Henry Story's FOAF file";
     foaf:primaryTopic :me ;
     iana:alternate <http://bblfish.net/people/henry/card.rdf>,
                    <http://bblfish.net/people/henry/card.n3> .

<http://bblfish.net/people/henry/card.rdf>
       wot:assurance <http://bblfish.net/people/henry/asc.card.rdf.asc> ;
       awol:type "application/rdf+xml" .
<http://bblfish.net/people/henry/card.n3>
       wot:assurance <http://bblfish.net/people/henry/asc.card.n3.asc> ;
       awol:type "text/rdf+n3" .

:me    a foaf:Person;
       foaf:title "Mr";
       foaf:family_name "Story";
       foaf:givenname "Henry";
       foaf:openid <http://openid.sun.com/bblfish> ;
       foaf:openid <http://bblfish.videntity.org/> ;
       is wot:identity of [ a wot:PubKey;
                            wot:pubkeyAddress <ttp://bblfish.net/people/henry/henry.pubkey.asc>;
                            wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7";
                            wot:hex_id "4CAE10D7";
                            wot:length 1024 ];

Which can be graphical represented as follows:

Building a web of Trust

From here is is easy to see how I can use the wot ontology to sign other files, link to my friends public signatures, sign their public signatures, how they could sign mine, etc, etc. and thereby create a cryptographically enhanced Web of Trust built on decentralised identity. Of course this is still a little complicated to put together by hand, but it should be really easy to automate, by incorporating it into the Beatnik Address Book for example. To illustrate this I will show I linked up to Dan Brickley and Tim Berners Lee's signature.

Dan Brickley has a foaf file which links to his public key and to the signature of his file. The relevant part of the graph is:

   <http://danbri.org/foaf.rdf>     a foaf:PersonalProfileDocument;
               wot:assurance <http://danbri.org/foaf.rdf/foaf.rdf.asc> .

   <http://danbri.org/foaf.rdf#danbri> 
               foaf:pubkeyAddress <http://danbri.org/danbri-pubkey.txt>

(foaf:pubkeyAddress is a relation that is not defined in foaf yet. Dan, one of the co-creators of foaf, is clearly experimenting here) Now with this information I can download Danbri's rdf, the public key and the signature and test that the document has not been tampered with. On the command line I do it like this:

bblfish$ gpg --import danbri.pubkey.asc
bblfish$ curl http://danbri.org/foaf.rdf > danbri.foaf.rdf
bblfish$ curl http://danbri.org/foaf.rdf.asc > danbri.foaf.rdf.asc
bblfish$ gpg --verify danbri.foaf.rdf.asc danbri.foaf.rdf

Of course this assumes that someone has not broken onto his server and changed all those files. As it happens I chatted with Dan over Skype (which I got from his foaf file!) and he sent me his public key that way. It certainly felt like I had Dan on the other side of the line, so I trust this public key enough. But why not publish what public key I am relying on? Then Dan and others can know what signature I am using and correct me if I am wrong. So I added the following to my foaf file

:me foaf:knows    [ = <http://danbri.org/foaf.rdf#danbri>;
                    a foaf:Person;
                    foaf:name  "Dan Brickley";
                    is wot:identity of
                          [ a wot:PubKey;
                            wot:pubkeyAddress <http://danbri.org/danbri-pubkey.txt>;
                            wot:hex_id "B573B63A" ]
                  ] .

In the above I am linking to Dan's public key on his server, the one that might have been compromised. Note that I specify the wot:hex_id of the public key though. Finding another public key with the same hex would be tremendously difficult I am told. But who knows. I have not done the maths on this. But I can make it even more difficult by signing his public key with my key, and placing that signature on my server.

bblfish$ gpg -a --detach-sign danbri.pubkey.asc

Then I can make that signature public by linking to it from my foaf file

<http://danbri.org/danbri-pubkey.txt> wot:assurance <danbri.pubkey.asc.asc> .

Now people who read my foaf file will know how I verify Dan's information, and detect if something does not fit between what I say and what Dan's says. If they then do the same when linking to my foaf file - that is mention in it where one is to find my public key, it's hash and sign my public key with their signature, placing that on their server - then anyone who would want to compromise any of us consistently to the outside world, would have to compromise all of the information on each of our servers consistently. As more people are added to the network, and link up to each other, the complexity of doing this grows exponentially. As there are more ways to tie information together there are more ways people can find the information, so the information becomes more valuable (as described in RDF and Metcalf's Law), and there are more ways to find inconsistencies, thereby making the information more reliable, thereby making it more valuable, and so on, and so on...

One can even make things more complicated for a wannabe hacker by placing other people's public keys one one's server, thereby duplicating information. Tim Berners Lee does not link to his public key, but I got it over irc, and published it on my web server. Then I can add that information to my foaf file:

:me foaf:knows [ = <http://www.w3.org/People/Berners-Lee/card#i>;
                    a foaf:Person;
                    foaf:name "Tim Berners Lee";
                    is wot:identity of
                           [ a wot:Pubkey;
                             wot:pubkeyAddress <timbl.pubkey.asc> ;
                             wot:hex_id "9FC3D57E" ];
                  ] .

By making public what public keys I use, I get the following benefits:

  • people can contact me and let me know if I am being taken for a ride,
  • it helps them locate public keys
  • the metadata associated with a public key grows
  • more people link into the cryptography network, making it more likely that more tools will build into this system

Encrypting parts of one's foaf file

Now that you have my public signature you can send me encrypted mail or other files using my public signature, verify signatures I place around the web and read files I encrypt with my private key. Of the files that I can encrypt one interesting one is my own foaf file. Of course encrypting the whole foaf file is not so helpful, as it breaks down the web of trust piece. But there may well be pieces of a foaf file that I would like to encrypt for only some people to see. This can be done quite easily as described in "Encrypting Foaf Files" [3]. Note that the encrypted file can be encrypted for a number of different users simultaneously. This would be simple to do, and would be of great help in making current practice available in an intelligent and coherent way to the semantic web. One thing I could do is encrypt it for all of my friend whose public key I know. I am not sure what kind of information I would want to do this with, but well, it's good to know its possible.

Notes

  1. This is how "PGP Signing FOAF Files" describes the procedure.
  2. The iana alternate (http://www.iana.org/assignments/relation/) relation is not dereferenceable. It would be very helpful if iana made ontologies for each of their relations available.
  3. Thanks to Stephen Livingstone for pointing that way in a reply to this blog post on the openid mailing list.
  4. A huge numbers of resources on the subject can be found on the Semantic Web Trust and Security Resource Guide, which I found via the Wikipedia Web of Trust page.
  5. This can be used to build a very simple but powerful web authentication protocol as described in my March 2008 article RDFAuth: sketch of a buzzword compliant authentication protocol

Friday Jul 20, 2007

foaf and openid

My Sun OpenId is helping me use many services I would not have used before. For example I have started using DZone which is a service like DIGG in that it allows one to vote for interesting stories on the web. But unlike DIGG, I don't have to go through the rigmarole of setting up a new account, waiting for an email, replying to the email, remembering one more password which I have to look up in my keychain anyway, etc, etc...

From my short experience I have identified some simple ways one can improve the user experience. Currently for example all the server knows about me is my openId URL. That makes for an impersonal experience, as you can see from this comment I posted:

I am identified as "openid.sun.com/bblfish" and there is no icon to represent me. If I want a more personal experience I need to register! Which means just entering my name, an email address and a few passwords. Ouch! So we are back to pre-openid land. One more password to enter, and to remember...

Luckily there is an obvious and easy fix to this. My openid http://openid.sun.com/bblfish should not just return a representation that contains a link to the openid server

<link rel="openid.server" href="https://openid.sun.com/openid/service" />
but also a link to a representation that contains more information about me, which would be my foaf file. This could be done very simply by growing the header of my openid html by one line, as specified by the foaf FAQ:
<link rel="openid.server" href="https://openid.sun.com/openid/service" />
<link rel="meta" type="application/rdf+xml" title="FOAF" href="http://bblfish.net/people/henry/card"/>
which is what videntity.org has been doing since 2005 [1], and openid.org has been providing since early July [2]. Now all that would be needed then is for dzone to read the foaf file pointed to, and extract the name relation, email and logo from the person described in the foaf file with the same openid. This could be done with a simple SPARQL query such as
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?mbox ?logo ?nick
WHERE {
        ?p foaf:openid <http://openid.sun.com/bblfish>.
     OPTIONAL { ?p foaf:mbox ?mbox } .
     OPTIONAL { ?p foaf:logo ?logo } .
     OPTIONAL { ?p foaf:nick ?nick } .
}
If you save the above to a file - say openid.sparql - you can run it on the command line using the python cwm script like this:
hjs@bblfish:2$ cwm http://bblfish.net/people/henry/card --sparql=./openid.sparql 
#   Base was: http://bblfish.net/people/henry/card
     @prefix : <http://www.w3.org/2000/10/swap/sparqlCwm#> .
    {
        "bblfish"     :bound "nick" .
        </pix/bfish.large.jpg>     :bound "logo" .
        <mailto:henry.story@bblfish.net>     :bound "mbox" .

        }     a :Result .
    {
        "bblfish"     :bound "nick" .
        </pix/bfish.large.jpg>     :bound "logo" .
        <mailto:henry.story@gmail.com>     :bound "mbox" .

        }     a :Result .
    {
        "bblfish"     :bound "nick" .
        </pix/bfish.large.jpg>     :bound "logo" .
        <mailto:henry.story@sun.com>     :bound "mbox" .

        }     a :Result .

That's how simple it is! [3]

For those who are still trying to keep their info private, one could add some content negotiation mechansim to the serving of the foaf file, such that depending on the authentication level of the requestor (dzone in this case), the server would return more or less information. If dzone could somehow show on requesting my foaf file, that I had authenticated them, and that should not be difficult to do, since I just gave them some credentials, I could give them more information about me. How much information exactly could be decided in the same box that pops up when I have to enter the password for the service... A few extra checkboxes on that form could ask me if I want to allow full, partial or minimal view of my foaf relations. Power users with more time on their hands could even decide on a relation by relation basis.

Notes

[1]
Videntity.org works nicely, and can even import all the information nicely from an existing foaf file! I would rather they give me the option to link to my original foaf file, which I am maintaining, rather than create yet another one on their server. Their foaf creates bnode urls, which makes me a little nervous (The only bnode url that makes me smile is Benjamin Nowack's). Also there is a bug in their foaf file, in that they have given me a URL which makes me both a foaf:Person and a foaf:Document. foaf does specify that there is nothing in the intersection of those sets. Does this make me a budhist?
[2]
Sadly I have not been able to use that openid.org account to log into anything yet. There seems to be a bug in their windows service. Their foaf file returns nearly no information at present and is incomplete. But the idea is good.
[3]
Here cwm returns an N3 representation. SPARQL servers usually can return both a SIMPLE XML and a simple JSON representation. Those working with a programming library, will skip the serialization step end up directly with a collection of solution objects that can be iterated through directly.

Tuesday Feb 13, 2007

mSpace: web 2.0 meets web 3.0 meets iTunes

Have you ever found the category browsing of iTunes to be a little limited? If so you have to try out mSpace, a Web 2.0 music browser, but also a whole new way of thinking about exploring relational data. Before reading any further just try it out!

So what does that mspace application do? If you have used iTunes and you view it in Browser mode by hitting ⌘B for example, you will have noticed that you are only confronted with three selection panes titled "Genre", "Artist" and "Album". You can't add any more, nor can you re-arrange them. Well the default is good as a default, but if you like to listen to classical music, then you may find that constraining your search by "Artist" is really not quite as interesting as constraining it by "Composer". So really you would like to have three columns "Genre","Composer","Album". This is what mSpace allows you to do. Not only that, but you can add any number of other columns and rearrange these columns any way you want by dragging and dropping them. You can then use this to search the information space the way that makes most sense to you.

As interesting as the UI is the theory behind it. Based on some 4 year old Semantic Web research (see their papers) this recent implementation makes all the points in an instant. For detailed description of the thinking behind this it is worth reading "Applying mSpace Interfaces to the Semantic Web", which gives a Description Logic (which is in short an Object Oriented declarative logical formalism) basis for their work.

A Java version, called jSpace is being implemented by Clark and Parsia. Looks like one just would need to resuscitate the work on jTunes and presto, one could have something a lot more interesting than iTunes, that worked on all platforms. The theory behind this is certainly going to be really useful to help me implement Beatnik.

Digg it.

Friday Feb 09, 2007

Beatnik: change your mind

Some people lie, sometimes people die, people make mistakes: one thing's for certain you gotta be prepared to change your mind.

Whatever the cause, when we drag information into the Beatnik Address Book (BAB) we may later want to remove it. In a normal Address Book this is straightforward. Every vcard you pick up on the Internet is a file. If you wish to remove it, you remove the file. Job done. Beatnik though does not just store information: Beatnik also reasons.

When you decide that information you thought was correct is wrong, you can't just forget it. You have to disassociate yourself from all the conclusions you drew from it initially. Sometimes this can require a lot of change. You thought she loved you, so you built a future for you two: a house and kids you hoped, and a party next week for sure. Now she's left with another man. You'd better forget that house. The party is booked, but you'd better rename in. She no longer lives with you, but there with him. In the morning there is no one scurrying around the house. This is what the process of mourning is all about. You've got the blues.

Making up one's mind

Beatnik won't initially reason very far. We want to start simple. We'll just give it some simple rules to follow. The most useful one perhaps is to simply work with inverse functional properties.

This is really simple. In the friend of a friend ontology the foaf:mbox relation is declared as being an InverseFunctionalProperty. That means that if I get the graph at http://eg.com/joe I can add it to my database like this.

If I then get the graph at http://eg.com/hjs

I can then merge both graphs and get the following

Notice that I can merge the blank nodes in both graphs because they each have the same relation foaf:mbox to the resource mailto:henry.story@sun.com. Since there can only be one thing that is related to that mbox in that way, we know they are the same nodes. As a result we can learn that :joe knows a person whose home page is http://bblfish.net, and that same person foaf:knows :jane, neither of those relations were known (directly) beforehand.

Nice. And this is really easy to do. A couple of pages of lines of java code can work through this logic and add the required relationships and merge the required blank nodes.

Changing one's mind

The problem comes if I ever come to doubt what Joe's foaf file says. I would not just be able to remove all the relations that spring from or reach the :joe node, since the relation that Henry knows jane is not directly attached to :joe, and yet that relation came from joe's foaf file.

Not trusting :joe's foaf file may be expressed by adding a new relation <http://eg.com/joe> a :Falsehood . to the database. Since doing this forces changing the other statements in the database we have what is known as non monotonic reasoning.

To allow the removal of statements and the consequences those statements led to, an rdf database has to do one of two things:

  • If it adds the consequences of every statement to the default graph (the graph of things believed by the database) then it has to keep track of how these facts were derived. Removing any statement will then require searching the database for statements that relied on it and nothing else in order to remove them too, given that the statement that one is attempting to remove is not itself the consequence of some other things one also believes (tricky). This is the method that Sesame 1.0 employs and that is described in “Inferencing and Truth Maintenance in RDF Schema - a naive practical approach” [1]. The algorithm Jeen and Arjohn develop shows how this works with RDF Schema, but it is not very flexible. It requires the database to use hidden data structures that are not available to the programmer, and so for example in this case, where we want to do Inverse Functional property deductions, we are not going to be easily able to adapt their procedure to our needs.
  • Not to add the consequence of statements to the database, but to do Prolog like backtracking when answering a query over only the union of those graphs that are trusted. So for example one could ask the engine to find all People. Depending on a number of things, the engine might first look if there are any things related to the class foaf:Person. It would then look at all things that were related to subclasses of foaf:Person if any. Then it may look for things that have relations that have domains that are foaf:Person such as foaf:knows for example. Finally with all the people gathered it would look to see if none of them were the same.
    All this could be done by trying to apply a number of rules to the data in the database in attempting to answer the query, in a Prolog manner. Given that Beatnik has very simple views on the data it is probably simple enough to do this kind of work efficiently.

So what is needed to do this well is the following:

  • notion of separate graphs/context
  • the ability to easily union over graphs of statements and query the union of those easily
  • defeasible inferencing or backtracking reasoning
  • flexible inferencing would be best. I like N3 rules where one can make statemnts about rules belonging to certain types of graphs. For example it would be great to be able to write rules such as: { ?g a :Trusted. ?g => { ?a ?r ?b } } => { ?a ?r ?b } a rule to believe all statements that belong to trusted graphs.

From my incomplete studies (please let me know if I am wrong) none of the Java frameworks for doing this are ideal yet, but it looks like Jena is at present the closest. It has good reasoning support, but I am not sure it is very good yet at making it easy to reason over contexts. Sesame is building up support for contexts, but has no reasoning abilities right now in version 2.0. Mulgara has very foresightedly always had context support, but I am not sure if it has Prolog like backtracking reasoning support.

[1] “Inferencing and Truth Maintenance in RDF Schema - a naive practical approach” by Jeen Broekstra and Arjohn Kampman

Wednesday Feb 07, 2007

foaf enabling an enterprise

Developing the Beatnik Address Book (BAB) I have found requires the software to implement the following capabilities:

  1. being able to read rdf files and understand foaf and vcard information at the very least. (on vcard see also the Notes on the vcard format).
  2. being able to store the information locally, keeping track of the source of the information so as to be able to merge and unmerge information on user demand
  3. being able to write semantic files out at some unchanging location
  4. an easy to use GUI (see the article on F3).

I would like to look at 3. the aspects of writing out a foaf file today. At its simplest this is really easy. But in an enterprise environment, if one wants to give every employee a foaf file so as to allow Universal Drag and Drop of people between applications inside the firewall, some questions need to be answered.

General Solution

The main solution and the obvious one is just to write a foaf file out to a server using ftp, scp, WebDav or the nascent Atom Protocol. Ftp and scp are a little tricky for the end user as he would have to understand the relation between the directory structure of the ftp server and its mapping to the web server urls, as well as what is required to specify the mime types of the foaf file, which is very much dependent on the setup of the web server. (see what I had to do to enable my personal foaf file) This may end up being a lot of work with a steep learning curve for someone who wishes to just publish their contact information. WebDav on the other hand, being a RESTful protocol, makes it much easier to specify the location of the file. Wherever the file is PUT that's it's name. Similarly with the Atom Protocol, though I am not sure for either of them how good they are when confronted with arbitrary mime types. My guess is that WebDav should do much better here.
In any case, using either of the above methods one can always later overwrite a previous version if one's address book changes. This is indeed the solution that will satisfy most use cases.

Professional Solution

In a professional setting though, things get to be a little more complicated. Consider for example any large fortune 500 company. These companies already have a huge amount of information on their employees in their ldap directory. This is reliable and authoritative information, and should be used to generate foaf files for each employee. These companies usually have some web interface to the ldap server which aggregates information about the person in human readable form. Such a web interface - call it a Namefinder – could easily point to the machine generated foaf file.

Now the question is: should this foaf file be read only or read/write? If it is read/write then an agent such as the Beatnik Address Book, could overwrite the file with different information from that stored in ldap, which could cause confusion, and be frowned upon. Well of course the WebDav server could be clever and parse the graph in such a way as to enforce the existence of a certain subgraph. So given the following graph generated from the ldap database

<#hjs> a foaf:Person;
             foaf:mbox <mailto:henry.story@sun.com>;
             foaf:name “Henry Story”;
             org:manager <12345#hjs> .

An Address Book that would want to PUT the a graph containing the following subgraph

<#hjs> a foaf:Person;
             foaf:mbox <mailto:henry.story@sun.com>;
             foaf:name “Henry Story”;
             org:manager <#hjs> .

might

  • get rejected, because the server decides it owns some of the relations, especially the org:manager one in this case. (What HTTP return code should be returned on failure?)
  • or it may decide to rewrite the graph and remove the elements it does not approve of and replace them. That is, replace the triple <#hjs> org:manager <#hjs> with <#hjs> org:manager <12345#bt> for example. (Again what should the HTTP return code be?)

Both of those solutions are valid, but they end up creating a file of mixed ownership. Perhaps it would be better to have the file be read only, officially owned by the company, and have it contain a relation pointing to some other file owned by the user himself. Perhaps something like the following would appear in file at http://foaf.sun.com/official/294551 :

<>   a foaf:PersonalProfileDocument;
       foaf:maker  <http://www.nasdaq.com/SUNW>;
       lnk:moreInfo </personal/294551> .

</personal/294551> a rights:EditableFile;
                    rights:ownedBy <#hjs> .

That is, in plain English the resource would say that it is a PersonalProfileDocument and that Sun Microsystems is the maker of the file and that more information is available at the resource </personal/294551>. It would also give ownership permissions on that resource. A PROPFIND on each of those files could easily confirm the access rights of each of them.

Now from there it should be possible for the user agent ( BAB in this case) to deduce that it has space to write information at </personal/294551>. There it can then write out all the personal information it likes: adding relations to DOAP files, to a personal home page, to interests and to other people known, etc... It could even add a pointer to a public foaf file with a statement such as

<http://foaf.sun.com/official/294551#hjs> owl:sameAs <http://bblfish.net/people/henry/card#me> .

Multiple Agent Problems

Having solved the problem of a writable user agent file, there remains one more distant problem of the same person ending up in a more Semantically enabled future with multiple user agents all capable of writing foaf files but each perhaps with slightly different interests. How would these user agents write files making sure that they don't contradict each other, overwrite important information that the other requires, etc...? The Netbeans user agent may want to write out some relations in the foaf file using the doap ontology to point to the projects the employee is working on... Well perhaps it is as easy as just adding those triples to the file or if then the same problem of ownership arises as above, it may be worth placing each triple into a different user agent space... Well. This seems a bit far out for the moment, I'll look at that problem when I build the next Semantic application. If people have already come across this problem please let me know.

Questions

The above are just some initial thoughts on how to do this. Are there perhaps already relations out there to help cut up the responsibility of writing out these files between different agents be they political or software ones? Are there other solutions I am missing?

Tuesday Jan 30, 2007

Semantic Web School

Andreas Blumauer, Managing Director for the Semantic Web School, came to Sun in Zürich to give a one day course as part of a four day conference whose main topics was helping relate knowledge and people. We had people fly in from the USA, the UK (Dave Levy and Chris Gerhard), France (me: photo), Austria and Germany. (more pictures).

The very impressive SkillMap java applet opened up the conference laying the ground work for thinking about the relations between people and skills. This was followed by Andreas' one day course which flew the team up to a 50000 foot height, where we all jumped out till we could see the details of the landscape, including rdf, Dublic Core, foaf, doap, skos, sioc, and much more.

The landing was softened by a few demonstrations. I gave a quick presentation on Universal Drag and Drop using an early version of the Beatnik Address Book to demonstrate the simplicity of the concept (another Micro Killer App?). D2RQ as usual opened everyone's eyes, as I gave a demo of SPARQLing Roller. (After the conference, people kept referring to it as R2D2 though...)

Organized by Peter Reiser, sponsored by Dan Berg and Hal Stern, the conference was according to everyone present a huge success. There were quite a number of "aha!" moments and everyone went through at least one. I myself finally got a full overview of the problems large organizations like Sun need to solve in the knowledge management area. We cleared away some major Semantic Web fears, the most important being that it would require a complete retooling of the enterprise. This is perhaps again why D2RQ (and its competitors) are so important. But most of all the idea that by giving everyone in the enterprise a foaf name, one could tag them like any other resource, relate them to other people, documents, processes ... and create an open and flexible space for linking knowledge together was like discovering a new horizon.

Of course no conference goes without good restaurants (and in Switzerland this means fondue) and drinking into the evening. To top it all I went to an amazing restaurant called Clowns & Kalorien which if you ever happen to be in Zürich I highly recommend. All these calories had to be shed somehow, so as luck would have it we had some fresh snow, and I went to Engelberg, a beautiful resort close by (images).

Thursday Dec 14, 2006

Universal Drag And Drop

As the world becomes completely interconnected, we need to drag and drop anything over computers, between operating systems and across continents. There is one easy way to do this, which we have been using for a long time without perhaps realizing it: by using URLs to send people web pages we like. Now we can generalize this to drag and drop anything.

In RDF we can use URLs to name anything we want. We can name people for example. My foaf name is http://bblfish.net/people/henry/card#me. If you GET its meaning by clicking on the above link, the resource http://bblfish.net/people/henry/card will return a representation in rdf/xml describing me and people I know. Using this feature I will be able to drag and drop my foaf name onto the AddressBook I am writing. It will be able to ask for its preferred representation from the resource and from the returned rdf, deduce that the given url points to a foaf:Person, and so display those connections it understands in the associated graph.

But we could go one further with this mechanism. The operating system could easily be adapted to use this information to change the picture of the mouse cursor when the drag event occurs, by fetching a cached representation (REST is designed for caching) or by querying the cached graph, to find what the type of the resource is ( a foaf:Person ) and so display a stick person or if a relationship to a logo is available the logo of the person, with perhaps their initials printed below.

All one needs for drag and drop are URLs - metadata is just one GET away.

About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today