Wednesday Dec 19, 2007

Hyperdata in Sao Paulo

In the past week I gave a couple of presentations of Hyperdata illustrating the concept with demos of the Tabulator and Beatnik, the Hyper Address Book I am just working on.

The first talk I gave at the University of Sao Paulo, which was called at the last minute by Professor Imre Simon, who had led the Yochai Benkler talk the week before. It was a nice turnout of over 20 people, and I spoke at a more theoretical level of the semantic web, how it related to Metcalf's law, as explained in more detailed in a recently published paper by Prof. James Hendler, and how an application like Beatnik could give a powerful social meaning to all of this. I also looked at some of the interesting problems related to trust and belief revision that come up in a simple application like Beatnik, which touched a chord with Renata Wassermann who has written extensively on that field of the Semantic Web.
Many thanks to Prof Simon, for allowing me to speak. For a view from the audience see Rafael Ferreira's blog (in English) and Professor Ewout's blog (in Portuguese).

Yesterday I gave a more Java oriented technical talk at GlobalCode, an evening learning center in Sao Paulo, with a J2EE project on I touched on how one may be able to use OpenId and foaf to create a secure yet open social network.
About 25 people attended which must be a really good turnout for a period so close to Christmas, when everyone is looking forward to the surf board present from Santa Claus, getting into their swimming trunks and paddling off to catch the next big wave. Well the really big wave that everyone in the know will be preparing for is the hyperdata wave. And to catch it one needs to practice one's skills. And a good way to do this is to help out with a simple application like Beatnik.
Thanks to Vinicius and Yara Senger for organising this.


The talk I gave is now available online with audio as "Building Secure, Open and Distributed Social Network Applications".

Saturday Dec 15, 2007

James Gosling has a foaf name

And so does Tim Bray, Greg Papadopoulos, Jonathan Schwartz, Sun Microsystems, and Java. All thanks to the great work of the DBPedia people, a loose network of highly skilled distributed self selected avant garde force de frappe, who are extracting all the metadata possible from Wikipedia and making it available as hyperdata, ready to be linked to. :-)

You can browse their information on the web, or with the Tabulator generic data browser which will merge information it finds into one large graph as you explore it. As a result of this I can now add Tim Bray and James Gosling to my foaf file (foaf icon), by adding the following N3 statements:

:me foaf:knows [ = <>;
                    a foaf:Person;
                    foaf:name "James Gosling" ],
               [ = <>;
                    a foaf:Person;
                    foaf:name "Tim Bray" ] .

It is worth looking at how DBPedia works. is now a Universal Resource Identifier for James Gosling. You cannot fetch James because he is not an information resource, ie, he is not a document, though he is very resourceful, and full of interesting information. You can tell that James is not an information resource because you can't copy him easily. So when you do an HTTP GET on that URI you get the following:

hjs@bblfish:0$ curl -I
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 17:57:54 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Content-Type: text/plain
Content-Length: 90

ie you get a redirect to the page about James Gosling. This is because curl by default asks for the html representation of resources. Had you sepecified that you wanted the machine readable rdf/xml representation you would get a redirect to another resource:

hjs@bblfish:0$ curl -I -H "Accept: application/rdf+xml"
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 18:01:10 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Content-Type: text/plain
Content-Length: 210

Here you get a redirect to a SPARQL query to DESCRIBE James Gosling. To get the full content, in N3 try:

hjs@bblfish:0$ curl -L -H "Accept: text/rdf+n3" 

the -L flag follows all the redirects...

Tuesday Nov 27, 2007

Jazoon call for papers

Jazoon is a Java conference taking place in Zürich, Switzerland in the summer, June 23 - 26 2008 to be precise. The Jazoon conference committee has sent out a request for papers. My submission last year "Web 3.0, this is the semantic web" was very well attended. I also met some great folk such as Dean Allemang and Harold Carr. It is a smaller conference so you get more time to meet people than at JavaOne, but they also get some famous people to come along, such as Roy Fielding who gave a keynote presentation.

Anyway, the call for papers is up till the end of December. So hurry up to add your contribution.

I will put forward a talk on a Semantic Address Book that I submitted for this year's JavaOne. The good news there is that there have been at least three talks on the Semantic Web put forward for J1 this year that I directly know about, and indirectly I believe there are a lot more. I really hope that at least three of them will get accepted, which would make for a nice SemWeb track. Hey, there were two talks on the subject at Jazoon! There should be at least the following talks in my opinion at J1:

  • a talk to introduce the semantic web and reasoning - even if this touches only slightly on Java
  • a talk explaining how a real hyperdata application was written in Java
  • a panel on the current state and the future of semantic web tools on the Java platform.

I suppose I should also put something forward for JavaPolis... Anyone know what the deadlines are there?

Friday Nov 02, 2007

Vote for Java6 on Leopard!

As mentioned previously a lot of Java developers on OSX are upset at Apple's silence as to its intentions with respect to the release of Java 6. There used to be a developer preview available, which was pulled recently with no indication as to when a replacement would be available. People like me who upgraded in the hope of having the latest and greatest - which we have been very patiently waiting for over a year for - are very disappointed. It creates all kinds of annoyances, like not being able to run Java Tutorial examples. Some who are working on Java 6 projects cannot use their computer easily, without resorting to installation of a separate OS in a virtual machine, to do their job. We all like OSX: its a beautiful easy to use Unix that usually really helps us get our work done. I have been very happily using it since 2004.

The first solution of course is to have our voice heard. One way to do this is to file a bug with Apple. Please do this! The only problem I have with it is that as opposed to the Java bug database which is completely open, the Apple bug database is completely closed. So there's no real way of verifying how many people have posted a report. We must therefore complement that action with an equal Open action. Following the noble example given to us by Nova Spivack, when he asked for people to make their voice heard in support of the Burmese people and got some real results, let us do the same to help Apple make the right decision.
Anybody who would like to support this issue in the blogosphere, should help post a blog with the string


The first part of the string is the decimal notation for 0xCAFEBABE [1], the magic cookie for JavaClass files (thanks David for the number and the pointer to Fredericiana's photo). Then post similar instructions on your blog or point people here. Let's see how far this gets us! [2]

We should then be able to use any search engine, Google is a good choice, to search for this string [3], and hopefully motivate the managers at Apple to invest more time on Java and be more open about their plans with the community.

Your vote may also be an energizer to those groups that are starting to port the OpenJDK to OSX (via the mac java community).


  1. Oops I just noticed a mistake here. 13949712720901 in dec = 0xCAFEBABE405 in Hex. Even better. So that's CAFEBABE + the HTTP 405 Response, which means "Method not available". :-)
  2. If you know a foreign language then please translate the instructions and explanations so that more people can understand what is going on. Always post a link to some instructions. Language is a Virus, but it is most virulent when it is understandable and hyperlinked, of course.
  3. A search on Google Web returns more results - more than AllTheWeb or AltaVista - but Google Blog Search contains less duplicates. The real number of votes is somewhere between those two numbers, as some people are voting on their open source web sites, which are not always feed enabled. Simon is keeping count.
  4. Karussell is keeping a list of related articles.


Tuesday Nov 13: Landon Fuller has been able to get a very nice hello world GUI app running on OSX using the FreeBSD jdk1.6 port. It runs under X Windows only. Excellent work!

Nov 20th, 2007: Dave Dirbin publishes the first beta release of the open source java 6. This campaign has gathered 105 blog votes if we count the results from Google Blog Search, placing it easily among the top 10 bug reports at the Java Bug database. The Google web search returns 256 results, which will contain the blog search, many duplicate pages pointing to blogs + some extra votes people may have placed on the web. I guess that those extra votes may pop this bug report up to the top 5 position.

Wednesday Dec 19: Apple has put a developer preview of Java 6 up on Apple Developer Connection. It is nice to see things progress on that side. As a result of this conflict, Java development on OSX has become a lot richer, with an open source JDK starting to compete with the closed one from Apple. This can only be good for both, and for developer and customer confidence in the platform.

Saturday Oct 27, 2007

Echo2: building Web 2.0 in Java

During some heated discussion on the Apple Java Dev mailing list as to why there still is not Java 6 available in Leopard, with some serious calls to port OpenJDK to OSX, Will Gilbert pointed to the Echo2 demo by Nextapp, which is written with a Java™ library. Here is what Will had to say:
I eventually found little known framework which blew my socks off with performance and seduced me with the source code I would be writing. It was Echo2 from Fully open source and free. There is nice demo at:

Use the accordion pane to select the "Technology" panel, the click on the "Java Development" button you can see the source code, then run the app which is defined by the source. You will see an immediate similarity to AWT with a flavor of Swing.

I don't know if you have already committed to a development platform, if not, check out Echo2. More specifically take a look at the Echo2 fork which was done by some guys in Australia called Cooee at They took the Echo2 source, added a nice bug reporting system (JIRA), scheduled releases, and Maven repository support.

I you have Leopard, you now have Maven installed. You can run a archetype which I've been developing to create a runnable demo application. This archetype is underdevelopment and is likely to change.

From terminal run the following. Maven will install what it need and then create the application

% mvn archetype:create \\
-DarchetypeGroupId=org.karora.cooee.sandbox.informagen \\
-DarchetypeArtifactId=webapp \\
-DarchetypeVersion=1.0.0 \\
-DremoteRepositories= \\ \\
% cd xxxxx
% more readme.txt

Anyway, that is a really nice discovery that got my mind off my disappointment with the unavailability of even a preview of Java 6 on Leopard. This is an OSX release that otherwise has many very nice features. Here are some of my initial impressions:

  • The user interface is snappier
  • Spotlight finally works (It used to take forever to find things, and the user interface was terrible).
  • The Finder is very much improved. The nice big graphics previews helped me find a few videos I did not know I had. With the click of a space you can see the first page of a NeoOffice document. (NeoOffice is a free and excellent Office suite with the User Interface written in Java btw)
  • Spaces, the multi windowing environment, is dead beautiful, but broken. You can drag one web browser window from one virtual desktop to another with one simple gesture, but then when you want to switch between a number of web browser in different windows using alt-tab you usually end up in the original window. Using Expose keys does not help that much either. So you have to have all windows from one application on the same virtual window. You also don't get much choice for selecting your short cut keys, which means that you may have trouble with it interfering with other applications.
  • Netbeans 6 beta 2 works very well on Leopard. There have been some improvements to the graphics library in their new Java release which has broken IntelliJ and Eclipse though. NetBeans 6 by the way is getting to be really really good.
  • Matt Neuburg post some well illustrated criticisms on the new features in Leopard, which I agree with. The last point about the help windows being impossible get out of the foreground is, it is true, quite bizarre.
  • Some minor and not so minor bugs in The lame Leopard blog

But the final and best review is as usual John Siracusa's Ars Technica review. No marketing hype. Real facts, and good criticism.

Wednesday Oct 17, 2007

Java for iPhone by Feb 2008?

I just received this email as a response to a bug I had filed at Apple concerning Java's non presence on the iPhone.
(Hmm... Reading the message closely it does not say "Java" though, it just mentions an SDK... I wonder what kind of Software Development Kit they mean?)

Subject: Re: Bug ID 4918928: java on iphone

Dear Henry,

This is a follow-up to Bug ID# 4918928 . Apple has just announced via Apple HotNews an iPhone SDK will be made available to developers in February 2008.

------------------- [no permalink it seems]

Third Party Applications on the iPhone Let me just say it: We want native third party applications on the iPhone, and we plan to have an SDK in developers’ hands in February. We are excited about creating a vibrant third party developer community around the iPhone and enabling hundreds of new applications for our users. With our revolutionary multi-touch interface, powerful hardware and advanced software architecture, we believe we have created the best mobile platform ever for developers.

It will take until February to release an SDK because we’re trying to do two diametrically opposed things at once—provide an advanced and open platform to developers while at the same time protect iPhone users from viruses, malware, privacy attacks, etc. This is no easy task. Some claim that viruses and malware are not a problem on mobile phones—this is simply not true. There have been serious viruses on other mobile phones already, including some that silently spread from phone to phone over the cell network. As our phones become more powerful, these malicious programs will become more dangerous. And since the iPhone is the most advanced phone ever, it will be a highly visible target.

Some companies are already taking action. Nokia, for example, is not allowing any applications to be loaded onto some of their newest phones unless they have a digital signature that can be traced back to a known developer. While this makes such a phone less than “totally open,” we believe it is a step in the right direction. We are working on an advanced system which will offer developers broad access to natively program the iPhone’s amazing software platform while at the same time protecting users from malicious programs.

We think a few months of patience now will be rewarded by many years of great third party applications running on safe and reliable iPhones.


P.S.: The SDK will also allow developers to create applications for iPod touch.

Monday Oct 15, 2007

Java Picture Editor

Over the past year I have found myself again and again using the free Java Image Editor from JH Labs, a company that seems to be doing a lot of interesting work in the java graphics area.

ImageEditor, starts much quicker than the Gimp, does not require X11, and has a modern look, as you can see from the snapshot I made, which integrates beautifully in OSX. It comes with a lot of filters, which are available in source code under an Apache licence. ( I have not really used them, to tell the truth)
Today I have been using it for the very simple task of editing a foaf icon. I had tried for hours to get X11 going after it got clobbered by an installation I made using fink, the open source distribution platform for OSX. What I now have keeps crashing after a few minutes of use. This is where Image Editor came to the rescue.

ImageEditor has quite a few limitations, other than it not being fully open source. I don't think it has more than one step undo for example. And some of the menus are a little obscure.

But what I really like about it is that it shows that you can build applications in Java that look as good as native OSX apps.

Friday Oct 05, 2007

Doap Bean available

I have just made the NetBeans Doap Bean available on the plugin portal. Just download onto your desktop and install in a version of NetBeans 6 (check Tools < Plugins in the menu)

This is the module I demonstrated at James Gosling's 'fun things' presentation on NetBeans day in San Francisco. I have updated the code to make it easy to understand for people who would wish to emulate and enhance it. It is easy to do that. Install the plugin, and go to the project. Then drag the blue button next to the URL

from your browser (I have checked that it works with Safari and Firefox on OSX) onto the DOAP button on the toolbar. This will fetch the information from the web page and pop up a window with a human readable representation of the RDF. This window should look like this:

window describing the so(m)mer project

Clicking on the other tabs will show you the original RDF/XML or an easier to read Turtle representation of the data. It is really important to show these tabs so that you can distinguish good from bad doap. Of course one can also go to the W3C Validator for an independent opinion.
In any case if the source code is available via a CVS or Subversion repository, you should be able to download it with just the click on the "download" button. (Make sure that NetBeans knows where your svn command line tool is though, by going to the menu Versioning &gr; Subversion > Checkout... )

If you want to try dropping other projects onto the button go to DoapSpace, they have put together a large collection of doap files for all the projects on SourceForge, Freshmeat and PyPi.

As I mention this is really only version 0.1 of the doap integration of Netbeans. Clearly one could do a lot more, such as:

  • Having it produce Doap for a project automatically
  • Tying it into NetBeans's Project panel
  • describing the relationships a project and others it depends on
  • Linking bug reports to information gleaned from the doap:bugdatabase relation
  • Perhaps see if one can set things up so that one can immediately find the javadoc online for a doap project one has information of
  • find a way to view source on a jar, by relating jars to source code repositories... (more difficult this one)
  • and a lot more...

Now you may wonder: How is one going to know that there is a doap link on some project's source page? Searching for the doap link seems a lot of work, right? Well to get an idea of how things will integrate you can install the Firefox Semantic Radar plugin, and go to the So(m)mer project again. You will then see displayed at the bottom of your browser an icon of square smiley faces, as shown on the following screenshot

semantic radar icon in Firefox

I should probably add this icon to the Doap button come to think of it...
The Doap button is in the So(m)mer repository, which is all published under the very generous BSD licence, so you are welcome to help out and add your own features... I may be having to work on a few other things next, so I won't be getting in your way :-)

Tuesday Sep 18, 2007

M2N: building Swing apps with N3

At the Triple-I conference in Graz, I came across one very interesting demo by the M2N Intelligence Management company, where they showed a Development environment powered by N3, the powerful, easy to read notation for the Semantic Web. Using a Visual Editor that mapped UML diagrams to N3, instead of the usual limited and difficult to understand OMG MOF family of standards, we could see how one could build a complete User Interface application, including logic in a visual way. The same could be done manually in vi by editing the N3 directly, for those proficient enough. I think they describe parts of this very generally on their solutions page.

It is a pity that M2N does not open source this library, as that would allow one to get a better idea as to the advantages of doing things this way. Sebastian Schaffert - who works at a research company in Salzburg, and was looking at their demo with me - was quite enthusiastic about the idea. There was a lot one could do with such a tool, he thought, such as being able to SPARQL query one's user interface, test it for constraints, etc...

It would be nice to have some feedback from people who had used this on the pros and the cons of their implementation, or of the general idea.

Monday Sep 10, 2007

The Church of NetBeans

If there is anyone else who is close to being as homeless as me at Sun, it is certainly Tim Boudreau, who is now on a World Tour in a truck he bought for $1000. As he is the ultimate NetBeans evangelist, he painted up his car with the NetBeans logo, and will evangelise to whomever wants to hear the word :-) Read up on his story on his blog.

Tim is also a creative guitar player and song composer so don't hesitate to ask him to play you a song.
This reminds me of Timbuk 3's song "Reverend Jack and his Roamin Cadillac Church" (iTunes).

Come hell or high water
A soul's got to find some release
Some find it in power
And some in heavenly peace
Some look to the preacher
As he speaks from his holy perch
Me, I back Rev. Jack & his Roamin Cadillac Church
So if you're stuck at the station
On the road to the Glory on High
If you need some inspiration
He's got more than your money can buy
If you're lookin for salvation
Well my friend it's the end of your search
Here comes Rev. Jack & his Roamin Cadillac Church
Ain't no use watchin the road, son
When you ride in his automobile
Cause we're all back seat drivers,
& there's nobody at the wheel
Now for the well-to-do doctor
There's a home & a summer retreat
And for the jet-settin banker
There's a place in the social elite
But for the poor & the hungry
All the lost souls left in the lurch
There's just Rev. Jack & his Roamin Cadillac Church

Tuesday Aug 28, 2007

My Bloomin' Friends

Closed Social Networks are blossoming all over the place. They provide a semblance of protection, at a price: lock in. Locked into the social network provider you get convenience in the form of tools to make conversation easier (video, email, chat boards, ...), some form of privacy protection (if you trust the provider), introductions to 'like minded' people, and other niceties.

Some of us work in the open air: we have to set standards in public view; we stand by what we say; we accept criticism from wherever it comes; and we can't choose our friends based on their social network provider. We describe ourselves in our foaf files where we can specify what we do, how to contact us, our interests, and links to who we know by pointing to their Universal Identifiers. There is no trouble linking between people who are open in this way. We are happy to reference each other: it strenghtens the exposure of our work and the quality of the web. This is how I link to Paul Gearon:

:me foaf:knows  [ = <>; 
                  a foaf:Person;
                  foaf:name "Paul Gearon" ] .
I could just point to his URL, but the little extra duplicate information can make life easier for people/robots browsing the data web. It can help people notice inconcistencies and help me correct them.

But not everyone lives in the open the same way, and not everyone wants to make the same amount of information about themselves public. There are a number of different ways to deal with this. I want to discuss a few of them here.

Content Negotition

How much someone says about themselves is up to them, and so is how they protect their information. The same URL that identifies someone, could return more or less information depending on who is asking. I could set up my foaf file so only friends who log in via openid can see my friends. Others would just get default information about me. I could be even more clever. I could allow any friend of my friend who logs in via their openid to see my full foaf file; others would see information about me, and a select group of open friends. Closed Social Networks could open up by making it convenient to specify these policies, and providing the right infrastructure to do so.

Indirect Identification

By directly identifying someone via a URL (as I do) we can leave a lot of the policy of what they make visible up to them. But those that don't have a foaf name, need to be identified indirectly. We can do that by identifying them via some property such as their blog, their home page, their email address, or their openid. I am very open about my email addresses. They are published and visible to all.
 <>     <> <> .
I value it more that people can contact me easily - living as I do in the middle of nowhere and often living nowhere in particular - than the pain of spammers. Too many people are lazy about security, using virus filled Windoze computers, obvious passwords, cracked software for me to be under any illusion that hiding my email is going to prevent the bad guys from getting it.

However I can't assume that everyone else will accept me applying this argument to their email address. For this there is a nice mathematical technique: I can encrypt their email address using the SHA1 hash function. This create a close to unique string that cannot be dissasembled. You cannot go from the sha1 sum of an email address back to the email. But you can always calculate the same sha1sum from an email. This is how I identify Simon Phipps, Sun's Open source Officer:

:me foaf:knows [ a foaf:Person;
                 foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                 foaf:name "Simon Phipps";
                 rdfs:seeAlso <>

If you know Simon's email, then you will know that I know him. "What use is that?" I can hear someone ask. It's all about Working with People on the Internet. Imagine you are reading email on a newsgroup with a foaf enabled mail tool linked to a foaf enabled Address Book (such as Beatnik). You come on an email by Simon saying something interesting about how Sun has changed its stock ticker to JAVA for example. My logo and perhaps that of a couple of other people appears on the mail reader in a way that indicates to you that we know Simon. The post is no longer anonymous for you, and so has more trust value. You feel part of a community.[1]

So spammers can not use that information to spam. Either they already know your email address, and so they are probably already spamming you, or they don't, and this won't help them. They can only [2] learn about social network claims: who claims to know who. They could use this, it is true to introduce themselves as an aquaintance of a friend of yours. A bit of a risky strategy that could quickly get them on a black list. Currently being black listed may not be an expensive proposition. But in a cryptographic web of trust this will be both much easier to notice, and more damaging for the infringers.

Fuzzy Identification

I can directly and indirectly identify a lot of people in my Address Book as described above. This is perfectly acceptable for people who have an open life, like I do, and a large portion of the Open Source community, bloggers, standard setters, etc... But on last count I had over 700 people in my AddressBook. It is a lot of work to identify all of them individuall, and to decide how much visibility I should give them. I may not even want people to know how many people I know this way. Also I may want deniability: there are people one may know, but one may not want to highlight that, and one may want to be able to deny that one knows them to some people. The foaf:sha1sum gives me a way to identify someone, but if some nozy person comes to me and asks me about that person's life after having identified the corresponding email address, there is no escape route other than refusing any conversation, which by itself can easily be taken to be significant. What we need is a way to fuzily identify a group of one's aquaintance.

Bloom Filters

This is what Bloom Filters enable one to do. Originally used in times when memory was expensive, they allowed the whole vocabulary of a language to be condensed into a reasonably short string. Here we can use it to group all the email addresses of our friends together in one opaque string. I could express as follows in RDF (bear in mind that the rdf vocabulary has not been settled on):
:me foaf:bloomMbox [ a bloom:Bloom;
                     bloom:base64String """"
                     bloom:hashNum 4;
                     bloom:length 1000 ] .

Given the above Bloom someone can query it with an email address using the inverse algorithm and the Bloom will answer either that I may know that person, or that it can't tell. The loaf project explains some of the advantages of having this in more detail.

The best way to get a feel for how it works is to try it. Here I have written a little java applet [3] that allows you to test my Bloom for people I know, and to create your own bloom [4].

Your browser is completely ignoring the <APPLET> tag! Go to to download the latest.

Some emails you can try with positive results are tbray attextuality dot C O M, or bill at dehora dot net (suitably transformed of course). The applet lowercases all email addresses when creating and when testing the bloom.

To create your own bloom just click the "Create Bloom" tab. An easy way to extract all your email addresses from an OSX Address Book is to run the following on the command line:

hjs@bblfish:0$ osascript -e 'tell application "Address Book" to get the value of every email of every person' | perl -pe 's/,+ /\\n/g' | sort | uniq | pbcopy

You should now be able to paste the list of all your contacts in the applet. To restrict the Addresses to on of your groups named "foaf" for example replace the relevant section above with tell application "Address Book" to get the value of every email of every person in foaf.

You will need to choose the number of hashes and the maximal size of the bucket you wish to fill. The greater the number of hashes and the greater the size of the bucket, the more precision you get and the less deniability.[5]


None of the above tools are by themselves the complete solution for creating an Open Social Network that will satisfy everyone. But for people willing to live in the open, the correct and astute use of them should satisfy most of people's requirements. Access Control on URLs can make it possible to reveal more or less information depending on who is looking; indirect identification can allow one to name people even without direct identification; sha1sums allows one to partially hide sensitve identifying information; and Blooms allow one to make fuzzy statements of set membership. All of these can be combined in different ways. So one can make statements about sha1sum identified people on the open web, or one can do so behind an access controlled file that only friends logged in with OpenId can see. There are bound to be more fun things to be discovered here. But this should make clear just how much can be done in this space.


  1. For the link from email addresses to sha1sums to work, it helps to canonicalise the emails to all lowercase. This should probably be made more explict in the foaf:mbox_sha1sum definition.
  2. "They can 'only' learn about social network claims", is quite a lot more than some people are willing to accept. See the article by Mark Wahl "Organizing principles for identity systems: Attacks on anonymized social networks and fudging oracles" which contains some very good pointers. For people who want to retain complete anonymity, and this is what people subscribe to when they answer public surveys, any leakage of information is too much leakage. The problem is that because of Metcalf's Law it is nearly impossible to stop information combining itself: Information wants to be linked. So I think, when we are not tied to stringent laws, we should accept this rather than fight it, and use it to our advantage when hunting down spammers: the law holds for them too.
  3. You can get the source code for the applet on the so(m)mer repository in the misc/Bloom subdirectory. I used the pt.tumba.spell.BloomFilter class which I adapted a little for my needs. This was just the first one I found out there. It is probably not the most efficient one, as it uses an array of booleans, when it could use an byte array. If you know of other libraries please let me know.
    The code was put together really quickly and may well contain bugs. Feedback and patches and contributions are welcome.
  4. the advantage of Java Applets over server side code is really obvious here:
    • I don't need a server with a fixed port number to show you this
    • someone can't easily start a denial of service attack to bring the server down
    • You email addresses never leave your computer, so there is no fear of loss of privacy.
    On the last point it would be nice if browser vendors made it easier to get info about the exact restrictions a Java Applet had. I would like to be able to click on an Applet and verify or set it to "no network communication whatever". This would increase trust even more in cases like this.
  5. More info on the load site. Apparently one needs more than 1/4 deniability if one is to preserve some measure of privacy, according to the paper "the price of privacy and the limits of LP decoding" by Cynthia Dwork, Frank McSherry and Kunal Talwar (Microsoft Research) who suggests that
    ... any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.
    Thanks again to Mark Wahl for these references.
  6. Thanks a lot to Dan Brickley for working together with me on this last Friday, and pointing me to many of the important work done here. Dan also wrote a little python script to do something similar. Some of the sites I came across during our discussion: Not having studied bloom filters in detail, I am not sure how compatible the blooms of each of these libraries are. The super simple ruby bloom library does not seem to specify the number of hashes that were used to create a Bloom.
  7. Nick Lothian reminded me in a comment to this that he has written a Bloom Filter demo for facebook. I don't have a facebook account (because I am already on LinkedIn, and I can't really be bothered to move all my information, and because I don't like closed networks), so I was not able to use it. Perhaps I should get a facebook account just for this... Let me know.

Thursday Jul 12, 2007

java on the iPhone

According to Ed Burnette' misleadingly entitled post "Apple sneaks Java support onto the iPhone", a java virtual machine named Jazelle runs natively on the CPU that the iPhone is made from, and this feature is enabled on the processor. Apparently it is very small and very efficient, blatantly contradicting Steve Jobs' comments:

Jobs: “Java’s not worth building in. Nobody uses Java anymore. It’s this big heavyweight ball and chain.”
Java is available on every cell phone except his pretty much, on nearly every computer shipped, on robots, and credit cards... Presumably because nobody uses it. And now we find he would not even have to build it into the iPhone, as it is already written for that cpu - well perhaps Apple would have to do some work on the graphics libraries.
Perhaps it's not surprising that he would think this, given that he is surrounded by ObjectiveC programmers. On the other hand I have heard an interesting argument that this may be a way to entice various providers to start creating video streams in h.264 format...
Myself, I won't see the point of having such a phone if I can't have a good version of Java on it that is usable. I can wait.

Monday Jul 02, 2007

refactoring xml

Refactoring is defined as "Improving a computer program by reorganising its internal structure without altering its external behaviour". This is incredibly useful in OO programming, and is what has led to the growth of IDEs such as Netbeans, IntelliJ and Eclipse, and is behind very powerful software development movements such as Agile and Xtreeme programming. It is what helps every OO programmer get over the insidious writers block. Don't worry too much about the model or field names now, it will be easy to refactor those later!

If maintaining behavior is what defines refactoring of OO programs - change the code, but maintain the behavior - what would the equivalent be for XML? If XML is considered a syntax for declarative languages, then refactoring XML would be changing the XML whilst maintaining its meaning. So this brings us right to the question of meaning. Meaning in a procedural language is easy to define. It is closely related to behavior, and behavior is what programming languages do their best to specify very precisely. Java pushes that very far, creating very complex and detailed tests for every aspect of the language. Nothing can be called Java if it does not pass the JCP, if it does not act the way specified.
So again what is meaning of an XML document? XML does not define behavior. It does not even define an abstract semantics, how the symbols refer to the world. XML is purely specified at the syntactic level: how can one combine strings to form valid XML documents, or valid subsets of XML documents. If there is no general mapping of XML to one thing, then there is nothing that can be maintained to retain its meaning. There is nothing in general that can be said to be preserved by transformation one XML document into another.
So it is not really possible to define the meaning of an XML document in the abstract. One has to look at subsets of it, such at the Atom syndication format. These subset are given more or less formal semantics. The atom syndication format is given an english readable one for example. Other XML formats in the wild may have none at all, other than what an english reader will be able to deduce by looking at it. Now it is not always necessary to formally describe the semantics of a language for it to gain one. Natural languages for example do not have formal semantics, they evolved one. The problem with artificial languages that don't have a formal semantics is that in order to reconstruct it one has to look at how they are used, and so one has to make very subtle distinction between appropriate and inappropriate uses. This inevitably ends up being time consuming and controversial. Nothing that is going to make it easy to build automatic refactoring tools.

This is where Frameworks such as RDF come in very handy. The semantics of RDF, are very well defined using model theory. This defines clearly what every element of an RDF document means, what it refers to. To refactor RDF is then simply any change that preserves the meaning of the document. If two RDF names refer to the same resource, then one can replace one name with the other, the meaning will remain the same, or at least the facts described by the one will be the same as the one described by the other, which may be exactly what the person doing the refactoring wishes to preserve.

In conclusion: to refactor a document is to change it at the syntactic level whilst preserving its meaning. One cannot refactor XML in general, and in particular instances it will be much easier to build refactoring tools for documents with clear semantics. XML documents that have clear RDF interpretations will be very very easy to refactor mechanically. So if you are ever asking yourself what XML format you want to use: think how useful it is to be able to refactor your Java programs. And consider that by using a format with clear semantics you will be able to make use of similar tools for your data.

Thursday Jun 28, 2007

Jazoon: Web 3.0

Well over 100 people attended my Web 3.0 talk at the Jazoon conference in Zurich today, which covered the same topics as my JavaOne BOF (slides here). You can count them here in the picture which I took at the end of the talk.

The Jazoon conference is 1/17th of the size of JavaOne, so the attendance numbers were tremendous. As a comparison I had 250 people attended the JavaOne BOF. Had the same percentage of the JavaOne conference attended my talk, I would have had an audience of 110\*17=1770!

I had just about time to cover the slides in the 40 minutes allocated to me, so it was really great to have a follow on question and answer session which at least a third of the people remained for. Dean Allemang (blog) shared the space with me on the Q&A, and was able to bring his vast experience to bear. The attendees were thus able to get a quick overview of TopBraid Composer which Dean presented quickly in response to a question on tooling. Questions on security popped up, which allowed me to speak a little more about RDF graphs and quads, essential pieces of the Semantic Web story.

Tuesday Jun 26, 2007


Roy Fielding gave his very well attended keynote presentation today (Tuesday 26) at Jazoon, the new Java developers conference taking place for the first time in Zurich this week. Coming here just to hear Roy talk was worth the whole trip in itself.

This is the first year of Jazoon, and yet the venue was able to attract over 800 developers (I am not sure of the exact number), which bodes well for its future. So to have close to 10% of the attendees (photo) come to Dean Allemang's talk "Semantic Mashups using RDF, RSS and microformats" was a very good surprise. Dean, who is working for TopQuadrant producers of the Eclipse based TopBraid Composer, is not just a very good presenter, but also a very knowledgeable Semantic Web evangelist. He gave Harold Carr (blog) and others a demo (photo) of TopQuadrant, that started up outside the conference room, moved down into the bar at the entrance (photo), and as it kept being interrupted by great side tracks into Philosophy, Jungian psychology (Jung of course worked in Zurich), Semantic Web company adoption, Literature, Mathematics, Religion, sexual politics, and so much more, that the demo only came to a tentative conclusion around 1am in a bar in the center of Zurich discussing the relations between REST and RDF and how this differed from SOAP. (For Dean's impressions of Jazoon, see his "Swiss Java" blog post.)

My talk, "Web 3.0: This is the Semantic Web" will be taking place on Thursday at 11am. I will be going into more technical details, looking at the foundations of the Semantic Web step by step. As a surprise I may even be able to get a slot for Dean to present his TopBraid composer, which is not just a Ontology editor, but also a complete mashup environment.

Time for me to go to sleep!

Friday Jun 01, 2007

Semantic Wonderland

Among the most impressive demos at JavaOne was the open sourced Project Wonderland[1] which James Gosling presented during his Toy show. It is a virtual world that grew out of project Looking Glass, the 2.5D Java Desktop that was unveiled a couple of years ago. The desktop has now been integrated into a full 3D world (or should it be 4D? space+time) where one can move around, meet people, work together on projects, etc...

It was not too difficult to get it to work on OSX (even though Apple is lagging with a 8 months old beta release of Java 6, grrrr!), by following the instructions on the main page, and reading the thread "Building Wonderland on MacOS" [2].

Once I got it started I noticed this billboard entitled "Knowledge Driven Hyperlinks: A Semantic Web Application". Really intriguing!

Apparently one gets the best out of wonderland by running it on Linux, as one can then interact with real X applications. The one that was most tested with OSX is Ubuntu Edgy, under BootCamp. A new version of Parallels has just come out though, that has OpenGL and DirectX graphics acceleration for Windows, so it may soon be possible to run Wonderland in Parallels using Ubuntu, and so get all the features, before making the leap to a full Linux OS again.

I am going to try one of these options out. This is going to be real fun! :-)


Friday May 11, 2007

Semantic Web Birds of a Feather at JavaOne 2007

Nova Spivack, Lew Tucker, and Tim Boudreau joined me today in a panel discussion on the Semantic Web at Java One. Given that it was at 8pm and that we were competing with a huge party downstairs with free drinks, with robots fighting each other, with live bands, and numerous other private pub parties, the turnout of over 250 participants was quite extraordinary [1]. There was a huge amount of material to cover, and we managed to save 13 minutes at the end for questions. The line of questioners was very long and I think most were answered to the satisfaction of the questioners. It was really great having Nova and Lew over. They brought a lot of experience to the discussion, which I hope gave everyone a feel for the richness of what is going on in this area.

Since many people asked for the presentation it is available here.

[1] It was quite difficult to tell from the stage how many people were in the room, but a good one third of the 1200 room was full . 580 people had registered for the talk.

Tuesday May 08, 2007

Dropping some Doap into NetBeans

Yesterday evening I gave a short 10 minute presentation on the Semantic Web in front of a crowd of 1000 NetBeans developers during James Gosling's closing presentation at NetBeans Day in San Francisco.

In interaction with Tim Boudreau we managed to give a super condensed introduction to the Semantic Web, something that is only possible because its foundations are so crystal clear - which was the underlying theme of the talk. It's just URIs and REST and relations to build clickable data. (see the pdf of the presentation)

All of this lead to a really simple demonstration of an integration with NetBeans that Tim Boudreau was key in helping me put together. Tim wrote the skeleton of a simple NetBeans plugin (stored in the contrib/doap section of the NetBeans CVS repository), and I used Sesame 2 beta 3 to extract the data from the so(m)mer doap file that got dropped onto NetBeans. As a result NetBeans asked us were we wanted to download the project, and after selecting a directory on my file system, it proceeded to check out the code. On completion it asked us if we wanted to install the modules in our NetBeans project. Now that is simple. Drag a DOAP url onto NetBeans: project checked out!

This Thursday we will be giving a much more detailed overview of the Semantic Web in the BOF-6746 - Web 3.0: This is the Semantic Web, taking place at 8pm at Moscone on Thursday. Hope to see you there!

Wednesday Mar 21, 2007

James Gosling on Web N

James Gosling had a couple of slides on Web N during his presentation on the Java Platform. Is it "a piece of Jargon" as Tim Berner's Lee is quoted as saying? Well James seems to agree in part with that assessment. It is a lot of hype for what seems to be a very simple thing: just different User Interfaces on ways of storing data on servers. The one consistent similarity of these services, he points out in the next slide, is the way they build communities, using the input of millions to create services that no single organization could have provided.

But in that respect, how does that differ from projects such as Linux, which I was using as my desktop OS in the 90ies? That was a huge piece of engineering developed on the internet, using the web and other tools, in a communal fashion. How does that differ from services such as imdb, the largest online database of films, which I was happily using ten years ago, whose whole content was updated by its users? Is it that the improvements in the web interface are making it easier and easier for people to contribute content? Partly so. If adding photos to a flickr account forced one to fetch a new page for every change, it would be a lot less appealing. But how much then does bandwidth improvements have to do with this? Services such as flickr would have been unbearable in the early web. Certainly YouTube would have gotten nowhere, not even taking into account the difficulty of editing videos on 400Mhz machines. So is Web 2.0 a technical thing, or is it something else?

I'll agree that Web 2.0 is a social phenomenon, in more ways than one. It is a meme that also has a psychological dimension. People who thought that by 2000 they had understood all about the web, the .com aspect, never quite grokking the huge open source wave, those people then declared the Web bubble burst. As more and more amazing things continued happening after the .com bust, they need a way to change their tune without feeling that they had gotten something wrong. Hence Web 2.0. The web just keeps evolving. It's always more than you thought it could be.

Another thought is that if we can trace Web 2.0 all the way back to Open Source programming, then my feeling is that this is where one should look to sow the seeds for Web 3.0. The Open Source community is full of small little Island projects. True they can all exchange code between each other, but the interaction between the groups could be a lot better, just as the interaction between Web 2.0 sites could be. If one could make the interactions between these communities a lot more fluid, then one will certainly be able to unleash a whole new wave of energy. This is why I am so enthusiastic about Baetle, the bug ontology we are developing, which should be an important element in helping open source project work together.

The next generation of the Web is not going to be obvious: how could it be? If it were obvious it would, technical issues aside, already be here. The people most apt to be able to move those technical issues aside, are of course going to be developers themselves. As they see the benefits, these will be distilled into something useful and easy to understand for everyone else.

Sun Tech Days, Paris

The second of the Sun Tech Days conference in Paris was more directly focused on Java. Under the Grande Arche de la Defense, Bruno Hourdel prided himself (photo) of the even better turnout for this even than the previous week's London one. Europe is an eternal competition between nations. Good for them. I think of myself as English, French and Austrian, even a little American, though my passport is British. So I don't take offense either way.

After Eric Mahe's opening welcome (photo), James Gosling came on stage to give an overview of the directions of the Java Platform. More on this in the next blog. This was followed by some really quick demos of which:

  • A revisited Pet Store example(photo) with an AJAX and dynamic html make over. I am starting to get the feeling that these things are getting stable enough that it may be worth learning, without having to spend one's life on it.
  • Romain Guy gave a quick overview of his extreeme Java Makeover (photo), which he showed a year ago. I wonder if it is open source yet? Romain is now an intern at Google. Hope he comes back to Sun soon.
  • A presentation of the small Java DB (photo)
  • And the lovely Pet Flower Store presentation by Doris Chen (photo)

I was busy taking pictures of the session so I did not quite capture every one's name. Please help me fill that out. Thanks.

I was not so good at attending the afternoon sessions. Instead I spend some time meeting people, such as the Solaris group, catching up on some questions I had. I'll be working harder on filling in my gaps at JavaOne. But now I really need to prepare my BOF. Sorry I won't be able to attend today's meetings.




« June 2016