Thursday Apr 17, 2008

KiWi: Knowledge in a Wiki

KiWi logo

Last month I attended the European Union KiWi project startup meeting in Salzburg, to which Sun Microsystems Prague is contributing some key use cases.

KiWi is a project to build an Open Source Semantic Wiki. It is based on the IkeWiki [don't follow this link if you have Safari 3.1] Java wiki, which uses the Jena Semantic Web frameworks, the Dojo toolkit for the Web 2.0 functionality, and any one of the Databases Jena can connect to, such as PostgreSQL. KiWi is in many ways similar to Freebase in its hefty use of JavaScript, and its emphasis on structured data. But instead of being a closed source platform, KiWi is open source, and builds upon the Semantic Web standards. In my opinion it currently overuses JavaScript features, to the extent that all clicks lead to dynamic page rewrites that do not change the URL of the browser page. This I feel unRESTful, and the permalink link in the socialise toolbar to the right does not completely remove my qualms. Hopefully this can be fixed in this project. It would be great also if KIWI could participate fully in the Linked Data movement.

The meeting was very well organized by Sebastian Schaffert and his team. It was 4 long days of meetings that made sure that everyone was on the same page, understood the rules of the EU game, and most of all got to know each other. (see kiwiknows tagged pictures on flickr ). Many thanks also to Peter Reiser for moving and shaking the various Sun decision makers to sign the appropriate papers, and dedicate the resources for us to be part of this project.

You can follow the evolution of the project on the Planet Kiwi page.

Anyway, here is a video that shows the resourceful kiwi mascot in action:

Friday Mar 09, 2007

Metaweb: a semantic wiki startup

O'Reilly groks the Semantic Web in the latest article "Freebase will prove addictive". From his article:

But hopefully, this narrative will give you a sense of what Metaweb is reaching for: a wikipedia like system for building the semantic web. But unlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomy approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions.

Now that's a very partial simplification. The Semantic Web has always been designed to be grown, though there has been a lot of misunderstanding on this issue as I reported in UFO's seen growing on the web.

The idea of using semantic wikis to grow ontologies is an excellent idea. Seed with a few tags, nourish with plain text, add a little structure with simple ontologies; water; repeat with a littel more complexity at each iteration. With love and attention and a few lullabies the Semantic Web will be born (see Search, tagging and wikis).

A little further he says:

Metaweb still has a long way to go, but it seems to me that they are pointing the way to a fascinating new chapter in the evolution of Web 2.0.

Soon O'Reilly is going to use the word Web 3.0, just you wait and see!

See also:

Monday Feb 26, 2007

High Tech Vienna

Museum of Modern Art, Vienna

Last week I traveled to Vienna for a few days meeting with Andreas Blumauer, Alois Reitbauer, and Max Wegmüller (Sun) to work on Semantic Web/tagging related ideas. These were a few very intense days and evenings with discussions going late into the night at the Heurigen.

We looked at Semantic Wikis - especially the java based ikewiki and the famous mediawiki (see comparison) for ideas on how one could link search, tagging and wikis. Ikewiki has some very nice features, including relation completion which is somewhat akin to method completion in modern java IDEs. If Ikewiki knows the type of the resource it is on, it will be able to use ajax calls to list a number of possible relations. Ikewiki is probably not stable enough for immediate deployment though as we had trouble after Andreas entered a contradictory statement into it [1]. I will have to play with this more to get a better understanding of where things are going in this area. Please send me any suggestions of other cool semantic wikis I should look at.

Another thing I am going to have to look into more detail now is the question of scalability of Semantic Web tools and the size of applications that have been deployed. People don't yet have a good feeling as to the size of projects being developed currently, and it worries them. There are large databases out there now that can do billions of triples such as Allegrograph, BigOWLIM, or Oracle's 10g database... Projects such as neuroweb as described by "Semantic Web Meets e-Neuroscience: An RDF Use Case" are using the Semantic Web in big ways. I don't like to talk about things I don't know about, so I should probably find the biggest projects, interview some of the people there and blog about it.

After three days of hard work I had a short visit around Vienna. The picture I had of Vienna was one of an old town full of beautiful old monuments, and so I was nicely surprised to find something completely different. On Friday evening I walked into a Cafe near the Semantic Web School, on Lerchenfelder Gürtel, and found an excellent funk band called Groove Coalition playing (pix of bar, pic of band). The next day I walked around Vienna and ended up in the museum district, where I visited the museum of modern art (pictured above). As I had to be at the airport at 5am I decided to stay overnight in the Cafe Leopold which stayed open until 4am.

On the whole it was a very enjoyable stay, and I wish I could have stayed longer. A full set of pictures is available on my flickr account under the tag vienna.

[1] Andreas made a page be both a foaf:Document and a foaf:Person which are defined as disjoint sets: foaf:Person owl:disjointWith foaf:Document. In fact this must be a little tricky thing to do correctly in a semantic wiki, as the URL for the page has to be a Document and not the thing one wishes to describe. To do this right each page should therefore have a #about anchor which would be the thing the page is about, such as Java or the Black Box. If a page needs an anchor, it may as well have a number of them too, so that one could describe a number of concepts on the same page, which may well sometimes be handy...

Tuesday Feb 06, 2007

Search, Tagging and Wikis

Problem Statement

Large companies such as Sun Microsystems have a number of knowledge management needs. Plain text search engines can help find documents by searching through keywords in document content. The problem is that
  1. This can return a huge number of documents
  2. The keywords the user is thinking of may not be the right ones
  3. The things searched for may not be text documents – they could be pictures, people or things

Search and its limitations

The first problem can be solved using PageRank algorithms such as those used by Google on the Internet, which looks at the topology of the graph made from the links between pages to categorize pages into those that are considered important by the web community as a whole. The second problem can be solved by automatic category extraction tools such as those developed by Exalead which analyze the frequency of occurrence of words in documents to create a graph of the similarity space of concepts. This can be presented to the user who can then find similar concepts to those he is looking for, and so narrow down his search. (I developed a Topic Graph Java Applet at AltaVista to do exactly this.)

On the intranet the PageRank algorithms are not going to be as successful as the user community is much smaller, and the need for people to link to each other's documents less big: those who work on the same topic will tend to know each other well. The value of the link will therefore be less evident.
Concept extraction tools should be much more successful on the intranet, as there will be a lot of high quality content available. But even though they will help narrow down a query to more relevant search terms, they will not help find those documents that are the most authoritative. And so the results returned could just as well be out of date documents, only vaguely related documents, or other irrelelvant ones. The ones everyone in the know is reading will not pop up to the top automatically.

Finally concept extraction tools or keyword searches are not helpful for finding information about documents or things that are not textual in nature.

To find what a company deems the most important one needs the behavioral input of its members. And so the question is really: how does one generate this information.

From Bookmarking to Tagging to Wikis

One thing one can assume is that useful documents or things will be those people keep wanting to return to. Bookmarking is the way to find one's way back to documents in a stable information space - a space where things have reliable links (aka. permalinks) - especially so when search itself is not reliable. Bookmarking is the only solution available to a successful information gatherer apart from search.
As bookmarks lists grow they become unwieldy and so they need to be categorized or tagged, so that the user can find his way around his own private information space. Developing one's own private categorization/tagging scheme is itself complex, time consuming and unreliable. Working with a larger community helps this process dramatically, and so is the first incentive towards helping a lone information gatherer participate in a larger communal structure. Bookmarking tools such as or slynkr just help the information gatherer do what he needs to do anyway.

Once bookmarking services are available and people are busy tagging the resources bookmarked it is possible to find resources that other people have found to be similar (with respect to a tag) to the one one is looking at. This is the beginning of the development of a conceptual scheme. This only requires a space to make these concepts more explicit and to reinforce their use. As tagging commes to be a communal activity the space for defining the concepts develops communally too. It may start as just a helpful suggestion for other tags to use, but it can easily develop into something more informational.
An enhanced wiki would be such an informational space. It would help define the publicly agreed meaning of each tag as its users work on filling out the wiki pages associated with it. This meaning would itself be agreed to communally, that is in a distributed fashion. Writing down the meaning more carefully would help disambiguate the tags and also serve as a repository of knowledge about the concept in question. An empty tag-wiki page would just link to all the documents that had been tagged that way. A more worked upon tag-wiki page would contain information about the meaning of the concept in question, the history of it, the people leading the changes, the place to find technical documentations, other related concepts, and much more. A fully semantic tag/wiki page would express some important elements of that content in terms of machine readable semantic relations.

An Ontology For Tagging

During a week in Zürich we came up with the following elements needed to describe a Tag.
  1. the tag
  2. the event of tagging
  3. the thing tagged
  4. the person or agent doing the tagging

This gives the really simple UML diagram:

It is important to distinguish the tag from the tagging event itself, as otherwise one cannot count the number of times a tag was applied to a resource, which is one element in calculating the value of a resource. The other element in keeping track of the value of the resource is to find out who did the tagging, as the value one gives to the Tagger flows to the tagging event.

Finally it is important in a Tag to keep track of the schema of the tag. Tags evolve within a social context - be it one provided by flickr,, stumbleupon or slynkr - and this social evolution gives them particular meaning. The label “bank” will end up being associated with a very different tag if part of a tag cloud at a large banking institution or if part of a tag cloud in an environmental agency. Using URL scheme (as opposed to URN schemes as Tim Bray proposed recently) is clearly be a big advantage as it can help locate the relevant context.

As it turns out the ontology proposed above is pretty much isomorphic with what Richard Newman came up with a few years ago in Tag ontology, and we should certainly try to work what is happening at the Tag Commons[1].

Richard Newman points out that a Tag has many of the properties of a skos concept. Things that are tags could therefore also be skos:Concepts, giving us a handy and simple vocabulary for relating tags.

With the above frameworks it should be possible to import tags from any of the tag engines such as and keep the meanings of the labels used in those tag engines separated enough to be able to go back to the context of the tagging, yet close enough together to be able to do searches across tag engines on the labels, as I showed in Folksonomies Ontologies and Atom[2]. This is really important for intranet tagging engines, as people do not want to duplicate the tagging work they do at home when at work.


Bookmarking is a necessary activity for information gatherers to keep track of content of interest to them. Tagging one's bookmarks is important tool to help find them again. The energy required in keeping this information can be gathered to help build enterprise encyclopedias in a distributed fashion: these are also known as wikis. These wikis can provide deep content an pointers of everything of interest to members of a group. Using Semantic Web technologies, it should be possible to build this in an open way.

Further thoughts

  • How should the tag be related to the wiki page? Should the tag be given the url of a wiki page? Ie. should we have something like this:
    <> :tagedWith [ :tag <>, 
                                     :by [ foaf:mbox <> ];
                                     :date "2007-02-03..."\^\^xsd:dateTime ].
    <> a :WikiPage, skos:Concept, :Tag;
                                              :scheme <>.
    <> a :WikiPage, skos:Concept, :Tag;
                                     :scheme <>.
    Or should there be a rdfs:seeAlso relation from the tag to the wiki page?
  • If a tagging is an association of a tag with an object, would it be useful to give the user some more granularity as to what the type of the relation is? Or are we going beyond tagging here? It would also mean that one could have to limit the number of tags in a tagging event to one.
  • Any other problems?
[1]Thanks Danny for rectifying my initial error. I thought Tom Gruber had been responsible for the rdf ontology. That was in fact Richard Newman. Tom Gruber has done a lot of work in helping people see the relation between ontologies and folksonomies and owns the Tag Commons site.
[2]Richard Newman also notes a very interesting parallel between a Tag and an RSS1.0 item. This is not surprising. The Evolution described earlier from bookmarking has been taken once before. Blogging stands for Bookmark Logging, and the various formats of RSS and Atom are just formats for describing bookmarks. These evolved over time to a system for describing not just information about a resource, but also be itself the new information resource. So we should not be surprised to find some very close commonalities between the syndication data models and what is needed for describing a Tag.
I found a similar parallel for Atom. An Atom Entry is very reminiscent of a tagging. An Entry is an event of changing something to a resource at the updated time, which is initiated by an author, to which one can associate a category which is isomorphic with a Tag. The structure of an Atom Entry is a little heavier though as it forces one to give the event a URI (the id) and some content.

Tuesday Jan 30, 2007

Semantic Web School

Andreas Blumauer, Managing Director for the Semantic Web School, came to Sun in Zürich to give a one day course as part of a four day conference whose main topics was helping relate knowledge and people. We had people fly in from the USA, the UK (Dave Levy and Chris Gerhard), France (me: photo), Austria and Germany. (more pictures).

The very impressive SkillMap java applet opened up the conference laying the ground work for thinking about the relations between people and skills. This was followed by Andreas' one day course which flew the team up to a 50000 foot height, where we all jumped out till we could see the details of the landscape, including rdf, Dublic Core, foaf, doap, skos, sioc, and much more.

The landing was softened by a few demonstrations. I gave a quick presentation on Universal Drag and Drop using an early version of the Beatnik Address Book to demonstrate the simplicity of the concept (another Micro Killer App?). D2RQ as usual opened everyone's eyes, as I gave a demo of SPARQLing Roller. (After the conference, people kept referring to it as R2D2 though...)

Organized by Peter Reiser, sponsored by Dan Berg and Hal Stern, the conference was according to everyone present a huge success. There were quite a number of "aha!" moments and everyone went through at least one. I myself finally got a full overview of the problems large organizations like Sun need to solve in the knowledge management area. We cleared away some major Semantic Web fears, the most important being that it would require a complete retooling of the enterprise. This is perhaps again why D2RQ (and its competitors) are so important. But most of all the idea that by giving everyone in the enterprise a foaf name, one could tag them like any other resource, relate them to other people, documents, processes ... and create an open and flexible space for linking knowledge together was like discovering a new horizon.

Of course no conference goes without good restaurants (and in Switzerland this means fondue) and drinking into the evening. To top it all I went to an amazing restaurant called Clowns & Kalorien which if you ever happen to be in Zürich I highly recommend. All these calories had to be shed somehow, so as luck would have it we had some fresh snow, and I went to Engelberg, a beautiful resort close by (images).




« July 2016