Aperture to the Semantic Desktop

How can Unix/Linux compete with Apple's Spotlight functionality? How can all the open source desktop applications communicate in an integrated yet completely distributed way? For Open Source software to do this, it needs to work with an architecture that is much better and more flexible than anything that has hitherto been developed, and it needs to do this in an Open Standards way. This is what various research topics on Semantic Desktops are exploring. The opening of Aperture could well be a key event in the unfolding of this vision.

Aperture, a set of Java libraries, has been released on sourceforge as under the Academic Free Licence version 3.0 and the Open Software License version 3.0. It has allready been used, as I understand, in the Aduna Autofocus freely downloadable product. Here is a quote from their documentation:

Aperture is an open source library and framework for crawling and indexing information sources such as file systems, websites and mail boxes. Aperture supports a number of common source types and document formats out-of-the-box and provides easy ways to extend it with custom implementations.

The Aperture code consists of a number of related but independently usable parts:

  • Crawling of information sources: file systems, websites, mail boxes
  • MIME type identification
  • Full-text and metadata extraction of various file formats
  • Expansion of archives (to be done)
  • Indexing and querying of crawled and extracted information (to be done)
  • Opening of crawled resources (to be done)

This is pretty much what Spotlight does currently, except that it only works on the local file system. Just as with Apple's Spotlight, the crawled information can then be queried. But unlike Spotlight, it could be done using the much more powerful W3C standard SPARQL query language. Imagine then that every linux distribution shipped by default with such an end point, that all applications could always rely upon. It would then be dead easy to write specialised file browsers, the equivalent of Apple's iPhoto, that would just query the end point to find all images on the local hard drive - and even remote ones if the the computer were connected to the internet - and present these in a user friendly way. The photos would not need to be in any special location on the hard drive, as with iPhoto. Indeed the tool could move them transparently to the user to an external hard drive, NFS location, or WebDav server. The same with music collections and every other file type we like to work with.

But that is just the beginning. As more and more applications become SPARQL aware it then becomes possible for large corporations like Siemens, Philips, Nasa, Boeing, and others, to have dedicated machines ( a bunch of Thumpers perhaps ) that would index the metadata for all the files, people, and other things on the intranet, and these applications would then work just as easily as if the search engine were local. Since RDF is based on URI/URLs the core architecture of the system does not care at all where the end point is or where the files are. As long as you can name something, you can name it with a URL, and so it can become available and queriable.

And further, one can then imagine specialised search engines appearing on the web, a technorati, wikipedia, imdb or other SPARQL endpoint which these simple desktop applications could then work with, not even aware that they were far away, in a distant land.

The potentials are so vast that the mind boggles. If yours does, remember: start off simple. But start thinking about it now. And while you are at it, check out D2RQ (and especially the pdf presentation).

Comments:

good article, "start off simple. But start thinking about it now. " good XP principle as well!

Posted by Jeryl Cook on June 07, 2007 at 09:10 AM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« July 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
   
       
Today