excel and rdf
By bblfish on Aug 29, 2008
Scott McNealy, rarely had much nice to say about spreadsheet software, when it was not web enabled. And indeed there are huge numbers of problems with them. Off the top of my head, some of these are:
- Hidden formula that nobody looks at and that get tweaked without alerting people
- Data that is never synchronized, with parts of it that is out of date
- Data that cannot be merged
- Some products even had virus problems...
And yet they are immensely popular, especially with the people who never see the problems that they lead to.
As it happens these are problems within the scope of the semantic web. Every spread sheet is like a mini SQL database. As long as you query the information inside of one database owned by one administrator all is fine. But what when you want to merge information from different databases? Ouch! That's really tough, because there is usually no clear understanding of which pieces should fit together. Do the columns in each database mean the same thing? Well if you have just a few big databases you can link them tediously together, but what if you have thousands of such databases? And each person wielding it is a complete novice to this problem? What if someone just renames a column in one spread sheet? What does that mean?
The topic of spreadsheets and the semantic web came to be one of the highlights of the conferences I went to in May. Dean Allemang in his talk at JavaOne ( the sound track enhanced slides are now online! ), used this problem in one of his examples. Eric Miller, talked about a solution that involved using the momentum behind spreadsheets to help build ontologies (I think, it's a while back now). This is not all new of course. In a reply to this post Mike Bergman pointed to his year old article entitled "RDF123 Makes Generating Flexible RDF a Snap".
But often a demo helps a lot, and the one that made me see the light was given by Lee Feigenbaum of Cambridge Semantics just before the end of the Semantic Tech Conference. Lee, who had been working on semantic web tools at IBM before going to start his own company, gave me a quick summary of the benefits of his SHAPE middleware. Essentially by adding URLs into the spreadsheet you can tie their meaning down a lot more carefully. By writing a plugin for Microsoft Excel ( they had a prototype working for openoffice before deciding to focus on M$ tools) that works together with the middleware, users can keep on behaving as they are used to, whilst helping link all the information together. Instead of working against each other, people in a company can build a web of information together. Here is a highlight from Lee's talk entitled Getting to Web Semantics for Spreadsheets in the U.S. Government:
- Tight integration into Excel allows semantic concepts to be dragged and dropped from the semantic repository onto data tables
- The data table's implicit row/column relations are explicitly stored in an RDF semantic database
- Cells, columns, and regions are tagged with explicit semantics
- Publish the data tables on the Web
Spreadsheets may yet be back again, but for the good.
PS. Please send me further links on this so I can flesh out this story better.
13 September 2008:
- Sean Martin, President and CTO of Cambridge Semantics talked to Paul Miller in a long and instructive podcast. The discussion takes a bit of time to get going, and so for those who would like to zoom straight into the matters discussed here you should skip to about the 34th minute of the interview.
- By the way this is different but complimentary to the idea of linking office documents to the web. Here is a video of how one can use Open Office as a wiki, where one can edit a spreadsheet and publish it directly to the web.