excel and rdf

Scott McNealy, rarely had much nice to say about spreadsheet software, when it was not web enabled. And indeed there are huge numbers of problems with them. Off the top of my head, some of these are:

  • Hidden formula that nobody looks at and that get tweaked without alerting people
  • Data that is never synchronized, with parts of it that is out of date
  • Data that cannot be merged
  • Some products even had virus problems...

And yet they are immensely popular, especially with the people who never see the problems that they lead to.

As it happens these are problems within the scope of the semantic web. Every spread sheet is like a mini SQL database. As long as you query the information inside of one database owned by one administrator all is fine. But what when you want to merge information from different databases? Ouch! That's really tough, because there is usually no clear understanding of which pieces should fit together. Do the columns in each database mean the same thing? Well if you have just a few big databases you can link them tediously together, but what if you have thousands of such databases? And each person wielding it is a complete novice to this problem? What if someone just renames a column in one spread sheet? What does that mean?

The topic of spreadsheets and the semantic web came to be one of the highlights of the conferences I went to in May. Dean Allemang in his talk at JavaOne ( the sound track enhanced slides are now online! ), used this problem in one of his examples. Eric Miller, talked about a solution that involved using the momentum behind spreadsheets to help build ontologies (I think, it's a while back now). This is not all new of course. In a reply to this post Mike Bergman pointed to his year old article entitled "RDF123 Makes Generating Flexible RDF a Snap".

But often a demo helps a lot, and the one that made me see the light was given by Lee Feigenbaum of Cambridge Semantics just before the end of the Semantic Tech Conference. Lee, who had been working on semantic web tools at IBM before going to start his own company, gave me a quick summary of the benefits of his SHAPE middleware. Essentially by adding URLs into the spreadsheet you can tie their meaning down a lot more carefully. By writing a plugin for Microsoft Excel ( they had a prototype working for openoffice before deciding to focus on M$ tools) that works together with the middleware, users can keep on behaving as they are used to, whilst helping link all the information together. Instead of working against each other, people in a company can build a web of information together. Here is a highlight from Lee's talk entitled Getting to Web Semantics for Spreadsheets in the U.S. Government:

  • Tight integration into Excel allows semantic  concepts to be dragged and dropped from the  semantic repository onto data tables
  • The data table's implicit row/column relations are  explicitly stored in an RDF semantic database
  • Cells, columns, and regions are tagged with explicit  semantics
  • Publish the data tables on the Web
Intriguing for sure.

Spreadsheets may yet be back again, but for the good.

PS. Please send me further links on this so I can flesh out this story better.


13 September 2008:


Hi Henry,

I have been hearing about Lee's Excel app for some time and am anxious to see it myself. Maybe this will prompt an update or an access link! :)

I have not seen any recent updates, but about a year ago (http://www.mkbergman.com/?p=394) I covered RDF123 from Tim Finin's group at UMBC Ebiquity. It, too, was trying to move toward spreadsheets as RDF input frameworks. My write-up has some links to still earlier initiatives.

Maybe Tim can provide an update as well?!

Thanks, Mike

Posted by Mike Bergman on August 29, 2008 at 04:44 PM CEST #

There is a very interesting blog post with video showing how SDS allows one to integrate two or more spread sheets, and even spread sheets and databases


I have not tried it, as I don't work a lot with spread sheets. But those who do will find this alluring.

Posted by Henry Story on September 02, 2008 at 04:32 PM CEST #

Hi Mike,

There is some new update from rdf123. look at this website: http://logos.cs.umbc.edu:8080/termpredict/wordswithtype.html

It will try to find a most standard and consistent schema for a set of English words/phrases (concepts)



Posted by Lushan on September 11, 2008 at 01:49 PM CEST #

I am not sure exactly how that relates to spread sheets. Can you develop your thought a little?

Posted by Henry Story on September 11, 2008 at 01:58 PM CEST #

First, people need create a semantic graph to describe the relations of the columns in a spreadsheet. However, people are allowed to use English words for the name of classes an properties occuring in the semantic graph. The above web service will try to map the set of names to the most standard and consistent rdf schema. What is important is that if the sets of names could reflect the same domain/context informatoin, they should be mapped to the same rdf schema in spite of different ways people may give names to their concepts.

Posted by Lushan on September 11, 2008 at 04:02 PM CEST #

In my consulting life, I quite often come across Heath-Robinson arrangements of spreadsheets that have been cobbled together to run a company or perform some critical operational function (but often the guy who wrote it left the company...ergh).

From an enterprise architecture perspective, Excel is really an end-user tool and should not be used inside mission critical systems, but quite OK at the periphery. So it is not a good idea to make it easier for people to create yet more complex systems depending on Excel, which is a bit like trying to change a light bulb whilst standing on a revolving chair - death or serioues injury is likley to follow!

Indeed, the problem is a second generation version of the old automation conundrum - don't just automate the old manual process, rethink and redesign. Same thing applies, you need to restucture, extract the business rules, move the data to a proper database, tie down the interfaces, and fire up a properly designed architecture.

Notwithstanding all that, I got really frustrated that I could not directly query SPARQl in Excel, and as far as I could see nobody had written the bits to do it. So I pulled on my coding hat, and got down to writing an OLE DB provider for SPARQL.

You can find the result at www.sixhills-software.com/SPARQLProv/ if you want to download and have a play!

Posted by Andy Gueritz on August 07, 2009 at 02:08 PM CEST #

Post a Comment:
Comments are closed for this entry.



« July 2016