« Choosing which documents to index with a document service | Main | Indexing MediaWiki with SES »

Office 2007 files in SES 10.1.8.4

Following on from my previous article, here's another handy use for a document service.

SES 10.1.8.4 cannot currently handle Office 2007 files formats (.docx, .pptx and .xlsx), but these can be handled by the filters in the latest versions of Oracle Text. The next release of SES will also use these new filters, and thus have full support for Office 2007. Meanwhile, if we should happen to have an installation of Oracle 11.1.0.7 database on the same machine as SES, we can use the 11.1.0.7 filters to do the filtering within SES.

We can do this because document services have access the original pre-filtered binary documents. So even though the built-in SES filters will have failed/refused to index the documents, we can pick them up in the document service, filter them using the external filter executable from the 11g installation, and feed the resulting HTML stream back to SES for indexing.

The document service to do this can be found here: OfficexFilter.zip. Unzip it and check the readme.txt file for installation instructions.

Please note that although Oracle 11.1.0.6 is downloadable from Oracle.com, you need the 11.1.0.7 patchset to get the new filters, which is available from Metalink. Also I'm not trying to guess what the licensing implications are here, you would need to discuss that with your Oracle Sales Representative, or someone else who deals with that sort of thing (which isn't me!).

TrackBack

TrackBack URL for this entry:
http://blogs.oracle.com/mt/mt-tb.cgi/15356

About This Entry

This page contains a single entry from the blog posted on November 20, 2009 8:33 PM.

The previous post in this blog was Choosing which documents to index with a document service.

The next post in this blog is Indexing MediaWiki with SES.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle