Full-text index contents of email attachments and zip files

email_200.jpg
I recently found out about new indexing functionality that allows UCM to full-text index content that is attached to an email message or the contents of a compressed zip file. This means that a Word document attached to an email message checked in or a zip file containing multiple PDF documents can now all be full-text indexed.

If you are using UCM 11g, this functionality is already built-in and configured. But if you are using UCM 10g, you need an extra patch file. I've made a copy

available here
. This will most likely make it into future update bundle patches.
winzip_200.png

To implement the patch, create the directory <ucm dir>/classes/intradoc/taskmanager/tasks/ and place the patch file there. For handling the emails with attachments, you don't need to do anything beyond that. But for handling zip files, you'll need to change the configuration so that UCM knows to full-text index that file type as well. The easiest way is to add this configuration value in <ucm dir>/config/config.cfg (as a single line)


TextIndexerFilterFormats=pdf,msword,ms-word,doc*,ms-excel,xls*,ms-powerpoint,powerpoint,ppt*,rtf,xml,msg,zip

Make those changes, restart the server, and then rebuild the search collection if you want to catch any existing content items. Otherwise, new items will now get indexed this way.

Comments:

Great post Kyle, right on target as usual!

Posted by Tal on February 11, 2011 at 06:50 AM CST #

Hi,
IS it possible to get the content of OCR documents in Full text search ???

Posted by guest on February 21, 2012 at 02:52 AM CST #

Yes, if the OCR documents is a PDF with the full-text within the document. This is something Document Capture can produce.

Thanks,
-Kyle

Posted by Kyle Hatlestad on February 21, 2012 at 01:26 PM CST #

Hi

I am hoping you can help me. We have UCM 11g, and we have the Lotus Notes functionality to transfer emails across with or without attachments into UCM. One of our users is checking in a mail item from Lotus Notes to UCM and all goes over and we get the UCM number, however, when we view that item in UCM the .eml primary file, all he gets is the body of the email, not the header or the attachment, yet on my pc, and other users see what they should see, the full email with header, and the attachment embedded into it. Any ideas from anyone on what is wrong on thisusers pc....im thinking settings of some kind but his Email Integration Settings for UCM are correct. cheers Sally

Posted by guest on August 02, 2012 at 05:03 AM CDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Kyle Hatlestad is a Solution Architect in the WebCenter Architecture group (A-Team) who works with WebCenter Content and other products in the WebCenter & Fusion Middleware portfolios. The WebCenter A-Team blog can be found at: https://blogs.oracle.com/ ateam_webcenter/

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today