Oracle Text Search; How Many is "Too Many?"

Don't you hate it when you have a question, and the best answer you can get is; "It depends?"  It's one thing if you're a child and you're asking your mom, "What's the best cookie?"  But if you're a DBA and you want to know, "How many pieces of content can I safely check into WebCenter Content Server before I take a significant performance hit," you're not going to be happy with the "It depends" answer that you'll likely get from Support.  Here are a few questions that we often get from customers, and some meaningful answers below them:

  1. Is Oracle Text the best method (performance wise) to search and index 600,000 + PDF reports?
  2. Is it advisable to store the files on a file system and index them or to store them within the database as BLOBS?
  3. New documents get added each day, how does the index update when new documents are added? Does this index ever need to be dropped and rebuilt?

600,000 is a relatively small number for WCC to handle. There are customers with more than 1 billion documents stored. The best strategy will depend on how much your storage will grow overtime: if these 600k documents will become 900k next year, than you can have one strategy; if they will become 6 million next year, you should go with a different strategy.  For a high number of documents (say, a few millions), we would probably advise you to disable your PDF conversion. For a very high number (several millions), you might want to think about disabling full-text indexing as well. Of course, you need to keep in mind that, in theory, there’s no limit for how many documents you can store in WCC. If you have the right architecture, you can support any amount of documents. These are just a few guidelines.

  1. Yes, we would recommend OracleText to index and search through the 600k documents. It solves, for instance, case sensitivity issues. But the recommendation is to use at least the database version, or, if possible, the 12c. It seems there are some improvements in OracleText in these versions. You don’t need SecureFiles for this. SecureFiles is the evolution of BLOBs, so they are only used to store and handle files.
  2. This is typically more of a ‘business’ decision than a 'technical' one. If you will store the documents in the DB, your DBA team must be on-board and be willing to monitor and support this.  This is because DBAs see storing files in the DB as their worst enemies, as they can bring performance down; and it’s the DBA’s fault if that happens.  So you must get your DBAs to commit to it.  Again, storing 600k documents in Oracle Database should not be a problem, if you have the right architecture in place. There are advantages and disadvantages to each strategy.   If you decide to store documents in the database, you should use the Database Enterprise Edition, with RAC, Partitioning, Deduplication and Advanced Compression, at least. These will pay for themselves with the benefits in performance and storage usage they will provide. Other options might be interesting, such as Database Vault and the security options. In the database, documents are stored as SecureFiles; blobs are no longer used. SecureFiles is the evolution of blobs.
  3. The Content Server has an incremental index, which will automatically index new additions. Index rebuild is a very rare process, and if everything goes fine, your customer will probably never have to do it.

Post a Comment:
Comments are closed for this entry.

This is the official blog of the Proactive Support Team for WebCenter Content Products: Content Management, Imaging, Records, Site Studio, DIS, Outside In, and more. Find information about our activities, publications, product related information and more.


« May 2016