Automate with documents using AI

August 29, 2023 | 4 minute read
Wes Prichard
Senior Principal Product Manager
Text Size 100%:

Documents are the currency of business. We work on slide presentations and text documents and create wiki pages and work tickets of various types. Our organizations process invoices, receipts, and sales orders. Documents aren’t as paper-based as they used to be, but they still drive many business processes. Some industries, like healthcare and government, are still heavily paper-driven. I know I get weary of filling out a clipboard full of paper every time I visit a doctor.

Since Oracle Cloud Infrastructure (OCI) Document Understanding was released in November 2022, we’ve added a number of features that helps it support a wide variety of use cases. Several of the document AI features in Document Understanding were originally in OCI Vision, but after January 2024, those features are only available in OCI Document Understanding.

What is OCI Document Understanding?

If you’re not familiar with the service, OCI Document Understanding has five main features.

  • It does text extraction using optical character recognition (OCR).

  • It can extract tables from documents preserving their content and structure.

  • It can recognize specific types of documents such as receipts, invoices, passports, driver’s licenses, resumes, tax forms, bank statements, checks, and pay slips.

  • It can extract relevant information from receipts, invoices, passports, and driver’s licenses. We call this feature key-value extraction because it finds specific values in the document like the passport holder’s name and birthdate and assigns them to labels (keys).

  •  It provides user the ability to create custom document classification and key-value extraction models for their business-specific use cases.

New features for OCI Document Understanding

Document Understanding already supported multiple file types for text extraction and OCR, such as JPEG, PDF, PNG, and TIFF. These formats are common for scanned documents. What’s new is that we now have support for new document formats, such as PowerPoint, Word, Excel, and HTML in limited availability. This expansion opens use cases that rely on common business documents, such as loyalty program management and claims processing, where you can receive attachments in many formats.

OCI has regions in 23 countries, making language support in Document Understanding more important. OCR support for Spanish and Portuguese is already in limited availability, and we expect to make many more languages available soon.

Document Understanding was released with an asynchronous API, which is useful for analysis of batches and large document files. However, it requires staging documents in and obtaining results from object storage. For low latency use cases like search and when working with sensitive data like protected health information (PHI), having a synchronous API not dependent on object storage is desirable. Such a synchronous API is now generally available. Be aware that it has a page count limit of five to maintain low latency.

Custom models, which were announced in June, enable you to create key-value extraction and document classification models that are tailored for your documents. Custom models build on the pretrained models through transfer learning. You supply a labeled data set to Document Understanding to train a new custom model with no data science experience required. Since releasing custom models, we’ve added new features, such as full multipage support.

For document-driven business processes, it’s natural to want to couple process automation technology with document AI technology. So, Oracle is integrating Document Understanding as a native resource in Oracle Process Automation to help organizations analyze documents and drive the process based on the types and content of documents. You can then use extracted text as metadata in the process.

We’ve seen Document AI uses cases in every industry and in many business processes common to all industries, including the following examples:

  • Invoice and expense processing using invoice and receipt key-value extraction

  • Validation of personal identification documents in employment verification, customer identification, shipping, and healthcare using driver’s license and passport key-value extraction

  • Forms processing for applications, such as travel and immigration applications and membership programs using text extraction and key-value extraction

  • Extracting content from banking transaction documents, such as account numbers, bank codes, and line-item details using key-value and table extraction

  • Expense analysis by extracting line item details from complex bills, such as cloud bills and utility bills

  • Fraud case processing by extracting content from case attachments using text extraction

  • In healthcare, extracting specific fields from lab reports using custom key-value extraction

Dive in to learn more

Join us at Oracle CloudWorld, September 18–21 in Las Vegas. Register today and check out the following hands-on labs and sessions for OCI Document Understanding and OCI AI services.

For more information on the Oracle Cloud Infrastructure Document Understanding service, see the following resources:

Wes Prichard

Senior Principal Product Manager

Previous Post

Deploy Llama 2 in OCI Data Science

Tzvi Keisar | 7 min read

Next Post

Efficient feature management for machine learning: an introduction to Feature Stores

Srikanta | 16 min read