Introduction
Oracle Cloud Infrastructure (OCI) Document Understanding is an AI-driven service aimed at assisting developers in classifying documents and extracting text, tables, and essential data. It enables the automation of labor-intensive business processes using prebuilt AI models and provides the flexibility to tailor document extraction to meet specific customer needs with custom models.
Oracle Analytics Cloud (OAC) integrates with Document Understanding, allowing customers to utilize both pretrained and custom models from the OCI Document Understanding service within OAC.
In this article, you’ll learn about custom Document Understanding models and how to register and invoke a custom key value extraction model from OAC. It includes the following sections:
- What are Custom Models?
- Custom Document Classification
- Custom Key Value Extraction
- What Prerequisites are Required?
- Create an OCI Connection
- Set up OCI Policies
- Create a Staging Bucket
- Create a Custom Model in OCI
- Prepare Input Dataset
- Register a Custom Key Value Extraction Model
- Invoke a Custom Key Value Extraction Model
To learn about pretrained Document Understanding models, see the Register and Invoke Pretrained Document Understanding Models in OAC blog.
What are Custom Models?
Document Understanding empowers users to develop custom models tailored to industry-specific requirements for document classification and key value extraction. A custom model must be created in OCI before you can register and access it in OAC.
Custom Document Classification
Create a custom model to classify documents according to customer-defined types which aren’t available in the pretrained classification. For instance, a bank needs to categorize the required documents for a mortgage application into credit reports, lease agreements, and application forms. The output provides a confidence score for each classification.
Custom Key Value Extraction
Develop a model to identify the positions of customer-defined fields within customer-specific documents. For instance, a healthcare client designs a custom key value model for their claim processing based on submitted insurance cards including fields such as Member Name, Member ID, and Insurance Provider. The output provides a confidence score along with the coordinates of the bounding box for each identified field. Here’s a sample insurance card for reference.

What Prerequisites are Required?
The following are the prerequisites for registering and invoking a custom Document Understanding model in OAC.
Create an OCI Connection
Your OCI connection stores relevant credentials and connection information used by OAC to access various OCI services such as Functions, Vision, Language, and Document Understanding. See Create a Connection to Your OCI Tenancy.
Set Up OCI Policies
Ensure you have the required security policies before you create a custom Document Understanding model in OCI.
The OCI user that you specify in the connection between OAC and your OCI tenancy must have read, write, and delete permissions on the compartment containing the OCI resources you want to use. See Policies Required to Integrate OCI Document Understanding with Oracle Analytics.
Create a Staging Bucket
In OCI Object Storage, create a bucket in a compartment using a suitable name. This staging bucket must be created in the accessible compartment before you register the model in OAC. It can have private visibility and be used for multiple models.
Create a Custom Model in OCI
Custom models in Oracle Analytics must be set up in the OCI Console.
First, create a suitable dataset for training the model with the OCI Data Labeling service, and then proceed to build your custom model. See the Create a Custom Document Understanding Model in OCI blog post.
Prepare Input Dataset
Store the documents that you want to analyze in OCI Object Storage buckets. Next, create a dataset in Oracle Analytics to access these documents. In this blog, a collection of 100 insurance card images from different members are used as input for the custom Document Understanding model.
- On OCI Console, navigate to Storage, and then Object Storage & Archive Storage.
- Click Buckets and create a bucket to store your documents.
- Once the bucket is created, click the bucket name.
- Under Objects, click Upload, and upload your documents including images and PDF files. See Limits for Document Understanding.

- Verify that the bucket doesn’t contain extraneous files for processing. Oracle Analytics processes every file in the bucket.
- Prepare an input dataset by creating a CSV file that includes an ID, Bucket Name, and Bucket URL where your documents are stored.
- Upload this CSV as a dataset to your OAC instance, which serves as the input for the dataflow to invoke the Document Understanding service.

Register a Custom Key Value Extraction Model
- On your OAC instance Home page, click Page Menu (ellipsis).
- Select Register Model/Function, and then OCI Document Understanding Models.

- Select the connection you created earlier.

- Select a compartment accessible to the OCI user.

A list of pretrained and custom Document Understanding models displays. Custom models are contained in projects.

- Select the Insurance_Cards_Template3 model.
- Enter a model name and description.
- To the right of Staging Bucket Compartment, click Select, and choose the staging bucket compartment.
- In Staging Bucket, select the staging bucket.
- Click Register.

- Verify that the model is successfully registered. On the Home page, click Navigator (in the left corner), and then Machine Learning.
The model is displayed on the Machine Learning Models page.

Invoke a Custom Key Value Extraction Model
Create a data flow and select the dataset (created earlier) with Bucket URL of the input receipt document files.
- On the OAC Home page, click Create, and then Data Flow.
- On the Add Data dialog, double-click the dataset you created earlier to select it.

- Click Add a step (+) and select Apply AI Model.

- From the list of AI models, select the Insurance_Cards_Template3 model you registered earlier.

- In Parameters, for Input Column, select Bucket, and for Input Type, select Buckets.
Alternatively, if you have a smaller number of documents to process, you can create the input dataset with a list of individual document URLs. In that case, select the Document URL column as the Input Column and Documents as the Input Type.

- Click Add a step (+) and then select Save Data.
- Provide the output dataset details and then click Run Data Flow.
Upon successful completion, the output dataset is available with the relevant columns such as Member Name, Member ID, and Provider. You can aggregate and visualize the output to create insightful dashboards.

Call to Action
In this article, you learned how to register and invoke a custom Document Understanding key value extraction model from OAC to extract customer-defined values and recognize them as key value pairs.
Now that you’ve read this post, try it yourself and let us know your results in the Oracle Analytics Community, where you can also ask questions and post ideas.
