Introduction
Oracle Cloud Infrastructure (OCI) Document Understanding is an AI-powered service designed to help developers classify documents, and extract text, tables, and critical data from them. It enables you to automate time-consuming business processes with prebuilt AI models while also offering the flexibility to customize document extraction for customer-specific requirements.
Oracle Analytics Cloud (OAC) provides integration with Document Understanding, empowering customers to use both pretrained and custom models from the OCI Document Understanding service within OAC.
This blog covers how to create and train a custom key value extraction model in OCI Document Understanding and includes the following sections:
- What Custom Models are Supported in OCI?
- Custom Document Classification
- Custom Key Value Extraction
- What Prerequisites are Required?
- Set up OCI Policies
- Upload Training Documents in OCI Object Storage
- Create a Custom Key Value Extraction Model in OCI
- Create a Dataset in OCI
- Build and Train a Custom Model
What Custom Models are Supported in OCI?
Document Understanding enables users to create custom models designed to meet industry-specific needs for Document Classification and Key Value Extraction. A custom model must be created in OCI before you can register and access it in OAC.
Custom Document Classification
Develop a custom model to classify documents based on customer-defined categories that are not included in the pre-trained classification. For example, a bank may categorize mortgage application documents into Credit Reports, Lease Agreements, and Application Forms. The model will output a confidence score for each classification.
Custom Key Value Extraction
Create a model to locate customer-defined fields within client-specific documents. For example, a healthcare provider could design a custom key value model for claim processing, using submitted insurance cards with fields like Member Name, Member ID, and Insurance Provider. The model will output both a confidence score and the coordinates of the bounding box for each identified field. Below is a sample insurance card for reference:
What Prerequisites are Required?
The following are the prerequisites for creating a custom Document Understanding model in OCI. But before you begin, you must have a paid OCI tenancy.
Set Up OCI Policies
Ensure you have the required security policies before you create a custom Document Understanding model in OCI.
Your OCI user must have read, write, and delete permissions on the compartment containing the OCI resources you want to use. See Document Understanding Policies.
Upload Training Documents in OCI Object Storage
Store the documents for training the model in OCI Object Storage buckets. In this blog, a collection of 20 insurance card images from different members are used for training the custom Document Understanding model.
- Create a bucket to store your documents. In OCI Console, navigate to Storage, select Object Storage & Archive Storage, and then Buckets.
- Once the bucket is created, click the bucket name, and then under Objects, click Upload to upload your documents, including images and PDF files. See Limits for Document Understanding.

- Verify that the bucket doesn’t contain extraneous files for processing. Oracle Analytics processes every file in the bucket.
Create a Custom Key Value Extraction Model in OCI
Before you create the custom model, create a dataset to train the documents and instruct the model. In this blog, you learn how to create and train a custom key value extraction model to capture details including Member name, Member ID, Provider, RxBIN, RxPCN and RxGRP from the specific insurance card template.

All the documents for the training must use the same template, which means the extracted text and Member ID, for example, must be in the same location in all the documents.
Create a Dataset in OCI
You can develop a custom model tailored to your specific needs by creating a dataset and using Document Understanding to train a model based on that dataset. Custom key value extraction requires a set of labeled documents that define the fields you wish to extract in the trained model, such as Member ID and Member Name.
- On the OCI Home page, select Analytics & AI, and then Data Labeling.
- Under Data Labeling, click Datasets.
- Click Create dataset.
- On Add dataset details page, in Name, enter the dataset name.
- Under Dataset format, select Documents.
- Under Annotation class, select Key Value.
- Click Next.
- On the Add files and labels page, select Select from Object Storage.
- Under Object Storage Location, select the compartment and bucket where you stored the training documents.
- Scroll down and under Add labels, in Label set enter the label set you want to extract. For example, member_id, member_name, and provider.
- Click Next.
- On the Review page, confirm the details and click Create.
- After dataset is created, a list of documents included in this dataset is displayed under Data records.
- Continue by clicking a data record name.
The text values are displayed in rectangular boxes on the document, and the label set you created earlier is also displayed on the right side.
- Keep going by clicking the highlighted text in the image and tag it to the corresponding label on the right.
- Repeat the previous step for all the labels.
- Once all the labels are correctly tagged, click Save & next.
- Tag the labels in all the training images.
Build and Train a Custom Model
Now that all the training documents are labelled, you can create the model.
- From the OCI main menu, click Analytics & AI, and then Document Understanding.
2. Click Projects on the left and then Create project.
3. Click the project name you created, and then click Create Model.
- From Type, select Key value extraction.
- Under Training data, select Choose existing dataset.
- Under Data Source, select Data Labeling Service.
- From choose a dataset in OAC deploy, select the dataset you created in the previous step.
- Click Next.
- Under Model display name, enter the model’s name.
- Select a training duration.
- Click Next.
- Review the details and click Create and train.
Model training may take a while depending on the complexity of the detection problem, the typical number of labels in a document, the resolution, and other factors. Upon successful completion of the training, you have a ready-to-use custom key value extraction model in Document Understanding service.
Call to Action
In this blog, you learned how to create and train a custom Document Understanding model in OCI to extract customer-defined values and recognize them as key value pairs.
As a next step, you can register and invoke this model in OAC. See the Register and Invoke Custom Document Understanding Models in OAC blog post.
