Make AI Document Understanding output easy to consume within your applications

Oracle has just released a new AI service that allows customers to uncover insights in unstructured documents powered by deep learning models. This service is called OCI Document Understanding, which allows you to extract text, tables and identify document types among other great capabilities.

Although the AI services are very easy to use, the output is complex to analyze and requires some extra processing to get the desired output. Other providers encourage you to use functions or Python code to analyze, process and consume that information. But, what happens if you need more information in the future? How can this process be simplified?

Using Autonomous Database (JSON, ATP or ADW), you can not only store the JSON output of the AI Services. Any application, business user or data scientist can query and use it! In this blog we are going to use a feature called table extraction, which identifies all the tables in a document and extracts the content in tabular format maintaining the row/column relationship.

The table extraction feature comes with a sample PDF which we are going to use. You can see that there is a small table that automatically is recognized and printed.

table-extraction

The response you get is a JSON document that contains a lot of useful information about our PDF, but we want to focus just on the table because it has the information we need. You will soon see that the JSON is quite big and complex. Let’s have a look into the structure to better understand how to get the needed information:

text can be complex

In this case, the PDF is just two pages long, but the table is on the first page. We need the table structure, which already is providing some information like the number of rows and columns. Also you can see the headers and the body rows:

view json results

If we have a look into the body rows, each cell of the table is stored in the JSON with a rowIndex and columnIndex. We will need to use these values to navigate the table:

json indexes used for navigation

Notice the cells are stored consecutively. It can be tricky to work with these documents; normally, you have to start coding with Python or other languages. However, using Autonomous Database, you can store and analyze them directly! Let’s have a look to the power of Autonomous Database. First, let’s create a new collection:

create a new collection

The output of the analysis is normally stored in object storage and there are many ways of consuming object storage data into Autonomous Database – and even automate the ingestion. For this blog, we are just going to copy and paste the JSON to consume it:

create new json document

Now we can switch to SQL for analyzing it. The first thing we are going to do is to have a look into the headers. We are going to use the JSON_TABLE function to show the results in a structured way. We can see that is a nested document:

json_table enables queries

Using the same query, we can change the headerRows with bodyRows to have a look into the content:

view header content

If you inspect the result, you’ll see just one column. However, the result we are looking for is one row with three columns. We need to use a small trick! Use the function JSON_ARRAYAGG to convert that column into a single row. This row will be an array:

create an array

Now that we have the array, we can use the JSON_TABLE function again to obtain the three columns:

Use json table

You can see that we are looking into the rowIndex 1. Now we can create a loop to iterate over all the rows. The first thing we are going to do is create a table to store the results of the query. As it’s just three columns, I will create it manually without the need of the headers:

create table

Finally, let’s run the code that will iterate over the cell rows and them insert into the table:

insert into table

If we run a simple query, we can see all the data is stored in the table:

simple query

You have seen how easy is to use all the power of SQL over native JSON stored in the Autonomous Database.

Learn more about Oracle Autonomous Database
Check out the JSON Developer’s Guide
Try it out! Go to Oracle LiveLabs for a hands-on experience with Oracle Database JSON capabilities

Make AI Document Understanding output easy to consume within your applications

Next:

Essbase Patch Set Update 11.1.2.4.049 is Available

Store Your Autonomous Database ADMIN Password as a Secret in OCI Vault

Make AI Document Understanding output easy to consume within your applications

Next:

Authors

Essbase Patch Set Update 11.1.2.4.049 is Available

Store Your Autonomous Database ADMIN Password as a Secret in OCI Vault