Extracting insights from unstructured data using AI services

November 19, 2021 | 4 minute read
Luis Cabrera-Cordon
Senior Director of Product Management
Text Size 100%:

Data is gold! But only if you can properly analyze it to make it useful.

About 80% of the worlds’ data is in an unstructured format: Video files, PDFs, spreadsheets, images, and so on. Many of these unstructured documents contain plain text written in prose. Even databases have many semi-structured tables because they contain textual fields, such as descriptions or customer support requests. Dealing with large quantities of text data is a complex task, but this information is valuable!

What if we could apply state-of-the-art AI models to unstructured data to extract insights from it? Oracle Cloud Infrastructure (OCI) now offers several AI services that allow you to harness the power of artificial intelligence. They offer pretrained models ready for you to use and easy-to-train custom models that any developer can tweak. AI Services are geared toward developers, and you don’t need to be an expert data scientist to consume them.

A graphic depicting a customer feedback analytics flow.

In this blog post, we describe a pattern where we use OCI Language, one of our AI services, as part of an enrichment pipeline that allows you to ingest unstructured textual data and extract insights from it to analyze and visualize the extracted insights.

Using AI to simplify your work

Let’s take the common task of dealing with customer feedback, a problem common to many industry verticals. Imagine that you’re working in the hospitality business. You manage several hotels, and you have received thousands of reviews for your properties. A treasure of information! But only if we can convert those reviews into actionable insights!

A graphic depicting an example review.

OCI Language can perform sophisticated text analysis at scale. It offers pretrained models, so you don’t need to be a data scientist to harness the power of state-of-the-art models, such as sentiment analysis and named entity extraction.

OCI Data Integration is a service that allows you to define custom data transformation flows. For our hotel reviews problem, you can create a data flow to read your unstructured data, call OCI Language to extract insights from the text, and then project the extracted insights into structured tables in a database, as shown in the following data flow.

A screenshot of an example data flow for extracting insights from text in customer reviews.

When Data Integration runs this flow, the output of the AI services ends up in structured tables that you can use for analytics purposes. For example, you can have a table with a set of records for each aspect found in a sentence and their respective sentiments, as shown for the following example.

A graphic depicting a review with highlighted words that AI would pick up.

record_id aspect sentiment
1230234 Rooms Positive
1230234 Bed Positive
1230234 Location Negative

Transforming the thousands of unstructured reviews into structured formats, such as the aspects table, opens the door to use the data for scenarios, such as data analytics, train machine learning models, and search. In this specific scenario, we can load the data into Oracle Analytics Cloud to visualize the insights and explore the information in a way that allows you to identify actionable tasks.

A graphic depicting a bar graph and word cloud as results for analyzed data.

A screenshot of analyzed reviews with the word "breakfast" highlighted in them.

The pattern exemplified by this customer feedback analytics scenario —transforming information using AI capabilities—is core to dealing with unstructured content. As shown in this example, Oracle Cloud Infrastructure provides you the tools that you need to perform advanced analytics at scale. With AI services, you don’t even have to be data scientist to seize the benefits of artificial intelligence!

Related resources

For more information, see the following resources:

Luis Cabrera-Cordon

Senior Director of Product Management

Previous Post

Connect from Oracle Cloud VMware Solution virtual machine to Oracle Autonomous Database on OCI: Part 2

Barbara Rabinowicz | 7 min read

Next Post

Ensuring business continuity in architectures with high file transaction volumes

Ed Shnekendorf | 8 min read