We’re excited to announce the general availability of Oracle Cloud Infrastructure (OCI) Language, which allows customers to uncover insights in unstructured text using natural language processing (NLP). Developers can integrate pretrained NLP capabilities into applications, without needing data scientists to create customized models. By automating text analysis at scale, customers can gain insights for improving customer experience and increasing efficiency.
You can access the service either through the OCI console, OCI SDKs in Python, Java, Go, Typescript, Dotnet, or Powershell, or the OCI-CLI.
This service is the first of a family of upcoming OCI AI Services, which will make it easy for businesses to add prebuilt AI capabilities to applications for a variety of industry use cases. To learn more about the AI Services, watch the keynote from Developer Live: AI and ML for Your Enterprise.
Let’s look at the key capabilities and use cases for OCI Language.
OCI Language is visible in the main menu of the OCI Console
Text data, such as social media posts, news, and surveys, provide valuable business and customer insights. However, it’s often too time-consuming for humans to analyze large amounts of textual data, so companies turn to NLP to gain insights effectively and at scale.
To use these NLP capabilities, they rely on data scientists to build and train custom machine learning models, then deploy these models into applications. This process is often time-consuming and expensive.
OCI Language reduces this time and effort by providing key language processing capabilities as production-ready REST APIs. These capabilities include:
Let’s dive into more detail about these key capabilities.
Within OCI Language, the Language Detection feature indicates the language of a given text. This feature is currently able to identify 100+ languages. Language-specific user models/services can use our Language Detection service as a vital preliminary step. Other OCI services can use Language Detection to help identify any language-related barriers before making deep inference of the provided blurb of text.
Named Entity Recognition (NER) enables you to identify and categorize entities in the text as people, places, organizations, date and time, quantities, percentages, or currencies. This feature is currently able to identify 19 entity types and 5 personally identifiable information (PII) entities. NER automatically scans entire articles to identify significant entities. The PII feature identifies and redacts sensitive entities in the text, such as a phone number, email address, mailing address, or URL.
Text Classification identifies the topic for the given input document/text. It returns a category with the confidence score from a set of the predefined categories. This capability can be easily used to tag your textual data by identifying the topic of a given text, making it easier to classify and retrieve more relevant information.
Text Classification will classify the content of a collection of documents to determine common themes. For example, you can classify a collection of news articles to News and Media/Business News/Company News; therefore, it will determine the theme and will show the results in primary/secondary/tertiary form with in-depth insights of the documents.
Key Phrase Extraction identifies essential phrases in unstructured input text. It helps customers gain a deep understanding of customer key intent and attributes in a glance. This service is built on cutting-edge technological advancements in the deep learning space, and it helps define context-based intent/aspect of a given text. Key phrases help summarize the content of a text and reveal the main topics. This feature makes it easy to extract the most relevant words from text and find insights related to the text's overall main points.
The Aspect-Based Sentiment Analysis feature extracts the critical components of text and provides the associated sentiment – either positive, negative, or neutral.
With this aspect-based sentiment analysis, businesses can become customer-centric. Aspect-Based Sentiment Analysis is vital in understanding feedback in reviews, surveys, and social media posts. It would help companies listen to their customers’ needs, analyze their feedback, and learn more about customer experiences and their expectations for a product or service.
Here’s an example showing what all of the features would return for this input text:
A booster shot of Moderna's Covid-19 vaccine revs up the immume response against two worrying coronavirus variants, and a booster dose formulated specifically to match the B.1.351 variant first seen in South Africa was even more effective, Moderna says"
Language Detection indicates the language as English, with about a 0.99 confidence score.
Text Classification identifies the main category as "health and medical/conditions and disease."
Named Entity Recognition finds five entities: "Moderna's Covid-19" as a product, "two" as a cardinal (meaning a cardinal number), "B.1.351" as a product, "South Africa" as a GPE (geopolitical entity), and "Moderna" as an organization.
Key Phrase Extraction identifies "booster shot of Moderna's Covid-19 vaccine", "immune response", "coronavirus variants", "booster dose", "B.1.351 variant", "South Africa", and "Moderna" as important phrases.
Finally, Aspect-Based Sentiment Analysis extracts "booster shot", "vaccine", and "booster dose" as aspects, all with a positive sentiment.
OCI Language will help businesses improve their customer experience while reducing the time and effort to analyze text data.
The service has multiple use cases across lines of business, including:
We will publish additional blogs to deep dive in some of these individual areas in coming weeks. Stay tuned …