Extending Oracle Fusion SaaS with Natural Language Processing (NLP) capabilities

May 26, 2023 | 7 minute read
Rekha Mathew
Consulting Solutions Architect | A-Team
Text Size 100%:

Introduction

Integration of artificial intelligence (AI) into business applications offers significant advantages for businesses that are looking to stay ahead in today's competitive landscape.

Natural language processing (NLP), is a field of AI that focuses on enabling machines to understand, interpret, and generate human language.Extending Oracle Fusion SaaS applications with AI can offer more intelligent, personalized, and efficient solutions, enabling users to achieve their goals faster and with greater ease.

This blog will show how we can incorporate NLP into Fusion SaaS applications.

NLP Techniques

NLP has a wide range of capabilities that can be applied to various real-world problems, and new techniques and applications continue to emerge as the field progresses.

Some of the most common techniques are,

  • Sentiment analysis: An NLP technique used to determine the emotional tone or attitude of a piece of text. It can identify whether the sentiment is positive, negative, or neutral.
  • Summarization: An NLP technique used to create a shorter version of a larger text. This can be helpful for readers who are short on time or need to quickly understand the main points of a lengthy document.
  • Keyword extraction: An NLP technique used to identify the most important words or phrases in a piece of text.
  • Tokenization: The process of breaking text down into individual tokens, which can be words, subwords, or characters. Tokenization is a fundamental step in NLP and is used in many tasks, such as text classification, language modeling, and machine translation.
  • Named Entity Recognition: An NLP  task of identifying and categorizing key information (entities) in text. 
  • Topic Modeling: It is a technique in NLP that extracts important topics from the given text or document.
  • Stemming and Lemmatization: Stemming in NLP reduce the words to their root form. It works on the principle that certain kinds of words which are having slightly different spellings but have the same meaning can be placed in the same token. In Lemmatization, words are converted into lemma which is the dictionary form of the word.

Some of these techniques can be used in business applications to,

  • Analyze a text input for specific length, wording, phrases, grammatical structure, and usage of biased/discriminatory words. In addition to readability, these factors impact the reader's attention, accessibility, and comprehension.
  • Perform sentiment analysis on customer feedback.
  • Generate summary text from document attachments.
  • Extract insights from reviews, documents, etc.

Let's now see how we can integrate NLP capabilities in Fusion SaaS. 

Architecture

This reference architecture describes how you can extend Fusion SaaS with an NLP model using OCI Native services and Oracle Visual Builder. 
 

Architecture Diagram

The following diagram illustrates the reference architecture.

Architecture Diagram

Components

This architecture has the following components.

What do these components do?

spaCy

spaCy comes with a set of pre-trained models that can be used to analyze text data.spaCy's pre-trained models are designed to perform various tasks such as named entity recognition, part-of-speech tagging, and text classification. These models are trained on large datasets of text data, allowing them to identify patterns and relationships in the text more accurately than traditional rule-based approaches.

When you call the spaCy API called nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps of the processing pipeline. The pipeline used by the trained pipelines typically includes a tagger, a lemmatizer, a parser, and an entity recognizer. Each pipeline component returns the processed Doc, which is then passed on to the next component. 

The following diagram illustrates a spaCy Processing Pipeline.

spaCy pipeline
image credit: spaCy.io

 

spaCy also has various matchers like PhraseMatcher, Matcher and DependencyMatcher using which you can match sequences of tokens, based on pattern rules or match subtrees within a dependency parse.

OCI Object Storage bucket

During a text analysis you will be looking for patterns, lemmas etc in the input text. If you want to use a file to store these patterns,base words etc you can use an Object Storage bucket to store that file.

OCI Function

 It is used to,

  • Pre-process text to remove HTML tags or any other unwanted information from the input.
  • Get the pattern files from the Object Storage bucket.
  • Pass the preprocessed text to the spaCy pipeline for NLP analysis. The spaCy pipeline returns the spaCy Doc object. This is then used for analysis. You can do several analysis using  – Token Matcher, Phrase Matcher, Dependency matcher, NER, etc. 
  • Return the analysis results.

spaCy’s trained pipelines can be installed as Python packages.  

To use spaCy in OCI Functions, create a Docker file and include the line below.

RUN python3 -m spacy download en_core_web_sm

This downloads the en_core_web_sm spaCy pre-trained model in the docker.

A sample of the Docker file  is given below

FROM oraclelinux:7-slim

WORKDIR /function

RUN groupadd --gid 1000 fn && adduser --uid 1000 --gid fn fn

RUN yum -y install python3 oracle-release-el7 && \

rm -rf /var/cache/yum

ADD . /function/

RUN pip3 install --upgrade pip

RUN pip3 install --no-cache --no-cache-dir -r requirements.txt

RUN rm -fr /function/.pip_cache ~/.cache/pip requirements.txt func.yaml Dockerfile README.md

ENV PYTHONPATH=/python

RUN python3 -m spacy download en_core_web_sm

ENTRYPOINT ["/usr/local/bin/fdk", "/function/func.py", "handler"]

 

In func. yaml, point the OCI Function runtime to docker.

schema_version: 20180708
name: analyze-jd
version: 0.0.172
runtime: docker
build_image: fnproject/python:3.8-dev
run_image: fnproject/python:3.8
entrypoint: /python/bin/fdk /function/func.py handler
memory: 2048

 

The OCI Function code summary is below

import io
import logging
import pandas as pd
import numpy as np
import oci.object_storage
from fdk import response
import os
import spacy
from spacy.matcher import Matcher
from spacy.matcher import PhraseMatcher
import re
import json

......

# Get the Function Configuration variables
def get_env_var(var_name: str):
  
   value = os.getenv(var_name)
   if value is None:
      raise ValueError("ERROR: Missing configuration key {var_name}")
   return value

#Function Initialization
try:
   logging.getLogger().info("inside function initialization")

   signer = oci.auth.signers.get_resource_principals_signer()
   object_storage_client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)

   env_vars = {
      "NAMESPACE_NAME": None,
      "JD_STORAGE_BUCKET": None,
   ......
   }

   for var in env_vars:
      env_vars[var] = get_env_var(var)

   NAMESPACE_NAME = env_vars["NAMESPACE_NAME"]
   INPUT_STORAGE_BUCKET = env_vars["JD_STORAGE_BUCKET"]
  ...
   #Load spacy model
   nlp = spacy.load("en_core_web_sm")

except Exception as e:
   logging.getLogger().error(e)
   raise

 

# Function Handler
def handler(ctx, data: io.BytesIO = None):
   # Get the payload
   payload_bytes = data.getvalue()
   if payload_bytes == b'':
      raise KeyError('No keys in payload')
   payload = payload_bytes.decode()
   # preprocess the payload by removing html tags
   .....

   # pass the preprocessed payload to spacy pipeline
   nlp_doc = nlp(preprocessed_payload)

# perform your analysis and get the results
   ........
   analysis_results = ......
   except Exception as handler_error:
   logging.getLogger().error(handler_error)
   return response.Response(
      ctx,
      status_code=500,
      response_data="Processing failed due to " + str(handler_error)
   )
   return response.Response(ctx, response_data=analysis_results)

API Gateway

API Gateway exposes the OCI Function as a REST endpoint.

 

API Gateawy

 

 

Oracle Visual Builder

Use Oracle Visual Builder(VB) to design your user interface. You can embed the Visual Builder application within a Fusion SaaS application page.

  • Use the Visual Builder page designer to design your pages by adding components from the Components palette to the canvas. The page can contain text area to input your text for analysis or if the text is already present in Fusion SaaS, use appropriate REST APIs to obtain it.
  • Create a VB Service Connection to connect to the REST endpoint exposed by the API Gateway and wire your VB Action Chain to it.

 

VB Action Chain

  • Use Visual Builder components to appropriately display the results of the NLP analysis. You can do things like colour coding the specific portions of your input text to highlight the patterns detected,  show a summary of the analysis, etc. 

 

Conclusion

This blog showed a reference architecture for text analysis using spaCy and OCI native services. I hope you will be able to use the architecture described in this blog to incorporate NLP techniques into your business applications. 

Rekha Mathew

Consulting Solutions Architect | A-Team


Previous Post

Configuring SQLDeveloper to Centrally Authenticate Oracle Database Users with Azure Active Directory

Ty Stahl | 5 min read

Next Post


How-Tos : Oracle Fusion Applications Expense Bot

Abhay Kumar | 6 min read