Integration of artificial intelligence (AI) into business applications offers significant advantages for businesses that are looking to stay ahead in today's competitive landscape.
Natural language processing (NLP), is a field of AI that focuses on enabling machines to understand, interpret, and generate human language.Extending Oracle Fusion SaaS applications with AI can offer more intelligent, personalized, and efficient solutions, enabling users to achieve their goals faster and with greater ease.
This blog will show how we can incorporate NLP into Fusion SaaS applications.
NLP has a wide range of capabilities that can be applied to various real-world problems, and new techniques and applications continue to emerge as the field progresses.
Some of the most common techniques are,
Stemming and Lemmatization: Stemming in NLP reduce the words to their root form. It works on the principle that certain kinds of words which are having slightly different spellings but have the same meaning can be placed in the same token. In Lemmatization, words are converted into lemma which is the dictionary form of the word.
Some of these techniques can be used in business applications to,
Let's now see how we can integrate NLP capabilities in Fusion SaaS.
This reference architecture describes how you can extend Fusion SaaS with an NLP model using OCI Native services and Oracle Visual Builder.
The following diagram illustrates the reference architecture.
This architecture has the following components.
spaCy
spaCy comes with a set of pre-trained models that can be used to analyze text data.spaCy's pre-trained models are designed to perform various tasks such as named entity recognition, part-of-speech tagging, and text classification. These models are trained on large datasets of text data, allowing them to identify patterns and relationships in the text more accurately than traditional rule-based approaches.
When you call the spaCy API called nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps of the processing pipeline. The pipeline used by the trained pipelines typically includes a tagger, a lemmatizer, a parser, and an entity recognizer. Each pipeline component returns the processed Doc, which is then passed on to the next component.
The following diagram illustrates a spaCy Processing Pipeline.
spaCy also has various matchers like PhraseMatcher, Matcher and DependencyMatcher using which you can match sequences of tokens, based on pattern rules or match subtrees within a dependency parse.
OCI Object Storage bucket
During a text analysis you will be looking for patterns, lemmas etc in the input text. If you want to use a file to store these patterns,base words etc you can use an Object Storage bucket to store that file.
OCI Function
It is used to,
spaCy’s trained pipelines can be installed as Python packages.
To use spaCy in OCI Functions, create a Docker file and include the line below.
RUN python3 -m spacy download en_core_web_sm
This downloads the en_core_web_sm spaCy pre-trained model in the docker.
A sample of the Docker file is given below
FROM oraclelinux:7-slim WORKDIR /function RUN groupadd --gid 1000 fn && adduser --uid 1000 --gid fn fn RUN yum -y install python3 oracle-release-el7 && \ rm -rf /var/cache/yum ADD . /function/ RUN pip3 install --upgrade pip RUN pip3 install --no-cache --no-cache-dir -r requirements.txt RUN rm -fr /function/.pip_cache ~/.cache/pip requirements.txt func.yaml Dockerfile README.md ENV PYTHONPATH=/python RUN python3 -m spacy download en_core_web_sm ENTRYPOINT ["/usr/local/bin/fdk", "/function/func.py", "handler"]
In func. yaml, point the OCI Function runtime to docker.
schema_version: 20180708 name: analyze-jd version: 0.0.172 runtime: docker build_image: fnproject/python:3.8-dev run_image: fnproject/python:3.8 entrypoint: /python/bin/fdk /function/func.py handler memory: 2048
The OCI Function code summary is below
import io import logging import pandas as pd import numpy as np import oci.object_storage from fdk import response import os import spacy from spacy.matcher import Matcher from spacy.matcher import PhraseMatcher import re import json
......
# Get the Function Configuration variables def get_env_var(var_name: str): value = os.getenv(var_name) if value is None: raise ValueError("ERROR: Missing configuration key {var_name}") return value #Function Initialization try: logging.getLogger().info("inside function initialization") signer = oci.auth.signers.get_resource_principals_signer() object_storage_client = oci.object_storage.ObjectStorageClient(config={}, signer=signer) env_vars = { "NAMESPACE_NAME": None, "JD_STORAGE_BUCKET": None, ...... } for var in env_vars: env_vars[var] = get_env_var(var) NAMESPACE_NAME = env_vars["NAMESPACE_NAME"] INPUT_STORAGE_BUCKET = env_vars["JD_STORAGE_BUCKET"] ... #Load spacy model nlp = spacy.load("en_core_web_sm") except Exception as e: logging.getLogger().error(e) raise
# Function Handler def handler(ctx, data: io.BytesIO = None): # Get the payload payload_bytes = data.getvalue() if payload_bytes == b'': raise KeyError('No keys in payload') payload = payload_bytes.decode() # preprocess the payload by removing html tags ..... # pass the preprocessed payload to spacy pipeline nlp_doc = nlp(preprocessed_payload) # perform your analysis and get the results ........ analysis_results = ...... except Exception as handler_error: logging.getLogger().error(handler_error) return response.Response( ctx, status_code=500, response_data="Processing failed due to " + str(handler_error) ) return response.Response(ctx, response_data=analysis_results)
API Gateway
API Gateway exposes the OCI Function as a REST endpoint.
Oracle Visual Builder
Use Oracle Visual Builder(VB) to design your user interface. You can embed the Visual Builder application within a Fusion SaaS application page.
This blog showed a reference architecture for text analysis using spaCy and OCI native services. I hope you will be able to use the architecture described in this blog to incorporate NLP techniques into your business applications.
Previous Post