Using Enterprise data in Large Language Models

May 22, 2024 | 8 minute read
Rekha Mathew
Cloud Solution Architect | A-Team
Text Size 100%:

Introduction

Enterprise data consists of structured and unstructured information generated, collected, and leveraged by an organization to support its core business activities and operations. This data helps in gaining insights, optimizing processes, and enhancing customer satisfaction.

Examples of enterprise data include:

  • Data from business management systems such as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Human Capital Management (HCM), Supply Chain Management (SCM), etc.
  •  Logs, metrics, and other data generated by IT infrastructure.
  • Documents like contracts, reports, policies, and manuals stored in organizational repositories.
  • Real-time data from sensors and connected machines monitoring physical assets and equipment.
  • Emails, support tickets, product feedback, and other forms of communication from customers.

For businesses, it is crucial that LLMs understand their specific industry and data and provide answers from their data context instead of giving broad and generalized responses. Large Language Models (LLM) are trained on a wide range of public data and have no knowledge of enterprise data, which is usually private. Therefore, when asked questions about enterprise data, LLM may respond incorrectly or hallucinate.  This blog will show a few techniques to get desired responses from Large Language Models based on enterprise data.

There are 2 options described in this blog: one is model fine-tuning and the other is in-context learning. In fine-tuning, the pre-trained model is trained with a labeled dataset of enterprise data. In in-context learning during model inference, you feed the pre-trained model with a context, which is your enterprise data, and the model generates a response using this context. 

Model fine-tuning

Fine-tuning involves training an LLM on a particular task or domain. To do this, we start with the existing pre-trained model and then train it further using labeled data specific to the task. This process adjusts the model’s weights to match the data better, resulting in an improved version of the LLM tailored to the training data.

Fine Tuning

In full fine-tuning, the entire pre-trained model, including all its layers and parameters, is usually trained. This process can be computationally expensive and time-consuming, especially for large models. On the other hand, parameters-efficient fine-tuning, or PEFT in short, is another method of fine-tuning that focuses on training only a subset of the pre-trained model’s parameters and layers. This approach involves identifying the most important parameters for the new task and only updating those parameters during training. This approach has reduced computational costs and faster training times compared to full fine-tuning. There are a few key points to consider while choosing a fine-tuning approach, which are 

  • Cost:  Fine-tuning, especially full fine-tuning, is expensive computationally. It requires large amounts of storage, compute, and memory.

  • Time: The quality and size of the training data are crucial for fine-tuning. Creating quality training data can be time-consuming and expensive.

  • Catastrophic Forgetting: When fine-tuning a model for a particular task, the base model may “forget” the general knowledge it had previously acquired due to adjustments made to its parameters. This is referred to as catastrophic forgetting. PEFT  shows greater robustness to this behavior because it leaves most of the pre-trained weights unchanged.

  • Monitoring and Guidelines: Regular performance monitoring is essential to ensure that an LLM behaves in alignment with desired outcomes. Additionally, the model should undergo regular updates to adapt to evolving data. Additionally, establishing guidelines and guardrails is important for defining ethical boundaries and addressing potential model biases.

In-context learning

Unlike fine-tuning, in-context learning doesn’t involve training the model with a dataset or changing the model’s parameters. It also doesn't require additional computational resources beyond what's necessary for model inference. Below are some techniques used in in-context learning.

Prompts

One way of achieving in-context learning is to provide the model with a prompt or set of instructions.  Apart from instructions, the prompt can include a context to guide LLM to generate a context-specific response.

Let's look into a few ways of passing enterprise data context to a prompt. 

Prompt Templates

One technique is to make prompts variable-friendly. For this, you define a Prompt Template that allows the creation of a template string with placeholders for variables. PromptTemplate can then be formatted with input values to generate the final prompt string, replacing placeholders with corresponding values. These variable values can be sourced from enterprise data via API calls or data extraction techniques. Once the variable substitution is complete, the resulting prompt becomes context-specific, incorporating data from the enterprise. This final prompt is passed to the LLM, enhancing its ability to generate relevant and context-specific responses based on enterprise data.

Prompt Templates

 

Few-shot prompting

Few-shot prompting is used to guide LLMs to generate a desired response by providing them with examples of input-output pairs. These input-output pairs can be obtained from enterprise data  This way, your prompt has examples, known as “shots,” with which you can condition the model to generate outputs based on the context. 

Few shot prompting

 

Prompt Chaining

Prompt Chaining is a technique utilized when a single prompt presents a complex task with multiple instructions. It involves breaking down the task into smaller, more manageable sub-tasks and executing them as a chain of prompts. These sub-task prompts can be executed sequentially, as depicted in the diagram, when the sub-tasks are interdependent. Alternatively, they can be executed in parallel when the sub-tasks are independent. 

In this process, the response of one sub-task becomes the prompt to the other sub-task, and this chain continues until the overall task is complete. During intermediate steps, the output of each LLM call can be parsed or manipulated, and the resulting output is then fed into the next step of the chain.

Prompt Chaining

 

Retreivel Augmented Generation(RAG)

Retrieval Augmented Generation (RAG) is an architecture in which an information retrieval system is added which will augment the prompt with relevant context. To ensure efficiency, retrieved information must be sufficiently detailed yet compact enough to fit within the maximum sequence length allowed for a prompt in an LLM. Due to this consideration, the majority of RAG approaches use vector similarity as the search technique to retrieve contextual information. 

Enterprise knowledge data is segmented into smaller chunks and stored in a vector store using an embedding model. This enables later retrieval through vector similarity search techniques. The prompt is also passed through an embedding model to generate embeddings, allowing for comparison with document embeddings during retrieval. The retrieval system compares the prompt and document embeddings to find the most relevant chunks, These chunks are passed as the supplemental context to create an augmented prompt and passed to LLM. 

There are also newer concepts like GraphRAG which uses a knowledge graph to search for relevant content.

RAG

Agents and Tools

Often, LLMs need to interact with other software, databases, APIs, and external data sources to accomplish complex tasks. Agents are programs that use an LLM to reason through a task, create a plan to solve the task and execute the plan using a set of tools.  LLM agents use tools and task-planning abilities, to interact with outside systems and overcome complex problems involving enterprise data access. Tools can take a variety of forms, such as API calls, Python functions, or database calls. One popular task-planning technique is the chain of thought approach, where the model is prompted to think step-by-step, allowing for self-correction. This method has advanced into more sophisticated versions, such as the tree of thoughts. In this approach, multiple thoughts are generated, re-evaluated, and consolidated to produce a final output. Agents also have a memory module, which can be thought of as a store of the agent’s internal logs as well as interactions with a user. 

 

Agent

 

Key Considerations when choosing an approach

Following are a few important factors to consider while choosing a technique.

  • Expertise: Fine-tuning requires expertise in model training, while in-context learning techniques are less complex.
  • Cost: Fine-tuning is costly in terms of infrastructure as well as the preparation of training data. In-context learning requires less labeled data and computing resources.
  • Reusability: In-context learning is highly reusable as pre-trained model parameters are not modified for a specific task. So same model can be used for various tasks by choosing appropriate in-context learning techniques.
  • Up-to-date responses: Fine-tuning requires regular retraining as the data evolves over time. In-context learning ensures updated responses by retrieving information from external up-to-date data sources.
  • Model inference time - Inference time is usually smaller for fine-tuned models. In In-context learning, we have to include data demonstrations which increase the inference time. 
  • Context size limitations - Models have a fixed context size that limits the amount of context-specific information that can be passed.
  • Model size - Usually in-context learning requires large models to work, while fine-tuning works well even with smaller models.
 

Oracle Cloud Infrastructure(OCI) Offerings

Let's review some of the Oracle offerings you can choose to implement these techniques. It would be often required to combine a few of these patterns to achieve the desired result.

OCI Generative AI is a fully managed service using which you can access pre-trained, foundational models. You can also create custom models by fine-tuning base models with your own data set.

OCI Generative AI Agents,  combine large language models (LLMs) and retrieval-augmented generation (RAG) with your data.

OCI Data Science is a platform where you collaboratively build, train, deploy, and manage machine learning (ML) models with the frameworks of your choice.

AI Vector Search is a new capability in the Oracle database. It includes a new vector data type, vector indexes, and vector search SQL operators that enable the Oracle Database to store and retrieve the semantic content of documents, images, and other unstructured data as vectors.

Select AI enables you to query your data using natural language. Combining LLMs with Oracle SQL empowers you to describe what you want and let the database generate the SQL query relevant to your schema.

OCI AI infrastructure provides the highest-tier performance and value for various types of  AI workloads. 

Rekha Mathew

Cloud Solution Architect | A-Team


Previous Post

Creating Innovative User Experience for Fusion Cloud Applications Scheduled Processes

Bala Mahalingam | 8 min read

Next Post


Connecting two OCI Regions using Equinix Network EDGE Virtual Device – Equinix side configuration

Marius Radulescu | 11 min read