Developing an AI bot powered by RAG and Oracle Database

What is RAG?

An excellent introduction to Retrieval Augmented Generation (RAG) can be found here

We can build the UI of the Knowledge Assistant using Langchain, OCI Generative AI, and store data in Oracle 23ai Vector DB.

Customers can easily build the chatbot using Oracle 23ai Vector DB and OCI Generative AI.

To test these functionalities, you can visit our GitHub repository for the python rag chatbot. Follow the instructions in the README file to install the appropriate versions of the required software libraries.

Code and functionalities may change as a result of customer feedback.

High Level Flow

Building a Full-Stack RAG Chatbot with OCI Generative AI and Oracle Vector Database (Python Powerhouse)

In the realm of chatbots, where responsiveness and knowledge are paramount, Retrieval-Augmented Generation (RAG) offers a compelling solution. This approach combines the power of large language models (LLMs) with the precision of database retrieval, making chatbots more informative and up-to-date. This blog delves into crafting a full-stack RAG chatbot using Oracle Cloud Infrastructure (OCI) Generative AI and Oracle Vector Database, all orchestrated by the versatile Python language.

Why OCI and Oracle 23ai Vector DB?

OCI Generative AI: This managed service streamlines LLM integration, providing pre-trained models like Cohere and Meta that can be fine-tuned for specific tasks.
Oracle Vector Database: This innovative database excels at semantic search, enabling the chatbot to find the most relevant information from your knowledge base using vector embeddings.
Python: Python’s extensive libraries like Transformers and Gensim facilitate building the LLM and retrieval components, making development efficient.

Building Blocks of the Chatbot:

Knowledge Base Preparation:
- Structure your data (documents, FAQs) for easy processing.
- Preprocess the text, including cleaning, tokenization, and stemming/lemmatization.
Vectorization:
- Use Cohere embedding or a similar library to create vector embeddings for your knowledge base documents.
- Store these embeddings in Oracle Vector Database for efficient retrieval.
LLM Integration with OCI Generative AI:
- Fine-tune a pre-trained LLM model on your specific domain or chatbot purpose.
- Utilize the OCI Generative AI Python SDK to interact with the LLM for text generation.
Retrieval Engine:
- Develop a Python function to search the Oracle Vector Database based on user queries.
- Retrieve the most relevant documents (top K) based on vector similarity.
Chatbot Framework and User Interface:
- Choose a framework like Streamlit or Flask to build the user interface for interacting with the chatbot.
- Integrate the LLM generation, retrieval engine, and response formatting logic.

Putting it All Together:

User Input: The user interacts with the chatbot interface, posing a question.
Query Processing: The query is preprocessed and vectorized.
Retrieval: The vectorized query is used to search the Oracle Vector Database for relevant documents.
LLM Generation: The retrieved documents (or snippets) are fed into the LLM, prompting it to generate a response.
Response Formatting: The response is refined and formatted for presentation to the user.

Now let’s elaborate on each step for the python rag chatbot which is available in Github.

Step 1: User put a question on chatbot

AI Bot

Internally, we push the question along with the RAG chain in our Python code.

response = get_answer(rag_chain, question)

Step 2: How to build the RAG chain

Building the RAG chain involves multiple steps, as outlined below:

Load a list of documents. For e.g. currently our github code supports the pdf

    all_pages = load_all_pages(BOOK_LIST) // BOOK_LIST = [BOOK1, BOOK2, BOOK3, BOOK4, BOOK5, BOOK6]

Split pdf pages in chunks

document_splits = split_in_chunks(all_pages) // CHUNK_SIZE = 1000, CHUNK_OVERLAP = 50

Load embeddings model

whenever user decide to go for local embedding model, he/she can use hugging face embedding or he/she can go for cohere embedding

embedder = create_cached_embedder()

def create_cached_embedder(): ## Initializing Embeddings model...
        fs = LocalFileStore("./vector-cache/")  # Introduced to cache embeddings and make it faster
    
        if EMBED_TYPE == "COHERE": ## Loading Cohere Embeddings Model...
            embed_model = CohereEmbeddings(
                model=EMBED_COHERE_MODEL_NAME, cohere_api_key=COHERE_API_KEY
            )
        elif EMBED_TYPE == "LOCAL":
            print(f"Loading HF Embeddings Model: {EMBED_HF_MODEL_NAME}")
    
            model_kwargs = {"device": "cpu"}
            # changed to True for BAAI, to use cosine similarity
            encode_kwargs = {"normalize_embeddings": True}
    
            embed_model = HuggingFaceEmbeddings(
                model_name=EMBED_HF_MODEL_NAME,
                model_kwargs=model_kwargs,
                encode_kwargs=encode_kwargs,
            )
    
        # the cache for embeddings
        cached_embedder = CacheBackedEmbeddings.from_bytes_store(
            embed_model, fs, namespace=embed_model.model_name
        )
    
        return cached_embedder

Create a Vector Store and store embeddings within the Oracle 23ai database.

We are providing an option in our chatbot where customers can configure the different databases for storing embeddings. The following code demonstrates this feature:

vectorstore = create_vector_store(VECTOR_STORE_NAME, document_splits, embedder)

Let’s say we are going to use oracledb we have to pass the store_type as “ORACLEDB” in the config rag file

def create_vector_store(store_type, document_splits, embedder):
        global vectorstore
        print(f"Indexing: using {store_type} as Vector Store...")
    
        if store_type == "ORACLEDB":
           connection = oracledb.connect(user="ADMIN", password="XXXXXX", dsn="XXXXXXX")
    
           vectorstore = OracleVS.from_documents(
                documents=document_splits,
                embedding=embedder,
                client=connection,
                table_name="oravs",
                distance_strategy=DistanceStrategy.DOT_PRODUCT
            )
           print(f"Vector Store Table: {vectorstore.table_name}")
        elif store_type == "FAISS":
            # modified to cache
            vectorstore = FAISS.from_documents(
                documents=document_splits, embedding=embedder
            )
        elif store_type == "CHROME":
            # modified to cache
            vectorstore = Chroma.from_documents(
                documents=document_splits, embedding=embedder
            )
        return vectorstore

Create a retriever

It will act as a decoder and provide the response in plain text

By default we have disabled the reranking in our chatbot code.

# added optionally a reranker
    retriever = create_retriever(vectorstore)

Build the OCI GEN AI LLM using below code

if llm_type == "OCI":
    
        llm = OCIGenAI(
            model_id="cohere.command",
            service_endpoint="XXXXXXX",
            compartment_id="mycompartmentId",
            model_kwargs={"max_tokens": 200},
            auth_type='SECURITY_TOKEN',
        )

Kindly define the prompt ( as for now hard coded…)

 template = """Answer the question based only on the following context:
    {context}
    
    Question: {question}
    """
    rag_prompt = ChatPromptTemplate.from_template(template)
    
    Build the entire RAG chain
    
    print("Building rag_chain...")
    rag_chain = (
            {"context": retriever, "question": RunnablePassthrough()} | rag_prompt | llm
    )

Step 3:

User questions are processed along with the RAG chain, and the retriever finds the answer using the Gen AI LLM.

 def get_answer(rag_chain, question): 
     response = rag_chain.invoke(question)

Highlighting one of the Customer use Case solved using chatbot

Benefits of a Full-Stack RAG Chatbot:

Accurate and Up-to-Date Responses: Combines LLMs with retrieval for reliable answers.
Scalability: The system can be seamlessly scaled to accommodate larger datasets and user loads.
Flexibility: New information can be readily added to the knowledge base, keeping the chatbot updated.
Customization: The LLM can be fine-tuned for specific domains or conversational styles.
Python Powerhouse: Python’s rich ecosystem of libraries empowers this project. From data manipulation to vectorization and LLM interaction (Transformers), Python provides the tools to streamline development.

Why Oracle 23ai?

The “best” vector database depends heavily on specific use cases, performance requirements, integration needs, and cost constraints. It’s crucial to evaluate these factors for your particular application.

That said, let’s analyze the three options we have mentioned in the demo:

database Comparison

ChromaDB, Oracle 23ai DB, and FAISS DB

ChromaDB is a popular open-source vector database designed for flexibility and ease of use. It’s often preferred for rapid prototyping and smaller-scale projects due to its Python-centric nature.

Oracle 23ai DB is a proprietary vector database integrated into the Oracle database platform. It offers robust performance and scalability, especially for large-scale enterprise applications. However, it might have a steeper learning curve and higher costs associated with Oracle licensing.

FAISS DB is less known, and there’s limited public information available. Without more details about its features and capabilities, it’s difficult to provide a comprehensive comparison.

Key Factors to Consider

When choosing a vector database, focus on these aspects:

Performance: Evaluate query latency, throughput, and scalability to meet your application’s demands.
Scalability: Consider how the database can handle increasing data volumes and query loads.
Features: Assess the availability of features like similarity search, indexing, filtering, and integrations.
Cost: Compare pricing models, including licensing fees, cloud storage costs, and operational expenses.
Ease of use: Evaluate the database’s API, documentation, and community support.
Integration: Consider how the database integrates with your existing infrastructure and applications.

Recommendations

Start with ChromaDB: If you’re new to vector databases or working on a smaller project, ChromaDB is a good starting point due to its simplicity and active community.
Evaluate Oracle 23ai DB: For large-scale enterprise applications with high performance and integration needs, Oracle 23ai DB might be worth considering, especially if you already have an Oracle database infrastructure.
Research FAISS DB: If you have specific requirements that align with FAISS DB’s capabilities, gather more information and conduct thorough testing.

Additional Considerations

Open-source vs. proprietary: Evaluate the trade-offs between flexibility, cost, and support.
Cloud vs. on-premises: Consider the deployment options based on your infrastructure and security requirements.
Benchmarking: Conduct performance tests with your specific data and workloads to make an informed decision.

Conclusion

Leveraging OCI Generative AI, Oracle Vector Database, and Python empowers you to build a robust RAG chatbot that delivers engaging and informed user experiences across various domains. For enterprise-scale RAG chatbots, Oracle 23ai DB often emerges as a strong contender due to its performance, scalability, and integration with Oracle infrastructure.

Developing an AI bot powered by RAG and Oracle Database

What is RAG?

We can build the UI of the Knowledge Assistant using Langchain, OCI Generative AI, and store data in Oracle 23ai Vector DB.

Paul Parkinson

Architect and Developer Evangelist, Microservices and Converged Database

Pankaj Tiwari

Principal Member Of Technical Staff

5 Sessions to see at OCW 2024: Java

Creating a True Cache setup using Oracle Database 23ai Free container images

Developing an AI bot powered by RAG and Oracle Database

What is RAG?

We can build the UI of the Knowledge Assistant using Langchain, OCI Generative AI, and store data in Oracle 23ai Vector DB.

Authors

Paul Parkinson

Architect and Developer Evangelist, Microservices and Converged Database

Pankaj Tiwari

Principal Member Of Technical Staff

5 Sessions to see at OCW 2024: Java

Creating a True Cache setup using Oracle Database 23ai Free container images