Retrieval-Augmented Generation (RAG) is a powerful technique that enhances the capabilities of Large Language Models (LLMs) for enterprise data. By grounding the model in information from your vector store/knowledge base, RAG helps produce more accurate, relevant, and trustworthy answers.

MySQL HeatWave is a fully managed MySQL database service that combines transactions, analytics, machine learning, and GenAI services, without ETL duplication. Also included is MySQL HeatWave Lakehouse, which allows users to query data in object storage, MySQL databases, or a combination of both. Users can deploy MySQL HeatWave–powered apps on a choice of public clouds: Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure.

This post will guide you through building a simple RAG workflow using the popular LangChain framework, powered entirely by the in-database AI capabilities of MySQL HeatWave GenAI. We will ingest a public blog post, index it in the HeatWave Vector Store, and use a HeatWave-hosted LLM to answer questions based on its content following the LangChain guide to creating vector stores and using them for RAG. The code in this article requires mysql-connector-python >= 9.5.0.

The RAG Workflow at a glance

Our process follows a standard but powerful pattern for creating a question-answering system grounded in your own data. All vector operations and model inference calls are handled within the database by HeatWave GenAI components for high performance and low latency.

  1. Ingestion: We’ll start by loading a LangChain blog post directly from the web.
  2. Chunking: The document will be split into smaller, overlapping text chunks to ensure we can effectively find relevant context later.
  3. Embed and Store: We will create vector embeddings for each chunk and store them in the MySQL HeatWave Vector Store.
  4. Retrieval: When a user asks a question, we’ll retrieve the most relevant text chunks from our vector store.
  5. Generation: Finally, we’ll combine the user’s question with the retrieved context in a prompt and use the HeatWave GenAI model to generate a grounded answer.

Step 1: Connect to Your Database

First, we need to establish a connection to our MySQL HeatWave instance. We’ll use the mysql-connector-python library to do this. You’ll need to configure your database credentials, including the host IP, user, password, and schema name.

import mysql.connector
    
    # Connect to the database
    mydb = mysql.connector.connect(
        host="127.0.0.1", port=3306,
        user="root", password="",
        database="rag_test", allow_local_infile=True,
        use_pure=True, autocommit=True,
    )

Step 2: Load and Chunk the Document

With our connection established, we can start the ingestion process. First, we use LangChain’s WebBaseLoader to fetch the content from a blog post about “Deep Agents”. Next, we split the loaded document into smaller pieces. This is a critical step. Splitting the text into manageable segments with a slight overlap helps preserve the context between them. We use RecursiveCharacterTextSplitter which intelligently splits text by paragraphs and sentences first, creating chunks that are structurally coherent and ideal for embedding.

from langchain_community.document_loaders import WebBaseLoader
    from langchain_text_splitters import RecursiveCharacterTextSplitter
    
    # Load the blog post
    loader = WebBaseLoader(web_paths=("https://blog.langchain.com/deep-agents/",))
    docs = loader.load()
    
    # Split the document into chunks of 1000 characters with a 200-character overlap
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) 
    splits = text_splitter.split_documents(docs)

Step 3: Embed and Store in HeatWave Vector Store

Now we’ll transform our text chunks into vector embeddings and store them. HeatWave GenAI provides two specific calls for it –  MyEmbeddings (to create embeddings) and MyVectorStore (to store the embeddings). We initialize MyVectorStore with our database connection and the MyEmbeddings model. When we add our text chunks (splits), the system automatically converts each one into a dense vector representation and indexes it. This creates an efficient “similarity-search surface” that allows us to find semantically related content later on.

from mysql.ai.genai import MyEmbeddings, MyVectorStore
    
    # Initialize the HeatWave Vector Store with an embedding model
    vectorstore = MyVectorStore(mydb, MyEmbeddings(mydb))
    # Add the document chunks to the store
    vectorstore.add_documents(documents=splits)

Step 4: Build the RAG Chain

With our data indexed, we can assemble the RAG pipeline using LangChain’s Expression Language (LCEL). A “chain” defines the sequence of operations from input question to final answer. The chain is assembled as follows:

  1. Retrieve Context: We create a retriever from our vector store that fetches relevant document chunks.
  2. Format Context: The retrieved documents are formatted into a single string to be used as context.
  3. Get Prompt: We pull a pre-built RAG prompt template from the LangChain Hub.
  4. Invoke LLM: The prompt, filled with the context and user question, is sent to the MyLLM model hosted in HeatWave.
  5. Parse Output: The final output is parsed into a clean string.
from langchain import hub
    from langchain_core.output_parsers import StrOutputParser
    from langchain_core.runnables import RunnablePassthrough
    from mysql.ai.genai import MyLLM
    
    # Create a retriever from the vector store
    retriever = vectorstore.as_retriever()
    # Pull a standard RAG prompt
    prompt = hub.pull("rlm/rag-prompt")
    
    # Helper function to format retrieved documents
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
    
    # Assemble the RAG chain
    rag_chain = (
     {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | MyLLM(mydb)
        | StrOutputParser()
    )

Step 5: Ask a Question!

The setup is complete! We can now invoke our RAG chain with a question about the blog’s content. The chain will automatically handle the retrieval and generation steps to produce a grounded answer.

Let’s ask: “What are deep agents?”

response = rag_chain.invoke("What are deep agents?")
    print(response)
    
    Output:
    Deep agents are advanced AI models that can plan over longer time horizons and execute complex tasks, allowing them to dive deeper into topics and achieve more sophisticated results. They use the same core algorithm as LLMs but with additional features such as detailed system prompts, planning tools, sub-agents, and a virtual file system. These agents are capable of executing more complex tasks and can be customized with custom prompts, tools, and sub-agents to suit specific needs.

As you can see, the model provides a detailed and accurate answer based on the information it retrieved from the blog post we indexed.

Get Started Today

You’ve just seen how to build a powerful, low-latency RAG pipeline using LangChain and MySQL HeatWave GenAI. By keeping your data, embeddings, and LLM all within the database, you can create highly efficient and scalable AI applications.

We invite you to try HeatWave AutoML and GenAI. If you’re new to Oracle Cloud Infrastructure, try Oracle Cloud Free Trial, a free 30-day trial with US$300 in credits.