MySQL HeatWave is a fully managed MySQL database service that combines transactions, analytics, machine learning, and GenAI services, without ETL duplication. Also included is MySQL HeatWave Lakehouse, which allows users to query data in object storage, MySQL databases, or a combination of both. Users can deploy MySQL HeatWave–powered apps on a choice of public clouds: Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure.
This post will guide you through building a simple RAG workflow using the popular LangChain framework, powered entirely by the in-database AI capabilities of MySQL HeatWave GenAI. We will ingest a public blog post, index it in the HeatWave Vector Store, and use a HeatWave-hosted LLM to answer questions based on its content following the LangChain guide to creating vector stores and using them for RAG. The code in this article requires mysql-connector-python >= 9.5.0.
The RAG Workflow at a glance
Our process follows a standard but powerful pattern for creating a question-answering system grounded in your own data. All vector operations and model inference calls are handled within the database by HeatWave GenAI components for high performance and low latency.
- Ingestion: We’ll start by loading a LangChain blog post directly from the web.
- Chunking: The document will be split into smaller, overlapping text chunks to ensure we can effectively find relevant context later.
- Embed and Store: We will create vector embeddings for each chunk and store them in the MySQL HeatWave Vector Store.
- Retrieval: When a user asks a question, we’ll retrieve the most relevant text chunks from our vector store.
- Generation: Finally, we’ll combine the user’s question with the retrieved context in a prompt and use the HeatWave GenAI model to generate a grounded answer.
Step 1: Connect to Your Database
First, we need to establish a connection to our MySQL HeatWave instance. We’ll use the mysql-connector-python library to do this. You’ll need to configure your database credentials, including the host IP, user, password, and schema name.
import mysql.connector # Connect to the database mydb = mysql.connector.connect( host="127.0.0.1", port=3306, user="root", password="", database="rag_test", allow_local_infile=True, use_pure=True, autocommit=True, )
Step 2: Load and Chunk the Document
With our connection established, we can start the ingestion process. First, we use LangChain’s WebBaseLoader to fetch the content from a blog post about “Deep Agents”. Next, we split the loaded document into smaller pieces. This is a critical step. Splitting the text into manageable segments with a slight overlap helps preserve the context between them. We use RecursiveCharacterTextSplitter which intelligently splits text by paragraphs and sentences first, creating chunks that are structurally coherent and ideal for embedding.
from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter # Load the blog post loader = WebBaseLoader(web_paths=("https://blog.langchain.com/deep-agents/",)) docs = loader.load() # Split the document into chunks of 1000 characters with a 200-character overlap text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) splits = text_splitter.split_documents(docs)
Step 3: Embed and Store in HeatWave Vector Store
Now we’ll transform our text chunks into vector embeddings and store them. HeatWave GenAI provides two specific calls for it – MyEmbeddings (to create embeddings) and MyVectorStore (to store the embeddings). We initialize MyVectorStore with our database connection and the MyEmbeddings model. When we add our text chunks (splits), the system automatically converts each one into a dense vector representation and indexes it. This creates an efficient “similarity-search surface” that allows us to find semantically related content later on.
from mysql.ai.genai import MyEmbeddings, MyVectorStore # Initialize the HeatWave Vector Store with an embedding model vectorstore = MyVectorStore(mydb, MyEmbeddings(mydb)) # Add the document chunks to the store vectorstore.add_documents(documents=splits)
Step 4: Build the RAG Chain
With our data indexed, we can assemble the RAG pipeline using LangChain’s Expression Language (LCEL). A “chain” defines the sequence of operations from input question to final answer. The chain is assembled as follows:
- Retrieve Context: We create a retriever from our vector store that fetches relevant document chunks.
- Format Context: The retrieved documents are formatted into a single string to be used as context.
- Get Prompt: We pull a pre-built RAG prompt template from the LangChain Hub.
- Invoke LLM: The prompt, filled with the context and user question, is sent to the MyLLM model hosted in HeatWave.
- Parse Output: The final output is parsed into a clean string.
from langchain import hub from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from mysql.ai.genai import MyLLM # Create a retriever from the vector store retriever = vectorstore.as_retriever() # Pull a standard RAG prompt prompt = hub.pull("rlm/rag-prompt") # Helper function to format retrieved documents def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) # Assemble the RAG chain rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | MyLLM(mydb) | StrOutputParser() )
Step 5: Ask a Question!
The setup is complete! We can now invoke our RAG chain with a question about the blog’s content. The chain will automatically handle the retrieval and generation steps to produce a grounded answer.
Let’s ask: “What are deep agents?”
response = rag_chain.invoke("What are deep agents?") print(response) Output: Deep agents are advanced AI models that can plan over longer time horizons and execute complex tasks, allowing them to dive deeper into topics and achieve more sophisticated results. They use the same core algorithm as LLMs but with additional features such as detailed system prompts, planning tools, sub-agents, and a virtual file system. These agents are capable of executing more complex tasks and can be customized with custom prompts, tools, and sub-agents to suit specific needs.
As you can see, the model provides a detailed and accurate answer based on the information it retrieved from the blog post we indexed.
Get Started Today
You’ve just seen how to build a powerful, low-latency RAG pipeline using LangChain and MySQL HeatWave GenAI. By keeping your data, embeddings, and LLM all within the database, you can create highly efficient and scalable AI applications.
We invite you to try HeatWave AutoML and GenAI. If you’re new to Oracle Cloud Infrastructure, try Oracle Cloud Free Trial, a free 30-day trial with US$300 in credits.


