Introduction

MySQL HeatWave is a fully managed MySQL database service that combines transactions, analytics, machine learning, and GenAI services, without ETL duplication. Also included is MySQL HeatWave Lakehouse, which allows users to query data in object storage, MySQL databases, or a combination of both. Users can deploy MySQL HeatWave–powered apps on a choice of public clouds: Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure.

With the rapid growth of unstructured data and the rise of enterprise use cases for Generative AI, effective information retrieval has become a fundamental requirement. MySQL HeatWave is expanding its intelligent search capabilities with the introduction of Hybrid Search, a retrieval approach that unifies semantic and exact search methods.

Traditional search paradigms typically fall into two categories:

Semantic search: Uses embeddings generated by large language models to find meaning-based matches, excelling at understanding contextual or paraphrased queries. However, it may miss proper nouns, codes, or new jargon that are outside an embedding model’s training data, reducing its effectiveness for queries involving product numbers, SKUs, or niche terminology.

Keyword (exact) search: Matches tokens and keywords directly using established algorithms such as BM25 or TF-IDF, ensuring precise retrieval of structured identifiers, technical terms, and entity names. However, while precise, it lacks conceptual understanding, failing to connect paraphrased queries with relevant content or infer user intent from context.

Hybrid Search combines both approaches, making it possible to retrieve relevant information whether the user’s query is vague and conceptual or highly precise and fact-based. This is especially important for business applications dealing with diverse datasets, from product catalogs and document repositories to regulatory filings and customer interactions. It empowers organizations to:

  • Achieve best-in-class relevance by retrieving both exact keyword matches and meaningfully related content.
  • Support compliance and audit use cases requiring exact phrase hits, as well as conversational AI workflows demanding contextual richness.

This dual approach is crucial for enabling Retrieval Augmented Generation workflows, where the combination of semantic relevance and exact matching ensures both accuracy and completeness in enterprise answers.

Use-case Scenario for Hybrid Search

This section describes a few real world examples which can benefit from Hybrid Search

Scenario 1: Product Catalog & Inventory Search (End to End Example)

A retailer wants to implement a smart search function for its product database. Users often search by SKU, brand names, or specific attributes (like “blue running shoes”) and sometimes by more generic queries (“comfortable sneakers for summer”). Hybrid Search enables the system to retrieve exact product codes or brand names via BM25 keyword matching, while also surfacing products that are semantically relevant based on detailed descriptions, even if those specific keywords aren’t present. This increases recall and ensures customers find both exact and related items efficiently

Customers may search in various ways, including:

  • Exact identifiers: SKUs (e.g., “BOOK001”)
  • Authors or publishers: (e.g., “Acme Publishing”)
  • Specific attributes: (e.g., “hardcover historical fiction”)
  • Generic queries: (e.g., “inspiring novels for teenagers”)

Here are the steps involved:

  • Upload the catalog data to the your object store bucket
  • Ingest the data into Heatwave Vector Store : Heatwave manages semantic embeddings and segment level metadata for search
  • Use ML_RAG to search for most relevant sections based on the query and then generate final answer
    • Note that ML_RAG can independently enable/disable semantic and hybrid search

Loading the data to vector store  

CALL sys.vector_store_load(
        'oci://{your-bucket}@{your-namespace}/sample_demo/SKU_sample_data.pdf',
        '{
            "formats": ["pdf"],
            "table_name": "sku_sample_data",
            "schema_name": "demo_db",
            "embed_model_id": "multilingual-e5-small"
        }'
    );
    
    DESCRIBE demo_db.sku_sample_data;
    +-------------------+---------------+------+-----+---------+-------+
    | Field             | Type          | Null | Key | Default | Extra |
    +-------------------+---------------+------+-----+---------+-------+
    | document_name     | varchar(1024) | NO   |     | NULL    |       |
    | metadata          | json          | NO   |     | NULL    |       |
    | document_id       | int unsigned  | NO   | PRI | NULL    |       |
    | segment_number    | int unsigned  | NO   | PRI | NULL    |       |
    | segment           | varchar(1024) | NO   |     | NULL    |       |
    | segment_embedding | vector(384)   | NO   |     | NULL    |       |
    | segment_metadata  | json          | NO   |     | NULL    |       |
    +-------------------+---------------+------+-----+---------+-------+

First, load your product catalog (for example, in PDF format) into a HeatWave vector store. This table will include the segment_metadata column necessary for keyword search

This ensures your table has:

  • segment_embedding for semantic search
  • segment_metadata for BM25 keyword search

Search for the book with Specific SKU number

This section highlights the benefits of combining semantic and exact search together. For this use case, while semantic search fails, the keyword search (or exact search) does yield the correct answers

Semantic Search for the book with Specific SKU number

CALL sys.ML_RAG(
      "What is Price of the Book SKU  BOOK001", 
      @output,  
      JSON_OBJECT(
        "vector_store", JSON_ARRAY("demo_db.sku_sample_data"),
        "keyword_search", False,
        "semantic_search", True,
        "n_citations", 1,
        "embed_model_id", "multilingual-e5-small"
      )
    );
    SELECT @output; 
    {
      "text": "There is no information about a book with SKU BOOK001 in the provided context. The available SKUs are BOOK003 and BOOK004, but there is no mention of a book with SKU BOOK001.",
      "citations": [{
        "segment": "“Healthy Recipes” Book (SKU: BOOK003) ... Price: $29.99 ...",
        "distance": 0.1213,
        "document_name": "https://objectstorage.uk-london-1.oraclecloud.com/n/lrsrfayerklw/b/hwml-test-bucket/o/sample_demo/SKU_sample_data.pdf",
        "segment_number": 6
      }]
    }

Insight: With only semantic search enabled it not able to retrieve right segment for the SKU asked in the query therefore it is unable to generate right answer.

Exact(Keyword) Search for the book with Specific SKU number

CALL sys.ML_RAG(
      "What is Price of the Book SKU  BOOK001",
      @output, 
      JSON_OBJECT(
        "vector_store", JSON_ARRAY("demo_db.sku_sample_data"),
        "keyword_search", True,
        "semantic_search", False,
        "n_citations", 1,
        "embed_model_id", "multilingual-e5-small"
      )
    );   
    SELECT @output;

 

{
      "text": "The price of the book \"The Art of Coding\" (SKU: BOOK001) is $24.99.",
      "citations": [{
        "segment": "“The Art of Coding” Book (SKU: BOOK001) ... Price: $24.99 | Stock: In Stock.",
        "bm25_score": 3.7612,
        "document_name": "https://objectstorage.uk-london-1.oraclecloud.com/n/lrsrfayerklw/b/hwml-test-bucket/o/sample_demo/SKU_sample_data.pdf",
        "segment_number": 5
      }]
    }

Insight: Keyword search retrieves segment for SKU BOOK001 as asked in the query. It then generated right answer with factual correctness for the query

CALL sys.ML_RAG(
      "What is Price of the Book SKU  BOOK001", 
      @output,  
      JSON_OBJECT(
        "vector_store", JSON_ARRAY("demo_db.sku_sample_data"),
        "keyword_search", True,
        "semantic_search", True,
        "n_citations", 1,
        "embed_model_id", "multilingual-e5-small"
      )
    );
    SELECT @output; 
    {
      "text": "The price of the book \"The Art of Coding\" (SKU: BOOK001) is $24.99.",
      "license": "Your use of this Llama model is subject to the Llama 3.2 Community License Agreement available at https://docs.oracle.com/cd/E17952_01/heatwave-9.4-license-com-en/",
      "citations": [
        {
          "segment": "Stock: Limited Stock.  “Healthy Recipes” Book (SKU: BOOK003)  Explore easy and quick meals with “Healthy Recipes,” designed for busy lifestyles. Price: $29.99  | Stock: In Stock.  “Mystery Novel” Book (SKU: BOOK004)  Dive into a thrilling 350-page “Mystery Novel” filled with suspense and twists. Price: $14.99 |  Stock: Out of Stock.  Bluetooth Speaker (SKU: ELEC005)  Portable Bluetooth Speaker with rich bass and 12-hour battery life for on-the-go entertainment.  Price: $49.99 | Stock: In Stock.",
          "distance": 0.1213,
          "document_name": "https://objectstorage.uk-london-1.oraclecloud.com/n/lrsrfayerklw/b/hwml-test-bucket/o/sample_demo/SKU_sample_data.pdf",
          "segment_number": 6
        },
        {
          "segment": "Confidential – Oracle Internal Stainless Steel Kettle (SKU: HOME004)  Cordless Stainless Steel Kettle with 1.7L capacity, perfect for quick boiling. Price: $44.99 |  Stock: In Stock.  “The Art of Coding” Book (SKU: BOOK001)  Enhance your programming skills with “The Art of Coding,” a comprehensive guide for  beginners and professionals. Price: $24.99 | Stock: In Stock.  “Travel Guide 2025” Book (SKU: BOOK002)  Discover top travel destinations with the “Travel Guide 2025,” filled with tips and itineraries.  Price: $19.99 |",
          "bm25_score": 3.7612,
          "document_name": "https://objectstorage.uk-london-1.oraclecloud.com/n/lrsrfayerklw/b/hwml-test-bucket/o/sample_demo/SKU_sample_data.pdf",
          "segment_number": 5
        }
      ],
      "vector_store": ["demo_dbsku_sample_data"],
      "retrieval_info": null
    }

Insight: Hybrid search combines the reliability of keyword matching with semantic coverage. In this case, it correctly retrieves the SKU-grounded product, ensuring the answer’s precision and completeness.

How to Use Hybrid Search in HeatWave ML RAG

Hybrid Search in HeatWave empowers you to search your knowledge base using both contextual meaning (semantic search) and precise keyword matches (BM25). Here’s how to use it, with a clear, consistent example from query to output.

1. Prepare Your Data

Load documents into a HeatWave vector store table. For hybrid search, make sure loading is done so that the segment_metadata column (for BM25 keyword match) is created.

Example: Load documents from object storage

CALL sys.vector_store_load(
        'oci://your-bucket@your-namespace/docs/',
        '{
            "formats": ["pdf"],
            "table_name": "my_store",
            "schema_name": "enterprise",
            "embed_model_id": "multilingual-e5-small"
        }'
    );

2. Run a Hybrid Search Query

Use the sys.ML_RAG procedure to retrieve both semantically relevant and keyword-relevant results at once.

Below are few examples that show the results of executing Hybrid Search on two sets of queries.

Use Case: Comparing Ages

Query:

“Who is older, Annie Morton or Terry Richardson?”

    {
      "citations": [
        {
          "segment": "Annie Morton was born in 1970, while Terry Richardson was born in 1965.",
          "similarity_score": 0.93,
          "document_name": "https://example.com/biographies.pdf"
        },
        {
          "segment": "Terry Richardson (born August 14, 1965) is an American photographer.",
          "BM25Score": 6.11,
          "document_name": "https://example.com/terry-profile.pdf"
        },
        {
          "segment": "Annie Morton, born in 1970, is a model and activist.",
          "BM25Score": 5.97,
          "document_name": "https://example.com/annie-profile.pdf"
        }
      ]
    }
  • Segments with a similarity_score were selected by semantic search, using question context and meaning.
  • Segments with a BM25Score were selected by exact matching of the keywords (names and dates).
  • Each document_name provides the source, supporting LLM grounding or direct citation.

Interpret the Results

In this example, both semantic and keyword search results are combined for context, allowing the LLM to correctly compare birth years (and thus answer who is older).

Use Case: IT Error Troubleshooting

Query:

“How do I resolve error code ORA-12514 in Oracle database?”

CALL sys.ML_RAG(
      'How do I resolve error code ORA-12514 in Oracle database?',
      @output,
      JSON_OBJECT(
        "vector_store", JSON_ARRAY('support.docs_store'),
        "n_citations", 3,
        "semantic_search", true,
        "keyword_search", true
      )
    );

sample output

{
      "citations": [
        {
          "segment": "ORA-12514 indicates that the listener does not currently know of service requested in connect descriptor.",
          "BM25Score": 7.2,
          "document_name": "https://support.example.com/oracle-errors.pdf"
        },
        {
          "segment": "To resolve ORA-12514, check your tnsnames.ora configuration and ensure the service name matches.",
          "similarity_score": 0.91,
          "document_name": "https://support.example.com/oracle-troubleshooting.pdf"
        }
      ]
    }
  • BM25 search surfaces content with the exact error code.
  • Semantic search reveals paraphrased, relevant troubleshooting guides.

Tips

  • Use both search modes for the most complete set of supporting facts.
  • Check both scores, semantic ones for context, BM25 for precise hits.
  • Increase n_citations if your queries are broad or need more supporting context.

Additional Use cases

 Policy, Legal, and Regulatory Document Retrieval

A law firm or compliance team frequently searches large corpora of contracts, policy documents, or regulatory filings. Queries may reference exact policy numbers or statute codes or may ask about the meaning or implications of certain clauses (“termination penalties”, “data retention requirements”). Hybrid Search leverages BM25 to ensure documents containing cited codes or policy references are retrieved with high importance, while semantic search fetches clauses and passages related in meaning, even where terminology varies. This supports in-depth research, compliance audits, and due diligence work

Support Knowledge Base & Ticket Resolution

A software support center manages a knowledge base of troubleshooting articles, error logs, and customer tickets. Users might search using an exact error code (“ORA-12514”) or with a natural question (“how do I fix a database connection error”). Hybrid Search ensures that support agents and end-users can quickly retrieve resources containing exact error messages/IDs alongside semantically similar solutions and explanations, improving first-call resolution rates and customer satisfaction

Enterprise RAG (Retrieval-Augmented Generation) Pipelines

An enterprise deploys a RAG pipeline to enrich LLM responses with authoritative internal documentation. End-user queries might combine freeform questions, specific terminology, or structured codes (“Show me reports related to SEC-18 compliance from 2025”). Hybrid Search underpins the retrieval phase, combining keyword based (BM25) and semantic search results before deduplication, ensuring that LLMs ground their answers in both the literal content and the broader context the query demands. This results in more accurate, actionable generative AI outputs

Medical Records and Scientific Literature

Healthcare professionals or researchers query large sets of patient records or scientific reports. Exact clinical codes (ICD-10, medication names) or identifiers might be mentioned, but queries may also involve describing symptoms or asking about treatment efficacy in plain language. Hybrid Search ensures critical information is retrieved regardless of whether it appears as a code, technical term, or paraphrased concept, improving care, research, and operational outcomes

Limitations

  • Hybrid Search introduces extra latency:

Because both BM25 (keyword-based) and semantic (embedding-based) retrieval strategies are executed sequentially and then results are deduplicated and merged, response times will generally be longer compared to single-method search. This additional processing is necessary to ensure both precision and coverage in results

  • Only supported for MySQL HeatWave vector store tables:

The hybrid (full-text search) function is managed internally by HeatWave. It does not support user-managed segment metadata. BM25 metadata are created and maintained automatically, users cannot directly create segment_metadata used for keyword search.

  • No user control over underlying BM25/FTS structures:

The FTS system is fully managed by HeatWave. Users interact only via the documented ML_RAG and ML_RAG_TABLE APIs; they cannot modify the FTS helper tables directly

  • Limited to retrieval (ML_RAG) workflows:

The hybrid search and BM25 keyword retrieval functions are currently designed to power retrieval-augmented generation (RAG) and are not exposed for arbitrary or direct full-text search APIs

Conclusion

Hybrid Search in HeatWave improves enterprise information retrieval by combining semantic relevance and keyword precision for superior recall and accuracy. This powerful feature allows organizations to seamlessly match user queries to both contextual content and exact terms, such as product codes, policy numbers, and technical errors, scenarios where either search method alone would be insufficient.

By integrating BM25 and semantic similarity within Heatwave’s managed ML RAG pipeline, Hybrid Search streamlines workflows, enhances user experience, and supports business-critical use cases ranging from customer support to knowledge mining and compliance research. The hybrid architecture is backward compatible, easy to enable, and manages all necessary metadata automatically for an effortless user experience.

While there are resource and latency considerations, the balanced approach offered by Hybrid Search enables customers to realize the full value of generative AI and RAG ensuring users always receive the most relevant, comprehensive, and authoritative information, no matter how their query is phrased.

Learn More

For more information, please visit: