Building an Autonomous Fraud Detection Pipeline with Autonomous AI Database

What if your database could hunt for fraud rings on its own — no human analyst seeding the investigation, no external AI pipeline to maintain, no data leaving the building? With Oracle Autonomous Database, that is not a hypothetical. In this post, I walk through building a fully autonomous, multi-agent fraud detection system using five converged AI capabilities available today in Autonomous AI Database when powered by Oracle AI Database 26ai: Select AI profiles, AI Vector Search with local ONNX embeddings, SQL Property Graphs, Select AI Agents, and ADB’s embedded MCP server. Every component runs inside the database. No external vector store, no standalone graph engine, no separate agent orchestration framework.

The Use Case: Banking Fraud Ring Detection

Fraud rings are clusters of accounts controlled by the same bad actor, often sharing an IP address, device fingerprint, or behavioural pattern. Traditional rule-based systems catch individual transactions; they rarely surface the ring itself. The system I built does three things that rules engines cannot:

Semantic search across unstructured support ticket text to find conceptually related fraud signals — even when customers describe the same problem in completely different words.
Graph traversal to map ring membership through shared-IP edges, revealing structure that flat SQL queries miss.
Autonomous multi-agent orchestration where a team of AI agents scouts for rings, profiles members, and produces structured case reports — without a human providing a seed customer ID.

Architecture Overview

The entire pipeline lives in a single Autonomous AI Database (ADB) ADMIN schema:

ADB (ADMIN schema)
├── BANK_CUSTOMERS              ← customer profiles with IP addresses
├── SUPPORT_TICKETS             ← fraud tickets w/ VECTOR column (384-dim)
├── BANK_FRAUD_GRAPH            ← SQL Property Graph (shared-IP edges)
├── LOCAL_EMBEDDING_MODEL       ← ONNX model (all-MiniLM-L12-v2)
│
├── Select AI Profiles
│   ├── AI_WITH_SCHEMA          ← NL→SQL over ADMIN tables
│   └── GRAPH_SLEUTH_PROFILE    ← NL→SQL with GRAPH_TABLE support
│
├── DBMS_CLOUD_AI_AGENT Tools (exposed via MCP)
│   ├── EXECUTE_SQL / LIST_SCHEMAS / LIST_OBJECTS
│   ├── FRAUD_SQL_TOOL
│   ├── FRAUD_GRAPH_TOOL
│   ├── SEMANTIC_SEARCH_TICKETS
│   ├── AGENT_RUN_TEAM
│   └── AGENT_MANAGE
│
└── Agent Teams
    ├── FRAUD_TASK_FORCE         ← 2-agent: investigate a customer
    └── FRAUD_RING_COMMAND       ← 2-agent: autonomous ring hunter

Let’s build it layer by layer.

Layer 1: Local ONNX Embeddings and AI Vector Search

The foundation is semantic understanding of support ticket text. Rather than calling an external embedding API — which adds latency, cost, and a data egress concern — Oracle AI Database lets you load an ONNX embedding model directly into the database and run inference on Exadata CPUs.

I used all-MiniLM-L12-v2, a 384-dimensional model that strikes a good balance between quality and speed. Loading it is a one-time operation:

BEGIN
    DBMS_VECTOR.LOAD_ONNX_MODEL(
        model_name => 'LOCAL_EMBEDDING_MODEL',
        model_data => < onnx_model_blob >,
        model_type => 'EMBEDDING'
    );
END;
/

With the model loaded, you generate embeddings using VECTOR_EMBEDDING — no external call, no network round-trip:

UPDATE support_tickets
SET    issue_vector = VECTOR_EMBEDDING(
           LOCAL_EMBEDDING_MODEL USING issue_text AS DATA
       );

To keep embeddings in sync as new tickets arrive, a trigger handles it automatically:

CREATE OR REPLACE TRIGGER trg_embed_ticket
BEFORE INSERT ON support_tickets
FOR EACH ROW
BEGIN
    :NEW.issue_vector := VECTOR_EMBEDDING(
        LOCAL_EMBEDDING_MODEL USING :NEW.issue_text AS DATA
    );
END;
/

For fast retrieval, an HNSW (Hierarchical Navigable Small World) vector index eliminates full-table scans:

CREATE VECTOR INDEX idx_ticket_vector
ON support_tickets(issue_vector)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
DISTANCE COSINE
WITH TARGET ACCURACY 95;

How Good Is It?

I ran a structured quality assessment across nine test queries covering exact matches, semantic inference, edge cases, and adversarial nonsense inputs. The results were eye-opening.

A query for “unrecognized transaction abroad” returned as its top match a ticket about debit card fees incurred in a foreign country — the model understood that “abroad” maps to international travel and foreign transactions, even though neither word appeared in the ticket text. Queries for “I think I’ve been hacked” correctly surfaced fraud, compromised account, and unauthorized access tickets. Distance scores were well-distributed (0.0 for near-duplicates, 0.05–0.4 for semantically related content), and HNSW ranking consistently placed the best matches in positions one through three.

The initial dataset of 100 tickets scored 8.4/10, with gaps in fraud, security, and account lockout categories. After adding more synthetic tickets to balance those categories, the score rose to 9.2/10. Fraud detection capabilities improved significantly, and the system gained entirely new handling for identity theft and phishing scenarios that previously had zero coverage.

The key takeaway: vector search quality is a function of dataset coverage, not just model quality. A balanced corpus with representative examples across all categories matters more than choosing the largest embedding model.

Layer 2: SQL Property Graph for Ring Detection

Relational queries can find customers who share an IP address. But to traverse a network — “show me everyone connected to this cluster, and everyone connected to them” — you need a graph. Oracle AI Database’s SQL Property Graphs let you define one directly over existing tables, with no ETL and no separate graph database:

CREATE OR REPLACE PROPERTY GRAPH bank_fraud_graph
    VERTEX TABLES (
        admin.bank_customers
            KEY (customer_id)
            LABEL bank_customers
            PROPERTIES (customer_id, full_name, email,
                        country, account_type, ip_address)
    )
    EDGE TABLES (
        (SELECT a.customer_id AS src_id,
                b.customer_id AS dst_id,
                a.ip_address
         FROM admin.bank_customers a
         JOIN admin.bank_customers b
           ON a.ip_address = b.ip_address
          AND a.customer_id < b.customer_id
        ) e
            SOURCE KEY (src_id) REFERENCES bank_customers (customer_id)
            DESTINATION KEY (dst_id) REFERENCES bank_customers (customer_id)
            LABEL shares_ip_with
            PROPERTIES (ip_address)
    );

The edge table is a self-join: two customers are connected if they share a non-null IP address. The a.customer_id < b.customer_id predicate avoids duplicate and self-loop edges. With this graph defined, you can query it with standard SQL using GRAPH_TABLE:

SELECT ring_ip, COUNT(DISTINCT member_id) AS ring_size
FROM   GRAPH_TABLE(bank_fraud_graph
           MATCH (a)-[e:shares_ip_with]-(b)
           COLUMNS (e.ip_address AS ring_ip,
                    a.customer_id AS member_id)
       )
GROUP  BY ring_ip
HAVING COUNT(DISTINCT member_id) >= 3
ORDER  BY ring_size DESC;

This returns all IP clusters with three or more members — your fraud ring candidates. No separate graph engine, no Cypher-to-SQL bridge. The same SQL optimizer that handles your OLTP workload handles graph pattern matching.

Layer 3: Select AI Profiles

Autonomous AI Database Select AI profiles connect the database to an LLM provider and define the scope of tables the AI can query (Select AI is ADB only at the moment of this writing). I created two profiles, each with a distinct personality:

AI_WITH_SCHEMA — a standard NL-to-SQL profile scoped to BANK_CUSTOMERS and SUPPORT_TICKETS. This powers the SQL analyst agents that need to count tickets, sum financial exposure, and classify fraud patterns.

GRAPH_SLEUTH_PROFILE — an NL-to-SQL profile augmented with instructions for generating GRAPH_TABLE syntax against BANK_FRAUD_GRAPH. This powers the graph traversal agents that need to map ring membership.

BEGIN
    DBMS_CLOUD_AI.CREATE_PROFILE(
        profile_name => 'GRAPH_SLEUTH_PROFILE',
        attributes   => TO_CLOB(JSON_OBJECT(
            'provider'       VALUE 'oci',
            'credential_name' VALUE 'OCI_GENAI_CRED',
            'model'          VALUE 'cohere.command-r-plus',
            'object_list'    VALUE JSON_ARRAY(
                JSON_OBJECT('owner' VALUE 'ADMIN',
                            'name'  VALUE 'BANK_CUSTOMERS'),
                JSON_OBJECT('owner' VALUE 'ADMIN',
                            'name'  VALUE 'SUPPORT_TICKETS')
            ),
            'comments'       VALUE TRUE
        ))
    );
END;
/

Both profiles use OCI Generative AI as the LLM backend, which means all inference traffic stays within the OCI network — no data leaves the tenancy.

Layer 4: MCP Tools and the Embedded MCP Server

Autonomous AI Database ships with a built-in MCP (Model Context Protocol) server. Once enabled, any MCP-compatible client — Claude Desktop, Cursor, Cline, VS Code with an MCP extension — can securely connect and call registered tools directly.

The key insight is that DBMS_CLOUD_AI_AGENT.CREATE_TOOL lets you expose any PL/SQL function as an MCP tool. I registered these tools (the fist two come preregistered):

Tool	Purpose
`EXECUTE_SQL`	Read-only SQL execution (SELECT only)
`LIST_SCHEMAS` / `LIST_OBJECTS`	Schema introspection
`FRAUD_SQL_TOOL`	NL→SQL via AI_WITH_SCHEMA profile
`FRAUD_GRAPH_TOOL`	NL→SQL with GRAPH_TABLE via GRAPH_SLEUTH_PROFILE
`SEMANTIC_SEARCH_TICKETS`	Vector search over support tickets
`AGENT_RUN_TEAM`	Invoke an agent team from outside the database
`AGENT_MANAGE`	Create/modify agents, tasks, and teams via PL/SQL

The SEMANTIC_SEARCH_TICKETS tool wraps a PL/SQL function that takes a natural language query string and a TOP_N parameter, generates an embedding using the local ONNX model, runs a cosine distance search against the HNSW index, and returns results as JSON:

CREATE OR REPLACE FUNCTION semantic_search_tickets(
    p_query  IN VARCHAR2,
    p_top_n  IN NUMBER DEFAULT 5
) RETURN CLOB IS
    l_result CLOB;
BEGIN
    SELECT JSON_ARRAYAGG(
        JSON_OBJECT(
            'ticket_id'  VALUE t.ticket_id,
            'category'   VALUE t.category,
            'severity'   VALUE t.severity,
            'issue_text' VALUE t.issue_text,
            'distance'   VALUE ROUND(
                VECTOR_DISTANCE(
                    t.issue_vector,
                    VECTOR_EMBEDDING(LOCAL_EMBEDDING_MODEL
                                     USING p_query AS DATA),
                    COSINE
                ), 4)
        ) ORDER BY distance
    )
    INTO l_result
    FROM (
        SELECT ticket_id, category, severity, issue_text, issue_vector
        FROM   support_tickets
        ORDER  BY VECTOR_DISTANCE(
                      issue_vector,
                      VECTOR_EMBEDDING(LOCAL_EMBEDDING_MODEL
                                       USING p_query AS DATA),
                      COSINE)
        FETCH  FIRST p_top_n ROWS ONLY
    ) t;
    RETURN l_result;
END;
/

With these tools registered, an external AI agent connecting via MCP can search tickets semantically, traverse the fraud graph, run arbitrary read-only SQL, and even invoke agent teams — all through a single, authenticated endpoint.

Layer 5: Autonomous Agent Teams

This is where everything converges. DBMS_CLOUD_AI_AGENT (Select AI Agent) provides a declarative framework for building AI agent teams: you define tasks (what to do), agents (who does it, with which profile and tools), and teams (the execution plan).

FRAUD_TASK_FORCE: Reactive Investigation

The first team is reactive — you give it a customer ID and it produces a comprehensive profile. Two agents work together:

ANALYST_AGENT uses FRAUD_SQL_TOOL to pull the customer’s transaction history, ticket patterns, and account metadata.
SLEUTH_AGENT uses FRAUD_GRAPH_TOOL to traverse BANK_FRAUD_GRAPH and map anyone connected to the target customer through shared IP addresses.

This is useful when a human analyst flags a suspicious account and wants the database to do the legwork.

FRAUD_RING_COMMAND: Autonomous Hunting

The second team is where it gets interesting. No seed input required. Two agents hunt proactively:

RING_SCOUT_AGENT — uses FRAUD_SQL_TOOL to discover shared-IP clusters across all customers, correlates them with Critical and High severity ticket counts, and ranks the top three rings by fraud signal density.

CASE_BUILDER_AGENT — takes the rings found by RING_SCOUT and produces structured case reports: member count, estimated financial exposure (extracted from ticket text), dominant fraud pattern classification (ACCOUNT_TAKEOVER, TRANSACTION_FRAUD, CARD_TESTING, IDENTITY_FRAUD), and a recommended action (monitor, investigate, or freeze).

Here is the task definition for RING_SCOUT:

BEGIN
    DBMS_CLOUD_AI_AGENT.CREATE_TASK(
        task_name   => 'RING_SCOUT_TASK',
        description => 'Discovers shared-IP fraud rings and ranks
                        them by fraud ticket density.',
        attributes  => TO_CLOB(JSON_OBJECT(
            'instruction' VALUE
              'You are RING_SCOUT. Run EXACTLY 2 queries using
               FRAUD_SQL_TOOL. Query 1: Find every IP address in
               ADMIN.BANK_CUSTOMERS shared by 3+ customers. Return
               IP, member count, and all customer_ids. Query 2: For
               those IPs, count Critical and High severity tickets
               in ADMIN.SUPPORT_TICKETS. Rank the top 3 rings by
               fraud ticket count descending.',
            'tools' VALUE JSON_ARRAY('FRAUD_SQL_TOOL')
        ))
    );
END;
/

Assembling the team:

BEGIN
    DBMS_CLOUD_AI_AGENT.CREATE_TEAM(
        team_name   => 'FRAUD_RING_COMMAND',
        description => 'Autonomous 2-agent fraud ring hunter.',
        attributes  => TO_CLOB(JSON_OBJECT(
            'process' VALUE 'sequential',
            'agents'  VALUE JSON_ARRAY(
                JSON_OBJECT('name' VALUE 'RING_SCOUT_AGENT',
                            'task' VALUE 'RING_SCOUT_TASK'),
                JSON_OBJECT('name' VALUE 'CASE_BUILDER_AGENT',
                            'task' VALUE 'CASE_BUILDER_TASK')
            )
        ))
    );
END;
/

Running it requires a single PL/SQL call:

DECLARE
    l_result CLOB;
BEGIN
    l_result := DBMS_CLOUD_AI_AGENT.RUN_TEAM(
        team_name   => 'FRAUD_RING_COMMAND',
        user_prompt => TO_CLOB(
            'Hunt for the top fraud rings across all customer data. '
            || 'Produce actionable case reports.'
        ),
        params      => TO_CLOB('{"max_tokens":4096}')
    );
    DBMS_OUTPUT.PUT_LINE(SUBSTR(l_result, 1, 32767));
END;
/

The output is a structured intelligence report. In my test environment with five synthetic fraud rings (25 members each, clustered around distinct IP addresses), FRAUD_RING_COMMAND consistently produced results like:

Ring	IP	Members	Fraud Signal	Exposure	Action
Alpha	192.168.1.1	25	Account Takeover	$79K+	FREEZE
Beta	8.8.8.8	25	Transaction Fraud	$62K+	FREEZE
Gamma	10.0.0.1	25	Card Testing	$18K	INVESTIGATE

No human pointed the system at these rings. The database found them, profiled them, and recommended actions — all within a single RUN_TEAM call.

Lessons Learned

1. Agent tool call budgets matter

OCI GenAI connections have SSL timeout thresholds. If an agent makes too many sequential LLM calls, the connection drops with ORA-28862. The fix is to constrain each agent’s task instruction to a specific number of tool calls — I found that two to three queries per agent is the sweet spot for staying under the timeout while still producing meaningful results.

2. Natural language in action recommendations

Early iterations used single-word uppercase tokens like FREEZE in agent instructions. The LLM occasionally misinterpreted these as tool names, causing ORA-20052: Invalid value for tool object errors. Switching to natural language descriptions (“immediate account suspension”) eliminated the problem.

3. Disable the human-in-the-loop tool explicitly

Some Select AI profiles implicitly enable enable_human_tool, which causes agents to pause in WAITING_FOR_HUMAN state during autonomous runs. Setting 'enable_human_tool' VALUE 'false' in the agent attributes is essential for unattended execution.

4. Semantic search quality is a dataset problem

The embedding model I used (all-MiniLM-L12-v2) is modest by today’s standards. Yet the system scored 9.2/10 on a comprehensive quality assessment after balancing the training corpus. Coverage across categories — fraud, security, account lockouts, customer service — mattered far more than model size.

What This Means for Oracle Developers

Every component in this system — vector embeddings, property graphs, Select AI profiles, MCP tools, agent teams — is available today in ADB with Oracle AI Database 26ai. You do not need a separate vector database, a standalone graph engine, an external agent framework, or a third-party embedding API. The converged architecture means:

Security is inherited. Database privileges, VPD policies, and SQL Firewall apply to agent-generated queries the same way they apply to human-generated ones.
No data movement. Embeddings, graphs, and agent orchestration all operate on the same data in the same engine. There is no ETL pipeline to maintain or synchronize.
SQL is the interface. Everything from creating an agent team to running an autonomous fraud hunt is a PL/SQL call. If you know SQL, you can build this.

Get Started

Documentation: DBMS_CLOUD_AI_AGENT Package Reference
Select AI: Learn more about Autonomous AI Database Select AI
AI Vector Search: Oracle AI Vector Search Documentation
SQL Property Graphs: SQL/PGQL Pattern Matching
ADB MCP Server: Launch Blog Post

Building an Autonomous Fraud Detection Pipeline with Autonomous AI Database

The Use Case: Banking Fraud Ring Detection

Architecture Overview

Layer 1: Local ONNX Embeddings and AI Vector Search

How Good Is It?

Layer 2: SQL Property Graph for Ring Detection

Layer 3: Select AI Profiles

Layer 4: MCP Tools and the Embedded MCP Server

Layer 5: Autonomous Agent Teams

FRAUD_TASK_FORCE: Reactive Investigation

FRAUD_RING_COMMAND: Autonomous Hunting

Lessons Learned

1. Agent tool call budgets matter

2. Natural language in action recommendations

3. Disable the human-in-the-loop tool explicitly

4. Semantic search quality is a dataset problem

What This Means for Oracle Developers

Get Started

German Viscuso

Director of Community - Autonomous AI Database - Dedicated

How I Added Memory to an AI Agent Using Spring AI and Oracle AI Database

Building ONNX Embedding Workflows in Oracle AI Database with Python

Building an Autonomous Fraud Detection Pipeline with Autonomous AI Database

The Use Case: Banking Fraud Ring Detection

Architecture Overview

Layer 1: Local ONNX Embeddings and AI Vector Search

How Good Is It?

Layer 2: SQL Property Graph for Ring Detection

Layer 3: Select AI Profiles

Layer 4: MCP Tools and the Embedded MCP Server

Layer 5: Autonomous Agent Teams

FRAUD_TASK_FORCE: Reactive Investigation

FRAUD_RING_COMMAND: Autonomous Hunting

Lessons Learned

1. Agent tool call budgets matter

2. Natural language in action recommendations

3. Disable the human-in-the-loop tool explicitly

4. Semantic search quality is a dataset problem

What This Means for Oracle Developers

Get Started

Authors

German Viscuso

Director of Community - Autonomous AI Database - Dedicated

How I Added Memory to an AI Agent Using Spring AI and Oracle AI Database

Building ONNX Embedding Workflows in Oracle AI Database with Python