Oracle Autonomous Database & Confluent Tableflow: Real-Time Kafka Analytics Without ETL

Co-authored with Confluent development: Matthew O’Keefe and Scott Sappenfield

TL;DR — Confluent Tableflow can automatically materialize your Kafka topics as Iceberg tables in Amazon S3. Oracle Autonomous Database (ADB) can then treat those tables as external tables, so you can run rich Oracle SQL (or even Select AI) across real-time events without copying any data.

In this post we walk through the end-to-end recipe, share lessons learned, and show how you can try it yourself.

Kafka data streaming has become very popular since Kafka was first introduced into open source in 2011. Streaming allows event producers like databases, applications, and logs to process and share event streams at scale, in real-time, with consumers. Consumers can include microservices, observability and analytics applications.

Open table formats like Iceberg have recently become popular. They provide a way for different compute engines including Snowflake, Databricks, Trino, and Oracle, to be used on the same physical tables.

Oracle Database supports Iceberg. Combining Confluent Tableflow (Iceberg Table materialization) with ADB presents Oracle customers with a simple way to ingest and query Kafka data.

In this blog post, we provide a “How To” that walks you through a simple use case to show how to configure and operate Tableflow with ADW.

Why Pair Confluent & Oracle ADB?

When people ask why this combo?, we like to answer with a simple thought experiment: picture the limitless event firehose of Kafka meeting the battle-tested SQL brain of Oracle. Put them together and you unlock real-time insights without the usual data copying.

Problem/Solution flow

Analysts need SQL access to Kafka topics without slowing the streams, and copying data into a warehouse adds latency and cost. Confluent Tableflow materializes Kafka topics as Iceberg tables — adding schema, time-travel, and ACID. Oracle Autonomous Database External Tables then query those Iceberg datasets in place with full Oracle SQL power. Governance across clouds stays simple because ADB reuses your AWS Glue catalog and IAM roles; there’s no extra catalog to manage. In short: Tableflow gives your streams an open-table format, and ADB provides the enterprise-grade SQL engine to explore them — via ad hoc queries, BI tools, or AI-augmented queries.

Architecture at a Glance

Events land in Confluent Cloud → Tableflow continuously writes them into Iceberg tables in S3 → AWS Glue tracks metadata → Oracle ADB — deployed on OCI, AWS, Azure, or GCP — queries the tables on demand.

Your Integration Toolkit In Action

In the remainder of this post we’ll walk through the full integration, start-to-finish, so you can replicate it yourself. Imagine you run an online shoe store: every landing-page view and product click is streamed into Kafka as a high-velocity clickstream topic for real-time insight, while your operational data — PRODUCT_CA TALOG, CUSTOMERS, and SALES_ORDERS — is stored in Oracle Autonomous Database to power the checkout experience. This pattern highlights the best of both worlds: Kafka for limitless event ingestion and Oracle ADB’s uniquely mixed-workloadengine that serves OLTP transactions and analytics in the same place.

Step 1 - Generate Sample Clickstream Data in Kafka

Confluent Cloud’s Datagen connector makes it trivial to spin up realistic click events. For this demo we use the built-in clickstream dataset to create a single Avro topic:

● clickstream — every landing-page view, product detail click, and add-to-cart action

For details on setting up a topic and using the datagen program to generate clickstream data, click here

Meanwhile we seed Oracle ADB with an operational schema:

CREATE TABLE PRODUCT_CATALOG
(product_id NUMBER PRIMARY KEY,
product_name VARCHAR2(128),
price NUMBER(10,2));


CREATE TABLE CUSTOMERS (
customer_id NUMBER PRIMARY KEY,
first_name VARCHAR2(64),
last_name VARCHAR2(64),
loyalty_tier VARCHAR2(32)
);

CREATE TABLE SALES_ORDERS (
order_id NUMBER PRIMARY KEY,
customer_id NUMBER REFERENCES CUSTOMERS,
product_id NUMBER REFERENCES PRODUCT_CATALOG,
quantity NUMBER,
order_ts TIMESTAMP
);

For a quick test you can seed a handful of rows inline:

- Products
INSERT INTO PRODUCT_CATALOG (product_id, product_name, price) VALUES
(1, 'Trail Runner 3000', 119.99),
(2, 'Urban Sneaker X', 89.50),
(3, 'Alpine Boot Pro', 179.00);
- Customers
INSERT INTO CUSTOMERS (customer_id, first_name, last_name, loyalty_tier) VALUES
(101, 'Ava', 'Nguyen', 'Gold'),
(102, 'Liam', 'Patel', 'Silver'),
(103, 'Mia', 'Garcia', 'Bronze');
 - Orders (note timestamps for realistic filtering)
INSERT INTO SALES_ORDERS (order_id, customer_id, product_id, quantity, order_ts) VALUES
(1001, 101, 2, 1, SYSTIMESTAMP - INTERVAL '45' MINUTE),
(1002, 102, 3, 2, SYSTIMESTAMP - INTERVAL '2' HOUR),
(1003, 103, 1, 1, SYSTIMESTAMP - INTERVAL '25' HOUR);

Step 2 - Enable Tableflow → Iceberg on S3 for the Clickstream

If this is your first time setting up Tableflow in Confluent Cloud, the following steps will guide you in setting it up correctly.

The high-level steps are as follows:

2.1 In Confluent Cloud ⇢ Tableflow, enable Tableflow for the clickstream topic.

2.2 Click on the Configure custom storage button since you will be using S3 instead of Confluent managed storage.

2.3 Select Iceberg (Tableflow handles Avro ➜ Parquet conversion automatically).

2.4 Point to your S3 bucket, e.g. s3://oracleadbtest-clickstream. Select Store in your own bucket and choose your Provider Integration from the dropdown list and provide your S3 bucket name.

Tableflow now writes every new Kafka message into partitioned Iceberg files. If you look at your topic, you can see that data is flowing and you should see a green check mark next to “Tableflow syncing.”

It’s important to note that if you have any errors or you do not see any data in your S3 bucket, then review this guide to make sure you configured all the IAM permissions properly.

Step 3 - Wire-Up AWS Permissions & Glue

● Create an S3 bucket oracleadbtest-clickstream in the same region as Confluent.

● Attach an IAM role (e.g., oracleadbtest-tableflowrole) with S3 + Glue permissions.

● Register the Glue catalog in Tableflow.

Step 4 - Create Cloud Credentials in Oracle ADB

Before ADB can read Iceberg metadata out of AWS Glue and pull Parquet files from S3 it needs a cloud credentials.

EXEC DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => ‘CRED_AWS’,
username => ‘<AWS_ACCESS_KEY_ID>’,
password => ‘<AWS_SECRET_ACCESS_KEY>’
);

Step 5 - Expose the Clickstream as an External Table

With Glue now reachable, the next move is to tell ADB where the Iceberg table lives. One DBMS_CLOUD.CREATE_EXTERNAL_TABLE call does the heavy lifting — no file lists, no path juggling, and (crucially) no data copied.

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE
    (table_name => ‘CLICKSTREAM’,
     credential_name => ‘CRED_AWS’,
     format => ‘{“access_protocol”: 
                  {“protocol_type”: “iceberg”,“protocol_config”: 
                      {“iceberg_catalog_type”: “aws_glue”,
                       “iceberg_glue_region” : “us-east-1”,
                       “iceberg_table_path” : “customers.clickstream”}}}’
);END;

Run the block and ADB instantly registers Iceberg table behind the scenes. From here on out, analysts can run SELECT statements against CLICKSTREAM just like any regular table — while Kafka continues to ingest without a hiccup.

Your operational tables (PRODUCT_CATALOG, CUSTOMERS, SALES_ORDERS) remain native in Oracle ADB, ready for low-latency OLTP plus cross-joins with the new external Iceberg Table.

Step 6 - Query Everything, Live

Below are three practical queries that blend real-time clickstream events (Iceberg) with your transactional tables to answer everyday e-commerce questions — zero ETL, zero impact on Kafka.

----------------------------------------------------------------------------
----------------------------------------------------------------------------
-- 1) Hot products with high attention but no sales in the last 30 minutes
-- Why mix real-time + relational?
--   • Real-time CLICKSTREAM shows fresh interest (views in last 30m).
--   • Relational SALES_ORDERS is the ground truth for conversions.
--   • Relational PRODUCT_CATALOG provides names/categories for actioning.
-- Joining them lets marketing spot attention spikes that aren’t converting yet.
----------------------------------------------------------------------------
----------------------------------------------------------------------------
WITH recent_clicks AS (
SELECT product_id, COUNT(*) AS views_30m
FROM CLICKSTREAM
WHERE ts > SYSDATE — INTERVAL ‘30’ MINUTE
GROUP BY product_id),
recent_sales AS (
SELECT product_id
FROM SALES_ORDERS
WHERE order_ts > SYSDATE — INTERVAL ‘30’ MINUTE)
SELECT p.product_name, c.views_30m
FROM recent_clicks c
JOIN PRODUCT_CATALOG p ON p.product_id = c.product_id
LEFT JOIN recent_sales s ON s.product_id = c.product_id
WHERE s.product_id IS NULL
ORDER BY c.views_30m DESC
FETCH FIRST 5 ROWS ONLY;
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-- 2. Gold-tier customers who browsed >5 products today but bought nothing
-- Why mix real-time + relational?
--   • Real-time CLICKSTREAM indicates active browsing behavior today.
--   • Relational SALES_ORDERS confirms absence of purchases.
--   • Relational CUSTOMERS adds loyalty tier so outreach can be prioritized.
-- The blend targets valuable customers at risk of churn or needing a nudge.
----------------------------------------------------------------------------
----------------------------------------------------------------------------
WITH today_clicks AS (
SELECT customer_id, COUNT(DISTINCT product_id) AS products_viewed
FROM CLICKSTREAM
WHERE ts > TRUNC(SYSDATE)
GROUP BY customer_id
),
today_buyers AS (
SELECT DISTINCT customer_id
FROM SALES_ORDERS
WHERE order_ts > TRUNC(SYSDATE)
)
SELECT cu.customer_id,
cu.first_name || ' ' || cu.last_name AS full_name,
tc.products_viewed
FROM today_clicks tc
JOIN CUSTOMERS cu ON cu.customer_id = tc.customer_id
LEFT JOIN today_buyers tb ON tb.customer_id = tc.customer_id
WHERE cu.loyalty_tier = 'Gold'
AND tb.customer_id IS NULL
AND tc.products_viewed > 5
ORDER BY tc.products_viewed DESC;
----------------------------------------------------------------------------
-- 3) Real-time conversion dashboard: clicks vs. sales by hour (last 6 hours)
-- Why mix real-time + relational?
--   • Real-time CLICKSTREAM provides top-of-funnel traffic by hour.
--   • Relational SALES_ORDERS provides bottom-of-funnel conversions.
-- Together they yield conversion rates that drive live ops and bidding.
----------------------------------------------------------------------------
WITH hourly_clicks AS (
SELECT TRUNC(ts, 'HH24') AS hour_bucket, COUNT(*) AS clicks
FROM CLICKSTREAM
WHERE ts > SYSDATE - INTERVAL '6' HOUR
GROUP BY TRUNC(ts, 'HH24')
),
hourly_sales AS (
SELECT TRUNC(order_ts, 'HH24') AS hour_bucket, COUNT(*) AS sales
FROM SALES_ORDERS
WHERE order_ts > SYSDATE - INTERVAL '6' HOUR
GROUP BY TRUNC(order_ts, 'HH24')
)
SELECT c.hour_bucket,
c.clicks,
NVL(s.sales,0) AS sales,
ROUND(NVL(s.sales,0) / c.clicks * 100, 2) AS conversion_pct
FROM hourly_clicks c
LEFT JOIN hourly_sales s USING (hour_bucket)
ORDER BY c.hour_bucket;

These statements power dashboards, marketing campaigns, and churn-prevention workflows in seconds — leveraging ADB’s mixed-workload engine to marry real-time streams with trusty OLTP data.

What We Learned

● Tableflow to Iceberg means no bespoke ETL scripts.

● Oracle ADB treats Iceberg as first-class citizen — same SQL, same optimizer.

● The combo scales from tiny side projects to petabyte scale.

Next Steps

Try the LiveLab – Building a Data Lake with ADB.
Read the docs:

Oracle Autonomous Database & Confluent Tableflow: Real-Time Kafka Analytics Without ETL

Alexey Filanovskiy

Product Manager

Automating Refreshable Clones in Autonomous Database: Step-by-Step Guide with Minute-Level Refresh Intervals

You can now schedule an in-place automatic upgrade from 19c to 23ai on Autonomous Database

Oracle Autonomous Database & Confluent Tableflow: Real-Time Kafka Analytics Without ETL

Authors

Alexey Filanovskiy

Product Manager

Automating Refreshable Clones in Autonomous Database: Step-by-Step Guide with Minute-Level Refresh Intervals

You can now schedule an in-place automatic upgrade from 19c to 23ai on Autonomous Database