OSS data to ADW in almost real-time

January 31, 2020 | 3 minute read
Igor Aragao de Souza
Principal Big Data Consultant
Text Size 100%:

Hey you.

Did you know that it is possible to get the Oracle Streaming Service (OSS) data and put in Autonomous Data Warehouse (ADW) in almost real-time just using Object Store? And more, you can do it without code.


  • OSS instance;

  • ADW instance;

  • Object Store Bucket;


How does this work?

The idea is that we can use Apache Flume to write the data in files and store in Object store than we can use ADW to create an External Table from these files and with this we have the data in ADW in almost real-time.

You can play around with the number of files and how often they will be written to achieve real-time.


So let's divide into 2 steps;

  1. Read data from OSS and write in Object Store
  2. Create the external table using the Object Store file path


1-) Read data from OSS and write in Object Store

Assuming that you already have the OSS with some content.

Option 1:

I don't want to re-invent the wheel, so you can have a look at Publishing To Object Storage From Oracle Streaming Service.


if you followed the tutorial you will see that you need a "connector-config.json" where you add the connector configurations.

You can check the connector properties here and change how often the file will be written.

Example: Streaming the content of a log file1.json;

This connector configuration will generate 3 files and each file is encoded as


You should see:

and if you check file 1 oci os object get -ns mynamespace -bn mybucket --name myfile.txt --file - you will see

the properties partitioner.class can be io.confluent.connect.storage.partitioner.TimeBasedPartitioner and here you can play around and configure to create one file a day or one file every 1 minute and with this change how real-time your solution will be.

You can have a look at the Amazon S3 Sink Connector for Confluent Platform docs here.

Option 2: using Apache Flume


Download Apache Flume here.

Untar the downloaded in any place of your preference.

tar xzf apache-flume-1.9.0-bin.tar.gz

you need setup the JAVA_HOME


we are going to use Flume Kafka source and HDFS sink.

HDFS sink doc

Kafka source doc


because we are using HDFS sink you need to add a Hadoop jar in the Flume lib folder.

You can download here

The idea here is to use the Oracle Object Store Amazon S3 Compatibility.

For this, we will use Fuse and you can check this tutorial to see a step by step Fuse configuration


Create a jaas file in /home/user/path/to/flume/conf/flume_kafka_jaas.conf

Create a Flume conf file in home/user/path/to/flume/conf/conf.properties

now you can run Flume with the command:

change the file names and path for whatever you used before.


as a example you can use a1.sinks.oci1.hdfs.path = /data/kafka/%{topic}/%y-%m-%d and this will generate subfolder with one file a day, moth and year.

%{topic} >> Kafka topic name

%y-%m-%d >> year-month-day


2-) Create the external table using the Object Store file path

Assuming that you already have the ADW ready, you just need to create the external table.

check here for more info.

you can add several files in file_uri_list and you can play with Table Partition as well.

file_uri_list Doc

format Doc

you can change the column_list and create some specif columns that you want.

Note: This idea is perfect for the Big Data scenario using HDFS and Hive. Oracle is about to release the new Big Data PaaS solution (BDS - Big Data Service) and when this happens I will generate a second version of this tutorial using HDFS and Hive.



The frequency of files that you write in the Object Store is how real-time your solution will be.


if you write one file a day your real-time is 1 day;

If you write one file every minute your real-time is 1 minute;


You can create one table partition for each file or you can use several files for one table partition. It's a trade-off between a number of files and table partitions that will give you a better performance on the SQL command.


you can create the table using one file and the SQL performance will be good until the file became a big one.

if you create more files you can create some table partition using different files and than you can query by partition when the volume of data increase.

Igor Aragao de Souza

Principal Big Data Consultant

Principal Big Data Consultant, Developer Advocate, Streaming Evangelist, Brazilian Geek, Coffee lover, Sepultura Fan, Maker and Hockey Player. Based in Dublin.

My hobbies are playing with IoT, electronics, Twitter sentiment analysis, electric guitar and In-line Skate/Hockey.

I'm current have a project to Transforming Donald Trump’s Tweets in cash.

I also run a monthly meet-up for Brazilian IT professionals in Dublin with an average of 80 attendees. Volunteer mentor and assisting Coder Dojo in Rathmines.

Previous Post

Getting Started With RabbitMQ In The Oracle Cloud

Todd Sharp | 4 min read

Next Post

Podcast #375: Tech Predictions - 10 for 2020

Bob Rhubart | 3 min read