Learn about data lakes, machine learning & more innovations

Integrating Autonomous Data Warehouse and Big Data Using Object Storage

Peter Jeffcock
Big Data Product Marketing

While you can run your business on the data stored in Oracle Autonomous Data Warehouse, there’s lots of other data out there which is potentially valuable. Using Oracle Big Data Cloud, it’s possible to store and process that data, making it ready to be loaded into or queried by the Autonomous Data Warehouse. The point of integration for these two services is object storage which I will explore below. Of course, you need more than this for a complete big data solution. If that's what you're looking for, you should read about data lake solution patterns.

Sign up for a free trial to build and populate a data lake in the cloud

Use Cases for the Data Lake and Data Warehouse

Autonomous Data Warehouse and Big Data

Almost all big data use cases involve data that resides in both a data lake and data warehouse. With predictive maintenance, for example, we would want to combine sensor data (stored in the data lake) with official maintenance and purchase records (stored in the data warehouse).

When trying to determine the next best action for a given customer, we would want to work with both customer purchase records (in the data warehouse) and customer web browsing or social media usage (details of which would most likely be stored in the data lake). In use cases from manufacturing to healthcare, having a complete view of all available data means working with data in both the data warehouse and the data lake.

The Data Lake and Data Warehouse for Predictive Maintenance

Take predictive maintenance as an example. Official maintenance records and purchase or warranty information are all important to the business. It may be needed for regulators to check that proper processes are being followed or for purchasing departments to manage budgets or order new components.

On the other hand, sensor information from machines, weather stations, thermometers, seismometers, and similar devices all produce data that is potentially useful to help understand and predict the behavior of some piece of equipment. If you asked your data warehouse administrator to store many terabytes of this raw, less well-understood, multi-structured data, they would not be very enthusiastic. This kind of data is much better suited for a data lake, where it can be transformed or used as the input for machine learning algorithms. But ultimately, you want to combine both data sets to predict failures or a component moving out of tolerance.

Examples: How Object Storage Works with the Data Warehouse

We talked previously about how object storage is the foundation for a modern data lake. But it’s much more than that. Object storage is used, amongst other things, for backup and archive, to stage data for a data warehouse, or to offload data that is no longer stored there. And these use cases require that the data warehouse can also work easily with object storage, including data in the data lake.

Let’s go back to that predictive maintenance use case. After being loaded into the data lake (in object storage) the sensor data can be processed in a Spark cluster spun up by Oracle Big Data Cloud. “Processing” in this context could be anything from a simple filter or aggregation of results to a running a complex machine learning algorithm to uncover hidden patterns.

Once that work is done, a table of results will be written back to object storage. At that point, it could be loaded into the Autonomous Data Warehouse or queried in place. Which approach is best? Depends on the use case. In general, if that data is accessed more frequently, or performance of the query is more important, then loading into the Autonomous Data Warehouse is probably optimal. Here you can think of object storage as another tier in your storage hierarchy (note that Autonomous Data Warehouse already has RAM, flash, and disk as storage tiers).

We can also see a similar approach in an ETL offload use case. Raw data is staged into object storage. Transformation processes then run in one or more Big Data Cloud Spark clusters, with the results written back to object storage. This transformed data is then available to load into Autonomous Data Warehouse.

Autonomous Data Warehouse and Big Data Cloud: Working Together

Don’t think of Oracle Autonomous Data Warehouse and Oracle Big Data Cloud as two totally separate services. They have complementary strengths and can interoperate via object storage. And when they do, it will make it easier to take advantage of all your data, to the benefit of your business as a whole.

If you're interested in learning more, sign up for an Oracle free trial to build and populate your own data lake. We have tutorials and guides to help you along. 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.