Building Data Lakes with Oracle: Exploring Innovations and Integrations

June 6, 2023 | 5 minute read
Alexey Filanovskiy
Product Manager
Text Size 100%:

Two potential data lake architectures

In today’s data-driven world, building efficient and scalable data lakes is crucial for organizations looking to unlock the full potential of their data. With the advent of new technologies and innovations, Oracle Autonomous Data Warehouse (ADW) has emerged as a powerful solution for creating robust data lakes. In this blog, we will delve into the latest advancements in Oracle ADW and explore its integration capabilities with other services.

Two potential data lake architectures

When considering a data lake architecture, there are two options: a database-centric approach and an object store-centric architecture. While the database approach offers excellent performance, concurrency, governance, and security, the object store architecture has several distinct advantages:

Reasons for Object Store Architecture:

  • Instant Data Access: Access raw data instantly without the need for time-consuming extract, transform, and load (ETL) processes. Avoid wasting database resources on unnecessary data loading.
  • Multiple-Engine Compatibility: Utilize the same datasets across different analytical engines, such as Spark, Jupyter, and Oracle Autonomous Data Warehouse (ADW). Seamlessly integrate with various tools for enhanced flexibility in data processing and analysis.
  • Multi-Cloud Support: Store data in popular cloud platforms like AWS, Azure, or GCP. Benefit from the freedom to choose the cloud provider that best fits your needs, enabling scalability, vendor diversification, and avoiding vendor lock-in.

Reasons for Database-Centric Approach:

  • High Performance: Benefit from superior performance in terms of query processing, concurrency, and speed. Database-centric architectures are optimized for efficient data retrieval and analysis, ensuring fast and responsive analytics.
  • Governance and Security: Leverage robust governance and security features provided by databases. Maintain data integrity, enforce access controls, and meet regulatory compliance requirements with greater ease.
  • single unified database platform that brings together a rich set of capabilities, including graph analysis, machine learning, spatial processing, JSON support, and in-memory data caching in a converged manner would reduce complexity, improve security, and eliminates the need for various integration points.

In summary, the object store architecture offers advantages such as instant data access, compatibility with multiple engines, and multi-cloud support. On the other hand, the database-centric approach excels at high performance, concurrency, governance, and security.

Breaking News: Cost is No Longer a Barrier!

Oracle ADW’s New eCPU Model Makes Database Storage Cost Comparable to Object Store.

One of the classic arguments to store data outside of the database was Cost-Effectiveness. Object storage is generally more cost-effective compared to traditional databases. It was true, up until recently. Oracle has just announced groundbreaking news that revolutionizes the cost of data lake architectures. With the introduction of the new eCPU model for Oracle Autonomous Data Warehouse (ADW), the cost is no longer a decisive factor in choosing between object store and database-centric approaches.

Traditionally, object store architectures were favored for their cost-effectiveness, while database-centric approaches offered superior performance. However, Oracle ADW’s innovative eCPU model combines the best of both worlds. Now, organizations can enjoy the exceptional performance, concurrency, governance, and security of a database-centric approach, all at a comparable cost to object store architectures.

This game-changing advancement empowers businesses to reap the benefits of ADW’s advanced query optimization, and parallel processing capabilities, without worrying about cost implications. Organizations can now confidently build their data lakes with Oracle ADW, knowing that they no longer have to compromise on the cost to achieve the desired performance, concurrency, governance, and security.

Embracing Object Store-Centric Data Lakes: Unleashing Flexibility with the freedom of choosing engine, Multicloud Compatibility, and Instant Data Access

It’s important to note that object store-centric data lakes still hold significant value in specific scenarios. If your use case calls for an object store-centric architecture, Oracle ADW is well-equipped to fulfill your requirements with the following capabilities:

  • Multiple File Format Support: ADW supports various file formats, including CSV, Parquet, JSON, ORC, and Avro. This versatility enables efficient processing and analysis of data in different structures and formats.
  • Upcoming Iceberg Table Format: ADW is set to introduce support for the Iceberg table format. This format enhances data lake management and scalability.
  • Multicloud Compatibility: One of the key advantages of ADW is its compatibility across multiple cloud platforms. With Oracle ADW, any table or file format can be accessed seamlessly in popular cloud environments such as AWS, Azure, and Google Cloud Platform (GCP). This flexibility empowers organizations to choose the cloud provider that best suits their requirements, leverage specific cloud features, and avoid vendor lock-in.
  • Delta Sharing Protocol: ADW can read shared data using the delta sharing protocol. This allows seamless collaboration and data sharing across different environments and systems.
  • Data Catalog Integration: ADW can derive metadata from Data Catalogs, such as OCI Data Catalog and AWS Glue. This integration streamlines data discovery, and governance, and improves overall data management.
  • Connectivity with Non-Oracle Databases: ADW offers connectivity to other non-Oracle databases, such as Redshift or Snowflake, using database links (DBLINKs).

Furthermore, despite the diverse range of features and integrations, ADW ensures unified security measures across the entire data lake architecture. This includes roles, grants, data masking, row-level security, and column-level security, ensuring comprehensive data protection and access control.

Conclusion

In the modern era of data, an efficient data lake architecture is pivotal for harnessing the full potential of organizational data. The two principal architectures, the database-centric approach, and the object store-centric approach, each bring their unique strengths. While the database-centric model shines with its high performance, robust governance, and superior security, the object store-centric architecture is distinguished by its instant data access, multi-cloud compatibility, and flexibility with various analytical engines.

Traditionally, the choice between these two has often been influenced by cost considerations, with object store-centric models generally seen as more economical. However, Oracle’s groundbreaking eCPU model for its Autonomous Data Warehouse has transformed this narrative. By offering database storage costs comparable to object stores, Oracle ADW is tearing down the barriers to embracing high-performing database-centric architectures.

That being said, object store-centric models continue to hold substantial value for specific use cases. Oracle ADW provides versatile support for such scenarios, including multiple file format support, multi-cloud compatibility, and comprehensive security measures. Future integrations like the Iceberg table format and Delta Sharing Protocol are set to further enhance its offerings.

Decision tree for picking the right tool

Ultimately, the decision between the two architectures depends on your specific needs and use cases. Oracle ADW, with its evolved capabilities, offers flexibility and performance across both database-centric and object store-centric models, thereby empowering organizations to build efficient, cost-effective, and robust data lakes. As the landscape of data storage and management continues to evolve, Oracle ADW’s innovative solutions promise to keep organizations at the cutting edge of data-driven decision-making.

Hands-on

Interested in getting hands-on? Explore our Live Lab workshop, which provides guidance on constructing Data Lakes using the Autonomous Database.

Alexey Filanovskiy

Product Manager

My role involves overseeing the product management of Data Lake features for the Oracle Autonomous Database.


Previous Post

Essbase 21c single URL access from SmartView

Ashish Jain | 3 min read

Next Post


Essbase 21.5 Release Update for Linux, Windows, and OCI Marketplace listing is available

Tanya Heise | 1 min read
Everything you need to know about data warehousing with the world's leading cloud solution provider