Introduction
The term online analytic processing (OLAP) describes a set of expectations or requirements that together allow data analysts to efficiently analyze data across multiple dimensions, each with their own aggregation and navigation paths. These expectations also include the ability to define calculations across dimensions and within aggregations, as well as very fast query response times.
Dimensional models present data to end users and application developers using an easily understood semantic model.
End users intuitively understand the concept that sales, for example, might vary by time, product, geography, and distribution channels. A social scientist understands that population characteristics vary by time, location, and demographics characteristics such as age, income, and gender. These users easily understand the idea that data is available at detailed and aggregate levels.
Application developers use dimensional models to organize data for query, to aggregate data and provide navigation paths using hierarchies, and to extend the model with calculations.
Databases that support dimensional models include dimensionally aware query languages or extensions to standard languages such as SQL, as well as rich structural and descriptive metadata.
Implementations of dimensional models have varied widely over time and requirements. This post examines the evolution of OLAP and dimensional modeling in the Oracle Database in the context of the broader market.
The Early Years
Early implementations of many decision supporrt databases were stand-alone multi-dimensional databases, where ‘multi-dimensional’ meant databases built for dimensional analysis, rather than the transaction processing use case of a standard relational database.
Stand-alone Multidimensional Database Rule the OLAP Universe
The earliest example of an OLAP database was MDB from Management Decision Systems. MDB included data storage, a semantic model, and a language that was designed for dimensional analysis. MDB was acquired by Information Resources Inc. and renamed IRI Express Server.
IRI Express Server supported dimensional reporting and budgeting/planning database use cases. The dimensional model and supporting query language was ideal for self-service reporting. A few simple commands could be used to filter, aggregate and report on data. Express Server also had support for long-running ‘sandbox’ sessions to run ‘what if?’ types of analysis, a statistical forecasting engine, and complex systems of simultaneous equations, making it an ideal platform for budgeting and planning, and financial consolidation applications.
IRI Express Server was joined in the market by Arbor Software’s Essbase (1992). Essbase is also a dedicated multi-dimensional database with its own storage, model, and language. Essbase was designed to appeal to financial analysts and was at the time, and still is, a strong platform for budgeting, planning, and financial consolidation.
Although what we now know as OLAP software existed for many years, the term OLAP was coined by E.F. Codd in 1993, who was best known at the time as one of the inventors of the relational database. E.F. Codd argued that the requirements of OLAP and transaction processing applications necessitated their own specialized databases. These multi-dimensional databases become known as multi-dimensional OLAP (MOLAP) databases
Microsoft joined the party with Microsoft OLAP Services in 1998, which was shipped as a companion to the Microsoft SQL Server relational database. Microsoft OLAP Services was separate from SQL Server, with a separate server and data store. Microsoft OLAP Services became Microsoft Analysis Services in 2000.
The Beginning of Relational OLAP
At the same time, middle tier business intelligence servers such as Business Objects, Cognos, and most notably MicroStrategy were all playing in the multi-dimensional query and reporting space. These products blended dimensional style reporting with relational style reporting in the middle tier, with data stored in relational tables. MicroStrategy was most focused on OLAP, promoting the term ROLAP (Relational OLAP).
MOLAP vs ROLAP
The multi-dimensional (MOLAP) servers had the advantages of specialized, and more powerful, calculation engines, and specialized multi-dimensional data types and files. The relational OLAP (ROLAP servers) had the advantages of the scalability of relational databases and real-time access to data in relational tables. ROLAP solutions typically supported, but did not require, pre-aggregation of data (with the expected performance trade-offs).
To provide very fast query response times, often sub-second, MOLAP servers used different forms of array-based storage which are not dissimilar to the columnar formats used by modern relational databases. They also relied on pre-aggregating data. Together, these allowed MOLAP servers to provide excellent query performance at a time when both CPU and memory were in very short supply.
The compromise made by MOLAP servers was scalability and latency. Building (loading and aggregating) MOLAP cubes takes time and computing resources. The cube build process limited the data volumes (data loaded and aggregated) and model complexity (numbers of dimensions, hierarchies, and levels). The term ‘aggregate explosion’ was often used to describe the growth in load times and sizes of MOLAP databases. However, applications that fit within these limits were well rewarded with extremely fast query performance and computational capability.
Oracle Acquires IRI Software
Oracle entered the OLAP market with the acquisition of IRI Express Server in 1995, as well dimensional reporting and financial analysis applications which were were rebranded at Oracle Sales Analyzer and Oracle Financial Analyzer.
Oracle acquisition of this OLAP technology was to complement the Oracle’s push into the data warehouse space. Express Server was the MOLAP solution while the Oracle relational database was the platform for ROLAP solutions.
The Middle Years
Oracle Express Server added a hybrid model (HOLAP) with Express Relational Access Manager (RAM). Express Server + RAM could access data at runtime from tables and/or load and aggregate data in the Express cube. Often, higher aggregates (for example year and country level data), which are the most expensive to query, were cached in the Express cube and lower-level data was accessed dynamically, a true double threat. Express Server with RAM provided full access to the Express calculation engine, eliminated or reduced cube build time and size, and provided great query performance.
Oracle Express Server was fully embedded in release 11.2 of Oracle Database, as the Oracle OLAP Option, making Oracle the first and only converged relational and dimensional database. The OLAP Option kept its multi-dimensional ‘engine’, data types, programming language, and transaction model. It added SQL query of multi-dimensional data types. This represented a significant advancement in multi-dimensional server technology. (Relational Access Manager was not included with the OLAP Option.)
Arbor Essbase introduced the Aggregate Storage Option (ASO) which focused on dimensional reporting. ASO was a new storage format that dramatically improved scalability and reduced cube build times. ASO was used for dimensional reporting, while the traditional block storage option (BSO) cubes were used for budgeting, planning, and ‘what if?’ use cases that require write-back.
The Oracle OLAP Option compressed cubes in Oracle Database release 12.1. Compressed cubes were another major advancement, building 10-20 times faster than uncompressed cubes, thus allowing for larger data volumes and increased model complexity, and they were 100% compatible with the programming and query language of the OLAP Option.
Like Essbase BSO, OLAP uncompressed cubes were used for budget/planning and ‘what-if?’ with write back, while Essbase ASO and OLAP Option compressed cubes were used for dimensional reporting and analysis.
While both the OLAP Option and Essbase could support both types of applications, the OLAP Option was used mostly for dimensional reporting, and Essbase remained strong in budgeting, planning, and financial consolidation. For example, Essbase is the engine used in the Oracle Enterprise Performance Management (EPM) planning, budgeting and forecasting modules.
At the same time, the business intelligence middle tiers were busy connecting to lots of different data sources, blending data in their semantic layers, and often caching data in the middle tier for improved query performance, or for other purposes.
One More Twist
So far, we’ve seen multi-dimensional databases with multi-dimensional data stores, hybrid multi-dimensional databases that access detail data in relational tables and cache aggregate data in MOLAP data stores, and the introduction of specialized MOLAP supporting much larger data sets in dimensional query and reporting use cases.
In Oracle Database release 12.2, the OLAP Option introduced the cube-organized materialized view. Data was loaded into the OLAP Option compressed cube and accessed transparently by SQL applications querying relational data using materialized view query rewrite. This allowed the OLAP compressed cube (MOLAP) to act as the aggregate cache for queries selecting from relational tables. Perhaps we could call this a reverse HOLAP implementation.
Column Store Databases
Another trend was emerging in relational databases; Column Stores. Vertica (2005) and SAP HANA (2010) were prime examples. Column Store databases promised query performance that was like MOLAP cubes, but without the cube build.
In response, Oracle introduced hybrid columnar compression (HCC) on Exadata in 2009, hybrid columnar compression in Exadata Flash Cache in 2019, and the Database In-Memory Option (Oracle 12.1 in 2014) with in-memory column store tables. Column Store tables optimized scan and filter operations while In-Memory Aggregation, part of the Database In-Memory option, optimized joins and aggregation in the star query use case.
(Read Andy Rivenes’s post Columnar Formats in Exadata Flash Cache for an excellent summary of this topic.)
Cheap(er) Computing and OLAP
In addition to advances in Oracle Database technology, computing resources were becoming cheap and plentiful. Servers with large numbers of CPU cores and large amounts of DRAM became commonplace, enabling high levels of parallelism and large in-memory columnar tables. Or, in the case of Exadata, query offload to columnar tables on the Exadata Storage Server.
The convergence of these factors allowed Oracle with the Database In-Memory option to process dimensional queries (star queries) ten to a hundred times faster (or sometimes even more) than row store tables, without pre-built aggregations.
The next picture that should be coming into focus is one where Oracle Column Store tables and In-Memory Aggregation could provide OLAP-like query performance without building separate, multi-dimensional cubes.
Modern Dimensional Analysis in the Oracle Database
What about the dimensional model? That’s where Oracle Analytic Views come in. Analytic Views provide a dimensional semantic model similar to the OLAP Option’s dimensional model. Analytic Views include dimensions, hierarchies, levels, attributes, base measures, and calculated measures. Like the OLAP Option, the model is deterministic, meaning the semantic model determines the calculated value of every data value (aggregations and measure calculations), regardless of the style of SQL used to query the analytic view.
Separation of Dimensional Model and Data Storage
The crucial difference with the Analytic View is that the semantic model is directly layered over storage in the database (that is, tables and views). The Analytic View does not store data or require a cube build, and thus can include an unlimited number of dimensions, hierarchies, levels, base measures, and calculated measures, and it can be layered over tables of any size.
The Analytic View includes several components:
- The semantic, dimensional model. This is metadata stored in, and accessible from, the Oracle Data Dictionary. The model is defined using SQL DDL.
- An expression language, used to define calculated measures.
- Query interfaces, including both hierarchical and relational style SQL, and MDX (a specialized multi-dimensional language used by applications such as Excel pivot tables).
- A SQL generator, which generates SQL used for execution in response to queries that select from the Analytic View.
If you like, you can think of the Analytic View as a Business Intelligence server, focused on a dimensional model, embedded in the Oracle Database. (I prefer that over refering to the Analytic View as a cube.)
This separation of the dimensional model and storage layers has many advantages:
- The dimensional model is highly scalable (with regard to numbers of dimensions, hierarchies, measures, etc.)
- There is no required cube build or pre-aggregation.
- Analytic Views can be layered over tables and views, including external tables that access files on cloud storage (including CSV, Parquet, and other formats).
- Because Analytic Views can be layered over any table or view, it is possible to layer an Analytic View over just about anything in the database (using appropriate data types)
This provides tremendous flexibility. It also means that performance will vary depending on the underlying implementation.
Putting all the Pieces Together
The best possible OLAP-like performance will be when Analytic Views are layered over in-memory column tables or Exadata column store tables, in a star schema. This allows all the fast scan, filter, join and aggregation optimizations to be used in query execution.
At the other extreme are collections of views over row store tables in a transaction schema requiring lots of joins, or external tables over files in a faraway cloud store. These will require extra processing and data access times, likely disabling join and aggregation optimizations that work in a star schema.
The great thing about the separation of the dimensional model and storage is that you are not locked into any one implementation. You can choose the implementation that best balances all your requirements. And the implementation can change over time, even within a partitioned table, without changing how the Analytic View presents data to an application.
Remember the discussion about Oracle Express Server and Relational Access Manager? Enter the Analytic View materialized cache. The materialized cache allows you to aggregate subsets of data using one or more tables in the local database instance. This can be done automatically or at the direction of the analytic view designer. Those tables are then used as aggregate caches by the Analytic View.
I’ll save the details of the materialized cache for a later post. For now, let’s just say that the materialized cache can offer a tremendous performance boost, and they are smaller and build or update more quickly than a typical materialized view. They help the slowest data sources and the most expensive queries the most.
Now you can see all the pieces of a modern dimensional model in the Oracle Database:
- Exadata column store and In-Memory column store for compressed storage, and fast scan and filter.
- In-Memory Aggregation, with join and aggregation optimizations.
- Analytic Views, providing the dimensional model and query semantics.
- The Analytic View materialized cache, as an optional optimization.
- SQL and MDX query interfaces, with REST supported over SQL (just like any other view).
Should OLAP Option Users Switch to Analytic Views?
If you are an OLAP Option user, the question you probably have now is whether you should switch to Analytic Views. The answer depends on your use case.
If you are using OLAP for dimensional queries and reporting, and your calculated measures do not use OLAP DML programs, Analytic Views might be a good fit. You will likely find Analytic Views provide similar performance and function, without the cube build and related limitations.
If you are using OLAP DML programs, you might use Analytic Views and the dimensional query and reporting layer, but calculation logic might need to be implemented elsewhere.
If you are using the OLAP Option for ‘what if?’ analysis, you will need to refactor your application to implement rewrite back to tables. The Analytic View might be used as the dimensional query and reporting layer.
The Final Twist
MOLAP, ROLAP, HOLAP, ‘reverse’ HOLAP … what else?
One of the great things about Analytic Views is that they allow you to define a dimensional model in the database where it can be accessed by any number of applications, allowing the users of each application all to get consistent results. Same model, same data. That includes Essbase. Essbase Federated Partitions can access data in the Oracle Database through Analytic Views. That’s about as hybrid as OLAP gets!
First on Autonomous Database
If you want the very latest and greatest features in Analytic Views, Autonomous Database is the place to be. New Analytic View features are released on Autonomous Database first, often well in advance of the next on-premise release.
Summary
For dimensional query and reporting applications, Analytic Views provide a modern implementation of a dimensional model over data that is stored elsewhere in the database or cloud storage (or some mix thereof).
Analytic Views provide a strong dimensional model, with support for dimensional and hierarchical calculation expressions, relational and dimensional query semantics, and OLAP-like performance when used with columnar tables and the Analytic View materialized cache. All without the scalability limitations of loading, building and aggregating OLAP cubes.
Applications that use OLAP cubes that are built using OLAP DML or make use of OLAP DML programs and formulas will likely continue with OLAP, at least for the time being. Over time, they might switch to Essbase or other solutions.
Learn More
Analytic Views are included in every edition of the Oracle Database as a no-added-cost feature.
There are two great places to learn about Analytic Views for free.
Oracle Autonomous Database in the Oracle Cloud Infrastructure Free tier is a great place to get started. Begin with a simple model, just a few dimensions, hierarchies, and measures, and build up from there once you learn the concepts.
Oracle LiveSQL provides an environment where you can learn just about any SQL, including Analytic Views. Search for ‘Analytic Views’ to access tutorials on creating, querying, and enhancing Analytic Views with calculations, multiple language support, and more.
