X

Enabling Real-Time Analytics With Database In-Memory

New Database In-Memory Features in Oracle Database Release 12.2

Tirthankar Lahiri
Vice President, Data and Inmemory Technologies

Oracle Database 12.2.0.1 was made generally available on March 1, 2017. (Note that Oracle Database 12.2 has also been available on Oracle Cloud since late
September 2016.) Oracle Database 12.2  includes major enhancements to
Database In-Memory covering four general areas: Performance,
Availability, Capacity, and Ease of Use. This post is a high level
summary of these enhancements.

1. Performance: Database In-Memory allows real-time analytics by
processing data at the rate of billions of rows per second. With the
Oracle Database 12.2 release, the following Database In-Memory features provide further performance improvements for more use cases:

a) Join Groups: With Oracle Database 12.2., it is possible to declare
that two tables will be joined on a set of columns using a new DDL
construct known as a Join Group. Doing so enables a variety of join
processing optimizations, such as co-encoding the compressed join
columns with the same dictionary so that joins can be performed on
compact dictionary codes instead of on values. Join groups have been
shown to provide an additional 2-3x performance boost on top of already
very fast in-memory joins.

b) In-Memory Expressions: While Database In-Memory speeds up analytic
data access by orders of magnitude, another cost of Analytic workloads
is Analytic data processing or computation. In-Memory Expressions
attempt to substantially reduce the cost of analytic data processing by
using the column store to materialize the result of commonly evaluated
expressions. For instance, if an application frequently computes the net_price as (price * ( 1  - discount) * (1 + tax)) it is possible to
declare this net_price expression as an inmemory virtual column. The result of
the evaluation is then stored within the column store, and is maintained
along with other columns as the base data columns change. In addition
to explicitly declared inmemory virtual columns, it is also possible to
automatically detect frequently evaluated expressions using a mechanism
known as the Expressions Statistics Store. Omitting the repeated reevaluation of complex expressions has been
found to improve the performance of complex queries by 5x or more

c) In-Memory JSON columns: With Database 12.2, JSON values are stored
in a highly optimized inmemory format within the inmemory column
store.  This allows all the machinery of the inmemory columnar engine to
be applied to JSON queries. In-Memory JSON can provide speedups of up
to 60x
for queries against semi-structured JSON data (e.g. Find all books where books.author.name contains "Kafka" ).

2. Availability: In Oracle Database 12.1, the column store must be rebuilt (populated) on instance restart, although the database is immediately available and queries need not wait for population to complete. While the column store is being rebuilt, queries may encounter some loss of performance. With Exadata and Supercluster, in-memory fault tolerance allows an object to be populated into memory of more than one instance in a RAC cluster (similar to storage mirroring) so that there is no performance downtime while an instance is being restarted. Oracle Database 12.2 provides the following additional enhancements to increase overall column store availability:  

a) In-Memory Fast-Start: Ordinarily when an instance is restarted,
the inmemory column store must be rebuilt from scratch, a process
referred to as inmemory populate. The process of populating the column
store can be CPU intensive since it must convert row major data into
compressed columnar data. With Oracle Database 12.2, the In-Memory
Fast-Start
mechanism can significantly reduce the total time required
for population by keeping a checkpointed copy of the column store on storage (using Secure File lobs preferably on a Flash-based tablespace). As a result when the instance is restarted, the checkpointed
copy can be read back into memory without requiring any further
transformation of the data. In-Memory Fast-Start can provide a 3x
reduction
in the time required to instantiate the column store.

b) In-Memory on Active Data Guard: With Oracle Database 12.2 on Exadata,
Supercluster or on Oracle Public Cloud PaaS offerings (e.g. DBaaS - Extreme Performance, Exadata Cloud Service,  Exadata Cloud Machine), it is now
possible to use the Active Data Guard standby for inmemory query
processing. In those environments it is possible to designate tables and
partitions to be populated into memory on the Active Data Guard
standby, resulting in orders of magnitude improvement for analytic
queries running on the standby database. In-Memory on Active Data Guard can significantly improve column store availability: When the same objects are  brought
into memory on both the primary site and the standby site, inmemory
queries can be run on either site regardless of temporary outages of
either site. This is effectively another form of in-memory fault-tolerance across the primary and the standby sites.  

3. Capacity: The following features in Oracle Database 12.2 help to increase the total effective inmemory columnar capacity, 

a) In-Memory on Active Data Guard: The same feature described in 2(b) above can also help to
increase the total columnar capacity of a deployment featuring a Primary
database and one or more Active Data Guard Standbys. This is because it
is possible to populate completely different objects into the column
store of a Standby database. For instance, it is possible to populate the
partition for the current month of orders in the Primary database, and
to populate the partitions for the prior one year of orders into the
memory of a Standby database. Queries on the one month partition must run on the Primary database, while queries on the last one year of partitions must run on the Standby database. This mechanism thus provides a much larger effective column
store across the two databases.

b) In-Memory on Exadata Flash Cache: The 12.2 release of Exadata cell
software features a format referred to CELLMEMORY, that is identical to
the format of INMEMORY  tables on DB nodes. Using the same inmemory
optimized algorithms to format and access data on much larger (10-100x
larger) Flash allows a much larger effective inmemory column store
capacity for the system as a whole. Very hot tables and partitions that
can fit in memory should still be populated into the In-Memory column
store on DB nodes, while tables of intermediate activity and very large tables
should be kept in the CELLMEMORY format on Storage nodes. As a result, scans offloaded to Exadata storage servers can benefit from the same inmemory optimizations as inmemory queries on the
DB nodes.

Note that this feature requires the 12.2 release of Exadata Storage Server software, but works with the 12.1 release of the database. Use of the 12.2 release of the database enables additional performance optimizations for offloaded CELLMEMORY scan processing. 

4. Ease of Use: Database In-Memory is a very easy to use inmemory
technology, requiring only that the user specify the INMEMORY attribute on
selected tables, partitions, and sub-partitions that are required for real-time
in-memory processing. Once an object is no longer required in the column store,
it can be marked NOINMEMORY in order to evict it in order to make room for more
important objects. Even so, further improvements were made in Oracle Database 12.2
so that system administrators can manage the column store even more seamlessly.

a) Dynamic In-Memory area: In Oracle Database 12.1, the size of the inmemory area  (specified by the parameter inmemory_size) is static and cannot be altered without restarting the instance. Oracle Database 12.2, the current size of the inmemory area can be dynamically increased without
requiring the instance to be restarted. This allows a DBA to accommodate
an increase in the total size of the objects that need to be populated
into the inmemory column store.

b) Automatic In-Memory Policies:  Oracle Database 12.1  introduced Automatic Data Optimization
to perform automated Information Lifecycle Management (ILM) on database
objects. Automatic Data Optimization allows the user to define custom
ILM storage-tiering policies, for instance, to change the compression
and storage properties of objects as accesses to the objects cool with
time.

With Oracle Database 12.2, the Automatic Data Optimization feature is
extended to allow automated policy based management of the inmemory
column store. It is now possible to set policies that:

  • Increase the inmemory compression level for a table as the table
    cools down. A  table that is heavily modified should be minimally
    compressed, a table that is heavily read and less frequently modified
    can be compressed optimally for query processing performance, and a
    table that is infrequently accessed can be compressed to maximize space
    savings.
  • Evict a table altogether from the column store after a certain
    number of days or after a certain period of inactivity. This is ideal
    for sliding window access patterns – for instance a near-term reporting
    application against data partitioned by week, but that operates only on
    the last four weeks of activity. Once data crosses the 1 month threshold
    it can be evicted from the column store in order to make room for new
    data. 

Using Automatic Data Optimization therefore, a system administrator
needs to perform the one-time task of setting up the inmemory ILM
policies after which the inmemory column store becomes self-managing.

In conclusion, Oracle Database 12.2 provides many improvements with
Database In-Memory for Performance, Availability, Improved Capacity, as
well as Ease of Use. We are looking forward to having our customers try
it out! 





Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services