Thursday Apr 17, 2014

Speed, Simplicity, Saving. ZFS Storage Appliance

Your business can run faster and your IT operations can be more streamlined with the right storage.
Get the facts about Oracle’s latest generation ZFS Storage Appliance, the ZS3 Series.

See the infographic video here:

Get better performances with Oracle ZFS Storage Appliance: ask us for further informations!

Tuesday Sep 03, 2013

Optimized Storage for DB Workloads (part 2)

In a previous issue of this blog, we have analyzed why the Oracle Sun ZFS Storage Appliance is an excellent storage system to host an enterprise database. Because of its unique controller architecture based on the Hybrid Storage Pool (HSP), it guarantees best in class performance and ease of integration. In this second part of the blog, we are going to address another critical requirement in today’s Enterprise database implementations: compression.

Best in class compression capabilities is a must if we consider the current growth of structured data in the IT centers. While traditional compression algorithms typically found in disk subsystem controllers can do a good job, a more specialized compression mechanism aligned with the database structure can go much further in terms of data reduction.

At the top of that, compression is not just a matter of saving space in a database environment:

  • It also has a great side effect when properly implemented: it reduces IO traffic between storage and the database node. The point is that disk drive capacity and IO capability are quite unbalanced. We can store massive amounts of data in today’s HDD, but the IO capabilities of one given disk have remained basically unchanged in the recent past. At the same time, the more data we store, the more IOs we need from a given disk. Therefore, it is often more important to pay attention at decreasing the IOs from a storage system rather than increasing its capacity.
  • It increases efficiency not only in terms of data reduction on disk, but also for the backup tasks, Disaster Recovery replication, memory usage (SGA) and ARC/L2ARC caches when using the Sun ZFS Storage Appliance. On that aspect, it goes far beyond what a traditional compression or deduplication approach at disk controller level can achieve.

Compression has always been an important aspect of an Oracle database environment. The table below summarizes the different types of compression which are available in an Oracle database and provides for each of them:

  • the database level needed to use the compression capability;
  • what is required in terms of storage or database option to use it;
  • what type of compression it operates;
  • what type of operation the compression applies to.

Compression DB Level   Requirements Type of Compression  DB Operations 
Index Compression 8i 

  • All types of storage
  • Part of the core product, no specific option required 

Row compression on indexes through deduplication of the index leading columns Compression on indexes when created or rebuild an index to compress it.
Basic Table Compression 9i
  • All types of storage
  • Part of the core product, no specific option required
Row compression on tables through deduplication of repeated values Only applies to bulk operations, i.e. compression is not maintained during DML operations such as INSERT, UPDATE and DELETE, so basic compression should be reserved for data which is mostly read-only.
Advanced Row Compression** 11gR1
  • All types of storage
  • Requires the separately licensed ACO*
Row compression at block-level through deduplication of repeated values Compression is maintained during DML operations (INSERT and UPDATE) with some restrictions (compression is triggered when new records reach a threshold).
RMAN Backup Compression 10g
  • All types of storage
  • Requires the separately licensed ACO*
Compression levels are BASIC (BZIP2), LOW (LZO), MEDIUM (ZLIB) and HIGH (BZIP2) Backup data is compressed before being written to disk or tape and read back directly in compressed format (only for backup sets).
SecureFile Compression and Deduplication 11gR1
  • All types of storage
  • Requires the separately licensed ACO*
Compression is based on industry standard compression algorithms Deduplication is performed first on SecureFile data, then compression and finally encryption if required.
Data Pump Export Compression 11gR1
  • All types of storage
  • Requires the separately licensed ACO*
Data Pump compression provides comparable reduction to standard gzip algorithms Inline compression during export and import of dump files.
Data Guard Redo Transport Compression 11gR1
  • All types of storage
  • Requires the separately licensed ACO*
Data Guard Redo Transport Compression uses the zlib compression engine at level 1 (equivalent to gzip -1 level) Data Guard Redo data is transmitted in a compressed format to reduce network bandwidth and improve performance.
Hybrid Columnar Compression 11gR2
  • Only available on Exadata, ZFS Storage Appliance and Pillar
  • No license required
Hybrid row/column compression that comes in 4 flavors: Query Low (LZO) and Query High (ZLIB) for warehouse compression, Archive Low (ZLIB) and Archive High (BZIP2) for archive compression Hybrid Columnar Compression is used on bulk operations. When DML is performed on HCC compressed data, new or modified rows are stored uncompressed or possibly compressed by OLTP Compression if the option is available. HCC is intended for dormant or possibly near-dormant data.

Advanced Compression Option (ACO) in Oracle Database Enterprise Edition. We do not mention in the table the following ACO options: Flashback Data Archive (11g), Heat Map (12c), Automatic Data Optimization (12c), Advanced Network Compression (12c), Storage Snapshot Optimization (12c) and Online Move Partition (12c).

** Starting with Oracle Database 12c, OLTP Table Compression is called Advanced Row Compression.

The Advanced Compression Option in the recently announced Oracle 12c brings several new compression features including Heat Map and Automatic Data Optimization (ADO). Heat Map is a new Oracle Database feature that collects usage information at the block and segment levels, enabling further automated compression and movement of data which becomes less accessed and used over time. Automatic Data Optimization is the policy engine that performs compression and storage tiering of table partitions according to the data maintained in the Heat Map. Compression can be any type of compression supported by the Oracle Database including Advanced Row Compression (formerly called OLTP Table Compression) and all levels of Hybrid Columnar Compression (HCC).

The Advanced Row Compression introduced in Oracle DB 9iR2 was a great improvement compared to the previous Basic Table Compression as it supports conventional DML (Data Manipulation Language) operations such as INSERT and UPDATE. Furthermore, the database does not require data to be uncompressed and keeps data compressed in memory. This means that performance can be improved on reads since fewer IOs are performed while memory can be more efficiently used. It must be noted that Advanced Compression Options (ACO) regroups several different types of compression: Advanced Row Compression, RMAN Backup Compression, SecureFile Compression, Data Pump Export Compression and Data Guard Redo Transport Compression (as well as several new options introduced in Oracle Database 12c).

Hybrid Columnar Compression (HCC) goes to the next level in terms of compression ratios compared to the other compression techniques. Typically, you will get a compression ratio between 1:5 and 1:20 as opposed to 1:2 and 1:4 for Basic Table Compression and Advanced Row Compression. HCC can be used with two compression modes: Query and Archive. The Query compression mode will provide faster decompression for data which is relatively frequently accessed but a lower compression ratio. On the other hand, the Archive compression mode will provide slower decompression for data which is not often accessed but a very significantly higher compression ratio.

Let’s now have a deeper look at how Hybrid Columnar Compression works and why it can bring so much value for compressing warehouse and archive data on Oracle’s storage.
Imagine that we have a 7-row and 6-column table as shown on the left and that we want to compress it with HCC. The first thing to understand is how such a table would be stored in an Oracle database without any compression applied. The Oracle database splits table data into “blocks”, each block containing data in the row format, i.e. the first row is sequentially written into the block, then the second one is appended and so on. If all the rows do not fit into one block, no problem, a second block is simply chained to the first one. So our table is going to be stored in the Oracle database as shown below (we ignore here the details about the header blocks and other subtle implementation tricks):

Such an arrangement of data in the database blocks is in most cases very effective when one wants to read for instance a complete row since the column values can be sequentially read for that particular row. However, it does not land itself well to traditional industry standard compression algorithms such as LZO and ZLIB since data with similar types and values are most of the time sorted in columns and are now dispersed within the blocks as show below:

This is where HCC is coming into play… In order to perform an effective compression without losing the benefit of having the rows “locally stored” within the blocks, we are going to store the data in a columnar format, but only for a subset of rows and not the whole table. So let’s say that, for the sake of our example, we define a set of 3 rows, that we call a Compression Unit (as shown on the right). For that Compression Unit, we reorganize the data in the “columnar” way, i.e. we group first the data by columns then we move along the rows. Here is what our compression unit looks like after the data reorganization process:

In reality, a Compression Unit would include thousands or tens of thousands of lines, but for the sake of our example, we are going to limit ourselves to three… Now, before storing the Compression Unit data into the database blocks, we are going to compress it. Guess what, we are now in a much better position to achieve a good compression ratio since similar data from the same columns now sit next to each other, at least for the rows that constitute a Compression Unit. After compression using a traditional “zip-like” algorithm, here is what we are going to get:

Our Compression Unit has dramatically shrunk and is now ready to be stored into the database blocks as shown below. If the compression unit does not fit in one block, it is simply split on a second block and the blocks are chained together.

In order to understand how DELETE, UPDATE and INSERT commands are managed with HCC compressed format, let’s first see in more details how a compression structure is stored within blocks:

The Header stores the offsets and lengths of all the column entries within the Compression Unit while the Bitmap is in charge of identifying the table rows which are stored by the Compression Unit (one can see the Bitmap as a “map” that gives the status of each of the row and column of the table zone which is covered by the Compression Unit). So here is how the data is actually stored into the blocks (we also represent here the Block Headers but not the Row Directories nor the Row Headers):

Let’s say now that a table row is actually deleted through a DELETE DML statement. Since the data corresponding to that row is deeply mingled with the other rows into the Compression Unit, there is no way we can free that space into the blocks. So all what we do is to mark the row data as deleted into the Compression Unit Bitmap as shown below with the small red squares into the Bitmap:

When compressed with HCC, a deleted entry is actually never removed from the table stored on the disk. It is simply marked as invalid. The only way to get rid of the data which is not more valid is to copy the table through a database command.

If a row is updated or inserted respectively through an UPDATE or INSERT statement, then the new row is simply added as a row (and not as an HCC hybrid structure) into the free space of the last block that constitutes the Compression Unit. No compression is applied to this modified or new row, except if the Advanced Row Compression option is set for the particular table or partition the row belongs to (but this is not an HCC compression which is used here). The figure below depicts the case of a row update where the old data is marked as invalid into the Compression Unit bitmap and where the new row is added in an uncompressed format into the free space of the second block (if there is not enough free space into that block, another block is simply chained to it):

Now that we understand how DML operations are performed in an HCC compressed table or partition, it becomes clear that we should not perform too many changes on that type of table and that HCC must only be used for dormant or near-dormant data such as Data Warehouse and Database Archive environments. Despite this limitation, in a vast majority of cases, the compression ratios achieved by HCC will be much better than what can be achieved through Advanced Row Compression.

Let’s now have a look at the different HCC options that can be used:

HCC Option Target Environment Typical Compression Ratio Objective
QUERY LOW Warehouse Compression Between 1:5 and 1:10 Optimized for speed where load time is important
QUERY HIGH Warehouse Compression Between 1:10 and 1:15 Optimized for speed on data access
ARCHIVE LOW Online Archive Compression Between 1:10 and 1:20 Optimized for highly compressed historical and archive data
ARCHIVE HIGH Online Archive Compression Between 1:15 and 1:50 Optimized for maximum storage reduction for data which is lightly accessed

At the top of storage reduction, HCC often brings an additional value in terms of performance for database loads which can be limited by an IO bottleneck between the database server and the storage device. In the case of database tables stored in a ZFS Storage Appliance and HCC compressed, all the queries to the disk subsystem are returned in the HCC compressed format, resulting in a much faster transmission of the data and greatly improved query performances. This is especially true when the ZFS Storage Appliance unit is connected to the database node through a low latency Infiniband network, which represents quite a unique capability of the ZFS Storage Appliance on the database storage market.

The figure below summarizes the benefits of handling HCC compressed tables:

  1. Data remains compressed during transfers over the network;
  2. Data remains compressed in the database SGA Buffer Cache;
  3. Data remains compressed in the Smart Flash Cache of the database server;
  4. Data stays compressed when handled by RMAN for backup to disk or tape;
  5. Data remains compressed when shipping logs to the Data Guard node over the network.

It is very clear that using a traditional disk array based compression does not bring any of the advantages mentioned above, and this makes HCC an extremely powerful feature for storing efficiently Warehouse and Archive tables or partitions. When combined with the unique capabilities of the Oracle Sun ZFS Storage Appliance (IO bandwidth, throughput, snapshot and cloning, connectivity and observability), Hybrid Columnar Compression represents an unmatched tool to reduce infrastructure costs and improve performance.

Monday Mar 18, 2013

Optimized Storage for DB Workloads

Today’s Enterprise database workloads require storage devices that must embed a variety of features so that the optimal level of performance, manageability and operational agility can be achieved. Even if most Enterprise Database products have the ability to rely on their own file system to optimize access to storage, there is much more to do on the storage side in order to bring the database workloads at the level that is expected from today’s demanding applications and users.

A possible approach to this problem is to fully integrate the storage devices into an engineered stack, which includes all the components required to build a high performance database system (software, servers, network and storage). This is exactly what Oracle does with the Exadata Engineered System which embeds dedicated and highly optimized storage nodes that actually take in charge a part of the database workload in a way that would not be possible with a standalone storage device sitting next to the database engine. As a result, the performance obtained cannot be beaten by a traditional architecture.

However, one size does not fit all, and there is still a need to deliver storage systems that can be used as primary storage for databases while guaranteeing best in class performance and ease of integration. There are several requirements that must be satisfied in order to achieve that goal:

  • The Storage System must be able to meet the high IO requirements of the database in the most efficient manner;
  • High throughput must also be guaranteed for sequential access to the database as well as data protection tasks;
  • Best in class compression capabilities is a must if we consider the current growth of structured data in the IT centers. While traditional compression algorithms typically found in disk subsystem controllers can do a good job, a more specialized compression mechanism aligned with the database structure can go much further in terms of data reduction.
  • Snapshot and cloning at the storage level are absolutely mandatory features these days so that data protection, disaster recovery and database quick duplication tasks can be expedited as efficiently as possible;
  • A versatile connectivity must be provided by the storage system in order to guarantee a quick and easy integration in various production environments. At the top of that, protocols that have the ability to transport specific database metadata to and from the storage can benefit from an additional performance increase compared to non-optimized storage devices;
  • Finally, the “observability” of the storage system, i.e. the ability an administrator has to monitor and diagnose the storage device, becomes of critical importance if we consider the consolidation trend in the data center that make the management of the database storage system always more complex and resource consuming.

Oracle’s Sun ZFS Storage Appliance is a storage subsystem that brings very convincing answers to all these requirements. Let’s detail step by step why the ZFS-SA constitutes an excellent device to host the data of an Oracle Enterprise Database.

An enterprise database workload actually requires very different IO profiles in order to provide a consistent performance for all the different data exchanges that take place between the database engine and the storage. The table below summarizes the different data types in a database and their IO profile in terms of IO throughput (IOPS), IO bandwidth (MB/sec), access type (random, sequential or mixed) and IO bias (read, write or mixed).

Data Type IO Throughput IO Bandwidth Access Type IO Bias 
Control Files High  Moderate  Mixed  Mixed 
Database Index Very High  Moderate to High  Mixed  Mixed 
Database Tables Moderate  Possibly High  Mixed  Mixed 
Temporary Files Moderate Medium Write  Write 
Redo Log Files Quite High High Sequential  Write 
Archive Log Files Moderate High  Sequential  Write 

It is clear that it is a challenging task for a given storage subsystem to provide optimal performances for all these different profiles at the same time. Let’s now try to see how the ZFS-SA’s controller architecture would behave when having to cope with two typical database workloads, OLTP and Datawarehouse.

In an OLTP environment, we can expect many random reads which are repeated on subsets of data (within index and hot tables) as well as a bunch of synchronous writes (for table updates) that need to be quickly acknowledged. At the top of that, the redo logs might experience bursts of activity that need to be efficiently processed as sequential writes by the storage subsystem. The ZFS Storage Appliance architecture brings optimal answers to these concurrent requirements. This is in large part due to the Hybrid Storage Pool (HSP), a read and write caching mechanism based on a mix of DRAM, read-oriented SSD, write-oriented SSD and HDD which optimize data placement and retrieval. The HSP dynamically and intelligently caches active data over multiple tiers of storage media, without the need for administrator intervention. As a result, the HSP provides unprecedented caching benefits for both reads and writes without the traditional need for lots of Hard Disk Drive (HDD) spindles demonstrated by traditional vendors.

This diagram on the left shows the HSP in action with in blue the read activity, in red the update write activity and in orange the redo log writes. Assuming a large DRAM is available, we expect that many reads will result in cache hits at ARC level. ARC (ZFS Adaptive Replacement Cache) is the first level of read cache sitting in DRAM, and it must be maximized for high OLTP workloads in order to provide exceptional performance. We typically see 80% cache hits in ARC in normal workloads and even on benchmarks designed to be cache unfriendly, we still see 70% on average.

If data is not present in cache, it can be in the second level cache (L2ARC) on read-oriented SSD called ReadZillas.

For some OLTP workloads, it might be a good idea not to use ReadZilla and leave all DRAM for the first level cache (since using ReadZilla will consume some of the RAM to manage the second level cache). OLTP update writes on the ZFS Storage Appliance are always synchronous writes, which means that they must be stored in non-volatile memory before being acknowledged. This is what the ZIL (ZFS Intent Log) can achieve very efficiently when it is set to reside on write-oriented SSD – ZIL on SSD is called LogZilla, and it is actually more than a simple SSD. It contains a DRAM buffer, which is secured by a Super Capacitor, as well as a controller and SLC Flash memory. For that type of write, the share holding the database table spaces in the ZFS-SA must have the Synchronous Write Bias parameter set to “latency” in order to guarantee that the ZIL is resides in the LogZilla SSD.

As show on the diagram, writes are performed simultaneously in the ARC and ZIL (step #1 in the diagram), and this is only when the write is securely performed into the ZIL (at the speed of the SSD flash memory) that the acknowledgement is returned to the application. In a second time, bunches of writes are flushed to the HDD pool on a regular basis. This is marked as step #2 in the diagram. The net result is that the database update writes can be very efficiently performed since they do not have to go to rotating disks before being acknowledged, resulting in a spectacular performance improvement without the risk of a non-committed write in DRAM cache. For OLTP workloads with important update activity, it is therefore advised to maximize the LogZilla size.

Then comes the question of the Redo Logs which is a totally different beast… Redo Logs need sequential writes with a high IO bandwidth, and on that aspect this is a totally different IO profile compared to the update writes we have just reviewed. Fortunately, the ZFS-SA brings an elegant and easy solution: all what we have to do is to store the Redo Logs on a share which has the Synchronous Write Bias parameter set to “throughput”. This will result in the ZIL to reside directly in the HDD pool, and not on SSD, bringing the throughput that we need to achieve efficient Redo Log handling. It must be noted that with our latest generation of SSD, we observe excellent performance for both random and sequential IO profiles, and, in many cases, it might not even be necessary to set the ZIL on HDD to achieve the required bandwidth.

Let’s see now how the ZFS-SA Hybrid Storage Pool would behave with a Datawarehouse workload where we can expect large sequential table reads as well as bulk loads on entire table partitions.

The diagram on the left shows in blue the read activity, in red the bulk load write activity and in orange the redo log writes. It is clear that in the case of table scans, level 2 read caching is not going to be useful since data is rarely accessed twice. However, level 1 caching in the DRAM ARC continues to be important since the ZFS-SA is able to perform “prefetching” of the data on disk, resulting in optimized reads on table rows during scans. Reads are also optimized when HCC compression is activated because of the decrease in IO activity it implies on the HDD pool side.

These two optimizations make the ZFS-SA an efficient device for a typical table scan workload which is extremely common in a Datawarehouse environment.

Let’s review now how the writes are going to be handled in a typical Datawarehouse workload. The bulk load writes are still performed as synchronous writes, and in that case there is no interest in having the ZIL on SSD since we are going to require a high throughput in MB/sec rather than a high IO bandwidth in IO/sec. All what we have to do is to set the Synchronous Write Bias parameter to “throughput” for both shares supporting the data files and the Redo Logs in order to guarantee the best performance during bulk loads. All the writes (intent log, data flushed from ARC and Redo Logs) will therefore be performed directly in the HDD pool. It must be noted that for the ZFS-SA, sequential writes on rotating disks in the HDD pool are always faster than sequential reads on HDD since a write will always use the closest block while a read will follow the block allocation from the copy-on-write mechanism.

Regarding RAID levels, the ZFS-SA offers a wide range of possibilities: mirrored, triple mirrored, striped, single parity RAID with narrow stripes, double parity RAID and triple parity RAID wide stripes. For a demanding OLTP or Datawarehouse workload, it is always advised to use mirrored or triple mirrored storage profiles in order to maximize performance and availability.

More to come in a further blog...