Using Sun Storage F5100 Flash Array together with Sun Storage J4400 to Improve Query Performance for Data Warehousing Workload
By Chang Shu on Oct 12, 2009
Data warehousing applications are generally I/O bound in nature. And because there are more reads than writes, this kind of workload should be able to take more advantage of the F5100 Flash Array as flash devices generally favor reads over writes.
This write-up presents the results of a database study that I did recently to compare the performance of storage architectures taking advantage of Sun Storage F5100 Flash Array with storage architecture based only on traditional hard disks.
The testing environment was built using a Sun Fire X4275 server with two quad-core Intel Xeon X5570 processors and 72 GB of RAM to host the DB2 database server. The Sun Fire X4275 server runs the Solaris 10 OS (Update 8) and the DB2 UDB V9.5 Fixpak 4. Four SAS HBA cards installed in the database server to support the access to four array domains of the F5100 Flash Array, and up to 960 GB of raw storage (40 Sun Flash Modules). Host mirroring across HBAs and domains can be used to increase data availability, which reduces the usable capacity to 480 GB. The second SAS channel of two HBAs are connected to a Sun Storage J4400, which contains 24 x 146 GB disk drives. Two Storage I/O Modules (SIM 0 and SIM 1) are used with the J4400 to support dual-path. A Sun Fire X4200 is used as the workload driver with two duo-core AMD Opteron processors and 16 GB RAM.
The deployment for the testing is shown as the following diagram.
Due to time limitation, the testing was performed against a small data warehouse. The size of the table spaces is around 200GB. Access to index table spaces and user temporary table space on the Sun Storage F5100 Flash Array is through four SAS HBAs. Access to the production data table spaces and log containers are through 24-disk based RAID 0 volume in J4400. The page size is 32KB. A UFS file system is created for each volume and mounted with the “forcedirectio” option.
Both atomic queries such as sorts / joins, and advanced queries containing complex business logic were both tested in order to compare the performance gains of using F5100 Flash Array in both cases. I/O Per Second (IOPS) and I/O bandwidth are the main performance metrics for the measurement.
Sorts / Joins
DB2 use temporary table space to store data during its execution in a transient or temporary work table. For e.g., when doing sorts and joins, if the the size of the sort heap can not fit in all temporary data, DB2 overflows the data to the temporary table space.
Three joins including nested loop join, merge join and hash join were perform during the test process. The I/O activity for disk-only test is as the follows:
The average disks usage is measured for each individual disk in the J4400, the usage for the whole volume is actually between 48-100%.
In Comparison, the I/O activity for flash / disk system is as below:
putting the temporary table space in the F5100 Flash Array, the
IOPS is increased from 5629 to 12983, which is 3.2 times improvement.
These queries contain more advanced business logic, mixed with both sequential reads and random reads. Besides data table spaces, temporary table spaces and index table spaces are also heavily accessed during the queries execution.
The I/O activity for disk-only test is as the follows:
The I/O activity for the flash / disk system is as below:
The IOPS improves 77% to 11345 compared to 6416 in the disk-only system. The queries execution time improves 10.8 times to 180 seconds compared to 1952 seconds in the disk-only system.
Adding a Sun Storage F5100 Flash Array to off-load index and temporary table spaces I/O processing from a disk system to flash-based storage can improve system I/O performance because index and temporary table spaces I/O have been moved to the flash device so there is less workload for the disk to support, and the spinning disk can get the remaining work done more quickly.
The above testing shows how the combination
of flash and disk technology can be applied to improve IBM DB2 UDB
database performance for data warehousing workload. In the experiments,
indexes and temporary tablespaces are stored on the F5100 Flash Array,
which improves the overall performance by reducing the application response
time and increasing the I/O throughput. Hybrid flash/disk technology is becoming a practical solution for database applications.