Oracle Text on Solid State Disks

We've known for some time that putting Oracle Text indexes onto flash-memory-based solid state disks (SSDs) gives some big advantages.

A while back, we tried benchmarking Oracle Text on Fusion I/O devices, which showed huge improvements. But those PCI-based devices are expensive, whereas consumer level SSDs are getting cheaper and cheaper. I decided to compare some Oracle Text queries on an inexpensive SSD against the same queries on a rotational disk.

Test Hardware:

Dell D6410 laptop
8GB RAM
2.4GHz Intel i5 CPU (2 cores, 4 threads)
Mushkin Chronos 120GB SSD
Seagate 750GB conventional disk (5400 rpm)

How I tested

First I created two Oracle users, "flash" and "nonflash". Then I created two tablespaces of 20GB each, one on the SSD and one on the conventional disk. I set the default tablespace for each user to the appropriate one.

Next I loaded five million rows of random text, totalling about 10GB.

For this I used a oci C program "randomtext_oci.c". This creates random length texts up to 4K in length, composed of random word lengths and random characters. The words are composed of a fairly small set of characters - the consonants BCDFG alternating with the vowels AEIO. Although this is not really representative of real text, it's better than completely random characters. In our 10GB of text, there are around 210,000 unique words, which is comparable to what you'd expect on a similarly-sized collection of clean text (by "clean", I mean very few spelling mistakes or made-up words).

This text was then copied from one user to the other (using CREATE TABLE mydocs AS SELECT * FROM flash.mydocs UNRECOVERABLE)
and each table indexed with a simple Oracle Text index:

create index mydocs_index on mydocs(text) indextype is ctxsys.context
parameters ('memory 500000000')
/

The index took around 40 minutes to create. It was a little quicker on the SSD, but not much.

I then ran several queries to check the performance with different numbers of hits returned.

Each query took the form "SELECT COUNT(*) FROM mydocs WHERE CONTAINS (text, '<searchterms>') > 0;

Using COUNT(*) means that we're testing only the cost of fetching rowids from the text index - if we fetched other data from the table we'd be measuring the cost of fetching base table blocks as well.

Before running each query, I rebooted the machine (thankfully quite quick as the OS is on SSD!) then pre-loaded the $R table using the LoadAllDollarR procedure as documented here. The $R table generally gets fully cached in memory within a short time, so we are mainly interested in the time needed to fetch the $I table info.  I also ran a query which fetched zero results, to make sure the Oracle Text data dictionary was loaded into cache.

Here are the queries, and the results I found:

 Query  No. of Hits
 Conventional Disk
 SSD  Difference
 bab  1,283,890  1.14 sec 0.53 sec
 2.2 x
 bad OR bac
  2,149,822  2.73 sec 0.90 sec
 3.0 x
 ba  3,478,811  5.44 sec 1.23 sec  4.4 x
 b  4,609,044  8.26 sec 1.95 sec  4.2 x

We can see this graphically (number of hits on the horizontal axis, number of seconds on the vertical axis)

Performance graph

Conclusions

Because Oracle Text queries require lots of random I/O to fetch "posting lists" from the index, we see big improvements from using solid state disks.  Although the advantages would be less marked if we compared against faster rotational disks (such as 15K rpm SAS disks), we would still most likely see a significant improvement.  And the SSD used here was a very inexpensive model (at around US $1 per GB), so we might expect to see even better improvements using a more advanced professional device, or by using a PCI-based solution rather than a SATA device.



Comments:

would it make sence to say that oracle db might run faster if it was on a SSD?

Posted by Neo on January 09, 2014 at 05:06 PM GMT #

All aspects of Oracle would tend to benefit from SSD, but Oracle Text is particular sensitive to the effects of seek time. Flash cache, of course, provides similar benefits but the data has to get into the cache at some point.

Posted by Roger Ford on January 09, 2014 at 05:38 PM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

bocadmin_ww

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today