Oracle Text on Solid State Disks
By Roger Ford-Oracle on Apr 18, 2012
We've known for some time that putting Oracle Text indexes onto flash-memory-based solid state disks (SSDs) gives some big advantages.
A while back, we tried benchmarking Oracle Text on Fusion I/O devices, which showed huge improvements. But those PCI-based devices are expensive, whereas consumer level SSDs are getting cheaper and cheaper. I decided to compare some Oracle Text queries on an inexpensive SSD against the same queries on a rotational disk.
Dell D6410 laptop
2.4GHz Intel i5 CPU (2 cores, 4 threads)
Mushkin Chronos 120GB SSD
Seagate 750GB conventional disk (5400 rpm)
How I tested
First I created two Oracle users, "flash" and "nonflash". Then I created two tablespaces of 20GB each, one on the SSD and one on the conventional disk. I set the default tablespace for each user to the appropriate one.
Next I loaded five million rows of random text, totalling about 10GB.
For this I used a oci C program "randomtext_oci.c". This creates random length texts up to 4K in length, composed of random word lengths and random characters. The words are composed of a fairly small set of characters - the consonants BCDFG alternating with the vowels AEIO. Although this is not really representative of real text, it's better than completely random characters. In our 10GB of text, there are around 210,000 unique words, which is comparable to what you'd expect on a similarly-sized collection of clean text (by "clean", I mean very few spelling mistakes or made-up words).
This text was then copied from one user to the other (using CREATE TABLE mydocs AS SELECT * FROM flash.mydocs UNRECOVERABLE)
and each table indexed with a simple Oracle Text index:
create index mydocs_index on mydocs(text) indextype is ctxsys.context
parameters ('memory 500000000')
The index took around 40 minutes to create. It was a little quicker on the SSD, but not much.
I then ran several queries to check the performance with different numbers of hits returned.
Each query took the form "SELECT COUNT(*) FROM mydocs WHERE CONTAINS (text, '<searchterms>') > 0;
Using COUNT(*) means that we're testing only the cost of fetching rowids from the text index - if we fetched other data from the table we'd be measuring the cost of fetching base table blocks as well.
Before running each query, I rebooted the machine (thankfully quite quick as the OS is on SSD!) then pre-loaded the $R table using the LoadAllDollarR procedure as documented here. The $R table generally gets fully cached in memory within a short time, so we are mainly interested in the time needed to fetch the $I table info. I also ran a query which fetched zero results, to make sure the Oracle Text data dictionary was loaded into cache.
Here are the queries, and the results I found:
|Query|| No. of Hits
|| Conventional Disk
|bab||1,283,890||1.14 sec|| 0.53 sec
| bad OR bac
||2,149,822||2.73 sec|| 0.90 sec
|| 3.0 x
|ba||3,478,811||5.44 sec||1.23 sec|| 4.4 x
|b||4,609,044||8.26 sec||1.95 sec|| 4.2 x
We can see this graphically (number of hits on the horizontal axis, number of seconds on the vertical axis)
Because Oracle Text queries require lots of random I/O to fetch "posting lists" from the index, we see big improvements from using solid state disks. Although the advantages would be less marked if we compared against faster rotational disks (such as 15K rpm SAS disks), we would still most likely see a significant improvement. And the SSD used here was a very inexpensive model (at around US $1 per GB), so we might expect to see even better improvements using a more advanced professional device, or by using a PCI-based solution rather than a SATA device.