Effects of Flash/SSD on PostgreSQL - PostgreSQL East 2009

I presented a talk at PostgreSQL East 2009 at Drexel University today morning.  The topic was "Effects of Flash/SSDs on PostgreSQL".  Here are the slides from the presentation

If you have questions please leave comments.

Comments:

If I read this right, you ended up testing both HDD and SSD with write-caching enabled. Wouldn't it be more sensible to compare them both with write-caching disabled? Is it somehow safe to run a database system on a SSD with write caching? The best practice so far has been to disable write caching on HDDs, and I don't see anything that would have changed that with SSDs.

Posted by Peter Eisentraut on April 05, 2009 at 10:46 PM EDT #

Well, I and CELKO began newsgroup chatting about SSD quite a while ago (and he discussed SSD in "Thinking in Sets"). That led me to finally start a blog, specifically to discuss why SSD makes sense. Unfortunately, this paper doesn't get why SSD makes sense. It merely drops an existing 0NF (I suspect) on SSD. Not much sense to that. To make sense, as I have discussed, one has to strip out the redundant data (go to 5NF, which can be an order of magnitude or more of deflation) as part of the migration to SSD. Then you get not only speed, but logical consistency you didn't have before.

May be it's time for Sun or Postgres to pay me to do that full test I have been wanting to do. :)

Posted by Robert Young on April 06, 2009 at 05:10 AM EDT #

Peter,

One of the thing I heard back was that the write cache on SSDs are really needed for Write Wear Leveling. The early generation of SSDs does not seem to have battery backed up Write Cache RAM but the industry does seem to solve the problem in the next generation of SSDs. In the meanwhile please check with your manufacturer of the SSDs on whether the turning off write cache is supported or not. Otherwise the best option is to use mirroring for SSDs.

Hope this helps.

Posted by Jignesh Shah on April 08, 2009 at 05:09 PM EDT #

Hey!

Do you think the performance could be improved in a hybrid setup?
\* Storing WALs on a regular hard disk (more capacity, mostly sequential accesses)
\* Storing tablespaces on the SSD (less capacity, mostly random accesses)

Sounds like it would make sense.

Marti

Posted by Marti Raudsepp on April 10, 2009 at 02:40 AM EDT #

Hi Marti,

If your workload has a read bottleneck more than a write bottleneck then yes your suggestions of putting WAL on regular disk and tablespaces on SSDs will help. Thought you might want to make sure that your SSD is big enough to hold stuff that is causing read latency.

Posted by Jignesh Shah on April 10, 2009 at 02:59 AM EDT #

Jignesh,

How does one actually "disable" write caching on the drives through ZFS? I thought by default cache flush is enabled which tells the drives to flush cache at each write, this is the equivalent of "disabling write cache" no? And by disabling cache flushing we are in effect "enabling" write caching on target devices? Just confused with the terminology thrown around.

Also, I've tested the SSDs with cache flush on or off, with flushing off there was maybe a 5% differential with sync writes...I'm assuming like what you heard this is due to the write cache actually being used for write leveling with very little of it actually being used for write caching...I couldn't find what the cache size is, but some blogs mentioned the Intel X25-Es do have 64MB of volatile write cache. Is this true? There are also rumors that the cap on the X25-Es will have enough to flush whatever it has to disk, have you seen/heard anything like that?

In our environment we only use SSDs for ZIL offload, which is working great now that some of the fw/driver issues have been worked out with Opensolaris and the LSI 1068E rev B2 based Storagetek HBA.

Thanks for the information :)

Posted by Robert K on June 17, 2009 at 11:04 AM EDT #

Intel X25-E have 256KB (yes, kilobytes) of cache.

Posted by Bao604 on August 03, 2009 at 11:55 AM EDT #

Jignesh,

Thank you very much for publishing this. It was very interesting and I hope you are going to publish more stuff like that.

One thing I'd be interested to add in your test: PCI-E SSD (like Fusion-IO). The IO is supposed to be more than 20x higher than on a X25...

Cheers,
Mike

Posted by Mike on August 07, 2009 at 10:32 PM EDT #

Bao604, The Intel x-25E write cache is 64MBytes, not 256KB. The 256K part found on the board is SRAM processor cache. HDD write cache is always DRAM.

Posted by guest on August 09, 2009 at 05:05 AM EDT #

Jignesh, Re:

>>"...write cache on SSDs are really needed for Write Wear Leveling. The early generation of SSDs does not seem to have battery backed up Write Cache RAM but the industry does seem to solve the problem in the next generation of SSDs."

Ok...so Flash SSD doesn't work without volatile DRAM write cache, but the problem here is that database systems rely on the HDD write acknowledgement to ensure that data has been written to non-volatile media.

Mirroring does not help the problem, power-loss would mean both SSD's devices would lose data. Wouldn't it be better to have the database server cache or buffer writes at the application level and turn off the DRAM at the SSD? That's what we do today, when we turn off HDD write cache. Why would we accept a volatile write cache on SSD when we don't permit them on HDD?

I just saw a result published where SSD IOPS dropped to around 1.5% of advertised performance when the volatile DRAM write-cache was disabled.

http://petereisentraut.blogspot.com/2009/07/solid-state-drive-benchmarks-and-write.html

Posted by Rick E. on August 09, 2009 at 05:28 AM EDT #

Rick said """ Wouldn't it be better to have the database server cache or buffer writes at the application level and turn off the DRAM at the SSD"""

I would think that with an SSD it would be feasible for the SSD to include enough battery power to guarantee a flush of the DRAM cache to the flash if power is removed. This is because SSDs use so much less power (both idle and under load) than HDD.

Also, by design, SSDs need to use write cache, they simply would be unusable without it due to read/erase/write and write amplification that is at the heart of SSD behavior.

P.S. great presentation!

Posted by funkyj on November 13, 2009 at 11:17 AM EST #

Post a Comment:
Comments are closed for this entry.
About

Jignesh Shah is Principal Software Engineer in Application Integration Engineering, Oracle Corporation. AIE enables integration of ISV products including Oracle with Unified Storage Systems. You can also follow me on my blog http://jkshah.blogspot.com

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today