Running PostgreSQL on ZFS file system. Some considerations

Up till now I compared ZFS to UFS when deploying the PostgreSQL database files. Only these database processes were running on the system, and only all the PostgreSQL related files were on the filesystem being compared.


The file system was created on either a stripe set of nine disks using Solaris Volume Manager or a zpool using these same nine disks. The last blog entry reported ZFS outperforming UFS.


I did some extra tests to see if performance (read database throughput) could be increased by fiddling with some elements that are known to influence the IO part when using ZFS.


The ZFS Intent Log is known to consume bandwith. The effect of this element can be seen by disabling it altogether. Of course this is definitely NOT advised for production environments. (See http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#ZIL for some extra information.)


Did some tests with the ZIL disabled. Using 5, 15, 50 and 85% for the completion_target of PostgreSQL. The following graph shows the database transaction throughput:




tpmNoZIL.png


Recalling the baseline I presented in an earlier blog using seven disk arrays with caches, that showed a throughput around 90K TPM this one comes very close. Only nine internal disks this time...


The previous throughput graph (... the saga continues) showed a rate of ~ 50K TPM: transactional integrity does not come for free.


Did some more tests where ZIL was enable, but new using a dedicated disk array (with 1GByte of write cache) for the intent log. The throughput result of this setup, the disabled-ZIL and the 'all on nine disks' tests are in the next graph (all have a completion_target setting of 15%):


tpmDifferentZIL.png


The ZIL element obviously takes time. Time during which the throughput is cleary throttled while a database checkpoint is in progress. Using a dedicated fast device pays of to increase throughput and make it more constant over time. The IO (write part) was measured for both the ZIL device as for the data on the "nine disk" ZFS file system . Here is the result:




zlogvsdataIO.png


The checkpoint at 1800 seconds resulted in a 77 MByte write/sec for the ZIL only. As already suggested by Mikael Gueck I would surely like to do some extra tests with SSD disks. Guess I need to hold my breath a while, since right now I "just' do not have them.


However right now I feel ZFS is already a realistic alternative for a PostgreSQL setup.


Will do some more tests where I'll play with some of the other features of ZFS to get a feeling for their effect on my PostgreSQL environment.

Comments:

Cool!. I suspect the spikes might be due to the transaction group committing. Have you tried playing around with txg_time? What version of the OS are you using? Later versions of nevada have write throttling, which may reduce the spikiness.

What is the zfs recordsize?

Posted by neel on June 19, 2008 at 09:54 AM PDT #

Neel, recordsize used is 8KB. Just as the database blocksize.
Running Nevada 79a.
Did not play with txg_time. Will have a look at it.
Would be cool if ZFS would have an API that could be used together with the transactionlog mechanisme of the database. Somehow I feel the work done is double.
All application processes do wait for a (database) log write to complete. Now if this one is throttled under the hood, would that not have a less positive impact on dbase transaction throughput?

Posted by Paul van den Bogaard on June 19, 2008 at 05:47 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

paulvandenbogaard

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today