PostgreSQL performance study: introduction
By paulvandenbogaard on Mar 09, 2008
While working on PostgreSQL performance and scalability tuning I found things worth mentioning. Guess that in order to talk to the community it would be nice if these findings would easily available. So find here the first post about this subject, with many more to come: an introduction of the environment I use for testing, some installation things and the IO patterns I observe.
While using an internal benchmark kit to generate load for a PostgreSQL 8.3beta setup some scalability issues are observed. The setup being used is a Sun Fire V890, 16 cores SPARC IV @1350MHz. This system has 64GByte internal memory, and seven disk arrays dedicated to the database. Load is generated on a Sun Fire T2000. Machines are connected through 1GBit ethernet.
System runs Nevada, build 70a. The database is a self compiled 64 bit binary optimized for this platform. The setup as a whole is created to ensure IO is not a problem. This to enable us to focus on the CPU related performance and scalability.
The first thing done was a suite of tests to discover anomalities over time. Wanted to make sure the load and database behaviour was "constant". Each test ran for 3600 seconds after a 100 second ramp up. During the whole test a couple of utilities like mpstat, vmstat, prstat, intrstat were running to gather system wide statistics. During most of the tests DTrace was used to measure areas of interest in the PostgreSQL code. This was done by "enhancing" the DTrace provider that comes with this version of PostgreSQL by adding a couple of probes.
Load settings were altered by changing the number of concurrent users from 64, 128, 256, ..., 1280. The generator has a notion of TPM to measure throughput. The TPM numbers of these tests will be compared. BTW these numbers are completely artificial and only usable for inter test comparisions.
The postgresql.conf file contains the following settings that differ from the default values:
max_connections = 2100 shared_buffers = 2097152 # 16GB max_fsm_pages = 204800 wal_sync_method = fdatasync wal_buffers = 256 checkpoint_timeout = 900 checkpoint_segments = 128 commit_delay = 10 commit_siblings = 10 bgwriter_delay = 10 bgwriter_lru_maxpages = 1000
Here are some graphs. First some of the IO behaviour. Please note that the database was created with an additional objective in mind: to learn the IO patterns for the different parts of this test suite. Load generator specific index files are placed in a dedicated tablespace. So are the specific tables. These tablespaces are each placed on a disk array dedicated to that tablespace. The WAL data is also placed on a dedicated disk array and finally all other IO is done to a fourth array. In the IO graphs these are called: index, table, wal, and pg. The graphs only show the write calls to the devices. The filesystem holding the non specific tablespaces is mounted on a UFS filesystem. All others are mouned on a UFS file system but with both the force directio option and the noatime option in place. For the direction mounted file systems this means that the underlying disks do 8KB block IOs.
The pictures depict the IO behaviour for 64, 128, 512 and 1280 concurrent user runs. In that order. BTW the WAL writes are scaled to fit the pictures. In reality they occur ten times more.
I have many question about this IO behaviour. However this facet does not have the focus of attention. The arrays can do much more work. There was only a burst of read activity (not in the graphs) during the first couple of 100 seconds. After that the 16GB were adequate to buffer al data. Realistic? No. However good to focus on CPU related issues.
The only write factor that really influences the transaction times is the time it takes for the WAL to be written after a sync. However the average service times are sub millisecond. Only during the checkpoints was this different.