Best Practices with PostgreSQL 8.3 on Solaris

With the release of PostgreSQL 8.3, its time for people to start deploying it in their environments. So this post is primarily for people planning to use PostgreSQL 8.3 on Solaris.  Now PostgreSQL is bundled in Solaris. So how will PostgreSQL 8.3 fit into Solaris?  PostgreSQL 8.3 should be soon integrated in Solaris Nevada train so it should appear within next few builds of Solaris Express. However since the Solaris 10 Update release takes more time to appear and unfortunately may not make it in the next immediate release. Hence this post caters to what can be done now in order to use PostgreSQL 8.3 on Solaris.

First of all, for people already using Solaris Nevada or Solaris Express, Developer Edition 1/08  can now download the 32-bit community binaries from postgresql.org download site. Yes, I believe, this is the first time when Solaris binaries are available for PostgreSQL immediately at the time of general release.  These binaries were built on Solaris Express and hence will not work on Solaris 10 8/07 release (or earlier). There are few reasons why this is the case (but I wont go in details of that).
 

However for Solaris 10 customers wanting to use it now also have an option. Download the source code of 8.3 from postgresql.org.  Refer to a document written by Bjorn Munch on how to build your own PostgreSQL binaries (quite similar to the way the release team at Sun builds them but tested only on Solaris Express for now).  Bjorn does plan to update it for 64-bit binaries soon (and hopefully for Solaris 10 8/07).

I think 64-bit binaries of 8.3 are now essential since with the amount of memory available going to the 256GB range even with Opeteron processors (like the Sun Fire X4600), the 32-bit binaries are not enough to capitalize the RAM of the system. (Though one should not go overboard and set shared_memory in postgresql.conf to 200GB in such cases even with 64-bit builds.)

 

 Anyway here is a quick way of generating 64-bit binaries without bells and whistles around it. I am assuming you already Sun Studio 12 Compilers installed in /opt/SUNWspro/bin. (Make sure  to create a directory called /opt/postgres/8.3 and the users generating the binaries have write permissions on that directory)

$ PATH=$PATH:/opt/SUNWspro/bin:/usr/sfw/bin:/usr/ccs/bin:/usr/sbin; export PATH
$ cd postgreql-8.3.0
$ ./configure CC=cc --without-readline --prefix=/opt/postgres/8.3 CFLAGS="-xO3 -m64"   --enable-thread-safety --enable-dtrace DTRACEFLAGS=-64

 

$ gmake  
$ gmake install

 

So that concludes the section of how to get the binaries for PostgreSQL 8.3 for Solaris.

Now lets look at deployment practices for PostgreSQL 8.3 for Solaris. I am assuming that this is for a decent sized (atleast in single digit GBs if not in 100s of GBs) database.

First of all the database user that will be running PostgreSQL 8.3 server should have rights to use resources required to run postgres.

Lets assume that the user is pguser then you need to set resources defined for pguser similar to what is shown below:

# projadd -U pguser user.pguser
# projmod -a -K "project.max-shm-ids=(priv,32768,deny)" user.pguser
# projmod -a -K "project.max-sem-ids=(priv,4000,deny)" user.pguser
# projmod -a -K "project.max-shm-memory=(priv,30813703372,deny)" user.pguser
# projmod -a -K "project.max-msg-ids=(priv,4000,deny)" user.pguser

 Now we need three filesystems (again, I am stressing that this is for big databases and hence not necessarily desirable if your database is only few MBs big). Why do we need three file systems? Based on various benchmarks like EAStress, iGen, pgBench and feedback from other users it seems we can classify mosts IO operations in three kinds of buckets

  1. PostgreSQL Data -  Cluster related with  execution information and temporary files
  2. Log files which are mostly only written and synced after every write
  3. Database and index files

People familiar with Solaris would know that many best practices will suggest to use forcedirectio with UFS. That suggestion is true for (2) and (3) above but for most operations on the rest of the types in $PGDATA which includes pg_clog, etc results in degraded performance. By putting them in three separate buckets its easy then to apply tunings specific to individual buckets and monitor them separately.

Right now the current best practice is to put:

  1. on default UFS filesystem (which is buffered)
  2. on UFS filesystem with forcedirectio enabled
  3. on UFS filesystem with forcedirectio enabled

(3) can be further broken down in still more separate tablespaces if needed.
The reason to separate (2) and (3) is since (2) is in constant state of syncing by separting it out on separate spindles and separate filesystem (3) is relieved from the stress.

Now (2) does not require much diskspace compared to (1) and (3). It primarily depends on checkpoint_segments. So a reasonable size could be about 10GB or so.

(1) is bit trickly. Mostly it is small. However temporarily files created (for example during index creation it will store the intermediate files here. Hence it needs to be relatively big though most of the temporary data will be cleared when the particular operation generating the files finishes.

(3) is where most of the database and indices (or indexes if you prefer) resides. (Again it can be broken into multiple filesystems if needed)

 Now for example consider you have created the following three filesystems

  1. /pg83db/data   with mount options in /etc/vfstab  -
  2. /pg83db/log     with mount options in /etc/vfstab forcedirectio
  3. /pg83db/mydb with mount options in /etc/vfstab forcedirectio

Now lets see how to setup PostgreSQL 8.3 to use these three file systems

First of  all make sure that the user (pguser in this case) has write permissions to all the three file systems. It i recommended to change ownership of the above directories to the user

# chown pguser /pg83db/data /pg83db/log /pg83db/mydb

Login as user (pguser in this case) and set the environment variables to pick up the right binaries and point to right data directory. (Ideally add it to .profile file so the next time you login, you have have to set it again.)

$ PATH=/opt/postgres/8.3/bin:$PATH; export PATH 
$ PGDATA=/pg83db/data; export PGDATA 

Confirm path being used is correct

$ which initdb
/opt/postgres/8.3/bin/initdb

You can also confirm if it is 32-bit or 64-bit.

$ file `which initdb`
 

(In case of SPARC you will see)

/opt/postgres/8.3/bin/initdb:      ELF 64-bit MSB executable SPARCV9 Version 1, dynamically linked, not stripped      

 (In case of x64 you will see)

/opt/postgres/8.3/bin/initdb:      ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR CMOV FPU], dynamically linked, not stripped   

 


Then initialize the database with the environment variables in place (UPDATED: Using -X command)
$ initdb -X /pg83db/log/pg_xlog

$ cd $PGDATA

Edit pg_hba.conf to allow your identified clients to connect from remote server.

$ vi pg_hba.conf


For example add the following line to allow a client to connect from 10.6.140.206
host    all         all         10.6.140.206    255.255.255.0 trust

Set popular tunables in postgresql.conf for high performance deployments with 10-100G configurations

$ vi postgresql.conf 

Modify the following parameters from their default value in postgresql.conf to similar values as needed. (Set shared_buffers to about 25%-50% of RAM to begin with). (Lets say major workloads are more OLTP like, where PostgreSQL is acting as a backend for J2EE applications)

listen_addresses = '\*'
shared_buffers=8000MB
temp_buffers = 8MB
work_mem=1MB
maintenance_work_mem = 256MB
wal_sync_method = fdatasync        
synchronous_commit = false
full_page_writes = off                  
wal_buffers = 512      
commit_delay = 10   
checkpoint_segments = 128


Updates: synchronous_commit is primarily for PostgreSQL users who generally turn fsync=off. There is a potential of losing 3X wal_writer_delay but the good thing is database is generally consistent even after failure. But it is not meant for mission critical systems or financial related transactions.

Setting commit_delay is a big win for OLTP workloads since this allows multiple transactions to be commited together (and hence also cutting the number of IOPS to the log file system) at the cost of adding probably a 1ms or so to the transactions when the load is pretty light to make an impact.

Turning full_page_writes off enables significant reduction in IOps on the log device and improving response times of transactions with the reduced load on the log device. (Of course there is a risk of partial writes related to it. But recommended when you have a decent storage array)

(More note for other workloads like DSS coming soon..)

Now we are ready to start PostgreSQL 8.3 for the first time on the server

$ pg_ctl start -l /pg83db/data/logfile

Now before creating the database, create a tablespace on /pg83db/mydb to use it for data tables and index files for the database to be created.

$ psql postgres -c “create tablespace datatbs location '/pg83db/mydb'”
$ psql postgres -c “create database mydb with  tablespace datatbs”

Now all tables and indexes created by default will be created in /pg83db/mydb

With the above setup we have avoided most of the general pitfalls and have a well configured starting database for your application. There may be needs for tweaking it more based on the workload on the database but most of them have already been applied.


 

Comments:

I think you should give a more sensible reasoning for your postgresql.conf choices. Especially setting synchronous_commit = false seem rather ill advised since it will cause lost transactions in case of a power failure. And disabling full page writes is also something that one really needs to consider carefully.
I'm also wondering a bit on what kind of database workload you have in mind here - shared_buffers is fairly high but work_mem is rather low except if you only do very simple queries or if you have an unrealistic amount of concurrency (hello pgbouncer/pgpool).
Some other rather important parameters are missing completely (like effective_cache_size) which is another hint that this only tuning for very simple queries.

Posted by Stefan Kaltenbrunner on February 07, 2008 at 09:05 PM EST #

Curious you didn't mention SMF at all.

Also, is there a reason you use UFS and not ZFS? Makes snapshotting much much easier.

Posted by Dick Davies on February 07, 2008 at 09:36 PM EST #

Instead of using ln yourself to move pg_xlog, you should use the new -X option of initdb:

initdb -X /pg83db/log/pg_xlog

And you're done :)

Posted by Guillaume Lelarge on February 07, 2008 at 11:45 PM EST #

Hi Guillaume, Thanks for reminding me the new -X option, I have modified the instructions out there. Stefan, I am adding notes for the selections in postgresql.conf. Thanks for the feedback.

Posted by Jignesh Shah on February 08, 2008 at 01:50 AM EST #

Nice post Jignesh!

Wondering if sun would provide support for installs done from pg source the way you have mentioned.

I understand the reason for separating 2 from 3 and onto separate disk groups but is there a guideline or empirical formula to arrive at the number of disks in the disk groups to sustain X transactions. I guess log disks will be in raid1 and 3 on raid10.

Is it not better to stripe and mirror both 2 and 3 on raid 10 as in reality there will more disks to stripe in this mixture compared to log disk group with fewer disks.

Posted by Steve on February 08, 2008 at 06:31 AM EST #

Hi Steve,
Unfortunately if one generates their own PostgreSQL binaries then technically it is community supported binaries. Sun will only support binaries distributed/bundled with Solaris.

As your other question, I think it is worthwhile for a separate blog entry on it. Stay tuned

Posted by Jignesh Shah on February 08, 2008 at 04:18 PM EST #

Excuse my ignorance, but why is there pg_ctl start -l /pg83db/data/logfile rather than pg_ctl start -l /pg83db/log

Posted by Goran on March 19, 2008 at 03:42 AM EDT #

@Gordan. The -l option is telling postgres where the logfile exists. The /pg83db/log is actually the database transaction log, which is a different animal then your syslog style daeomon logging. The -l is for a log for you to read. The /pg83db/log is a set of files for the database to read (when it needs to roll back transactions or recover from a failure)

Posted by Hedley Robertson on April 07, 2008 at 02:02 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

Jignesh Shah is Principal Software Engineer in Application Integration Engineering, Oracle Corporation. AIE enables integration of ISV products including Oracle with Unified Storage Systems. You can also follow me on my blog http://jkshah.blogspot.com

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today