Achieving High Availability with PostgreSQL using Solaris Cluster
By rnl on Feb 04, 2008
Quick announcement: Sun has just released Solaris Express Developer Edition (SXDE) 1/08, a developer version that has all the latest and greatest Solaris features and tools. So what's new for PostgreSQL? The main updates/additions are 8.2.5, Perl driver, and pgAdmin III v1.6.3.
High availability (HA) is very important for most applications. PostgreSQL has an HA feature called warm standby or log shipping which allows one or more secondary servers to take over when the primary server fails. However, there are some limitations with PostgreSQL warm standby, two of which are:
- No mechanism in PostgreSQL to identify and perform automatic failover when the primary server fails
- No read-only support on the standby server
Detlef Ulherr has recently implemented the Solaris Cluster agent to work with PostgreSQL warm standby. We discussed different possible use-case scenarios with PostgreSQL warm standby, and he came up with a design that I think will work well for non-shared storage clustering. Maybe not everything will be perfect initially, but as more people test out the agent, we'll know what can be improved. The agent is still in beta now and will be released soon, probably in a couple of months.
The cool thing with Solaris Cluster is that you can now setup a cluster on a single node using multiple Solaris Zones . This is extremely useful because it eliminates the need for multiple machines or complicated hardware setup if you just want to try it out or if you want a simple environment for doing development. Here's more details on the Solaris Cluster and Zones integration
Of course you wouldn't want to deploy your HA application on a single machine. In production environment, you should have at least a two nodes cluster. Please refer to the Solaris Cluster documentation for more info.
From my recent test, I setup the cluster on a single machine with two Solaris Zones. Here's how the automatic failover works in a nutshell:
- Client connects to a logical host. The logical host is configured to have two resource groups, each containing a Zone.
- The logical host initially points to the primary server (Zone 1) where PostgreSQL is running, and PostgreSQL is configured to ship WAL logs to the secondary server (Zone 2) where PostgreSQL is running in continuous recovery mode.
- When primary server fails, the Solaris Cluster agent detects the failure and triggers the logical host to automatically switch to the IP address of the secondary server.
- From the PostgreSQL client perspective, it's still using the same logical host, but the actual PostgreSQL server has moved to a different machine, all happens transparently.
- If the client was connecting to the DB on the primary, the session would be disconnected momentarily and reconnected automatically to the DB on the secondary server, and the application would continue on its merry way.
The combination of PostgreSQL warm standby and Solaris Cluster can provide an enterprise class clustering solution for free (unless to need support services off course). So, please try it out and provide your feedback on what can be improved.
In my next blog, I will discuss how Solaris ZFS and Zones can be used in a clever way to overcome limitation #2. This idea has been used already, and some of you may have seen Theo's blog on this topic. I will provide a working sample code and step by step instructions for setting it up.