Tuesday Jun 30, 2009

How to add DTrace probes to your application

MC Brown and I co-presented a session at CommunityOne West on how to add probes to applications, using MySQL and PostgreSQL as case study. In the presentation, I used a very simple example to demonstrate how easy it is to add probes. If you want to try out yourself, here is the code. Extract the files, run gmake (or gnumake on OS X) to build, and run the executable in one terminal and the DTrace script in another terminal to see the output from the probes. To see more complicated examples, checkout the MySQL or PostgreSQL source code.

Monday Jun 29, 2009

More on the demos from PgCon 2009

At PgCon 2009 in Ottawa, I did a lightning talk on DTrace probes in PostreSQL 8.4. I wanted to show several demos but ran out of time. If you want to try them out, use the scripts below.

Here is the script, query_time.d, used in slide 14. This script is used to identify slow queries by printing out the query execution time.

#!/usr/sbin/dtrace -s
#pragma D option quiet

  printf("Tracing... Hit Ctrl-C to end.\\n");

    self->query = copyinstr(arg0);
    self->ts = timestamp;

    @query_time[self->query] = avg(timestamp - self->ts);
    self->query = 0;
    self->ts = 0;

     printf("%10s %s\\n", "TIME (ns)", "QUERY");
     printa("%@10d %s\\n", @query_time);

Here is the second script, sort.d, used in slide 16. This script tells the type of sort, whether the sort was done in memory or on disk, and the time to perform the sort.

#!/usr/sbin/dtrace -qs

        sorttype[0] = "TUPLE";
        sorttype[1] = "INDEX";
        sorttype[2] = "DATUM";

        sortmethod[0] = "INTERNAL";
        sortmethod[1] = "EXTERNAL";

        self->ts = timestamp;
        printf("\\nBegin %s sort, workmem = %d KB \\n", sorttype[arg0], arg3);

/self->ts && arg0 == 0/
        /\* Internal Sort \*/
        printf("%s sort ended, space used = %d KB \\n", sortmethod[arg0], arg1);
        printf("Sort time = %d ms\\n\\n", (timestamp - self->ts) / 1000000);

/self->ts && arg0 == 1 /
{        /\* External Sort \*/
        printf("%s sort ended, space used = %d disk blocks\\n", sortmethod[arg0], 
        printf("Sort time = %d ms\\n\\n", (timestamp - self->ts) / 1000000);

Below is the last script, query_trace.d, used in slide 23. This script provides useful data that will allow you to dig down deeper. In this example, the buffer reads to table 16397 (this is the OID) is huge. This signals a red flag that an index may be needed for this table. To find out the table name from OID, run "SELECT relname FROM pg_class WHERE relfilenode=16397" in psql.

#!/usr/sbin/dtrace -qs

        self->ts = timestamp;
        self->pid = pid;

        self->readts = timestamp;

/self->pid && arg7/
        /\* Buffer cache hit \*/
        @read_count[arg2, arg3, arg4] = count();
        @read_hit_total["Total buffer cache hits      : "] = count();
        @read_hit_time["Average read time from cache : "] = avg (timestamp - self->readts);
        self->readts = 0;

/self->pid && !arg7/
        /\* Buffer cache miss \*/
        @read_count[arg2, arg3, arg4] = count();
        @read_miss_total["Total buffer cache misses    : "] = count();
        @read_miss_time["Average read time from disk  : "] = avg (timestamp - self->readts);
        self->readts = 0;

        self->writets = timestamp;

        @write_count[arg2, arg3, arg4] = count();
        @write_time["Average write time to disk   : "] = avg (timestamp - self->writets);
        self->writets = 0;

/self->ts && self->pid == pid/
        printf("\\n============ Buffer Read Counts ============\\n");
        printf("%10s %10s %10s %10s\\n","Tablespace", "Database", "Table", "Count");
        printa("%10d %10d %10d %@10d\\n",@read_count);

        printf("\\n======= Buffer Write Request Counts ========\\n");
        printf("%10s %10s %10s %10s\\n","Tablespace", "Database", "Table", "Count");
        printa("%10d %10d %10d %@10d\\n",@write_count);

        printf("\\n========== Additional Statistics ===========\\n");

        printf ("Backend PID    : %d\\n", pid);
        printf ("SQL Statement  : %s\\n", copyinstr(arg0));
        printf ("Execution time : %d.%03d sec (%d ns)\\n", (timestamp - self->ts) / 1000000000, ((timestamp - self->ts) / 1000000) % 1000, timestamp - self->ts);
        printa("\\n%19s %@8d\\n",@read_hit_total);
        printa("%19s %@8d\\n",@read_miss_total);
        printa("%19s %@8d (ns)\\n",@read_hit_time);
        printa("%19s %@8d (ns)\\n",@read_miss_time);
        printa("%19s %@8d (ns)\\n",@write_time);


        self->ts = 0;
        self->pid = 0;

To see more sample scripts as well as a GUI tool, check out the PostgreSQL DTrace Toolkit.

Sunday Jun 28, 2009

PostgreSQL DTrace Toolkit

As many of you know, PostgreSQL 8.4 has quite a few more DTrace probes. See my previous blog post for more details . To use the probes, you need to write DTrace scripts, which is quite easy to do, but to make it easier to use the probes (especially for those who are new to DTrace), I have written some scripts that you can just run from the command line. In addition, I've integrated some of those script with Chime to make it even easier to visualize the data. Check out the toolkit on PgFoundry.

Monday Mar 02, 2009

New DTrace Probes in PostgreSQL 8.4

DTrace probes were introduced in PostgreSQL starting in 8.2. Initially, only a handful were added, and they were mostly for developers. In 8.4, many more (46 to be exact) have been added, and they are targeted more toward database administrators as shown below.

query-parse-start(const char \*)
query-parse-done(const char \*)
query-rewrite-start(const char \*)
query-rewrite-done(const char \*)
query-start(const char \*)
query-done(const char \*)
statement-status(const char \*)
sort-start(int, bool, int, int, bool)
sort-done(unsigned long, long)
buffer-read-start(ForkNumber, BlockNumber, Oid, Oid, Oid, bool)
buffer-read-done(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool)buffer-flush-start(Oid, Oid, Oid)
buffer-flush-done(Oid, Oid, Oid)
buffer-sync-start(int, int)
buffer-sync-done(int, int, int)
buffer-write-dirty-start(ForkNumber, BlockNumber, Oid, Oid, Oid)
buffer-write-dirty-done(ForkNumber, BlockNumber, Oid, Oid, Oid)
checkpoint-done(int, int, int, int, int)
smgr-md-read-start(ForkNumber, BlockNumber, Oid, Oid, Oid)
smgr-md-read-done(ForkNumber, BlockNumber, Oid, Oid, Oid, const char \*, int, int)
smgr-md-write-start(ForkNumber, BlockNumber, Oid, Oid, Oid)
smgr-md-write-done(ForkNumber, BlockNumber, Oid, Oid, Oid, const char \*, int, int)
xlog-insert(unsigned char, unsigned char)

Documentation should be available in 8.4 doc page soon, but if you don't want to wait, check out the doc patch I recently submitted. If you're using or plan to use the probes, I'd love to hear your feedback, both positive and constructive!

Special thanks to Theo Schlossnagle, Robert Treat, Zdenek Kotala, Alvaro Herrera and Simon Riggs for their contributions with the probes as well as reviewing them.

Sunday Nov 30, 2008

Time for a Home NAS!

Over the years I have accumulated quite a bit of digital content (music, pictures, videos, archived tax returns, etc) and have used external drives, CDs/DVDs, and even USB sticks to backup the data. After a while, it has become increasing inconvenience to locate the content let alone share it with multiple computers, so I decided to shop for a home NAS. But which one? There are quite a few of them on the market with varying prices ($200 to $2000+) and features. I liked the Netgear's ReadyNAS, but it was quite pricy. With 2TB of storage, it was around $1,500 at the time. This costed more than a PC with the same amount of storage, so I thought to myself, "Why not use a PC with ZFS."

After some Googling, I found a number of people had already done exactly this, and Simon's blog was particularly helpful , especially for the system config that worked well for him. So, I decided to build a similar system. I ordered the components from Newegg.com, and the total price was less than $900 for a pretty powerful system (2.6 Ghz AMD Athlon, 2GB ECC RAM, 4x500GB WD drives). It took me a couple of days to put the system together, install OpenSolaris 2008.05, setup ZFS pools, and CIFS server/client, and voila, I have a system that can be used as a NAS as well as a general purpose server. Now, I have all my data in one place and be able to get to it from all the computers on the network, and be assured that if a drive fails, I won't lose data thanks to the RAID support in ZFS.

Wednesday Nov 19, 2008

Test driving Hyperic HQ 4.x

Hyperic just released Hyperic HQ 4.x, and I recently took it for a test drive on OpenSolaris 2008.05. The installation went smoothly, but I ran into one issue when starting the server. With a 64-bit kernel, the startup script assumes you have 64-bit JRE on the system. It my case, the 64-bit JRE wasn't available, so the server startup failed. I could have installed the 64 bit version, but I decide to just comment out the check in the startup script (hq-server.sh) to use the 32-bit version. Below are the details of the installation steps.

Before running the Hyperic HQ install script, setup the database first. Below are the commands I used for PostgreSQL. Run psql, and then run the following commands:

postgres=# create role admin with login createdb password 'hqadmin';
postgres=# CREATE DATABASE "HQ" OWNER admin;

Now, run the install script (assuming you're unpacked the download in /var/tmp) and note the answers I entered in bold.

-bash-3.2# /var/tmp/hyperic-hq-installer/setup.sh
Initializing Hyperic HQ 4.0.1 Installation...
Loading taskdefs...
Taskdefs loaded
Choose which software to install:
1: Hyperic HQ Server
2: Hyperic HQ Agent
You may enter multiple choices, separated by commas.
HQ server installation path [default '/home/hyperic']:
	1: Oracle 9i/10g
	2: PostgreSQL
	3: MySQL 5.x
What backend database should the HQ server use? [default '1']:
Enter the JDBC connection URL for the PostgreSQL database [default 'jdbc:postgresql://localhost:5432/HQ?protocolVersion=2']: Enter

Enter the username to use to connect to the database:
Enter the password to use to connect to the database: enter password here
(again): enter password again
HQ agent installation path [default '/usr/local/hyperic']: Enter

Loading install configuration...
Install configuration loaded.
Preparing to install...
Validating agent install configuration...
Validating server install configuration...
Checking server webapp port...
Checking server secure webapp port...
Checking server JRMP port...
Checking server JNP port...
Checking database permissions...
Verifying admin user properties
Validating server DB configuration...
Installing the agent...
Looking for previous installation
Unpacking agent to: /usr/local/hyperic/agent-4.0.1...
Setting permissions on agent binaries...
Fixing line endings on text files...
Installation Complete:
  Agent successfully installed to: /usr/local/hyperic/agent-4.0.1

 You can now start your HQ agent by running this command:

  /usr/local/hyperic/agent-4.0.1/bin/hq-agent.sh start

Installing the server...
Unpacking server to: /usr/local/hyperic/server-4.0.1...
Creating server configuration files...
Copying binaries and libraries to server installation...
Copying server configuration file...
Copying server control file...
Copying server binaries...
Copying server libs...
Setting up server database...
Setting permissions on server binaries...
Fixing line endings on text files...
Installation Complete:
  Server successfully installed to: /usr/local/hyperic/server-4.0.1

 You can now start your HQ server by running this command:

  /usr/local/hyperic/server-4.0.1/bin/hq-server.sh start

 Note that the first time the HQ server starts up it may take several minutes
 to initialize.  Subsequent startups will be much faster.

 Once the HQ server reports that it has successfully started, you can log in
 to your HQ server at: 

  username: hqadmin
  password: hqadmin

 To change your password, log in to the HQ server, click the "Administration"
 link, choose "List Users", then click on the "hqadmin" user.

Setup completed.
A copy of the output shown above has been saved to:

At this point the installation is completed, and the server can be started with the following line, but as you can see HQ failed to start.

-bash-3.2# /usr/local/hyperic/server-4.0.1/bin/hq-server.sh start
Starting HQ server...
Initializing HQ server configuration...
Checking jboss jndi port...
Checking jboss mbean port...
Setting -d64 JAVA OPTION to enable SunOS 64-bit JRE
Booting the HQ server (Using JAVA_OPTS=-XX:MaxPermSize=192m -Xmx512m -Xms512m -d64)...
HQ failed to start
The log file /usr/local/hyperic/server-4.0.1/logs/server.out may contain further details on why it failed to start.

Looking at the hq-server.sh script, I notice that it checks if you're running a 64-bit kernel. If so, it uses the 64-bit JRE. Since I only have 32-bit JRE on my system, I commented out the following section in hq-server.sh.

#if [ $THISOS = "SunOS" ] ; then
#       ARCH=`isainfo -kv`
#       case $ARCH in
#               \*64-bit\*)
#                 echo "Setting -d64 JAVA OPTION to enable SunOS 64-bit JRE"
#                       HQ_JAVA_OPTS="${HQ_JAVA_OPTS} -d64"
#                       ;;
#       esac

And rerunning the script, the server started up properly.

-bash-3.2# /usr/local/hyperic/server-4.0.1/bin/hq-server.sh start
Starting HQ server...
Removing stale pid file /usr/local/hyperic/server-4.0.1/logs/hq-server.pid
Initializing HQ server configuration...
Checking jboss jndi port...
Checking jboss mbean port...
Booting the HQ server (Using JAVA_OPTS=-XX:MaxPermSize=192m -Xmx512m -Xms512m)...
HQ server booted.
Login to HQ at:

At this point you can just hit the server by pointing the browser to, and you should see the Portal Dashboard.

This is just a quick cheatsheet to get you started. The official installation/configuration instructions are available on Hyperic website.

Monday Nov 10, 2008

Not just another NAS appliance

Sun just announced a new line of storage appliances that will forever change the NAS appliance market. By using OpenSolaris and technologies such as ZFS, DTrace, FMA and SMF with standard-based storage server, Sun is able to bring the products to market at a much cheaper price point than its competitor and yet provide more differentiated features such as:

  1. Analytics - With DTrace as the underlying technologies, Analytics allow users to ask questions about the storage system in production in realtime from a GUI.
  2. Hybrid Storage Pool - Leveraging the Flash technology, the ZFS Hybrid Storage Pool transparently manages DRAM, Flash, and low-cost hard drives, providing a high performance and cost effective storage solution.
  3. Remote Replication - Data can easily replicated with minimum configuration from the browser using the powerful ZFS features.
  4. Supported Protocols: NFS, CIFS, iSCSI, HTTP, WebDAV, FTP

To get more technical information about these exciting new NAS appliances, checkout the blogs and demos from the engineering team.

BTW, you can also try out the products for 60 days for FREE.

Friday Oct 31, 2008

OFBiz and PostgreSQL

There are a quite number of Open Source ERP apps out there, but only a few (Compiere/Adempiere, OpenBravo, xTuple, OFBiz) seem to have gained traction. Recently, I tried out OFBiz and getting it to work with PostgreSQL was quite simple. OFBiz is actually shipped with an embedded Java DB (Derby), so for testing or development, you probably don't need an external DB.

I did my test on OpenSolaris 2008.05, so if you use a different version of Solaris or a different OS, the instructions below may vary.

First, check out the source, compile, and run with the embedded database to make sure everything works properly.

  1. Create a directory where you want OFBiz to reside, and run "svn co http://svn.apache.org/repos/asf/ofbiz/trunk ofbiz"
  2. Go to the OFBiz directory (now referred to as $OFBIZDIR), and run "./ant run-install"
  3. In $OFBIZDIR, run "java -jar ofbiz.jar"
  4. Once OFBiz is started, check out the sample app by pointing the browser to http://localhost:8080/ecommerce/

Once your have it working with Derby, follow the steps below to switch to PostgreSQL. I use version 8.3.x.

  1. Assuming you have Postgres installed and ready to go, run:

    postgres $ createuser -S -D -R -P -E ofbiz
    postgres $ createdb -O ofbiz ofbizdb

  2. Edit the OFBiz DB config file in $OFBIZDIR/framework/entity/config/entityengine.xml

    <delegator name="default" entity-model-reader="main" entity-group-reader="main" entity-eca-reader="main" distributed-cache-clear-enabled="false">
    <group-map group-name="org.ofbiz" datasource-name="localpostnew"/>

    <datasource name="localpostnew"

  3. Copy PostgreSQL JDBC driver to $OFBIZDIR/framework/entity/lib/jdbc. Make sure to use the JDBC4 driver.
  4. Go to $OFBIZDIR, and run "./ant run-install" to populate the data in PostgreSQL
  5. Re/start OFBiz by running "java -jar ofbiz.jar"
  6. Once OFBiz is up and running, you should be able to access http://localhost:8080/ecommerce/

That's all there is to it.

Jignesh has a post on setting up OpenBravo with Postgres on OpenSolaris 2008.05.

Tuesday Sep 30, 2008

Setting up MediaWiki with PostgreSQL 8.3

Setting up MediaWiki (I use version 1.13.1) with PostgreSQL 8.3 is quite straightforward. Thanks to Greg Mullane for fixing the issues.

Since tsearch2 is integrated into core in 8.3, you only need to run the following three commands before running the MediaWiki install script.

$ createuser -S -D -R -P -E wikiuser
$ createdb -O wikiuser wikidb
$ createlang plpgsql wikidb

With 8.2, you have to run the following commands in addition to the above.

$ psql wikidb < /usr/postgres/8.2/share/contrib/tsearch2.sql (path for PostgreSQL on Solaris)
$ psql -d wikidb -c "grant select on pg_ts_cfg to wikiuser;"
$ psql -d wikidb -c "grant select on pg_ts_cfgmap to wikiuser;"
$ psql -d wikidb -c "grant select on pg_ts_dict to wikiuser;"
$ psql -d wikidb -c "grant select on pg_ts_parser to wikiuser;"
$ psql -d wikidb -c "update pg_ts_cfg set locale = current_setting('lc_collate') where ts_name = 'default';"

If you need to upgrade to PostgreSQL 8.3 from an older version of PostgreSQL, you may find the following links useful.

Wednesday Aug 20, 2008

Setting up Drupal with PostgreSQL on OpenSolaris 2008.05

Drupal is quickly gaining popularity among the many Open Source Content Management Systems (CMS) out there. AFAIK, Drupal is mostly used with MySQL on Linux, so I wanted to find out myself how easy (or hard) it is to get it working with PostgreSQL on OpenSolaris. Below were the steps I followed:

  1. Install OpenSolaris 2008.05. The installation from the Live CD went very smoothly, much simpler than previous versions of Solaris. Great job to the installer team! If you don't have the CD, you can download it from http://www.opensolaris.com/. Remember, OpenSolaris comes with the bare minimum, so additional software need to be installed from the network repository.

  2. Install Web Stack (Apache, PHP, MySQL, etc). After the OS installation, I fired up Firefox and just followed the instructions from the Welcome Page, clicked on "the world of OpenSolaris" link and that took me to http://www.opensolaris.com/learn/. From there, I followed these links: "Web Stack Getting Started Guide" > "2. Getting Started" > "Installing Web Stack Components" > "Setting Up Your AMP Development Environment". I decided to install everything using the following command.

  3. # pkg install amp-dev

  4. Initialize the Web Stack Environment & Start Apache. To do this, I just followed the Web Stack Getting Started Guide. Clicked on Applications > Developer Tools > Web Stack Getting Start Guide, and did the followings:
  5. Initialize the environment: Clicked on Applications > Developer Tools > Web Stack Initialize

    (The guide is a bit confusing, so run the following commands first before starting Apache/MySQL; otherwise you'll get errors)
    # svccfg import /var/svc/manifest/network/http-apache22.xml
    # svccfg import /var/svc/manifest/application/database/mysql.xml

    Start Apache & MySQL: Clicked on Applications > Developer Tools > Web Stack Admin > Start Apache2/MySQL servers

    Point browser to http://localhost so check that Apache is running.

  6. Install PostgreSQL. I wanted to install PostgreSQL 8.3 from the network repository, but I found out that 8.3 didn't make it to OpenSolaris 2008.05 which was based on Nevada build 86. To see all the available PostgreSQL packages, run.

    # pkg search -r postgres

    The output shows a lot of packages, but unfortunately the 8.3 packages are for later builds of Nevada. I did try to install one package, but it didn't work.

    I could build Postgres 8.3 myself, by fortunately Sun also provides the binary in tarball format at http://www.postgresql.org/ftp/binary/v8.3.3/solaris/opensolaris/i386/, so I just used this version.

  7. Setup PostgreSQL. I wanted to run Postgres using using the "postgres" user, but it's setup as a role by default, so I did the following to change it to a normal user:

    a) As root, run "passwd -d postgres"
    b) Edit /etc/user_attr and change the type for "postgres" from role to normal
    c) Change the home directory and shell for "postgres" using the usermod command

    Now I need to create the Postgres DB cluster. Make sure the directory where Postgres is installed is added to PATH.

    postgres$ initdb -D /directory/path/to/data
  8. Install Drupal 6.4. Follow the instructions in INSTALL.txt and copy the files to /var/apache2/2.2/htdocs.

    Follow the instructions in INSTALL.pgsql.txt to setup the database. Here's what I did. Make the Postgres is running.

    postgres$ createuser --pwprompt --encrypted --no-adduser --no-createdb drupal
    Enter password for new role:
    Enter it again:
    Shall the new role be allowed to create more new roles? (y/n) y

    postgres$ createdb --encoding=UNICODE --owner=drupal drupaldb

  9. Configure Drupal. Make sure Apache is running and point the browser to http://localhost, and you should see the Drupal setup page.

  10. That's it. Besides a couple of small hiccups, I think the whole process was quite straightforward.

Monday Jul 21, 2008

BIG News for the PostgreSQL team at Sun

If you haven't heard already, Peter Eisentraut is joining Sun to work on PostgreSQL. This is very exciting, and I'm looking forward to working with Peter on some interesting projects.

Just as Peter decided to join Sun, Josh Berkus decided to leave us. Josh has made many contributions, and he will be missed. I have learned a lot from him for the past couple of years. Josh, good luck with your new endeavors.

BTW, I'm at OSCON this week. I arrived here Saturday night so I could attend PDXPug Day. I'm glad I decided to attend this mini-conference because the talks were excellent and I also got to meet some cool PostgreSQL people.

Monday Apr 28, 2008

Austin's First PostgreSQL User Group Meeting

A few of us are getting together to start a PostgreSQL User Group in Austin. If you're interested in PostgreSQL and live in the area, come join us!

Our kick off meeting will be on Tuesday, May 6th from 6-8pm. Please RSVP to austinpug@decibel.org (also subscribe to austinpug@postgresql.org ) if you plan to attend so we can pre-register all visitors to save time and know how many pizza to get! BTW, pizza is FREE.

I'm pleased to be able to offer Sun's facility for this meeting. Here's the address and link to Google map.

Sun Microsystems
Building 8 - Longhorn Conference Room
5300 Riata Park Ct
Austin, TX

Thursday Apr 03, 2008

DTrace probes in PostgreSQL now work on Mac OS X Leopard

The issue with PostgreSQL's DTrace probes not working with Mac OS X Leopard as reported in the mail list has been fixed and checked into the 8.4 development tree. The problem had to do with the fact that Leopard's DTrace implementation not supporting the -G flag.

If you're curious about the gory details, check out the proposal, patch, and code commit.

With the new implementation, the steps for adding new probes are slightly different than before, but the provider and probe names remain the same. For details on how to use and add new probes, refer to the online doc.

Many thanks to Peter Eisentraut, Tom Lane, and Alvaro Herrera for their valuable feedback and assistance!

Monday Feb 04, 2008

Achieving High Availability with PostgreSQL using Solaris Cluster

Quick announcement: Sun has just released Solaris Express Developer Edition (SXDE) 1/08, a developer version that has all the latest and greatest Solaris features and tools. So what's new for PostgreSQL? The main updates/additions are 8.2.5, Perl driver, and pgAdmin III v1.6.3.

High availability (HA) is very important for most applications. PostgreSQL has an HA feature called warm standby or log shipping which allows one or more secondary servers to take over when the primary server fails. However, there are some limitations with PostgreSQL warm standby, two of which are:

  1. No mechanism in PostgreSQL to identify and perform automatic failover when the primary server fails
  2. No read-only support on the standby server

Fortunately, Solaris Cluster can be used to solve limitation #1. Solaris Cluster is a robust clustering solution that has been around for many years, and best of all it's open sourced and free.

Detlef Ulherr has recently implemented the Solaris Cluster agent to work with PostgreSQL warm standby. We discussed different possible use-case scenarios with PostgreSQL warm standby, and he came up with a design that I think will work well for non-shared storage clustering. Maybe not everything will be perfect initially, but as more people test out the agent, we'll know what can be improved. The agent is still in beta now and will be released soon, probably in a couple of months.

The cool thing with Solaris Cluster is that you can now setup a cluster on a single node using multiple Solaris Zones . This is extremely useful because it eliminates the need for multiple machines or complicated hardware setup if you just want to try it out or if you want a simple environment for doing development. Here's more details on the Solaris Cluster and Zones integration

Of course you wouldn't want to deploy your HA application on a single machine. In production environment, you should have at least a two nodes cluster. Please refer to the Solaris Cluster documentation for more info.

From my recent test, I setup the cluster on a single machine with two Solaris Zones. Here's how the automatic failover works in a nutshell:

  • Client connects to a logical host. The logical host is configured to have two resource groups, each containing a Zone.
  • The logical host initially points to the primary server (Zone 1) where PostgreSQL is running, and PostgreSQL is configured to ship WAL logs to the secondary server (Zone 2) where PostgreSQL is running in continuous recovery mode.
  • When primary server fails, the Solaris Cluster agent detects the failure and triggers the logical host to automatically switch to the IP address of the secondary server.
  • From the PostgreSQL client perspective, it's still using the same logical host, but the actual PostgreSQL server has moved to a different machine, all happens transparently.
  • If the client was connecting to the DB on the primary, the session would be disconnected momentarily and reconnected automatically to the DB on the secondary server, and the application would continue on its merry way.

The combination of PostgreSQL warm standby and Solaris Cluster can provide an enterprise class clustering solution for free (unless to need support services off course). So, please try it out and provide your feedback on what can be improved.

In my next blog, I will discuss how Solaris ZFS and Zones can be used in a clever way to overcome limitation #2. This idea has been used already, and some of you may have seen Theo's blog on this topic. I will provide a working sample code and step by step instructions for setting it up.

Wednesday Jan 16, 2008

Postgres and Solaris Virtualization

Many people today use virtualization technology to consolidate applications to fewer and more powerful systems to improve system utilization, save datacenter space and power. One way to achieve this on Solaris is through Zones . Combined with Solaris Resource Management , a Solaris Zone provides a virtualized environment where resources such as CPU and memory can be controlled.

To demonstrate how this works, I will show a simple example of how to run PostgreSQL in Solaris Zone and adjust CPU cap as appropriate for this application. You can also restrict other resources such as memory or number of processes in a zone, but this example only covers CPU capping.

Note: To follow this example, you need a machine with Solaris Express Developer Edition 9/07 or later installed.

1) Create a new zone. Here is a script you can use to automate the zone creation. Save it to a file called create_zone.sh, and run it like this ./create_zone.sh pgzone /zones By default, this script assign the zone 20% of a CPU. You can adjust this number(ncpus) as necessary.

    PrintMsg() {
      echo "$CURRENTTIME: $\*"
    if [[ -z "$1" || -z "$2" ]]; then
                    PrintMsg "Usage: $0  "
                    PrintMsg "Example: $0 pgzone /zones"
                    exit 2
    if [[ ! -d $2 ]]; then
                    PrintMsg "$2 does not exist or is not a directory"
                    exit 1
    PrintMsg "Creating a new zone"
    zoneadm -z $name list > /dev/null 2>&1
    if [ $? != 0 ]; then
            PrintMsg "Configuring $name"
            rm -f $commands
            echo "create" > $commands
            echo "set zonepath=$dir/$name" >> $commands
            echo "set autoboot=true" >> $commands
            echo "set scheduling-class=FSS" >> $commands
            echo "add capped-cpu" >> $commands
            echo "set ncpus=0.2" >> $commands
            echo "end" >> $commands
            echo "add capped-memory" >> $commands
            echo "set physical=256m" >> $commands
            echo "set swap=256m" >> $commands
            echo "end" >> $commands
            echo "set cpu-shares=10" >> $commands
            echo "set max-lwps=500" >> $commands
            echo "commit" >> $commands
            zonecfg -z $name -f $commands 2>&1 | \\
                sed 's/\^/    /g'
            PrintMsg "$name already configured"
    # Installing
    if [ `zoneadm -z $name list -p | \\
        cut -d':' -f 3` != "configured" ]; then
            PrintMsg "$name already installed"
            PrintMsg "Installing $name"
            mkdir -pm 0700 $dir/$name
            chmod 700 $dir/$name
            zoneadm -z $name install > /dev/null 2>&1
            PrintMsg "Setting up sysid for $name"
            rm -f $cfg
            echo "network_interface=NONE {hostname=$name}" > $cfg
            echo "system_locale=C" >> $cfg
            echo "terminal=xterms" >> $cfg
            echo "security_policy=NONE" >> $cfg
            echo "name_service=NONE" >> $cfg
            echo "timezone=US/Pacific" >> $cfg
            echo "root_password=Qexr7Y/wzkSbc" >> $cfg  # 'l1a'
    PrintMsg "Booting $name"
    zoneadm -z $name boot
2) Open a terminal. As root, log into the zone using zlogin (e.g zlogin pgzone).

3) Once you're in the zone, do the following:
   a. As root, su to postgres:
      # su - postgres

   b. Create PostgreSQL DB cluster:
      $ /usr/postgres/8.2/bin/initdb -D /var/postgres/8.2/data

   c. As root, use the SMF's svcadm command to start PostgreSQL:
      # /usr/sbin/svcadm enable postgresql:version_82

   d. Create and load a db called bench
      $ /usr/postgres/8.2/bin/createdb bench
      $ /usr/postgres/8.2/bin/pgbench -i -s 5 bench

   e. Run this Cartesian joint multiple times (try 5) to generate CPU load

      $ /usr/postgres/8.2/bin/psql -d bench -c "select count(\*) from accounts foo, accounts bar;" &
4) In the global zone, using another terminal window, run the following command to see cpu usage for each zone. Note the zone that Postgres is running should cap to around 20% if you have a single CPU system.
   # prstat -Z
5) You can dynamically adjust the amount of CPU assigned to the zone using the prctl command. In another terminal window, run:
   # prctl -n zone.cpu-cap -i zone 
So in a nutshell, that's how you can use Solaris Zones and Resource Management to improve system utilization in a virtualization environment. As I mention in the beggining, you can also cap other resouources as well which make the combination of Solaris Zones and resource management very powerful.



« July 2016