Sunday Apr 05, 2009

Effects of Flash/SSD on PostgreSQL - PostgreSQL East 2009

I presented a talk at PostgreSQL East 2009 at Drexel University today morning.  The topic was "Effects of Flash/SSDs on PostgreSQL".  Here are the slides from the presentation

If you have questions please leave comments.

Wednesday Nov 12, 2008

Snapshots with PostgreSQL and Sun Storage 7000 Unified Storage Systems

Continuing on my earlier blog entry on setup of PostgreSQL using Sun Storage 7000  Unified Storage (Amber Road), let's now look at how to take snapshots for such systems. 

Considering a typical PostgreSQL system there are two ways to take backup: 

Cold or Offline backup: Backing of PostgreSQL while PostgreSQL server is  not running. The advantage of such a backup is that it is simple to achieve the purpose of backing up the database. However the disadvantage is that the database system is unavailable for work.  The general steps for doing a cold backup is, shutdown gracefully the PostgreSQL server and backup all the files including $PGDATA, pg_xlog if it is in different location and all the tablespaces location used for that server instance. After backup the files, start the PostgreSQL server again and you are done. In case if you ever want to go back to that earlier backup version, you shutdown the database, restore all the old files and restart the database.  This is essentially the same logic which we will follow with snapshots except if you now have a multi gigabyte database, the snapshot operation  will look like million times faster than having to do a "tar" or "cpio" (specially if you consider that "tar" is a single threaded application doing its best but still slow if you use it on multi-core systems).

Lets consider an example on how to take a snapshot with OpenStorage systems with Cold or Offline backup strategy.

Assume that Sun Storage 7000  Unified Storage System  has the  hostname "amberroad" (Sun Storage 7000 Unified Storage Systems is mouthful ) have a project defined "postgres" which has three devices exported "pgdata" which holds $PGDATA, "pglog" which holds $PGDATA/pg_xlog and "pgtbs1" which is one of the tablespaces defined.

Then a backup/snapshot script  (as postgres user ) will look something similar to the following (Disclaimer:  Provided AS-IS. You will have to modify it for your own setups):

#!/bin/sh
date
echo "---------------------------------------------------------------"
echo "Shutting down the PostgreSQL database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Deleting earlier snapshot called testsnap.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres snapshots select testsnap destroy
echo "---------------------------------------------------------------"
echo "Taking a new snapshot..."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots snapshot testsnap
echo "---------------------------------------------------------------"
echo "Verifying the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots show
ssh -l root amberroad shares select postgres snapshots select testsnap show
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
date

Note: The testsnap should not exist otherwise snapshot operation will fail and hence we delete any earlier snaps called testsnap in the script. You may modify the script to suit your priorities.

Now in case you want to restore to the snapshot "testsnap" version
you can use a sample script as follows:

#!/bin/sh
date
echo "Restoring the database"
echo "---------------------------------------------------------------"
echo "Shutting down the database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Restoring the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres select pgdata snapshots \\
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pglog snapshots \\
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pgtbs1 snapshots \\
select testsnap rollback
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
echo "Database restored to earlier snapshot: testsnap."
echo "---------------------------------------------------------------"
date

NOTE: When you use the above script, all snapshots taken after “testsnap” are lost as they are rolled back to the point in time when “testsnap” was taken. There is another feature to clone the snapshot incase if you don't want to use rollback.

Hot or  Online backup:  A Hot or Online backup is when one takes a backup while PostgreSQL server is running. The advantage is that the database server is available. The disadvantage is that the setup is more complex than the Cold or Offline backup strategy. The recommended way to use Hot or Online backup with PostgreSQL is to use it in conjuction with PITR - Point in Time Recovery feature of PostgreSQL.

This can be achieved by turning on continuous WAL archiving in PostgreSQL and using   SELECT pg_start_backup('label');before taking the snapshot and then issuing SELECT pg_stop_backup(); after the snapshot is completed.

For this you will also need another dedicated device under the "postgres" project to just hold the WAL archive files. (In my case I call it "pgwal" and mounted as /pgwal/archive 

To turn on continuous WAL archiving you need to set the following variables in postgresql.conf (and then restart the PostgreSQL server):

archive_mode = true
archive_command = 'test ! -f /var/lib/postgres/8.2/data/backup_in_progress || cp -i %p /pgwal/archive/%f < /dev/null'
archive_timeout=3600

Then a sample backup/snapshot with hot backup  script can look as follows:

#!/bin/sh
#PGDATA= /var/postgres/8.3/data; export PGDATA # if not set 
date
echo "---------------------------------------------------------------"
echo "Indicating PostgreSQL for a hot Snapshot.."
echo "---------------------------------------------------------------"
touch $PGDATA/backup_in_progress
psql  -c “select pg_start_backup('MyHotSnapShot');” postgres
echo "---------------------------------------------------------------"
echo "Deleting earlier snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres snapshots select testsnap destroy
echo "---------------------------------------------------------------"
echo "Taking a new snapshot..."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots snapshot testsnap
echo "---------------------------------------------------------------"
echo "Verifying the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots show
ssh -l root amberroad shares select postgres snapshots select testsnap show
echo "---------------------------------------------------------------"
echo "Indicating to PostgreSQL Server that Hot Snapshot is done."
echo "---------------------------------------------------------------"
psql  -c “ select pg_stop_backup();” postgres
rm $PGDATA/backup_in_progress
echo "---------------------------------------------------------------"
date

Restoring from a hot backup snapshot is also bit tricky since we need to decide whether we need to roll forward and upto what point. It is probably beyond the scope of my post and hence I will just do a simple one with all files in the wal arhcives.

First of all restoring with PITR requires a recovery.conf file which should contain the path to the WAL Archives:

restore_command = 'cp /pgwal/archive/%f "%p"'

Also since we will be doing rollforward with the archive files,  we need to remove all the old pg_xlog files, create an archive status directory, remove the postmaster.pid file and an artifact of our earlier script, remove the backup_in_progress marker file too.

(We will do the above steps also as part  of our restore script.)

A restore snapshot script file can look something like as follows:

#!/bin/sh
#PGDATA= /var/postgres/8.3/data; export PGDATA # if not set 
date
echo "Restoring the database"
echo "---------------------------------------------------------------"
echo "Shutting down the database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Restoring the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres select pgdata snapshots \\
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pglog snapshots \\
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pgtbs1 snapshots \\
select testsnap rollback
#if there is reason to rollback walarchives then uncomment the next two lines
#ssh -l root amberroad confirm shares select postgres select pgwal snapshots \\
#select testsnap rollback
echo "---------------------------------------------------------------"
echo "Make sure pg_xlog/\* is empty ..."
echo "Make sure pg_xlog/archive_status exists..."
echo "Make sure postmaster.pid is removed"
echo "---------------------------------------------------------------"
rm $PGDATA/backup_in_progress
rm $PGDATA/postmaster.pid
rm $PGDATA/pg_xlog/\*
mkdir -p $PGDATA/pg_xlog/archive_status
echo "---------------------------------------------------------------"
echo "Set recovery.conf for recover.."
echo "---------------------------------------------------------------"
echo restore_command = \\'cp /pgwal/archive/%f "%p"\\' > $PGDATA/recovery.conf
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
echo "Database restored."
echo "---------------------------------------------------------------"
date

Of course the above scripts may not be best but gives an idea of how one can use snapshot features provided by Sun Storage 7000  Unified Storage Systems with PostgreSQL.



Wednesday Nov 05, 2008

Yes You Can - Save $$$ on cost of Proprietary Databases

The New Sun Microsystems  announced a new Glassfish/MySQL/OpenSolaris SPECjAppServer2004 result today. The  real highlight is that all software used in this benchmark is all Open Source Software. It is  a win for Open Source Software including Open Source Databases like PostgreSQL and MySQL. We need more of such benchmarks to highlight the exorbitant prices charged by Proprietary Database & other Software Vendors who charge and force customers to give them all their major dollars of their IT budget.

Tom Daly's blog entry highlight that in terms of Price/Performance, the proprietary database vendors who  conveniently also happen to be the Application Server Vendors charge  a much higher cost including all hardware/software (10X) associated with same performance which can be obtained  by Open Source database alternatives like PostgreSQL and MySQL. Check BM Seer's blog entry for more comparison.


SPEC required disclosure : - SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 5th Nov 2008 All comparisons are based on the SPEC SPECjAppServer2004JOPS@Standard metric from www.spec.org or on pricing made using the bill of materials included in each SPECjAppServer2004 result

Thursday Oct 30, 2008

PostgreSQL 8.3 in pkg.opensolaris.org repository (Mostly)

PostgreSQL 8.3.4 is now in the pkg.opensolaris.org repository (most of it). Since there is still no single manifest tag for all the files yet, so I thought I will highlight the names of the packages (specially the main ones with name changes) for a easy reference. Also as Hugo found that one of the packages is missing in the repository (SUNWpostgr-83-server-data-root) which needs tracking down.

The main pkgs  to be installed (via "pkg install pkgnames") are as follows

  • SUNWpostgr-83-server
  • SUNWpostgr-83-libs  (pulled by SUNWpostgr-83-server as dependent)
  • SUNWpostgr-83-client
  • SUNWpostgr-jdbc
  • SUNWpostgr-83-docs
  • SUNWpostgr-83-contrib
  • SUNWpostgr-83-devel
  • SUNWpostgr-83-pl
  • SUNWpostgr-83-tcl
  • SUNWpgbouncer-pg83

 Note pgbouncer is now part of the repository also.

Note the JDBC package now includes the latest support and there is no seperate version to download.

Update : SUNWpostgr-83-server-data-root is no longer required and is part of SUNWpostgr-83-server



Sunday Jul 20, 2008

Is PostgreSQL dead at Sun?

Ever since Sun Microsystems acquired MySQL, the most common question  that I had to answer  was "Is PostgreSQL dead at Sun?".  This question was not only raised from just PostgreSQL contributors but also other Sun Employees and  potential customers who may not even know that Sun actually sells PostgreSQL support services

This question will be further fueled with Josh Berkus's departure. (Josh, we will miss you.)  I admit from the period between the announcement of MySQL acquisition till recently, I too had my own doubts. But as Sun enters it new Fiscal Year FY09, and the clear directions from Management has given the right positive goals to the Sun PostgreSQL Team which will lead to Positive Growth, Self Sustaining model and help contribute to the bottom line of Sun. The very first positive outcome of the new year is the hiring of Peter Eisentraut. As Josh mentions in his blog, Peter has lot of actual customer experience specially in migrations from other databases to PostgreSQL. He will bring lot of customer experience to the table to help us focus more on customer needs.


Of course this does not mean that we don't have other challenges in contributing to the bottom line of Sun.  One of the biggest challenges that the Sun PostgreSQL team still has is perception.   Most potential customers are not even aware that Sun actually sells PostgreSQL support. I am just dazed by the comment.. "Didn't Sun buy MySQL? Do they really sell PostgreSQL Support services?" (Hey Peter, do you have a silver bullet for this problem?)

It is important for us to overcome this perception problem this year so that we spend less time marketing that "we exist" and more time on "how do we help with your PostgreSQL problems?".

 If you ask me there are many other challenges but I think they deserve their own blog entries. In the mean time .. PostgreSQL is alive and kicking well at Sun and we look forward to a very interesting year with a refined focus.



Peter Eisentraut joining Sun

Just read the  entry on  "On the Record" blog on blogs.sun.com that Peter Eisentraut has joined Sun's PostgreSQL Team. Peter Eisentraut is a PostgreSQL Core Team member. Check more about Peter on his blog.

I had met Peter for the first time when  I had attended Peter's tutorial at PGCon 2008. Looking forward to work with him.

Peter, Welcome to Sun.


Wednesday Jul 02, 2008

wal_buffers in postgresql 8.3.3

Recently Felix Malinkevich joined the ISV-E Open Source Team as an Intern working on PostgreSQL along with us. On his first project he takes on pgbench with PostgerSQL 8.3.3 on a drive on OpenSolaris and Ubuntu using VirtualBox on his system. Read about his experience with wal_buffers in PostgreSQL 8.3.3 .

Resources on  PostgreSQL here just went up.

 

Wednesday May 21, 2008

Problems with PostgreSQL on Multi-core Systems with Multi-Terabyte Data

Slides for today's presentation at PGCon 2008

Title:  Problems with PostgreSQL on Multi-core Systems with Multi-Terabyte Data

 

Friday May 16, 2008

PGCon 2008 next week in Ottawa, Canada

It is that time of the year when many PostgreSQL fans gather in Ottawa, Canada for PGCon 2008 next week. This will be my first visit to PGCon in Ottawa. Earlier this year I had presented two sessions "PostgreSQL and Benchmarks" and "Best Practices of PostgreSQL on Solaris" at PostgreSQL Conference 2008 East in Maryland. Thanks to that visit, this time I might recognize many people by face this time around.

Sun is a Gold Sponsor at PGCon 2008. There will be quite a bit of presence from Sun in PGCon. Josh, Max, Robert, Magne, Zdenek, Jim, Mayuresh et all will be present out there.

Josh Berkus is doing a tutorial on  "GUCs: A Three Hour Tour" on Tuesday morning 9:00 am.

Susanne Ebrecht (from MySQL team @ Sun) will also be presenting "What PostgreSQL could learn from MySQL" on Thursday 1:30pm.

 I  will be presenting "Problems with PostgreSQL on multi-core Systems with multi-terabyte Data" at Thursday 3:00pm.  I am actually presenting on behalf of the PostgreSQL Performance Team @ Sun and based on various performance work done by the group.

Robert Lor is also doing a  5-min Lightning Talk on New DTrace Probes proposed for PostgreSQL 8.4. Not to steal his thunder but  the demo will be quite interesting. It is part of Lightning Talks on Thursday 5:30pm

Zdenek Kotala will be presenting "PostgreSQL Upgrade Project" on Friday 3:00pm . In-place Upgrade in an eagerly awaited project.

Personally I will be meeting many of the Sun folks also for the first time in person. Generally we talk on conference calls, emails, etc.  Definitely looking forward to that.



 


Tuesday May 06, 2008

How OpenSource and ISVs can leverage OpenSolaris

I briefly mentioned in my earlier post that OpenSolaris binary is probably just a chapter of a book. To really understand how the benefits of OpenSolaris lets look at the needs of a typical Software Application

A Software Application at a very base level of being a usage product needs the following:

  1. Operating System: First and foremost which generally nobody wants to rewrite unless there is something missing that you really need. Generally it is part of "Systems requirement" which defines the operating systems supported.
  2. Method of distributing the software: Either via CD, DVD or via internet downloads
  3. Installer: Installing the software on the target machine
  4. Patching/Upgrading: Upgrading the software when new versions are released (specially with security updates)

So how does OpenSolaris solve this problems?

Lets start with (1). For the first time that I am aware, this is first Any Solaris release which makes it so friendly to even bundle an Operating System with a software product that the customers of Software application need not be even concerned about if the server (virtual for that matter) is going to do only the task of running this software application. I may not be far from truth when I mention that since I am aware of many people who are using Virtual Machine which is only doing one tasks. Suddenly the number of operating systems increases drastically and multiplied with number of operating systems available, I forsee more and more people bundling the operating system as part of their software application. In such sense OpenSolaris is already ahead of the curve providing tools to make the whole bundling experience easy with open source projects which includes Distribution Constructor and the new Installer project. Infact the New Installer project even makes it easy to not only install OpenSolaris but also your own application with it  and since it is all open source it can all be modified to have your own branding image out there giving a sense of control to the "key task" product to be highlighted during installation. (Maybe call it PostgreSQL OS). Looks like I am also covering (3).

Now to cover (2) and (4) the repository with pkg(5) command is ready to cover those responsibilities. The thing to realize is that there can be more than one repositories. For instance http://pkg.opensolaris.org is a repository which has all the packages for OpenSolaris itself, but an ISV or OpenSource communities can create their own repositories for their Software application. Now suddenly having a software application which is easy to distribute, install, update is now so easy to create and maintain with this ecosystem, its hard to imagine why this was not done years ago. 

This new eco-system makes it easy to cater to multiple type of Software applications

1. Controller Software application where all installations need to have identical stack. (Creating custom distribution helps achieve that)

2. Latest and Greatest Software application model (using pkg makes it very easy)

3. Pick and Choose model : As repositories increase in number there will be choices where to get packages and pre-built binaries for your needs.

Ofcourse I am probably skipping many other requirements of an ISV since every ISV is different but if dissected probably the new ecosystem can probably fill most (if not all) of the needs in this new world of Solaris.

 

 

 

 

 

About

Jignesh Shah is Principal Software Engineer in Application Integration Engineering, Oracle Corporation. AIE enables integration of ISV products including Oracle with Unified Storage Systems. You can also follow me on my blog http://jkshah.blogspot.com

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today