Wednesday Apr 09, 2008
Thursday Mar 27, 2008
By Jignesh Shah on Mar 27, 2008
PostgreSQL Conference East 2008 is probably the first PostgreSQL user conference that I will be attending and also presenting couple of sessions. One of my sessions PostgreSQL and Benchmarks is on Saturday 2:30pm in CSIC 2117 and the other Best Practices of PostgreSQL on Solaris is on Sunday 1:45pm in CSIC 3117 (Schedule can change!). Its is being held in the Center for Scientific Computation and Mathematical Modeling at the University of Maryland College Park.
It will be good to meet in person lot of names that I have only read on emails but never had a face associated with it. Not sure if any local Sun folks will be visiting the conference but if you are, it will be good to meet all out there.
Wednesday Feb 27, 2008
By Jignesh Shah on Feb 27, 2008
As part of Project Indiana which is working on getting a binary distribution of OpenSolaris, there is a new OpenSolaris Developer Preview 2 release available which is based on Solaris Express build 79a. So why am I interested in this "yet-another-distribution" ?
Currently when people are interested in running PostgreSQL on Solaris, check out where the site points them. It points to download Solaris. The choice is either to download Solaris 10 8/07 (latest production release) or Solaris Express latest builds (bleeding edge technology). So yes its already out there. It also includes PostgreSQL 8.2 in both releases. In fact soon PostgreSQL 8.3 will also be integrated in the latest Solaris Express releases. So what's the problem?
If you ask me, the problem is the size of the download. I think for a lean-mean database like PostgreSQL whose binary download size is less than 10MB, telling people to download the DVD image of Solaris Express which typically could be around 4GB. That's just too much for what they really need. Coming back to PostgreSQL, what portions of Solaris do you really need to execute PostreSQL to its optimum? Kernel, ZFS, DTrace, Monitoring Tools, maybe some basic X windows to use PgAdminIII. So why force people to download the whole DVD every time there is a new Solaris Express build out there.
Project Indiana does help to cut it down to less than 1/5 th of the original. The download size of the new OpenSolaris Developer Preview 2 is about 650MB which is now a CD image. It takes out lot of "overhead" and simplifies lot of scenarios and works best when you have a brand new server and want to use the Operating System primarily to run a database on it (the way I see it, your view may differ).
Project Indiana is opening doors for "Database Appliances" with OpenSolaris at its core.
Anyway I took it to do a test drive on my laptop. After installing OpenSolaris Developer Preview 2. It has basic GNOME, Firefox already installed. (And that's how I am writing this current blog entry.) The installation did not have any versions of PostgreSQL installed. I wasn't surprised. The whole idea of Project Indiana is to pull down additional packages that user requires from the network. Of course which means its time to get familiar with the new Image Packaging System or pkg(5)
First of all I have to search if there are any PostgreSQL related packages on the default server that it refers. (Make sure to use -r option to search on the remote server) (It goes without saying, you need an internet connection to work before you do the following.)# pkg search -r postgres
Good to see that PostgreSQL 8.1 and PostgreSQL 8.2 versions are already available for on-demand download. (I had cheated here. I checked before executing the command that it did exist at http://pkg.opensolaris.org ) (NOTE: There are plans for PostgreSQL 8.3 download also but will take some time to get it out there. In the meanwhile the Solaris binaries (i386) can be downloaded from postgresql.org which works on OpenSolaris Developer Preview 2. See below)
Notice that 8.1 has only two packages while PostgreSQL 8.2 has quite many out there. I remember that 8.1 used to have Jumbo packages but patching them was pain since the patches ended up being the whole Jumbo packages and hence for PostgreSQL 8.2 it was broken down into bunch of smaller packages. The list shows the original versions and also patched final versions. Well that causes me some problem here since I am not sure which packages to install and in what order. Fortunately pkg(5) is aware of package dependencies so that solves part of the problem but I still have to make sure all of them have been installed. I figured the best way to narrow down the list is to filter by the 5.11-0.79 stamped packages so I see the package names only once that I want to install.
# pkg install SUNWpostgr-82-server
DOWNLOAD PKGS FILES XFER (MB)
Completed 2/2 166/166 12.96/12.96
Install Phase 289/289
Notice that IPS also downloaded the postgr-82-libs package.
# pkg search postgres
We can continue installing one package at a time
# pkg install SUNWpostgr-82-client
DOWNLOAD PKGS FILES XFER (MB)
Completed 1/1 64/64 3.66/3.66
Install Phase 117/117
However installing multiple packages seems to make more sense to me# pkg install SUNWpostgr-82-jdbc SUNWpostgr-82-server-data-root SUNWpostgr-82-contrib SUNWpostgr-82-devel SUNWpostgr-82-pl SUNWpostgr-82-tcl SUNWpostgr-82-docs
DOWNLOAD PKGS FILES XFER (MB)
Completed 7/7 1514/1514 12.99/12.99
Install Phase 1667/1667
# ls /usr/postgres/8.2
bin doc etc include jdbc lib man share
Anyway now we are now ready with PostgreSQL 8.2.5 on OpenSolaris Developer Preview as one would normally use it on Solaris.
For now to use PostgreSQL 8.3 on OpenSolaris Developer Preview is to download the Solaris binaries from postgresql.org and install it in /opt as follows# cd /opt
# bunzip2 < /tmp/postgresql-8.3.0-sxde.i386.tar.bz2 | tar xpf -
# ls /opt/postgres/8.3-community/ bin doc etc include jdbc lib man share
Whatever version you use just make sure you have the right directory in your PATH environment variable. Talking about PATH, you may also want to include the new /usr/gnu/bin along with your favorite path.
Tuesday Feb 12, 2008
By Jignesh Shah on Feb 12, 2008
With the acquisition of MySQL I expect that many people might end up comparing MySQL and PostgreSQL using sysbench. It is like a micro-benchmark utility which includes an oltp mode which is used quite a bit to show MySQL performance. It can actually also be used with PostgreSQL. So this post is about how to configure sysbench to work with PostgreSQL. (Primarily a note for myself since I had to do hunt around to get it configured for PostgreSQL).
First download the latest version of sysbench. I had downloaded the version sysbench-0.4.8. After gunzip/untar I had to to figure out few steps to get the right configure script for it.
I am going to use the Sun Studio Compiler (since PostgreSQL in my case is also built with Sun Studio Compiler). So I will need the compiler cc in my path. Also after couple of tries I found that I also need the right version of PostgreSQL binaries since the configure script uses pg_config to figure out the Libraries and include files. So make sure your path is setup right before using the configure script. You might also need make, gmake in your path too.
Plus if you dont have mysql already installed on your system (and in your path), you have to use --without-mysql otherwise compilation fails since --with-mysql is default even if you select --with-pgsql since it can support multiple databases with the same binary.
$ cd sysbench-0.4.8
$ export PATH
$ ./configure --prefix=$HOME/sysbench-0.4.8 --without-mysql --with-pgsql \\
CFLAGS="-xO2 -m64" CC=cc
$ gmake install
So hopefully if all goes well in your setup the sysbench binary should now be available in $HOME/sysbench-0.4.8/bin directory
Now how to run the benchmark with PostgreSQL. First of all make sure that PostgreSQL instance is running and a database for the test (for example called dbtest) is available. (I assume that the user is already familiar with Postgres and skipping the steps for it).
Now use the sysbench binary to setup the table for the oltp test. You will generally need to select the size of the table (in terms of rows) to prepare for it.
Here are the ones that I used. (Replace pguser/pgpass with right credentials)
$ ./sysbench --test=oltp --pgsql-user=pguser --pgsql-password=pgpass \\
--pgsql-db=dbtest --db-driver=pgsql --oltp-dist-type=special \\
--oltp-table-size=100000 --oltp-read-only=on --num-threads=16 prepare
$ ./sysbench --test=oltp --pgsql-user=pguser --pgsql-password=pgpass \\
--pgsql-db=dbtest --db-driver=pgsql --oltp-dist-type=special \\ --oltp-table-size=100000 --oltp-read-only=on --num-threads=16 run
I have extra arguments which may not be needed for both statements but since I was using bash up/down a lot, its just easier to replace prepare/run and num-threads while keeping the rest the same.
However before doing prepare again make sure to drop the table sbtest
$ psql dbtest -c "drop table sbtest"
Enjoy using sysbench with PostgreSQL on Solaris.
Thursday Feb 07, 2008
By Jignesh Shah on Feb 07, 2008
Sunday Feb 03, 2008
By Jignesh Shah on Feb 03, 2008
Solaris Express, Developer Edition 1/08 or SXDE 1/08 as its affectionately called within Sun is being released on monday. There are certain new features that I think needs to be highlighted specially for the PostgreSQL community.
Well for starters the versions of PostgreSQL included in SXDE are PostgreSQL 8.1.10 and PostgreSQL 8.2.5. (Unfortunately not all security fixes made it to the release though). However the big news is the PostgreSQL servers which are pretty hidden now has an administration GUI. Yes pgAdmin III is now included in SXDE.
Also there are lot of new features which now are quite well integrated with PostgreSQL.
For example take NetBeans 6.0 which is newly included in SXDE 1/08. The most tauted addition in Netbeans 6.0 is the Ruby Feature. But many people fail to notice that now by default NetBeans knows that there is a database called PostgreSQL (along with MySQL) and it also has the PostgreSQL JDBC driver all configured to connect to PostgreSQL database. How about that for simplifying PostgreSQL access for Java applications?
Another new feature in SXDE 1/08 is "Project WebStack". Project WebStack or "Web Application Stack optimized for Solaris" makes the Solaris equivalent of LAMP (which some people call it SAMP, AMPS, etc) easy to use out of the box. LAMP/SAMP is traditionally Apache WebServer, MySQL, PHP. It does support PostgreSQL and hence for the PostgreSQL loving folks like me, we know have WebServer, PostgreSQL with PHP support along with an IDE, Netbeans all out of the box with this new version. Also Glassfish, the only application server which boasts of publishing results with PostgreSQL is also included in the release.
So that takes care of application development where it uses PostgreSQL.
Little known secret about SXDE is there is nothing preventing you to use it actually as a deployment platform. Why would one use it as a deployment platform? Well for one it is free, second it has features that seems compelling enough that other versions may not have and its a feature that is needed now in order to deploy.
Are there any such features in SXDE? To my mind yes there are few.
My favorite deployment features are as follows:
1. Support for CIFS Server in ZFS along with iSCSI and NFS. So ZFS volumes can now be exported from Solaris in three forms NFS, iSCSI and CIFS (Windows). Now why that is important for PostgreSQL community. Actually I can use my favorite system X4500 just as mirror storage and use a bigger system like Sun Fire X4450 to drive the PostgreSQL engine while the database is stored in X4500. Which means while the data is stored on mirrored drives in X4500 via ZFS, but now this allows me to use Operating System of Choice (Solaris for me, Linux and/or Windows for many) on 16-core Sun Fire X4450 for run PostgreSQL. So now I have 24TB (after mirror) storage talking to my server, allowing me to take snapshots or make clones for replications irrespective of the Operating System that is running beneath PostgreSQL. This feature alone allows you to use ZFS features in Solaris without moving your PostgreSQL application to Solaris. Definitely an "award" worthy feature.
2. Sun xVM server: The "hypervisor" to allow other Operating System to be hosted with SXDE as the "host" operating system. I do see lot of opportunities here again for deployment. For example another one of my pet projects (which means managers gets to take the allocated time away from it since they know I will end up doing it anyway sooner or later) is to create a framework for deploying PostgreSQL based end user applications pre-configured, minimized and easy to setup "appliances" on top of existing xVM servers. I think this model will be very beneficial for applications based on different open source products. One example that comes to my mind is SugarCRM deployment using PostgreSQL and glassfish. Since if one ends up using components from different communities, there ends up some configuration work. But now such configuration work need not be replicated as one properly setup, then theoretically all you should do is "sys-unconfig" the virtual machine and make clones of it and deploy it on any xVM server. The first boot will take you to OS configuration. But the application setup is already out and configured and ready to start via SMF.
So overall lots of new features in SXDE 1/08 that helps the PostgreSQL users whether they are running their database on Solaris or not.
Wednesday Jan 09, 2008
By Jignesh Shah on Jan 09, 2008
As days goes by, more and more multi-core systems are popping up everywhere. Infact, with the advent of the new 4-socket quad-core Sun Fire X4450 , 16-core systems are soon becoming common out there.
However,personally, I think these systems are still underutilized from their true potentials. Of course virtualization is a way of increasing the utilization but that is like working around the symptoms and not fixing the real problems. Software Applications fundamentally has lagged behind the microprocessor innovations. Operating Systems too have lagged behind too. Solaris, however advanced opertating system has still lot to achieve in this area. For example, yes the kernel is multi-threaded, yes it can scale well easily to 100s of cores but that scaling is generally achieved by creating copies of the process (or multiple connections or multiple threads however you look at it) at the APPLICATION level. One area however that generally falls behind is its own utility commands. For example: tar, cp, compress, or pick your favorite /usr/bin command). These utility programs will generally end up using only one core or virtual cpu. Now Solaris does provide a framework API for multi-threaded systems, but it is still surprising to me that not many people are asking for versions of the basic utilities that can use the resources available to make it a big priority. I think there is a significant loss of productivity waiting for utilities to complete while the system practically is practically running idle loops on the rest of the cores.
Database is an application of an operating system, few commercial databases have handled these challenges to adapt to multi-core systems well, Even opensource databases are not far behind as seen by the multi-threaded nature of MySQL and the scaling of PostgreSQL on multicore systems. However an area that even PostgreSQL lacks support for multi-core systems is "utilities". Load, Index creations, Backup, Restore to me are all utilities of a database system which generally when it happens have lot of eyes watching and waiting for them to complete. Now lot can be done by having multiple connections. Break up work and execute bits which can be done in parallel to be invoked using different connections. But again to me that looks like the buck is conviently passed to its APPLICATION to avail of its features.
Similarly as the bucks passes from one SERVICE PROVIDER to its APPLICATION, the buck gets baggage of these heaviness which eventually causes the "eyes" of the end users waiting to see their "simple" tasks done on this systems boasting of 16x the compute potential take a long time to finish. What is the result? Loss of productivity, efficiency, wastage of power running idle threads, etc. Users then generally "Curse" and "Yell" just to kill the time while waiting for mundane tasks to finish on these multi-core systems.
Now if you look at each cores as a resource (like an IT Software Programmer), you have 16 programmers available to crunch code (or numbers in case of cores). To an IT Project manager budget is what generally limits the number of programmers. But when you have already paid and got 16 programmers, a good IT Project manager will try to utilize to them efficiently to utilize them in the best possible way. After all the buck stops at the IT Project Manager's desk.
The question is what is the best possible way to utilize the 16 programmers to do a task. I think a good project manager will not call all its 16 programmers and tell them to do all the 'n' programming tasks asking every one of them to assign 1/n th of their time to each task. That just creates a chaos. Why? Well the assumption is "skills" of all the programmers are the same, all will take the same time to finish a particular task, all of them will be doing all the tasks that way the shortest time to complete all tasks will be acheived. Now any good IT Project manager will tell you that all assumptions are wrong here. Well then why in the world do we assume that in multi-cores systems?
So what are we missing in "Software"? Yep you got it, we need an equivalent of IT Project manager to solve the chaos of multi-core systems. An IT Project manager which understands how the "Experience" and Potential of each of its compute resources (Programmers). But wait, Solaris engineers are already saying we have that Project Manager, it is called Solaris Scheduler. Duh.. then what is wrong? Think about it again from a IT Project Manager's view point. An IT Project manager takes the input from a IT Director or Sr Manager regarding priorities of tasks and then it works with Senior IT Architect to break up the Project into series of "Serial" and "Parallel Tasks" assuming it has 16 skilled programmers to achieve the goal of finishing the "Highest" priority projects first. The way currently most Operating System Scheduler works is just trying to find an available compute resource to continue its computing the "Serial" task with no intelligence on how the task can be broken into sets of "serial" and parallel components. So there is a difference between Scheduler and Project Manager.
But then the question comes can One Project Manager be intelligent about everything in the IT Department? Maybe not, maybe thats why there is an hierarchy where our Project Manager banks on another Project Manager Assistants having knowledge for particular taks in order to assist our Project Manager for creating a chart for specific tasks not known to it. Which means now every application needs to provide a "Project Manager Assistant" in order to provide guidance to the Operating System Project Manager on how best to run the tasks of the applications. Currently except for the "nice" priority bribes to the Operating Scheduler, an application does not provide much feedback on how it thinks it can be executed optimally on multi-core systems.
Come to think about it, these IT Project Managers do have a role to play in solving Engineering problems.
Now the onus is on the various operating systems, database, applications architects to provide a frame work on how applications can provide their own application project manager input to the Operating System Project Manager. Maybe we should call it Project "Project Manager". :-)
In the meanwhile, what we can all do is start thinking about ways to convert atleast the "utilities" that we own to utilize these multi-cores resources on the system. Change software to adapt to multi-core, one utility at a time.
Monday Oct 29, 2007
By Jignesh Shah on Oct 29, 2007
Starting with PostgreSQL 8.2, I observed in the documentation that if wal_sync_method is set to open_datasync (O_DSYNC) or open_sync (O_SYNC) then PostgreSQL will set O_DIRECT flag (to bypass file system buffer) where O_DIRECT flag is supported. PostgreSQL does that by checking if O_DIRECT is defined and enables it accordingly.
Thats great, since both O_DSYNC and O_SYNC are synchronous flags, there is no use for file system to cache it (maybe not true for reads). But certainly by default, if it is not directly writing to the disks, it generally is not getting the best response time on the completion of the write calls. Even on Solaris, it is generally recommended that if you use O_DSYNC flags on your files that are opened, then it is better to write directly to the underlying disks.
However there is a small problem. O_DIRECT flag is not supported on Solaris. Hence the assumption that PostgreSQL while using wal_sync_method as open_datasync or open_sync is writing directly to the disks on Solaris is not true. There is another api directio(3C) which needs to be used on Solaris. Solaris has no other way of knowing that the application is requesting to bypass the file system buffer cache. (Only other alternative is for Solaris administrator to use forcedirectio as mount options but that does it at the filesystem level impacting all files on that filesystem.)
directio(3C) API was introduced in Solaris 8 and hence applications using it should compile for Solaris 8,9,10 and all OpenSolaris based distributions.
I did a quick test by just modifying BasicFileOpen function in fd.c in PostgreSQL 8.3beta1 to advise the directio(fd,DIRECTIO_ON) while still using wal_sync_method=open_datasync and saw performance improvements with the recompiled application. Ofcourse in my small quick test it turned on DIRECTIO for all files and that is not something that we want (remember the CLOG Buffer thrashing from couple of days ago?). Looks like we need a hacker to modify the code to advise DIRECTIO_ON for XLOG, datafile and the index files when open_datasync or open_sync is used as wal_sync_method and fsync is enabled.
Friday Oct 26, 2007
By Jignesh Shah on Oct 26, 2007
Regarding the CLOG reads happening in my test with PostgreSQL 8.3beta1, the community advised that it could be thrashing the CLOG Buffers resulting on those mutiple reads. Interesting thing that I learned from the Community that the most current CLOG buffer page is actually pinned (shared) but the remaining CLOG Buffers are used for caching old pages. So the recommendation was to change NUM_CLOG_BUFFERS from default 8 to 16 in clog.h in the source code and recompile PostgreSQL 8.3beta1.
I tried the change and yes it certainly seems to almost take out all the reads that I was seeing on the CLOG files. The downside is that it requires re-compilation of PostgreSQL. If one does not want to recompile postgresql, then my earlier workaround still works in reducing the pain on the disk.
I also tried out the new wal_writer_delay flag. The default is 200ms for now. I tried 100ms and found that I got similar gains as the default. But from what I understand, I am cutting the risk of data loss (not data corruption like fsync=off) from 600ms to 300ms. This does warrant more tests at still lower values but the community is right in asking for more tests for the new flag.
Now to hit the other bottlenecks.. Hopefully PostgreSQL 8.3 becomes the tide of change in the expensive world of databases.
Friday Oct 12, 2007
By Jignesh Shah on Oct 12, 2007
We just published a world record price/performance result using the industry standard TPC-H data ware housing benchmark at a 3 TeraByte scale factor using DB2 9.1 with Solaris 10 8/07 with 10 Sun Fire X4500 (aka thumper).
Few highlights of the benchmark are as follows:
- Best Price/Performance in 3TB scale factor
- First database industry standard benchmark publication for Sun Fire X4500
- First DB2/Solaris TPC-H benchmark after a gap of 6 years
- First DB2 on Solaris x64 for Data Warehouse type of workload ever
- First DB2 9 result on Solaris with x64 based system
- Shows why Sun Fire X4500 is the best cost effective system in a role of Data Warehouse appliance
The benchmark consisted of overall 20 AMD opterons (dual-cores) and hence really cost effective in terms of database license fees incurred. The overall system provided about 15GB/sec of simultaneous IO which itself is amazing if not surprising for Sun Fire X4500. Though the overall disk capacity was very high not all was used for the benchmark which means plenty of Data Marts can be stored on the same system too.
My related blog entry on the subject is also available to get some insight into it.
Sun Fire X4500 cluster 38,672.4 QphH@3000GB, $29.39 USD $/QphH@3000GB, available 10/12/07. TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.
Jignesh Shah is Principal Software Engineer in Application Integration Engineering, Oracle Corporation. AIE enables integration of ISV products including Oracle with Unified Storage Systems. You can also follow me on my blog http://jkshah.blogspot.com
- My Last Day and Blog Post at Sun
- Building latest PostgreSQL on OpenSolaris
- Sun rules in PeopleSoft NA Payroll Benchmarks
- New Year, New Role and New Build
- Accelerate your Payroll Performance with F20 PCIe Card
- Infobright Tuning on OpenSolaris/Solaris 10
- iGen with PostgreSQL 8.4 on Sun Fire X4140
- Olio on 6-core Opterons (Istanbul) based Sun Systems
- Minimal OpenSolaris 2009.06 Appliance Image for VirtualBox 2.2.4
- Read Only Scalability Patch