Friday Feb 05, 2010

Building latest PostgreSQL on OpenSolaris

I am moving my PostgreSQL on OpenSolaris realted entries to a new external blog. Since it is not part of my $dayjob anymore. Hope you update your bookmarks too.

Read  "Building latest PostgreSQL CVS Head on OpenSolaris".

Wednesday Jan 13, 2010

New Year, New Role and New Build

Happy New Year. Its a new year and I have started a new role in Applications Integration Engineering which is part of the Sun Storage 7000 - Unified Storage Systems group. AIE's main charter is to integrate ISV products with Sun Storage 7000 family. I hope to continue working with databases and other applications and specially how it interacts and integrates with the FISHworks based products. Years ago, interestingly, I don't think I would have recommended NFS to be used with any database application. But looks like it is now way more stabilized in its current form. Then there is also iSCSI. But there is yet another way to connect to these systems soon which I think is more attractive to me and maybe even other database folks at large. More about that when time is right.

 Anyway with the new role, I thought it was time to update my existing OpenSolaris (b128a) to the latest OpenSolaris build 130. I must admit this has been the first OpenSolaris upgrade which was not as smooth as expected. First things first I got hit with bug with the naming of /dev repository. I first heard about the bug from George Drapeau but even though I worked around it I could still not upgrade to the latest build.  Then I heard from Mandy about the problem that if I had ever installed from /contrib repository I could still not upgrade to the latest build with the changed /dev name. I uninstalled all the software from /contrib and crossing my fingers the pkg image-update command still failed. Of course I then realized I probably had couple of packages from the /pending repository and even the Sun /extra repository. Uninstalling all the extra software was not fun but still the darn thing did not upgrade. Finally gave up and read about this forced upgrade using -f  as follows

# pkg image-update -f

and it worked. It started downloading the updates and finally created a new boot environment with the new build.

However the reboot to the new environment just stuck at the graphical boot with the orange bar going from left to right. After 5 minutes I killed the power and rebooted and this time used "e" on the grub menu and deleted the splashfile, foreground and background lines and changed the kernel boot line from console=graphics to console=text and pressed "b" to boot using the modified grub entry. I figured out that the X server refused to start. Cutting a long story short (it actually took me almost a day) to figure a simple solution, re-move my custom /etc/X11/xorg.conf (which I was forced to create few upgrades (b111a)  ago) so the X server can use its new defaults to start without any problems.

Of course that worked till I got the login and when I entered my login information, I ended with a white screen. Arrg yet another bug. Reading the mailing list got the following solution

$ pfexec /usr/sbin/unlink /usr/lib/xorg/modules/extensions/GL
$ pfexec ln -s ../../../../../var/run/opengl/server /usr/lib/xorg/modules/extensions/GL

With the above changes, finally rebooting the desktop into fresh working build 130 of OpenSolaris and I was ready to try out the new Thunderbird 3.0 and Firefox 3.5. Of course AWN (the mac like dock) worked for most part but the dock preferences refused to start. I did file a bug and it seems that it will be fixed in b131 but the quick fix is to edit

/usr/bin/awn-manager and replace the first line




and that should allow you to see your AWN dock preferences once again.

If you ask me was it worth all the pain to upgrade to this new version. My simple answer is yes

Few thing fixed for me:

  • The new login screen is much nicer (in last few builds I could hardly read what I was typing in the login name text field on a widescreen monitor.
  • On build 128a I saw that the screen saver unlock screen was taking a long time to respond which seems to have gone away with this build.
  • I like the full text search capabilities of Thunderbird 3.0

Of course your reasons may be different then mine to upgrade and who knows build 131 might be out soon in next week or two then it probably might be a smoother upgrade if you can wait for it. (I can't.)

Monday Sep 14, 2009

Infobright Tuning on OpenSolaris/Solaris 10

Recently I was working on a project which used Infobright as the database. The version tested was 3.1.1 both on OpenSolaris as well as Solaris 10. Infobright is like a column-oriented database engine for MySQL primarily targeted towards data warehouse, data mining type of project deployments.

While everything was working as expected, one thing we did notice that as number of concurrent connections tried to query against the database we noticed that queries deteriorated fast in the sense that not much parallel benefits were being squeezed from the machine. Now this sucks! (apparently sucks is now a technical term). It sucks because the server has definitely many  cores and typically each Infobright query still can at the max peg a core. So the expectation will be typically to atleast handle concurrent queries which is close to the number of cores  (figuratively speaking though in reality it depends).

 Anyway we started digging into this problem. First we noticed that CPU cycles were heavy so IO was probably not the culprit (in this case). Using plockstat we found

# plockstat -A -p 2039    (where 2039 is the PID of mysqld server running 4 simultaneous queries)

Mutex hold 

Count     nsec Lock                         Caller 
3634393     1122`libc_malloc_lock`_Znwm+0x2b 
3626645     1047`libc_malloc_lock`_ZdlPv+0xe 
    2 536317885 0x177b878                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
   12  6338626 mysqld`LOCK_open             mysqld`_Z10open_tableP3THDP13st_table_listP11st_mem_rootPbj+0x55a 
 9057     1275`libc_malloc_lock`_Znwm+0x2b 
 8493     1051`libc_malloc_lock`_ZdlPv+0xe 
 7928     1119`libc_malloc_lock`_ZdlPv+0xe 
    5   326542 0x177b878                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
  683     1189`libc_malloc_lock`_Znwm+0x2b 
  564     1339`libc_malloc_lock`_Znwm+0x2b 
  564     1274`libc_malloc_lock`_Znwm+0x2b 
  564     1156`libc_malloc_lock`_ZdlPv+0xe 
   17    36292 0x1777780                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
    2   246377 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 
   57     8074 mysqld`_iob+0xa8   `_ZNSo5flushEv+0x30 
  218     1479`libc_malloc_lock`_Znwm+0x2b 
    4    78172 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 
    4    75161 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 

R/W reader hold 

Count     nsec Lock                         Caller 
   44     1171 mysqld`THR_LOCK_plugin       mysqld`_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0xa3 
   12     3144 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1    14125 0xf7aa18                     mysqld`_ZN11Query_cache21send_result_to_clientEP3THDPcj+0x536 
    1    12089 0xf762e8                     mysqld`_ZN11Query_cache21send_result_to_clientEP3THDPcj+0x536 
    2     1886 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    2     1776 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     3006 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     2765 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     1797 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     1131 mysqld`THR_LOCK_plugin       mysqld`_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0xa3 

Mutex block 

Count     nsec Lock                         Caller 
 2175 11867793`libc_malloc_lock`_ZdlPv+0xe 
 1931 12334706`libc_malloc_lock`_Znwm+0x2b 
    3 93404485`libc_malloc_lock   mysqld`my_malloc+0x32 
    1    11581`libc_malloc_lock   mysqld`_ZN11Item_stringD0Ev+0x49 
    1     1769`libc_malloc_lock`_ZnwmRKSt9nothrow_t+0x20

Now typically if you see libc_malloc_lock in a plockstat for a  multi-threaded program then it is a sign that the default malloc/free routines in libc is the culprit since the default malloc is not scalable enough for a multi-threaded program. There are alternate implementations which are more scalable than the default. Two such options which are already part of OpenSolaris, Solaris 10 are and They can be forced to be used instead of the default without recompiling the binaries by preloading anyone of them before the startup command.

In case of the 64-bit Infobright binaries we did that by modifying the startup script mysqld-ib and added the following line just before invocation of mysqld command.

LD_PRELOAD_64=/usr/lib/64/; export LD_PRELOAD_64

What we found was now the response times for each query was more in-line as it was being executed on its own. well not true entirely but you get the point. For a 4 concurrent queries we found that it had improved from like 1X to 2.5X reduction in total execution time.

Similary when we used we found the reduction more like 3X when 4 queries were executing concurrently.

LD_PRELOAD_64=/usr/lib/64/; export LD_PRELOAD_64

Definitely something to use for all Infobright installations on OpenSolaris or Solaris 10.

In a following blog post we will see other ways to tune Infobright which are not as drastic as this one but still buys some percentage of improvements. Stay tuned!!

Tuesday Jun 02, 2009

Minimal OpenSolaris 2009.06 Appliance Image for VirtualBox 2.2.4

With the release of the OpenSolaris 2009.06, I thought it is time to update the Minimal OpenSolaris 2008.11  Appliance OVF image that I had created earlier. The script has been updated to create minimal OpenSolaris 2009.06 Appliance images for VirtualBox. 

How to use the OVF image?

  • Download VirtualBox 2.2.4 and install it on your host platform.
  • Download the OpenSolaris 2009.06 App OVF image zip file and then unzip it.
  • Fire up Virtualbox GUI and  use menu item VirtualBox->File->Import Appliance to import the image (using the  OSOL200906App.ovf file ) into a new VirtualBox VM
  • Start the newly created VM and in few minutes you will be  ready to login into OpenSolaris 2009.06 kernel.The preset login information is user: root with password: opensolaris.

Comments welcome.

Friday May 29, 2009

Read Only Scalability Patch

Simon Riggs of 2nd Quadrant recently submitted a patch for testing which should improve read only scalability of Postgres. I took it for a test drive for my setup. In the first set of tests I used the same benchmark as previous ones so as to have the same reference point.

It seems changing the Number of Buffer Partitions for this workload does not have any impact. My dataset for this iGen benchmark is pretty small and should easily fit under 2GB size and hence may not be stressing the buffer partitions too much to warrant bigger number. The patch still helps to get good healthy 4-6% gain in peak values.

Thursday May 28, 2009

Postgres 8.4 Testing with new JDBC Drivers

At PGCon 2009, Jesper Pedersen talked to me about the new Binary Transfer patch which was submitted to the JDBC Driver for Postgres 8.4. I thought it will be nice to compare how the JDBC 8.4 driver compared to older 8.3 JDBC Driver. Hence I took it for a drive

The 8.4 JDBC Driver with BinaryTransfer patch seems to get to a better peak faster but since to taper off at high clients. I don't know if this benchmark was the right benchmark for it. Need more benchmarks which uses JDBC to see the performance difference with this feature.

Wednesday May 27, 2009

Postgres on OpenSolaris using 2x Quad Cores: Use FX Scheduler

During my PGCon 2009 presentation there was a question on the saw tooth nature of the workload results on the high end side of benchmark runs. To which Matthew Wilcox (from Intel) commented it could be scheduler related. I did not give it much thought at that time till today when I was trying to do some iGen runs for the JDBC Binary Transfer patch (more on that in another blog post) and also Simon's read only scalability runs . Then I realized that I was not following one of my one tuning advice for running Postgres on OpenSolaris. The advice is to  use FX Class of scheduler instead of the default TS Class on OpenSolaris . More details on various scheduler classes can be found on

Now how many times I have forgotten to do that with Postgres on OpenSolaris I have no idea. But yes it is highly recommended specially on multi-core systems to use FX scheduler class for Postgres on OpenSolaris. How much gain are we talking about? The following graph will give an indication using the default TS scheduler class Vs the FX Scheduler class using the iGen benchmark.

The gain is about 14% by just switching over to FX Class. How did I get Postgres server instance to use FX class? I cheated and put all processes of the user (with userid 236177)  in FX class using the following command line.

# priocntl -s -c FX -i uid 236177

One thing to figure out is how to make sure Postgres uses FX scheduler class out of the box on OpenSolaris so I don't keep forgetting about that minute performance tip.

Friday May 22, 2009

PGCon 2009: Performance Comparison of Postgres 8.3 Vs Postgres 8.4

On the first day of PGCon 2009 I presented on my results of my testing with Postgres 8.4beta1 vs the earlier version (8.3.7). The good news is it should not cause any regressions to existing users of 8.3.7 to upgrade and exploit the opportunity to use the new features of Postgres 8.4. 

Comments/Questions welcome.

Monday May 18, 2009

Postgres 8.4 Lock Wait Statistics Tool

While working on my upcoming presentation for PGCon 2009 on Thursday, I found that sometimes it is misleading to just take one snapshot of locks to figure the hot locks in PostgreSQL workload characterization.

So again starting from one of the DTrace scripts I arrived at pglockwait_84.d

NOTE: It only works with operating systems that support DTrace. I have only tested it on OpenSolaris as of now.

It can either be used to track to summarize all PostgreSQL backends (using '\*')  or selected one using process id using 10 second interval. It also prints time so that it can be dumped into a file for post-processing analysis. 

An example output  is show below during dbt-2 runs using PostgreSQL 8.4 beta1.

# ./pglockwait_84.d '\*' 2009 May 19 02:52:14 Lock-Id Mode Wait-Time(ms) Count Dynamic Locks Exclusive 0 5 ProcArrayLock Shared 0 37 Dynamic Locks Shared 1 52 CLogControlLock Exclusive 1 85 BufFreelistLock Exclusive 1 81 CLogControlLock Shared 1 103 ProcArrayLock Exclusive 2 112 BgWriterCommLock Exclusive 10 123 BufMappingLock Exclusive 11 636 XidGenLock Exclusive 17 2 BufMappingLock Shared 34 1566 WALInsertLock Exclusive 49 2305 LockMgrLock Exclusive 65 852 2009 May 19 02:52:24 Lock-Id Mode Wait-Time(ms) Count XidGenLock Shared 0 1 XidGenLock Exclusive 0 12 ProcArrayLock Shared 1 86 BufFreelistLock Exclusive 4 240 BgWriterCommLock Exclusive 5 213 Dynamic Locks Shared 5 157 CLogControlLock Exclusive 6 238 CLogControlLock Shared 6 384 ProcArrayLock Exclusive 57 360 Dynamic Locks Exclusive 158 7 WALInsertLock Exclusive 187 7837 LockMgrLock Exclusive 226 3251 BufMappingLock Exclusive 289 2141 BufMappingLock Shared 895 5513 2009 May 19 02:52:34 Lock-Id Mode Wait-Time(ms) Count XidGenLock Shared 0 0 Dynamic Locks Exclusive 0 6 XidGenLock Exclusive 0 5 ProcArrayLock Shared 1 76 BufFreelistLock Exclusive 3 183 BgWriterCommLock Exclusive 4 118 ProcArrayLock Exclusive 5 229 Dynamic Locks Shared 5 91 CLogControlLock Exclusive 29 198 CLogControlLock Shared 62 272 BufMappingLock Exclusive 141 1685 LockMgrLock Exclusive 206 2175 WALInsertLock Exclusive 221 5540 BufMappingLock Shared 279 4180 2009 May 19 02:52:44 Lock-Id Mode Wait-Time(ms) Count XidGenLock Shared 0 0 Dynamic Locks Exclusive 0 3 XidGenLock Exclusive 0 5 ProcArrayLock Shared 0 67 BgWriterCommLock Exclusive 1 69 BufFreelistLock Exclusive 2 148 CLogControlLock Shared 3 262 CLogControlLock Exclusive 4 199 ProcArrayLock Exclusive 47 277 WALWriteLock Exclusive 64 2 BufMappingLock Exclusive 79 1599 WALInsertLock Exclusive 151 5949 LockMgrLock Exclusive 198 2377 BufMappingLock Shared 223 4345 Dynamic Locks Shared 1568 144 \^C

It throws an output every 10 second and the time spent in acquiring the locks. For the BufMappingLock, LockMgrLock and Dynamic Locks it aggregates all of them together respectively. It's bit high on system resources if you track all Postgres backends but if you already know which one then it can be low on overhead. Hope it is useful to you too as I found it for my purpose.

Friday May 15, 2009

PostgreSQL Transactions Per Second Using Dtrace

 I modified one of Robert's dtrace scripts so that it is  useful for my purpose to measure often asked transactions per second  for random workload running on PostgreSQL.

The script is as follows:

#!/usr/sbin/dtrace -qs
	@startpersec["New"] = count();
	@commitpersec[ "Commit"] = count();
	@abort["Abort"] = count();
        printf("\*\*\*\*\*\*\*\* Transactions Per Second \*\*\*\*\*\*\*\*\*\\n");
	printf("%20s %15s\\n", "Txn Type", "Count");
	printa("%20s %@15d\\n", @startpersec);
	printa("%20s %@15d\\n", @commitpersec);
	printa("%20s %@15d\\n", @abort);

UPDATE: You can also download it pgtps.d

When you execute it you see outputs every second as follows:

# ./tps.d
\*\*\*\*\*\*\*\* Transactions Per Second \*\*\*\*\*\*\*\*\*
            Txn Type           Count
                 New             192
              Commit             192
               Abort               1

\*\*\*\*\*\*\*\* Transactions Per Second \*\*\*\*\*\*\*\*\*
            Txn Type           Count
                 New             175
              Commit             172
               Abort               0

\*\*\*\*\*\*\*\* Transactions Per Second \*\*\*\*\*\*\*\*\*
            Txn Type           Count
                 New             195
              Commit             198
               Abort               0

\*\*\*\*\*\*\*\* Transactions Per Second \*\*\*\*\*\*\*\*\*
            Txn Type           Count
                 New             183
              Commit             178
               Abort               2

How to interpret the output?

  • New mentions how many transactions started per second
  • Commit talks about how many transactions commited per second.
  • Aborts talks about transactions aborted in that second

Useful specially when some one  asks a questions that they are generally reading from a questionaire like how many transactions per second are we doing?

Where is your TPS report?


Jignesh Shah is Principal Software Engineer in Application Integration Engineering, Oracle Corporation. AIE enables integration of ISV products including Oracle with Unified Storage Systems. You can also follow me on my blog


« April 2014