Thursday Apr 15, 2010

My Last Day and Blog Post at Sun

I am getting close to finish my 10th year at Sun Microsystems, Inc now part of Oracle America, Inc. However before I finish the year, it is time for me to move on with my new ventures.

Today is my last day in Sun/Oracle.

My new coordinates  and my LinkedIn Profile.

Stay in touch.

Friday Feb 05, 2010

Building latest PostgreSQL on OpenSolaris

I am moving my PostgreSQL on OpenSolaris realted entries to a new external blog. Since it is not part of my $dayjob anymore. Hope you update your bookmarks too.

Read  "Building latest PostgreSQL CVS Head on OpenSolaris".

Wednesday Jan 20, 2010

Sun rules in PeopleSoft NA Payroll Benchmarks

Sun previously had published record numbers with PeopleSoft NA Payroll  240K benchmark using Sun F5100 Flash Storage Array .  However Giri just announced a follow-on benchmark which beats even that number using twice the number of streams using the Sun Enterprise M4000 server and Sun F5100 Flash Storage Array.

Read more on Giri Mandalika's blog.



Wednesday Jan 13, 2010

New Year, New Role and New Build

Happy New Year. Its a new year and I have started a new role in Applications Integration Engineering which is part of the Sun Storage 7000 - Unified Storage Systems group. AIE's main charter is to integrate ISV products with Sun Storage 7000 family. I hope to continue working with databases and other applications and specially how it interacts and integrates with the FISHworks based products. Years ago, interestingly, I don't think I would have recommended NFS to be used with any database application. But looks like it is now way more stabilized in its current form. Then there is also iSCSI. But there is yet another way to connect to these systems soon which I think is more attractive to me and maybe even other database folks at large. More about that when time is right.

 Anyway with the new role, I thought it was time to update my existing OpenSolaris (b128a) to the latest OpenSolaris build 130. I must admit this has been the first OpenSolaris upgrade which was not as smooth as expected. First things first I got hit with bug with the naming of /dev repository. I first heard about the bug from George Drapeau but even though I worked around it I could still not upgrade to the latest build.  Then I heard from Mandy about the problem that if I had ever installed from /contrib repository I could still not upgrade to the latest build with the changed /dev name. I uninstalled all the software from /contrib and crossing my fingers the pkg image-update command still failed. Of course I then realized I probably had couple of packages from the /pending repository and even the Sun /extra repository. Uninstalling all the extra software was not fun but still the darn thing did not upgrade. Finally gave up and read about this forced upgrade using -f  as follows

# pkg image-update -f

and it worked. It started downloading the updates and finally created a new boot environment with the new build.

However the reboot to the new environment just stuck at the graphical boot with the orange bar going from left to right. After 5 minutes I killed the power and rebooted and this time used "e" on the grub menu and deleted the splashfile, foreground and background lines and changed the kernel boot line from console=graphics to console=text and pressed "b" to boot using the modified grub entry. I figured out that the X server refused to start. Cutting a long story short (it actually took me almost a day) to figure a simple solution, re-move my custom /etc/X11/xorg.conf (which I was forced to create few upgrades (b111a)  ago) so the X server can use its new defaults to start without any problems.

Of course that worked till I got the login and when I entered my login information, I ended with a white screen. Arrg yet another bug. Reading the mailing list got the following solution

$ pfexec /usr/sbin/unlink /usr/lib/xorg/modules/extensions/GL
$ pfexec ln -s ../../../../../var/run/opengl/server /usr/lib/xorg/modules/extensions/GL

With the above changes, finally rebooting the desktop into fresh working build 130 of OpenSolaris and I was ready to try out the new Thunderbird 3.0 and Firefox 3.5. Of course AWN (the mac like dock) worked for most part but the dock preferences refused to start. I did file a bug and it seems that it will be fixed in b131 but the quick fix is to edit

/usr/bin/awn-manager and replace the first line




and that should allow you to see your AWN dock preferences once again.

If you ask me was it worth all the pain to upgrade to this new version. My simple answer is yes

Few thing fixed for me:

  • The new login screen is much nicer (in last few builds I could hardly read what I was typing in the login name text field on a widescreen monitor.
  • On build 128a I saw that the screen saver unlock screen was taking a long time to respond which seems to have gone away with this build.
  • I like the full text search capabilities of Thunderbird 3.0

Of course your reasons may be different then mine to upgrade and who knows build 131 might be out soon in next week or two then it probably might be a smoother upgrade if you can wait for it. (I can't.)

Monday Oct 12, 2009

Accelerate your Payroll Performance with F20 PCIe Card

I guess you already heard about Sun Storage F5100 Flash Array and its world record benchmarks.  

But it's not F5100 that I am going to talk about but its smaller sibling called Sun Flash Accelerator F20 PCIe Card.  The name is a mouthful like all Sun product names so I will just call it "The Accelerator Card" in the remainder of this blog entry.  Of course the idea is not to start with the answer and find a problem with it. But I am going to narrate  is how we saw a problem and then thought of using this answer to solve the problem.

Recently our group ISV-E was doing our standard thing of  making applications run best on Sun. In this particular project with PeopleSoft Enterprise 9.0 on M5000 system using Sun Storage 6540, we encountered a problem that certain batch jobs where taking a long time to execute. Peoplesoft Enterprise 9.0 actually have ways to breakup jobs and run them in parallel so as to use the multi-core of the multi-processor system. But yet we could not really leverage the system enough to be satisfactory.  In this project they were using Oracle Database 11g. I got to give it to Oracle, they do have good tools. We used Oracle Enterprise Manager and saw for the troubled batch process, it was showing lot of blue color in its output.

Also looking at the top Objects, the tool reported which tables and index were  troublesome which was causing that amount of blue appear in the chart. This "Blue" problem is what led us to an idea to test out the Accelerator Card in the system and see if can help out here. What we did was created a few tablespaces and spread them out on the four Flash Modules on the Accelerator Card and moved the highly active (or "hot" ) tables and indices to the newly created tablespace. What we saw was simply huge reduction in the blue area and more green. That lead to the slogan in our team

"Go Green with the Accelerator Card !"

The Accelerator card not only reduced the time on this process but many other batch processes which had high IO components.  Here is a relative comparison of how it helped (with additional slight boost from upgrading SPARC64 VII from 2.4Ghz to 2.53Ghz CPUs).

Of course the next question is what if you take the same thing to its bigger sibling, Sun Storage F5100 Flash Array, well that's exactly what we did and as they say the rest is history.(Hint: Read the world records link and search for PeopleSoft)  For more information check out Vince's blog entry on  PeopleSoft Enterprise Payroll 9.0 NA and also  Why Sun Storage F5100 is good for PeopleSoft 9.0 NA Payroll application.

Truly if you use Oracle and use Oracle Enterprise Manager to monitor your application performance and are turning blue by seeing lot of Blue area in the chart then just remember

"Go Green with the Accelerator Card !"

Monday Sep 14, 2009

Infobright Tuning on OpenSolaris/Solaris 10

Recently I was working on a project which used Infobright as the database. The version tested was 3.1.1 both on OpenSolaris as well as Solaris 10. Infobright is like a column-oriented database engine for MySQL primarily targeted towards data warehouse, data mining type of project deployments.

While everything was working as expected, one thing we did notice that as number of concurrent connections tried to query against the database we noticed that queries deteriorated fast in the sense that not much parallel benefits were being squeezed from the machine. Now this sucks! (apparently sucks is now a technical term). It sucks because the server has definitely many  cores and typically each Infobright query still can at the max peg a core. So the expectation will be typically to atleast handle concurrent queries which is close to the number of cores  (figuratively speaking though in reality it depends).

 Anyway we started digging into this problem. First we noticed that CPU cycles were heavy so IO was probably not the culprit (in this case). Using plockstat we found

# plockstat -A -p 2039    (where 2039 is the PID of mysqld server running 4 simultaneous queries)

Mutex hold 

Count     nsec Lock                         Caller 
3634393     1122`libc_malloc_lock`_Znwm+0x2b 
3626645     1047`libc_malloc_lock`_ZdlPv+0xe 
    2 536317885 0x177b878                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
   12  6338626 mysqld`LOCK_open             mysqld`_Z10open_tableP3THDP13st_table_listP11st_mem_rootPbj+0x55a 
 9057     1275`libc_malloc_lock`_Znwm+0x2b 
 8493     1051`libc_malloc_lock`_ZdlPv+0xe 
 7928     1119`libc_malloc_lock`_ZdlPv+0xe 
    5   326542 0x177b878                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
  683     1189`libc_malloc_lock`_Znwm+0x2b 
  564     1339`libc_malloc_lock`_Znwm+0x2b 
  564     1274`libc_malloc_lock`_Znwm+0x2b 
  564     1156`libc_malloc_lock`_ZdlPv+0xe 
   17    36292 0x1777780                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
    2   246377 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 
   57     8074 mysqld`_iob+0xa8   `_ZNSo5flushEv+0x30 
  218     1479`libc_malloc_lock`_Znwm+0x2b 
    4    78172 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 
    4    75161 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 

R/W reader hold 

Count     nsec Lock                         Caller 
   44     1171 mysqld`THR_LOCK_plugin       mysqld`_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0xa3 
   12     3144 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1    14125 0xf7aa18                     mysqld`_ZN11Query_cache21send_result_to_clientEP3THDPcj+0x536 
    1    12089 0xf762e8                     mysqld`_ZN11Query_cache21send_result_to_clientEP3THDPcj+0x536 
    2     1886 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    2     1776 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     3006 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     2765 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     1797 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     1131 mysqld`THR_LOCK_plugin       mysqld`_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0xa3 

Mutex block 

Count     nsec Lock                         Caller 
 2175 11867793`libc_malloc_lock`_ZdlPv+0xe 
 1931 12334706`libc_malloc_lock`_Znwm+0x2b 
    3 93404485`libc_malloc_lock   mysqld`my_malloc+0x32 
    1    11581`libc_malloc_lock   mysqld`_ZN11Item_stringD0Ev+0x49 
    1     1769`libc_malloc_lock`_ZnwmRKSt9nothrow_t+0x20

Now typically if you see libc_malloc_lock in a plockstat for a  multi-threaded program then it is a sign that the default malloc/free routines in libc is the culprit since the default malloc is not scalable enough for a multi-threaded program. There are alternate implementations which are more scalable than the default. Two such options which are already part of OpenSolaris, Solaris 10 are and They can be forced to be used instead of the default without recompiling the binaries by preloading anyone of them before the startup command.

In case of the 64-bit Infobright binaries we did that by modifying the startup script mysqld-ib and added the following line just before invocation of mysqld command.

LD_PRELOAD_64=/usr/lib/64/; export LD_PRELOAD_64

What we found was now the response times for each query was more in-line as it was being executed on its own. well not true entirely but you get the point. For a 4 concurrent queries we found that it had improved from like 1X to 2.5X reduction in total execution time.

Similary when we used we found the reduction more like 3X when 4 queries were executing concurrently.

LD_PRELOAD_64=/usr/lib/64/; export LD_PRELOAD_64

Definitely something to use for all Infobright installations on OpenSolaris or Solaris 10.

In a following blog post we will see other ways to tune Infobright which are not as drastic as this one but still buys some percentage of improvements. Stay tuned!!

Wednesday Jul 22, 2009

iGen with PostgreSQL 8.4 on Sun Fire X4140

Recently I got access to the refreshed Sun Fire X4140 consisting of 2 x 6-core Opterons with 36GB RAM. Since the release of the final PostgreSQL 8.4 bits I had not tried it out so I downloaded the Solaris 10 binaries of PostgreSQL 8.4 (64-bits) from the download site of and took it for the test drive with the same iGen benchmarks that I had used earlier for my PGCon2009 presentation.

The system already had Solaris 10 5/09 installed with couple of  SSDs  and a RAID LUN for the database. I put the WAL log on an internal drive with ZFS intent log on SSDs and the tablespaces on the RAID LUN (on an external storage array).

Notice the crossing of the 400K tpm boundary with PostgreSQL here using this benchmark toolkit. None of my tests have ever done that before. I consider this to be a milestone achievement with PostgreSQL, Solaris 10, Sun Fire Systems with Opterons.

Tuesday Jul 21, 2009

Olio on 6-core Opterons (Istanbul) based Sun Systems

Sun is launching systems with multisocket  6-core Opterons (Istanbul) today. Last week I got access to  Sun Fire X4140 with 2 x 6-core Opterons with 36GB RAM. It is always great to see such a 1RU system packaged with so many x64 cores.

# psrinfo -vp
The physical processor has 6 virtual processors (0-5)
  x86 (chipid 0x0 AuthenticAMD family 16 model 8 step 0 clock 2600 MHz)
    Six-Core AMD Opteron(tm) Processor 8435
The physical processor has 6 virtual processors (6-11)
  x86 (chipid 0x1 AuthenticAMD family 16 model 8 step 0 clock 2600 MHz)
    Six-Core AMD Opteron(tm) Processor 8435

I decided to take the system for a test drive with Olio. Olio is a Web 2.0 toolkit consisting on a web 2.0 event calendar application  which can help stress a system. Depending on your favorite scripting language you can use either PHP, Ruby on Rails, Java as the language used to create the application. (I took the easy way out and selected Olio PHP's prebundled binary kit)

Please don't let the small 2MB kit size fool you thinking it will be a easy workload to test it out. While setting it up I figured that to generate the data population for say 5000 users you will need space with atleast 500GB disk space for the content that it generates for it. Yes I quickly had to figure out how to get a storage array for Olio with about 800GB LUN.

Olio requires a webserver, PHP (of course) and  a database for its metadata store (it has scripts for MySQL already in the kit). The system came preconfigured with Solaris 10 5/09. I downloaded MySQL 5.4.1 beta  and also the Sun WebStack kit which has Apache Httpd 2.2, PHP 5.2 (and also MySQL 5.1 which had not used since I had already downloaded MySQL 5.4 Beta). Memcached 1.2.5 is part of the WebStack download and Olio is configured to use it also by default (but can be disabled too).

Eventually everything was installed and configured in the same X4140 and using the Faban Harness on another system started executing some runs with file store and the meta store preconfigured to handle all the way up to 5000 concurrent users. The results are as follows:


Here are my observation/interpretations:

  • Eventually beyond 10 cores run I find that the system memory (36GB) is not enough to sustain more concurrent users to fully utilize the remaining cores. I would probably need RAM  in the range of 48GB or more to handle more users. (PHP is not completely thread-safe and hence the web server used here spawns processes)
  • This 1RU system can handle more than 3200 users  (with everything on the same system) with CPU cycles to spare is pretty impressive. It means you still have enough CPU to log into the system without seeing degraded performance.
  • Actually you can see here that SMP (or should be called  SMC - Scalable Multi Cores) type system helps when the initial cores are added  instead of using multiple single core systems (ala in Cloud).

 In an upcoming blog entries I will talk more about the individual components used.

Tuesday Jun 02, 2009

Minimal OpenSolaris 2009.06 Appliance Image for VirtualBox 2.2.4

With the release of the OpenSolaris 2009.06, I thought it is time to update the Minimal OpenSolaris 2008.11  Appliance OVF image that I had created earlier. The script has been updated to create minimal OpenSolaris 2009.06 Appliance images for VirtualBox. 

How to use the OVF image?

  • Download VirtualBox 2.2.4 and install it on your host platform.
  • Download the OpenSolaris 2009.06 App OVF image zip file and then unzip it.
  • Fire up Virtualbox GUI and  use menu item VirtualBox->File->Import Appliance to import the image (using the  OSOL200906App.ovf file ) into a new VirtualBox VM
  • Start the newly created VM and in few minutes you will be  ready to login into OpenSolaris 2009.06 kernel.The preset login information is user: root with password: opensolaris.

Comments welcome.

Friday May 29, 2009

Read Only Scalability Patch

Simon Riggs of 2nd Quadrant recently submitted a patch for testing which should improve read only scalability of Postgres. I took it for a test drive for my setup. In the first set of tests I used the same benchmark as previous ones so as to have the same reference point.

It seems changing the Number of Buffer Partitions for this workload does not have any impact. My dataset for this iGen benchmark is pretty small and should easily fit under 2GB size and hence may not be stressing the buffer partitions too much to warrant bigger number. The patch still helps to get good healthy 4-6% gain in peak values.


Jignesh Shah is Principal Software Engineer in Application Integration Engineering, Oracle Corporation. AIE enables integration of ISV products including Oracle with Unified Storage Systems. You can also follow me on my blog


« April 2014