Monday Jun 08, 2009

:wq blog

I guess I have decided to stop here.
Looking back my previous posts and the blog, without any latest update, looks as older and frivolous as reassurances of an economic rebound ! :(
There isn't a mentionable personal acheivement since I moved to the bay area about an year ago.
Well, yes there is.. I managed to stay (live, drive and visit tourist places) without wearing Sun Glasses !! :)
Ok, was nice bloggin here.

Tuesday May 13, 2008

Controlling Threads with concurrent package

For long, I have been using Runnable interface to get the long running non interactive natured stuff done in the background. In some previous posts, I had mentioned the crude way of using ThreadGroup to achieve pooling sort of control. It was also fun working on a task scheduler that ran tasks in controlled set of threads. Logically, what do you need ? The pool size, the Runnable targets, an infinite loop and a data structure to control the thread scheduling !! So the code looked really.. as we call in India.. 'Fundooo' ! But when I look at the Concurrent package, the same code looks like a floppy disk in front of a DVD. I haven't explored the package completely so don't want to say a blue ray disk.
Just for my understanding, let me post a test program that I can anytime compile and run later to refresh my memory.
import java.util.concurrent.\*;
import java.util.Random;

public class TestConcurrent {
        public static void main(String args[]) {
                ExecutorService runner = Executors.newSingleThreadExecutor();
                ExecutorService pool = Executors.newFixedThreadPool(5);

                int style = 1;

                try { style = (new Integer(args[0])).intValue();
                } catch (Exception e ) {}

              switch(style) {

              case 1:
                  System.out.println("Running in separate threads...");
                  for(int oldstyle=0; oldstyle<10; oldstyle++) {
                      new Thread(new TestRunnable()).start();

              case 2:
                  System.out.println("Running in a worker thread....");
                  for(int newstyle=0; newstyle<10; newstyle++) {
                      runner.submit(new TestRunnable());

              case 3:
                  System.out.println("Running in a thread pool....");
                  for(int newstyle=0; newstyle<10; newstyle++) {
                      pool.submit(new TestRunnable());

              System.out.println("Main is over....");

class TestRunnable implements Runnable {

    private static int i = 0;
    private int member = 0;
    private int runtime = 0;

    public TestRunnable() {
       member = ++i;
       runtime = (new Random()).nextInt(10);

    public void run() {
        System.out.println("I am instance " + member + " with runtime " + runtime + " seconds");
        try { Thread.sleep(runtime \* 1000); }catch(Exception e) {System.out.println("Interrupted");}
        System.out.println("Instance "+ member + " runtime over...!");


Usage is obvious:
$ java TestConcurrent [1|2|3]

Thursday Jan 10, 2008

My Notes on the Migration to PostgreSQL Experience

         Recently, I was involved in changing the database implementation for one of the products. The product had been using the most popular database and there were several reasons ranging from performance goals, maintainability, platform support requirements to licensing cost. After taking a look at several replacement candidates, the team narrowed down to PostgreSQL. The decision eventually became very easy with the availability of PostgreSQL in Solaris 10 and the enterprise ready features of it.
         I wish I had the time to carefully note the minutest detail of the porting experience. But this is a set of short notes. Let me try to explain the requirements in brief. The product has a central server layer that collects data from tens or sometimes hundreds of systems periodically. The collected data needs to be processed and stored in the database for generating reports and graphs. The data retainment policy is to keep on rolling it up so that the data stays over a long duration at a gradually reducing granularity level with time. Which means the freshly collected data should be the most granular while the older data should be summarized over a period of time and purged out as and when required.

Porting the code
     Migrating just the data was nearly a piece of cake with the help of a downloadable utility. After manually creating the Postgres schema, we were able to migrate the data from the older version of the product and use it for the prototype. Deciding on the datatypes to be used is not a rocket science as there is an equivalent or better in postgres for nearly every data type. Keeping performance in mind, the numeric datatype was seldom used as it consumes 14+ bytes, but not many pitfalls there.
     While porting the procedures procedural language code to the PostgreSQL functions, the team learnt that most of the code can be reused as is. However, some of the functions don't compile but have various equivalent functions such as COALESCE and a lot of date operators, functions. The operators and type casting with :: comes very handy.
The usage of PL/psSQL itself is not one of the best ways of doing things if the old blocks of code were already hitting the roofs of the utilization levels. But we can talk about it a little later.
     At the same time, a lot of code written in C to do 'bulk loads' into the database tables was replaced with a single 'COPY' statement. Amazing !! The Copy statement required changing the format of the source file of the 'bulk load' operation but that was a very very minor overhead. All that was required was to read the old format line by line and convert it into a single delimiter separated fields, something easily done using a perl script. A huge amount of code REMOVED at a cost of a small perl script and call to COPY statement.

     So, the product is in a stage where the business logic is ported. It's functional and can handle prototype/dummy data. But when actual data starts flowing in, the size will go up and will test the limits of the database performance. The database design of the old implementation highly utilized the partitioning techniques in order to scale upto several Gbs of data. New partitions were being created dynamically at the same time old partitions dropped after summarizing their data as per the retainment policy. Postgres 8.x has partitioning mechanism that, at the face of it, looks very different. But as and when we went on implementing it, we found it simpler to administer. For (a) the table owner can also own the partitions, eliminating a need to bring in the most privileged user. (b) The partitions are tables, making them easy to manipulate from the administration point of view. (c) The partition indexing automatically becomes local as it's just like indexing a table. ... and several such reasons.
     One of the stumbling blocks we faces was that the Postgres partitioning works perfectly with the help of Postgres rules for insert command. But the Copy command does not follow the rules. So a way out was to
     \* create partitions and rules
     \* create a temp table and insert all the data into it using COPY
     \* use insert into < original_table > select from < temp table > order by < partition field >
     Next hurdle was, the pre-partitioned tables to be migrated. A migration utility will not retain the partitions easily. Hence,
\* Solution A: Refer to the metadata to find out if the table has been partitioned, and get the partition info. This requires a higher privileged user.
\* Soluiton B: Create the max possible partition starting backwards from the current date. Eventually when it becomes old enough, it will be dropped anyway as per the design.

Postgres initial configuration
     So, now all set with the data and business logic ported to Postgres. The partitions are in place to improve the query performance and enable effective maintenance.
But can the PL/pgSQL scale while processing huge amount of data and give at par performance as compared to the old database ?
That's when database tuning came in picture.
     \* Shared buffers adjusted to
     f(x) = (x / 3) \* (1024 / 8) For 511 < x < 2049
     = 682 For x > 2048
     \* Work memory adjusted to 1/4 th of Shared Buffers
     f(x) = (x / 2) \* (1024 / 16) For 511 < x < 1025
     \* Maintenance work memory, effective cache size and max fsm pages set to 2 times work_mem
     \* constraint_exclusion set to on. (This will boost up query performance when partitioned tables are queried.)
     \* A manual vacuum and analyze forced just before running the batch jobs ( instead of autovacuum )

Directories for tablespaces
The idea was to have 3 directories on separate file systems and preferably on separate disks
The first dir would have the smaller tables, more or less static in nature
The second dir would have the medium sized tables holding the summarized or less granular data.
The third dir would have the large tables holding the most granular or non summarized data.
The indexes placed in the second dir holding the medium tables.
The application data stays in a yet another directory and if the above three dirs do not use the same filesystem, we get it as the forth file system. This will give the pg_xlog it's own filesystem and if configured, a different disk.

Business Logic Updates
     Seems we are all set. But the first round of testing itself revealed we are far from it.
The batch job functions seemed to take forever. So, it needed code changes. Remember, it's nearly a reused and ported code in PL/pgSQL. The main point is PL/pgSQL usage of cursors needs special treatment. Especially when there are loops. The older implementation had nested loops performing singular inserts. There were intermediate commit statements after a certain transaction count. PL/pgSQL does not allow it. It's not the best approach in the first place.
A careful look at the nested loops, and we quickly figured out that one of the loop could be eliminated by replacing it with an INSERT .. SELECT. The huge bonus we get is, now it becomes a single transaction. Also, figured out that intersections using EXCEPT don't go well with the PL/pgSQL performance. After running the new query with explain and explain analyze, figured out the indexing changes required. In particular, a lot of function based indexes were required. One needs immutable function to do so in postgres, and it's beautiful to read the code. The postgres indexing is very different in some cases especially composite indexes and as said function based indexes can be used very effectively. With all that in place, the performance improved magically ! Now we had a situation where the PL/pgSQL blocks were faster than the older implementation.

New Bottleneck, the conversion script.
     Remember the conversion perl script I talked about to get the data in a single delimiter format, so that COPY statement can pick it ? Well, Only after PL/pgSQL started performing (better) we came to know that the perl script is a new bottleneck. So converted it into a multiprocess script that forks off the conversion logic for every 0.1 M rows. Now, even the script and the PL/pgSQL blocks put together outperformed the older block.

         While, obviously a lot more can be done to make it perform better, the short experience was good enough to give me a feel of the strengths of PostgreSQL. The Postgres emblem although represents 'The elephant never forgets', I think it should be the elephant with tremendous power and strength but friendly and useful like our Indian Elephants. :) Would highly recommend using PostgreSQL for your applications.

Tuesday Dec 18, 2007

IEC Sports Day 2007

The one and half weeks of FUN is now over. The Sun Microsystems India Engineering Center Annual Sports Days concluded with a prize distribution this afternoon. The winners, especially the team event winners looked visibly pleased with the trophies. Obviously due to their win or run up to the finals. But the size of the trophies looked large enough to make it special for them.          The well publicized event, although announced a little late, was open for one and half day for registration. And boy, there were 550+ e-mails on the alias when I checked last, out of which 400+ were registration mails.
Agreed that many sent more than one mail as per the process, enrolling for one form of sport each, but the team game registration was a single mail per 6 to 8 member team. I must mention that cricket alone had 33 registered teams with non common members, which means we had ~200 distinct cricket players with at least 33 girls, ready to face the tennis ball darted at them from 22 yards ! 200+ for one sport alone. That was quite amazing !!!
         I was tempted to put it under the sports category in continuation to my previous post
Some of the matches proved to be mis-matches, but the enhanced rules or the format of some of the games gave us very closely contested battles in more than 80% of the cases. The amateurs were pitted against experts, but they had skewed rules to help them baffle the experts. The favorites were suddenly seen running out of ideas when the over enthusiasts started playing the games their own way, their own style. It was Fun !!
         Unfortunately, the last day was ruined by unexpected rain. Cricket semis and finals saved for the last day had to be reduced to a bowl out. Sure, there were other ways of deciding the winner when the conditions were totally unplayable. But we thought we would rather see the teams PLAY and win. Athletics and other outdoor games had to be canceled. Football really saved the day for the sports lovers. The mud and the small to mid size pools of water right in the middle of the field did not stop the finalists and the referee. The indoor games, Badminton, TT, Chess, caroms, foosball, were of course unaffected.
         Not everything went on as per the plan. There were moments of conflicts. Many of us went through frustration and disappointment. But in the end, all of us walked away with more friends and surely everyone had a lot of FUN.. !!

Saturday Mar 31, 2007

Solaris, my hometown and the financial capitol of India..

Very eventful...
That's how I can summarize my short trip to Pune aka Oxford of India and the financial capitol of India, Mumbai.
I presented three topics in each.
. Solaris ( overview and latest features )
. Extreme Observability with DTrace
. Solaris Containers and Container Management.
So just to fulfil my inner most desire, I made them get down at the food mall just before Lonavala, to have arguably the best street side food “Vada Pav”.

[Read More]

Thursday Mar 08, 2007

The Black Boxes have started goin' places

The wonderfully simple idea Project Blackbox: Datacenter in a Container is becoming more and more realistic these days. There are customers who have reportedly signed up for the early access of the first few shipments of the containers fully loaded with latest technology servers. The idea also made it to the list of crackpot tech ideas

Tuesday Feb 27, 2007

Recent clicks

my clicks around sun blogs recently opened these interesting things

[Read More]

Tuesday Feb 06, 2007

Loot lo offer: Register your Solaris 10 box and get a gift

Do you manage Solaris boxes ?
Do you plan to go buy some apparels this weekend ? ;)
Read on ..

[Read More]

Friday Nov 24, 2006

OpenSolaris, IIT Kanpur and glimpses of North India

I had an opportunity to share my views on OpenSolaris with the IIT K students , as a guest lecturer. ... The by-products of my IIT K visit were some rare and most of the times unusually joyous moments. ...[Read More]

Thursday Sep 21, 2006

ThreadGroup added to the fun with Sun Fire T1000

    Modified my java program a little to control the number of threads, and I see very interesting results !![Read More]

Monday Sep 11, 2006

Fun with multi threaded java program on T1

Definitely not a benchmark testing kind of program, but a small java program to check the T1 performance.
[Read More]

Friday May 05, 2006

CPU Caps: Control those CPU Pigs

    I am thrilled to say that probably I am one of those few lucky souls to get an early experience of the latest and arguably coolest resource control mechanism on Solaris. CPU Caps released a month back by Andrei on opensolaris.
At a first glance: it's a simple equation. 1 CPU cap = 1/100 th of one CPU. And it gells well with the Solaris Containers. Let's check out how CPU caps and containers work together.

[Read More]

Thursday Apr 20, 2006

CEC visit

made a quick visit to the CEC to make sure things are setup for the evening preso and demo.[Read More]

Tuesday Apr 18, 2006

Hi netbeans

Let me confess here that I should have done this earlier but somehow I am very very comfortable with vi editor.[Read More]



« July 2016