Thursday Jan 10, 2008

My Notes on the Migration to PostgreSQL Experience

         Recently, I was involved in changing the database implementation for one of the products. The product had been using the most popular database and there were several reasons ranging from performance goals, maintainability, platform support requirements to licensing cost. After taking a look at several replacement candidates, the team narrowed down to PostgreSQL. The decision eventually became very easy with the availability of PostgreSQL in Solaris 10 and the enterprise ready features of it.
         I wish I had the time to carefully note the minutest detail of the porting experience. But this is a set of short notes. Let me try to explain the requirements in brief. The product has a central server layer that collects data from tens or sometimes hundreds of systems periodically. The collected data needs to be processed and stored in the database for generating reports and graphs. The data retainment policy is to keep on rolling it up so that the data stays over a long duration at a gradually reducing granularity level with time. Which means the freshly collected data should be the most granular while the older data should be summarized over a period of time and purged out as and when required.

Porting the code
     Migrating just the data was nearly a piece of cake with the help of a downloadable utility. After manually creating the Postgres schema, we were able to migrate the data from the older version of the product and use it for the prototype. Deciding on the datatypes to be used is not a rocket science as there is an equivalent or better in postgres for nearly every data type. Keeping performance in mind, the numeric datatype was seldom used as it consumes 14+ bytes, but not many pitfalls there.
     While porting the procedures procedural language code to the PostgreSQL functions, the team learnt that most of the code can be reused as is. However, some of the functions don't compile but have various equivalent functions such as COALESCE and a lot of date operators, functions. The operators and type casting with :: comes very handy.
The usage of PL/psSQL itself is not one of the best ways of doing things if the old blocks of code were already hitting the roofs of the utilization levels. But we can talk about it a little later.
     At the same time, a lot of code written in C to do 'bulk loads' into the database tables was replaced with a single 'COPY' statement. Amazing !! The Copy statement required changing the format of the source file of the 'bulk load' operation but that was a very very minor overhead. All that was required was to read the old format line by line and convert it into a single delimiter separated fields, something easily done using a perl script. A huge amount of code REMOVED at a cost of a small perl script and call to COPY statement.

     So, the product is in a stage where the business logic is ported. It's functional and can handle prototype/dummy data. But when actual data starts flowing in, the size will go up and will test the limits of the database performance. The database design of the old implementation highly utilized the partitioning techniques in order to scale upto several Gbs of data. New partitions were being created dynamically at the same time old partitions dropped after summarizing their data as per the retainment policy. Postgres 8.x has partitioning mechanism that, at the face of it, looks very different. But as and when we went on implementing it, we found it simpler to administer. For (a) the table owner can also own the partitions, eliminating a need to bring in the most privileged user. (b) The partitions are tables, making them easy to manipulate from the administration point of view. (c) The partition indexing automatically becomes local as it's just like indexing a table. ... and several such reasons.
     One of the stumbling blocks we faces was that the Postgres partitioning works perfectly with the help of Postgres rules for insert command. But the Copy command does not follow the rules. So a way out was to
     \* create partitions and rules
     \* create a temp table and insert all the data into it using COPY
     \* use insert into < original_table > select from < temp table > order by < partition field >
     Next hurdle was, the pre-partitioned tables to be migrated. A migration utility will not retain the partitions easily. Hence,
\* Solution A: Refer to the metadata to find out if the table has been partitioned, and get the partition info. This requires a higher privileged user.
\* Soluiton B: Create the max possible partition starting backwards from the current date. Eventually when it becomes old enough, it will be dropped anyway as per the design.

Postgres initial configuration
     So, now all set with the data and business logic ported to Postgres. The partitions are in place to improve the query performance and enable effective maintenance.
But can the PL/pgSQL scale while processing huge amount of data and give at par performance as compared to the old database ?
That's when database tuning came in picture.
     \* Shared buffers adjusted to
     f(x) = (x / 3) \* (1024 / 8) For 511 < x < 2049
     = 682 For x > 2048
     \* Work memory adjusted to 1/4 th of Shared Buffers
     f(x) = (x / 2) \* (1024 / 16) For 511 < x < 1025
     \* Maintenance work memory, effective cache size and max fsm pages set to 2 times work_mem
     \* constraint_exclusion set to on. (This will boost up query performance when partitioned tables are queried.)
     \* A manual vacuum and analyze forced just before running the batch jobs ( instead of autovacuum )

Directories for tablespaces
The idea was to have 3 directories on separate file systems and preferably on separate disks
The first dir would have the smaller tables, more or less static in nature
The second dir would have the medium sized tables holding the summarized or less granular data.
The third dir would have the large tables holding the most granular or non summarized data.
The indexes placed in the second dir holding the medium tables.
The application data stays in a yet another directory and if the above three dirs do not use the same filesystem, we get it as the forth file system. This will give the pg_xlog it's own filesystem and if configured, a different disk.

Business Logic Updates
     Seems we are all set. But the first round of testing itself revealed we are far from it.
The batch job functions seemed to take forever. So, it needed code changes. Remember, it's nearly a reused and ported code in PL/pgSQL. The main point is PL/pgSQL usage of cursors needs special treatment. Especially when there are loops. The older implementation had nested loops performing singular inserts. There were intermediate commit statements after a certain transaction count. PL/pgSQL does not allow it. It's not the best approach in the first place.
A careful look at the nested loops, and we quickly figured out that one of the loop could be eliminated by replacing it with an INSERT .. SELECT. The huge bonus we get is, now it becomes a single transaction. Also, figured out that intersections using EXCEPT don't go well with the PL/pgSQL performance. After running the new query with explain and explain analyze, figured out the indexing changes required. In particular, a lot of function based indexes were required. One needs immutable function to do so in postgres, and it's beautiful to read the code. The postgres indexing is very different in some cases especially composite indexes and as said function based indexes can be used very effectively. With all that in place, the performance improved magically ! Now we had a situation where the PL/pgSQL blocks were faster than the older implementation.

New Bottleneck, the conversion script.
     Remember the conversion perl script I talked about to get the data in a single delimiter format, so that COPY statement can pick it ? Well, Only after PL/pgSQL started performing (better) we came to know that the perl script is a new bottleneck. So converted it into a multiprocess script that forks off the conversion logic for every 0.1 M rows. Now, even the script and the PL/pgSQL blocks put together outperformed the older block.

         While, obviously a lot more can be done to make it perform better, the short experience was good enough to give me a feel of the strengths of PostgreSQL. The Postgres emblem although represents 'The elephant never forgets', I think it should be the elephant with tremendous power and strength but friendly and useful like our Indian Elephants. :) Would highly recommend using PostgreSQL for your applications.

Friday Dec 14, 2007

Solaris Containers came to the rescue

One of my friends leading a quality team responsible for performance/stress testing of a network intensive application, had something interesting to discuss with me few months back.
The application has multiple instances of a component communicating to a central server over the network. In a customer environment, this piece of the software has one instance on one OS instance. This one does a lot of snmp talk back n forth with the server. The test case was to have ~1500 instances talking to a single server at a time. His team has a simulation script that runs 50 to 100 instances on one box. He had used 15 to 30 Solaris 8 and 9 hosts some 4 yrs back to successfully test it. Amazing effort ! Because those many instances are good enough to drive the system crazy. And managing 15 to 30 such system must have been SOMEthing!
The current situation was that he had a lab with reduced resources, a smaller team and very few days to repeat it with the latest version of the same s/w. So I gave him the obvious suggestion ! Virtualize the host(s) to get the magic set of 15 to 30 os instances. And more obvious choice of virtualization technology in this case, Solaris Containers. All the team needed was multiple OS instances of the same OS flair. Within this constraint, containers are extremely lightweight. The team had a 24 vCPU 48 G RAM box that was selected to host them.
So it all looked great on the paper. One of his team members configured some 8 zones to start with and started instantiating 100 per zone. The moment the instances in just 2 of them started talking over the network, the system refused to respond to any of the terminals connected. Although there were s/w partitions in the form of zones, all were sharing a NIC.
A team meeting with a few glum faces and few 'I knew it' remarks... And here is when the exclusive IP stack support added to containers came to the rescue !!! With additional NICs on board, and an upgrade to Solaris 10 Update 4, the zones were reconfigured to have exclusive ip type. And it worked !! 30 boxes down to a single box. So easy to manage, fast boot and shutdown, nearly instantaneous replication with cloning to get 16 Containers running 100 instances each. Cool Stuff !!!




« April 2014