System performance issues? Check the I/O
By Amir Javanshir on Nov 03, 2009
Did you ever have a performance problem and not know where to start looking? Well, who didn't? In such a situation, it is important to perform a proper monitoring of the system and resist the immediate urge to blame the processing power --the storage subsystem is often overlooked. Now that I spoiled the suspense giving away the answer, give me 5 minutes to illustrate this with a typical example. Why typical? Because this is a case I meet quite often with startup companies who rightly concentrate on their core business and not on their IT infrastructure --that's what we are here for.
One of our partners, Squid Solutions, recently reported a performance problem with an Oracle database; they had engaged our team as part of the Sun Startup Essentials support program. Squid Solutions makes a software called Nautilus that performs intensive analytics on database systems, whatever their kind or size. They call Nautilus an SQL Knowledge Engine because it models data and business knowledge, and then automatically generates SQL code to execute the data processing tasks. Nautilus is sold as a service performed by Squid's engineers as Customer Intelligence projects.
So, Squid Solutions had purchased a brand-new Sun Fire X2250 server --quad-core Intel Xeon with 4 GB of RAM, running Solaris 10--, and was experiencing poor performance --much lower than their old system of previous generation-- when executing a read-n-write intensive workload on the database. When I first got on the server, something jumped right to my eyes; the server used a single hard drive for all of the following: the operating system; the SWAP files; Oracle binaries; Oracle redo logs; Oracle table spaces and indexes.
This basically means that you will not be able to get more transactions from your database, than the number of IOPS the hard drive can handle, since a transaction is not fully committed until it is written in the redo log file. In the best case, we are talking about 200 IOPS for a single drive! Consider also that this number must be shared between the writes to the redo logs, and the table and index reads and writes. It therefore seems pretty safe to assume that the bottleneck is the I/O subsystem. But we still need to verify this by monitoring the system.
Monitoring the System
We ran the Squid Solutions test case on the Oracle database for 10 minutes and used the dim_STAT tool to collect and analyze all the necessary system statistics: CPU, RAM and I/O. dim_STAT is a great performance monitoring and analysis tool developed by our colleague Dimitri Kravtchuk at the Sun Solution Center in Paris: see Solaris Performance Primer: dimSTAT, dimSTAT by examples and Customizing db_STRESS for some detailed blogging on dim_STAT from our team.
The CPUs are hardly used; they are constantly over 80% idle throughout the test. Also the run queue is always equal to zero, which means that we never have a thread waiting for a CPU. This clearly shows that there is no CPU issue.
scan rate is constantly equal to zero: in case of memory shortage
this rate can get very high --over 200 for instance. Also we
see that throughout the test there is more than 700 MB of free physical
memory. There is therefore no memory issue.
Finally the I/O analysis shows the problem; the disk is clearly the bottleneck here. Throughout the test, the single disk is more than 60% busy, this figure even goes up to 85% at the end of the test. Also the wait percentage that should be close to zero, is around 10% and even peaks at 20% once in a while. This test confirms our suspicion concerning the saturation of the I/O subsystem; it needs to be upgraded in order to get acceptable performance from Oracle.
Better performance can be achieved by isolating Oracle's redo logs on a separate drive. By doing so, we will no longer have the share the IOPS with other reads or writes. But it will not be enough; we will still be
limited by the maximum number of IOPS possible per drive. The industry solution to this problem is RAID storage; by stripping the Oracle redo logs over multiple disks, you then have 200x the number of drives of possible IOPS.
A budget configuration could consist of a Sun Storage J4200, a simple JBOD array, combined to a SAS RAID HBA to your server to compensate the lack of a RAID hardware controller in the JBOD --and thus of memory cache. This adapter offers up to 256 MB DDR2 cache memory. The cache is important for performance because Oracle does not have to wait until the redo log in actually written to the hard drive. And you have a 1000x better performance writing to RAM than disk! The RAID controller will guarantee that what it has in cache will eventually be written on disk.
A better option for higher performance, reliability and flexibility is an external storage array for an integrated RAID controller such as the Sun StorageTek 2510. It is an entry-level SAS storage, that requires a SAS controller on the server that connects to it. The array can host up to 48 SAS or SATA drives, and can be controlled by one or two SAS RAID controllers. Each controller has a 512 MB cache size, 2-to-4x more cache than the previous solution.
Would you like to monitor your own system to check where the bottleneck is? As mentioned earlier, I used a tool called dim_STAT to collect and analyze all the necessary system statistics. You can download and use it for free on Solaris SPARC, Solaris x86 and Linux. Features of dim_STAT include: history of system statistics saved in a MySQL database; simplicity to get many system statistics once, and analyze later; neat graphical output allowing a much easier analysis; capable of collecting from multiple servers, and allowing comparison; etc.
That said, you can already get useful
statistics from your Solaris system out of the box, without installing
dim_STAT --dim_STAT fully relies in fact on the Solaris native tools. The three most used are
vmstat for virtual memory and CPU
iostat for IO and CPU utilization
netstat for network
you check closely the graphics in the “Test
section above, you will see that for the CPU and memory statistics,
was in fact used, and
iostat for the disk statistics.
Oh, by the way! I almost forgot ! So why was our partner's old server doing better than the new one? Because it had two disks, and the Oracle redo logs and data files were on separate drives.
If there is any conclusion to draw from this:
- Don't throw all your money at CPU and RAM: the I/O subsystem may be where you can dramatically impact performance (for better or worse).
- Analyze your system with the standard Solaris system statistics tools to get an instant high-level picture of your application's behavior --dtrace and HAR are then tools that will allow you to drill down.