Friday Jan 04, 2008

Organizational stove-pipes complicate database storage configurations.

IT organizations at large companies are complex entities where people are partitioned by function. There are SAN people, system administrators, Database administrators, and Developers. While it is good to specialize by function there seems to be a mis-match when each organization optimizes their internal operations. Let me walk you though the a common situation where the SAN administrators and system administrators each try to optimize performance without consideration to overall picture.

The setup

  • DBA requests storage for new application. They are expecting filesystem(s) or RAW luns will be presented for ASM use.
  • System's administrators request luns from the Storage administrators to fulfill the request.
  • Storage administrators supply the luns.

Systems Administrators

Their job is to make sure the performance of the supplied luns map cleanly to the Database environment. For years System Administrators have been using SW volume management/Raid to improve performance. So, naturally, they request a large number of luns (say 128) from the Storage administrators so they can stripe. Past experimentation has shown that a 32k stripe width was best.

Storage Administrators

The Storage people take care of large Hitachi or EMC boxes. Their job is to supply luns to applications and make sure their "san-box" performs well. They gladly supply the luns to the Systems administrators, but to ensure performance of the SAN box, they must prevent the fiber from "resets". The maximum number of requests on a fiber is 256 requests. So, no problem, they have the system administrators adjust the "sd_max_throttle" parameter so the OS will queue events and not cause resets. The rule of thumb is to set it to:

       sd_max_throttle = 256/#luns = 256/128 = 2

Putting it all together

So, now the system administrator takes these 128 luns and creates four file systems by striping 32 luns together each with a 32k stripe width using SVM. Since this is a SAN, there are multiple connections from the host to the SAN in this case there are 4 connections. MPxIO is used to round-robin IO requests to the 4 connections to the SAN to balance load and allow for fail-over in case of an HBA failure.

This environment is turned over to the DBA who finds the performance is less than stellar.


The DBA is running 10 jobs that result in queries which full scan 10 tables. These queries request 1MB per IO. Now a stripe width of 32k breaks down the 1MB IO into 32 equal pieces... and since there are 10 concurrent jobs that equates to 32\*10 or 320 concurrent request for IO. Finally, these 320 request, are routed down one of the four channels so that would be 320/4 or 80 requests per channel. Are you beginning to see the problem?

Given the "sd_max_throttle" setting of 2, the OS will allow 2 outstanding requests at a time. If you look at the array, the performance will look great... so it must be an OS problem :)

The Fix

This issue was solved in multiple phases.
  • Quick Fix: Simply increase the "sd_max_throttle" >= 80. This will prevent queuing at the driver level.
  • Increased stripe width. Use an SVM stripe width of 1MB or greater. This will reduce the number of IO being broken down by the SW volume manager.
  • Optimal Solution. Eliminate SW striping all together and build larger luns within the SAN box.


Storage issues often involve multiple layers of HW, SW, people, and organizations. To architect a well thought out solution, all aspects must be taken into consideration. Get everyone talking and sharing information so that your organizational stove-pipes don't cripple application performance.

Friday Aug 04, 2006

High "times()" syscall count with Oracle processes

"Why does Oracle call times() so often? Is something broken? When using truss or dtrace to profile Oracle shadow processes, one often sees a lot of calls to "times". Sysadmins often approach me with this query.

root@catscratchb> truss -cp 7700
syscall               seconds   calls  errors
read                     .002     120
write                    .008     210
times                    .053   10810
semctl                   .000      17
semop                    .000       8
semtimedop               .000       9
mmap                     .003      68
munmap                   .003       5
yield                    .002     231
pread                    .150    2002
kaio                     .003      68
kaio                     .001      68
                     --------  ------   ----
sys totals:              .230   13616      0
usr time:               1.127
elapsed:               22.810

At first glance it would seem alarming to have so many times() calls, but how much does this really effect performance? This question can best be answered by looking at the overall "elapsed" and "cpu" time. Below is output from the "procsystime" tool included in the Dtrace toolkit.

root@catscratchb> ./procsystime -Teco -p 7700
Hit Ctrl-C to stop sampling...
Elapsed Times for PID 7700,
         SYSCALL          TIME (ns)
            mmap           17615703
           write           21187750
          munmap           21671772
           times           90733199       <<== Only 0.28% of elapsed time
          semsys          188622081
            read          226475874
           yield          522057977
           pread        31204749076
          TOTAL:        32293113432

CPU Times for PID 7700,
         SYSCALL          TIME (ns)
          semsys            1346101
           yield            3283406
            read            7511421
            mmap           16701455
           write           19616610
          munmap           21576890
           times           33477300         <<== 10.6% of CPU time for the times syscall
           pread          211710238
          TOTAL:          315223421

Syscall Counts for PID 7700,
         SYSCALL              COUNT
          munmap                 17
          semsys                 84
            read                349
            mmap                350
           yield                381
           write                540
           pread               3921
           times              24985    <<== 81.6% of syscalls.
          TOTAL:              30627

According to the profile above, the times() syscall accounts for only 0.28% of the overall response time. It does use 10.6% of sys CPU. The usr/sys CPU percentages are "83/17" for this application. So, using the 17% for system CPU we can calculate the overall amount of CPU for the times() syscall: 100\*(.17\*.106)= 1.8%.

Oracle uses the times() syscall to keep track of timed performance statistics. Timed statistics can be enabled/disabled by setting the init.ora parameter "TIMED_STATISTICS=TRUE". In fact, it is an \*old\* benchmark trick to disable TIMED_STATISTICS after all tuning has been done. This is usually good for another 2% in overall throughput. In a production environment, it is NOT advisable to ever disable TIMED_STATISTICS. These statistics are extremely important to monitor and maintain application performance. I would argue that disabling timed statistics would actually hurt performance in the long run.

This blog discusses performance topics as running on Sun servers. The main focus is in database performance and architecture but other topics can and will creep in.


« October 2016

No bookmarks in folder


No bookmarks in folder