Organizational stove-pipes complicate database storage configurations.

IT organizations at large companies are complex entities where people are partitioned by function. There are SAN people, system administrators, Database administrators, and Developers. While it is good to specialize by function there seems to be a mis-match when each organization optimizes their internal operations. Let me walk you though the a common situation where the SAN administrators and system administrators each try to optimize performance without consideration to overall picture.

The setup

  • DBA requests storage for new application. They are expecting filesystem(s) or RAW luns will be presented for ASM use.
  • System's administrators request luns from the Storage administrators to fulfill the request.
  • Storage administrators supply the luns.

Systems Administrators

Their job is to make sure the performance of the supplied luns map cleanly to the Database environment. For years System Administrators have been using SW volume management/Raid to improve performance. So, naturally, they request a large number of luns (say 128) from the Storage administrators so they can stripe. Past experimentation has shown that a 32k stripe width was best.

Storage Administrators

The Storage people take care of large Hitachi or EMC boxes. Their job is to supply luns to applications and make sure their "san-box" performs well. They gladly supply the luns to the Systems administrators, but to ensure performance of the SAN box, they must prevent the fiber from "resets". The maximum number of requests on a fiber is 256 requests. So, no problem, they have the system administrators adjust the "sd_max_throttle" parameter so the OS will queue events and not cause resets. The rule of thumb is to set it to:

       sd_max_throttle = 256/#luns = 256/128 = 2


Putting it all together

So, now the system administrator takes these 128 luns and creates four file systems by striping 32 luns together each with a 32k stripe width using SVM. Since this is a SAN, there are multiple connections from the host to the SAN in this case there are 4 connections. MPxIO is used to round-robin IO requests to the 4 connections to the SAN to balance load and allow for fail-over in case of an HBA failure.

This environment is turned over to the DBA who finds the performance is less than stellar.

Analysis

The DBA is running 10 jobs that result in queries which full scan 10 tables. These queries request 1MB per IO. Now a stripe width of 32k breaks down the 1MB IO into 32 equal pieces... and since there are 10 concurrent jobs that equates to 32\*10 or 320 concurrent request for IO. Finally, these 320 request, are routed down one of the four channels so that would be 320/4 or 80 requests per channel. Are you beginning to see the problem?

Given the "sd_max_throttle" setting of 2, the OS will allow 2 outstanding requests at a time. If you look at the array, the performance will look great... so it must be an OS problem :)

The Fix

This issue was solved in multiple phases.
  • Quick Fix: Simply increase the "sd_max_throttle" >= 80. This will prevent queuing at the driver level.
  • Increased stripe width. Use an SVM stripe width of 1MB or greater. This will reduce the number of IO being broken down by the SW volume manager.
  • Optimal Solution. Eliminate SW striping all together and build larger luns within the SAN box.

Summary

Storage issues often involve multiple layers of HW, SW, people, and organizations. To architect a well thought out solution, all aspects must be taken into consideration. Get everyone talking and sharing information so that your organizational stove-pipes don't cripple application performance.
Comments:

Hi

you advise to reduce the SW striping, but we have a HP XP12000 ( = Hitachi ) that don't strip well on the Storage Side, and HP advise us to do Striping on the OS Level!

Because we are now migration from TruCluster to Red Hat Cluster or and Oracle RAC, we face exact the problem that you describe. it's very difficulte to find the right setup.

Mike

Posted by Michael Hagmann on January 14, 2008 at 07:12 AM PST #

It is all a matter of trade-offs. Striping in the array would be best, but if you are using Oracle RAC then I guess you are using ASM... especially with RAC. Striping on the OS side with ASM is not a real problem. ASM has knowledge of the application and aligns blocks so that small IO's won't be split int multiple chunks. Large IO (1MB) might still be split. You can always increase the stripe width of ASM to help further limit this but it should not be too much of a problem either way.

Posted by Glenn Fawcett on January 14, 2008 at 08:16 AM PST #

we have the Problem that we use Oracle RAC for SAP and today ASM is not supported for SAP. Clear ASM would be the best Solution for Oracle RAC. Do you have any experience with btrfs? This Filesystem lock also very interesting, seems very near to Advfs from Tru64.

Posted by Michael Hagmann on January 14, 2008 at 08:29 PM PST #

Post a Comment:
Comments are closed for this entry.
About

This blog discusses performance topics as running on Sun servers. The main focus is in database performance and architecture but other topics can and will creep in.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder