Optimizing Oracle Schema Analyze with CMT based servers

A common observation regarding Niagara based servers is that system maintenance or database administration tasks can run slower than previous generations of Sun servers. While single-threaded performance may be less, these maintenance tasks are often able to be parallelized, especially using a database engine as mature as Oracle. Take for instance the task of gathering schema statistics. Oracle offers many options on how to gather schema statistics, but there are a few ways to reduce overall gather statistics time:
  • Increased Parallelism
  • Reduced Sample Size
  • Concurrency
Oracle has written many articles in metalink which discuss sample size and the various virtues. There have also been many volumes written on optimizing the Oracle cost based optimizer (CBO). Jonathan Lewis of who is a member of the famous Oaktable network has written books and multiple white papers on the topic. You can read these for insight into the Oracle CBO. While a reasonable sample size or the use of the "DBMS_STATS.AUTO_SAMPLE_SIZE" can seriously reduce the gather statistics times, I will leave that up to you to choose the sample size the produces the best plans.

Results

The following graph shows the total run time in seconds of a "GATHER_SCHEMA_STATS" operations at various levels of parallelism and sample size on a simple schema of 130GB. All tests were run on a Maramba T5240 with a 6140 array and two channels.

GATHER_SCHEMA_STATS parallelism and sample_size


Note that if higher levels of sampling are required, parallelism can help to significantly reduce the overall runtime of the GATHER_SCHEMA_STATS operation. Of course a smaller sample size can be employed as well.

GATHER_SCHEMA_STATS options

SQL> connect / as sysdba

-- Example with 10 percent with parallel degree 32
--
SQL> EXECUTE SYS.DBMS_STATS.GATHER_SCHEMA_STATS (OWNNAME=>'GLENNF', 
     ESTIMATE_PERCENT=>10, 
     DEGREE=>32, 
     CASCADE=>TRUE);

-- Example with AUTO_SAMPLE_SIZE and parallel degree 32
--
SQL> EXECUTE SYS.DBMS_STATS.GATHER_SCHEMA_STATS (OWNNAME=>'GLENNF', 
     ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE, 
     DEGREE=>32, 
     CASCADE=>TRUE);

Note that you must have "parallel_max_servers" set to at least the level of parallelism desired for the GATHER_SCHEMA_STATS operation. I typically set it higher to allow for other parallel operations to get servers.

        SQL> alter system set parallel_max_servers = 128;

Finally, you can easily run a schema collect on multiple schema's concurrently and in parallel by issuing GATHER_SCHEMA_STATS from multiple sessions and ensuring the level of parallelism is set high enough to accommodate.

Configuration

  • T5240 - "Maramba" CMT based server
    • 2 x UltraSPARC T2 Plus @ 1.4GHz
    • 128GB memory
    • 2 x 1GB fiber channel HBA
    • 1 x 6140 Storage array with 1 lun per channel.
  • Software
    • Solaris 10 Update 5
    • Oracle 10.2.0.3
    • CoolTools
  • Schema
      SQL> Connected.
      SQL> SQL> SQL> SQL> SQL> SQL> SQL>   2    3    4  
      OWNER	 TABLE_NAME	NUM_ROWS       MB
      -------- ------------ ---------- --------
      GLENNF	 B2	       239826150    38560
      GLENNF	 B1	       237390000    32110
      GLENNF	 S2		 4706245      750
      GLENNF	 S4		 4700995      750
      GLENNF	 S5		 4699955      750
      GLENNF	 S7		 4698450      750
      GLENNF	 S8		 4706435      750
      GLENNF	 S9		 4707445      750
      GLENNF	 S10		 4700905      750
      GLENNF	 S3		 4706375      750
      GLENNF	 GTEST		 4706170      750
      
      OWNER	 TABLE_NAME	NUM_ROWS       MB
      -------- ------------ ---------- --------
      GLENNF	 S6		 4700980      750
      GLENNF	 S1		 4705905      710
      HAYDEN	 HTEST		 4723031      750
      
      14 rows selected.
      
      SQL>   2    3    4  
      OWNER	 INDEX_NAME	NUM_ROWS       MB
      -------- ------------ ---------- --------
      GLENNF	 B1_I2	       244841720    11623
      GLENNF	 B2_I2	       239784800    11451
      GLENNF	 B1_I1	       248169793     8926
      GLENNF	 B2_I1	       241690170     8589
      GLENNF	 S6_I2		 4790380      229
      GLENNF	 S3_I2		 4760090      227
      GLENNF	 S2_I2		 4693120      226
      GLENNF	 S5_I2		 4688230      224
      GLENNF	 S8_I2		 4665695      223
      GLENNF	 S4_I2		 4503180      216
      GLENNF	 S1_I2		 4524730      216
      
      OWNER	 INDEX_NAME	NUM_ROWS       MB
      -------- ------------ ---------- --------
      GLENNF	 S9_I2		 4389080      211
      GLENNF	 S10_I2 	 4364885      209
      GLENNF	 S7_I2		 4357240      208
      GLENNF	 S2_I1		 4972635      177
      GLENNF	 S3_I1		 4849660      174
      GLENNF	 S6_I1		 4830895      174
      GLENNF	 S9_I1		 4775830      171
      GLENNF	 S7_I1		 4772975      169
      GLENNF	 S5_I1		 4648410      168
      GLENNF	 GTEST_C1	 4686790      167
      GLENNF	 S1_I1		 4661605      166
      
      OWNER	 INDEX_NAME	NUM_ROWS       MB
      -------- ------------ ---------- --------
      GLENNF	 S4_I1		 4626965      166
      GLENNF	 S10_I1 	 4605100      164
      GLENNF	 S8_I1		 4590735      163
      
      25 rows selected.
      

Comments:

are you setting the degree to 32 and parallel_max_servers to 128 based on 1/4 of total hw threads and an oracle server per thread?

So on a T2000, parallel_max_servers would be set to 32 and degree set to 8?

Posted by R.P. Aditya on May 12, 2008 at 11:19 AM PDT #

This was just an example, so don't read anything special into these numbers.

In the past we would typically set parallel_max_servers ~= 4x the number of \*cores\*. With CMT based systems, I would not go so far maybe 2x the number of threads. It really depends on how much IO bandwidth is available. As far as the level of parallelism per operation, that would depend on what else is expected to run. If there is nothing else running on the system and there is ample IO bandwidth, go ahead and crank up the parallelism. Do note that once you saturate IO, additional parallelism might actually hurt run times.

Posted by Glenn Fawcett on May 13, 2008 at 02:35 AM PDT #

Is there an reason the chart is starting with degree=5 instead of 1?

Posted by Bernd Eckenfels on May 14, 2008 at 05:36 AM PDT #

I didn't do runs with parallel degree=1... I ran out of time :)

Posted by Glenn Fawcett on May 14, 2008 at 05:49 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

This blog discusses performance topics as running on Sun servers. The main focus is in database performance and architecture but other topics can and will creep in.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder