We are facing severe performance issues on our server running Oracle Database 10g Release 2 (with 2 CPUs and 12 GB RAM). Automatic Workload Repository reports state that we have to tune our SQL statements, but application-level tuning is not an option, because we do not have the resources. Please refer to the top part of the Automatic Workload Repository report and suggest a way to improve performance (any way except SQL tuning). We are at our wit’s end.
Note: Due to its size, I’m not including the submitted report from the Oracle Database Automatic Workload Repository tool. If you are interested in reviewing the report, it is available online at bit.ly/zN0vK3.
My initial answer to this question follows. The people who posted the question sent a follow-up comment, and I’ve also included that here. It is always nice to get feedback (good, bad, or indifferent) to see how things are going. This time, the feedback was good and reinforced the points I made below.
The questioners wrote, “. . . application-level tuning is not an option” and that they do not have the resources to do it, but I maintain that they have to do it. “Tuning a database” will deliver single- or even double-digit percentage decreases in application runtime, but these performance issues need orders-of-magnitude application runtime improvement. There is no magic here. If getting the response you wanted didn’t require looking at the application and its overall design and implementation, no one would ever look at the application, its design, or its implementation.
If you want orders-of-magnitude decreases in runtimes, you will be looking at the application and its design and implementation. If you want things to run 10 percent faster, you might (and I stress the word might) get lucky and be able to tune the database without touching the application. It is exceedingly rare—and getting rarer as the database becomes more self-tuning.
The question ends, “. . . suggest a way to improve performance (any way except SQL tuning). We are at our wit’s end.” I can imagine that you are at your wit’s end. A request like this is sort of like being asked to go from Europe to America on foot, blindfolded, in one day. Tuning an application without being able to touch the application is an impossible job. I suggest that you start by rejecting that tuning proposition and realizing that you are almost certainly going to have to touch the application.
In looking at your Automatic Workload Repository report, I see that the longest-running events at the system level are latch-related: cache buffers chains and library cache. Additionally, your CPU time was way up there. Concurrency-based waits are caused by one thing and one thing only: having many concurrently active sessions. If you had fewer concurrently active sessions, you would by definition have fewer concurrency-based waits (fewer people contending with each other). I see that you had 134 sessions in the database running on a total of 4 CPU cores. Because it is not really possible for 4 CPU cores to allow 134 sessions to be concurrently active, I recommend that you decrease the number of concurrent sessions by reducing the size of your connection pool—radically. Cut your connection pool to something like 32. It is just not possible for 4 cores to run 134 things at the same time; 4 cores can run only 4 things at exactly the same time. If you reduce the concurrent user load, you’ll see those waits go down, response time go down, and your transaction rates improve. In short, you’ll find out that you can actually do more work with fewer sessions than you can with more.
I know that this fewer-does-more suggestion sounds counterintuitive, but I encourage you to watch this narrated Real World Performance video. In this video, you’ll see what happens in a test of a 12-core machine running transactions and decreasing the size of a connection pool from an unreasonable number (in the thousands) to a reasonable number: 96. At 96 connections on this particular machine, the machine was able to do 150 percent the number of transactions per second and took the response time for these transactions from ~100 milliseconds down to ~5 milliseconds.
Short of reducing your connection pool size (and therefore changing the way the application is using the database by queuing in the middle-tier connection pool instead of queuing hundreds of active sessions in the database), you would have to change your queries to make them request cache buffers chains latches less often. In short: tune the queries and the algorithms in the application. There is literally no magic here. Tweaking things at the system level might not be an option. Touching the application might have to be an option.
That was my original answer, and shortly after I posted it, the people who’d posed the question wrote back:
Thanks a lot for your response. . . . We reduced the connection pool, and we are already seeing better performance!
That drives home the fact that managing resources contributes hugely to your runtime performance. Far too often, I see databases with hundreds or thousands of concurrently connected sessions. Unless you have hundreds or thousands of CPUs, this just isn’t reasonable in a three-tier architecture. You have connection pools in the middle tier, and you should be using them to limit the amount of concurrent workload you ask the database to do. It comes down to simple math in the end: if you have N cores, you can truly have only N things concurrently “active.” If some of the things that are trying to be active are blocked, waiting for I/O, you might be able to have some number of active sessions larger than the number of cores on your machine, but the multiple would be in the single digits. For example, in the above-referenced video, the demo ran 8 connections × 12 cores = 96 total connections. In many cases, I see people running 20, 30, 50, or more (many more, in some cases) connections per core. The only thing that can happen if you do that: your machine will end up in a state requiring you to pull the plug to regain control of it.
Watch your connections! Keep them reasonable. It makes a huge difference.Cursor Versus Ref Cursor
In an interview for an Oracle PL/SQL developer job position, the interviewer asked what the difference between a cursor and a ref cursor was and when I would appropriately use each of these. Could you please tell me how I could have effectively answered this question?
Technically, under the covers, at the most basic level, they are the same. They are both cursors and can be processed in the same fashion. In the following answer, I’ll discuss only ref cursors and “regular” PL/SQL cursors (not DBMS_SQL cursors, for example). In short, the salient differences between regular cursors and ref cursors are as follows:
A ref cursor can be dynamically opened and is defined at runtime; a regular cursor is static and defined at compile time.
A ref cursor can be passed to another PL/SQL routine or returned to a client. A regular cursor must be directly addressed (not passed) and cannot be returned to a client application.
A regular cursor will be cached “open” by PL/SQL; a ref cursor cannot be. In other words, using a ref cursor incurs a parsing penalty.
A regular cursor can implicitly array-fetch 100 rows at a time. A ref cursor must use explicit array fetching. In other words, a regular cursor can more efficiently retrieve data by using significantly less code.
A regular cursor can be defined outside of a procedure or a function as a global package variable. A ref cursor cannot be; it must be local in scope to a block of PL/SQL code.
Let’s look at these points one by one. First, a ref cursor can be dynamically opened and defined at runtime, in contrast to a regular cursor, which is static and defined at compile time. Listing 1 creates a ref cursor and a regular cursor.
Code Listing 1: Creating a ref cursor and a regular cursor
declare type rc is ref cursor; cursor c is select * from dual; l_cursor rc; begin -- here we have a ref cursor whose definition is determined at runtime. if ( to_char(sysdate,'dd') = 30 ) then open l_cursor for 'select * from emp'; elsif ( to_char(sysdate,'dd') = 29 ) then open l_cursor for select * from dept; else open l_cursor for select * from dual; end if; -- Here we have a regular cursor. Its definition was decided -- at compile time and cannot change open c; end; /
Given the block of code in Listing 1, you see perhaps the most salient difference between cursors: no matter how many times you run that block, cursor C will always be SELECT * FROM DUAL. The ref cursor (L_CURSOR), however, can be anything. Another thing that block of code demonstrates is that a ref cursor can be opened with either a query you construct at runtime or a query that is predetermined at compile time. The statement
open l_cursor for 'select * from emp';
demonstrates that the query defining the ref cursor L_CURSOR can be truly dynamic—constructed entirely at runtime. You could replace 'select * from emp' with any string—any PL/SQL variable of a string type—and be able to open the ref cursor at runtime.
The other two open calls for L_CURSOR in Listing 1 demonstrate that the query associated with the ref cursor can be dynamic at runtime. Depending on what day of the month it is, the ref cursor opens the static query SELECT * FROM DEPT or SELECT * FROM DUAL. The ref cursor is not bound to a fixed definition at compile time as the regular cursor C is. No matter how many times you run the code, the query associated with cursor C will be static and constant.
Now, looking at the second point, a ref cursor can be passed to another PL/SQL routine or returned to a client, whereas a regular cursor must be directly addressed (not passed) and cannot be returned to a client application. You can easily see what this means for a ref cursor. The following is a procedure that accepts a ref cursor as input, fetches a row from it, and prints out the row.
create or replace procedure p( l_cursor in out sys_refcursor ) is l_rec all_users%rowtype; begin fetch l_cursor into l_rec; if (sql%notfound) then return; end if; dbms_output.put_line ( 'data in procedure = ' || l_rec.username ); end; /
Then, using SQL*Plus, I’ll define a ref cursor host variable in the “program”:
SQL> variable x refcursor
And last I’ll run the code in Listing 2, which will dynamically open a ref cursor, pass it to a PL/SQL subroutine, and then return it to the client program for further processing.
Code Listing 2: Opening, passing, and returning a ref cursor
SQL> declare 2 l_cursor sys_refcursor; 3 l_rec all_users%rowtype; 4 l_query varchar2(300) := 5 'select * 6 from all_users 7 where rownum <= 5'; 8 begin 9 open l_cursor for l_query; 10 p( l_cursor ); 11 :x := l_cursor; 12 end; 13 / data in procedure = SYS PL/SQL procedure successfully completed. SQL> print x USERNAME USER_ID CREATED ————————————— —————————— ————————————— SYSTEM 5 05-SEP-10 OUTLN 9 05-SEP-10 DIP 14 05-SEP-10 ORACLE_OCM 21 05-SEP-10
As you can see, the first row was fetched by the PL/SQL routine and printed out, and the remaining four rows were fetched and printed by the client program, not PL/SQL.
Those two differentiators show the flexibility of a ref cursor over a regular cursor, so that might beg the question “Why use regular cursors at all?” The answer is twofold: performance and ease of programming. Let’s look at performance first, starting with the facts that a regular cursor will be cached open by PL/SQL and that a ref cursor cannot and will not be. In other words, a parsing penalty is involved in using a ref cursor. Again, I can observe this easily by coding a simple routine that will open a regular cursor, fetch from it, close it, and then do the same to a ref cursor. Here is a routine to do that:
SQL> create or replace 2 procedure p 3 is 4 l_cursor sys_refcursor; 5 l_rec dual%rowtype; 6 cursor c is select * from dual d1; 7 begin 8 open c; 9 fetch c into l_rec; 10 close c; 11 12 open l_cursor for select * from dual d2; 13 fetch l_cursor into l_rec; 14 close l_cursor; 15 end; 16 / Procedure created.
Now, if I run this procedure in SQL*Plus 10 times, using “exec p;” with SQL tracing enabled, my TKPROF report will show the following:
SELECT * FROM DUAL D1 call count ————————— ———————— Parse 1 Execute 10 Fetch 10 ————————— ———————— total 21 SELECT * FROM DUAL D2 call count ————————— ———————— Parse 10 Execute 10 Fetch 10 ————————— ———————— total 30
Note how both SQL statements were executed 10 times apiece, which was expected, because I ran the code 10 times, but the query associated with the ref cursor was also parsed 10 times, as compared to the regular cursor’s query, which was parsed only once. In PL/SQL, regular cursors are cached in an open state in the hope that you’ll execute them again. PL/SQL can then skip the entire parse process, which can lead to a significant reduction in CPU utilization. (If you don’t want to take my word for it, watch this video and be convinced: bit.ly/zPMeVw.) This PL/SQL cursor cache is controlled by the SESSION_CACHED_CURSORS database initialization parameter and is 100 percent transparent to you and completely nonintrusive. By that, I mean that PL/SQL’s caching of cursors open will not affect your programming at all. If the number of open cursors (not PL/SQL cached cursors) needs to exceed the number of currently free open cursor slots, PL/SQL will silently start closing the cached open cursors. PL/SQL is not taking a chunk of open cursor slots away from you—it just transparently uses the ones you are not currently using.
The next performance-related advantage of a regular cursor is also an ease-of-use feature. A regular cursor can implicitly array-fetch 100 rows at a time, whereas a ref cursor must use explicit array fetching. In other words, a regular cursor can more efficiently retrieve data by using significantly less code. For example, if you were to copy the ALL_OBJECTS view into a table T and run the code in Listing 3 with tracing enabled, you would find something similar to that listing’s TKPROF report.
Code Listing 3: Report comparing performance of regular and ref cursors
SQL> declare 2 l_rec t%rowtype; 3 l_cursor sys_refcursor; 4 begin 5 for x in 6 ( select * from t regular ) 7 loop 8 null; 9 end loop; 10 11 open l_cursor 12 for 13 select * 14 from t refcur; 15 16 loop 17 fetch l_cursor 18 into l_rec; 19 exit when 20 l_cursor%notfound; 21 end loop; 22 close l_cursor; 23 end; 24 / PL/SQL procedure successfully completed. TKPROF report SELECT * FROM T REGULAR call count cpu elapsed disk query current rows ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 722 0.23 0.23 0 1748 0 72198 ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— total 724 0.23 0.23 0 1748 0 72198 SELECT * FROM T REFCUR call count cpu elapsed disk query current rows ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 72199 0.40 0.42 0 72203 0 72198 ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— total 72201 0.40 0.42 0 72203 0 72198
Note how many times the regular cursor was fetched from—722—about a 100th of the number of rows. PL/SQL uses an implicit array fetch of 100 rows for that implicit, regular cursor. For the ref cursor, you can see that the number of fetches is one more than the number of rows (the fetches must go one past the last row to verify that the last row has been fetched). That materially affects the CPU utilization of the query and may also affect the number of I/Os (the number in the query column in the TKPROF report) performed, as it did in this example. (If you are curious about why I/O was reduced, go to bit.ly/ww0Wkl, which shows why with examples.)
Last, consider that a regular cursor can be defined outside of a procedure or function as a global package variable. A ref cursor cannot be—it must be local in scope to a block of PL/SQL code. I consider this to be neither an advantage nor a disadvantage for either type of cursor. I am not a huge fan of global variables to begin with—I consider them to be a bad practice in general—so I personally never really take advantage of a regular cursor’s ability to be global. To show what this particular cursor difference means, I need just a small snippet of code:
SQL> create or replace package my_pkg 2 as 3 cursor global_cursor is select * from dual; 4 end; 5 / Package created. SQL> create or replace package my_pkg 2 as 3 global_cursor sys_refcursor; 4 end; 5 / Warning: Package created with compilation errors. SQL> show err Errors for PACKAGE MY_PKG: LINE/COL ERROR ———————— ———————————————————————————— 3/16 PL/SQL: Declaration ignored 3/16 PLS-00994: Cursor Variables cannot be declared as part of a package
See how a package with a regular cursor defined outside of a procedure or a function compiles successfully but a package with a ref cursor defined that way will not compile.Optimal Ordering
Is there an optimal order for creating columns in a table? The DBA at my shop enforces a standard of putting VARCHAR2 columns last, and the likelihood of nulls is not considered. Is there a real-world benefit to this order?
Here’s a little background first. A row is stored in a manner similar to this in the database block:
[null flag][length][data][null flag][length][data]. . . .
In order to get to the third column in a table, the database must parse through the first two columns (not pointers—the database must read the row byte by byte)— even if the columns are numbers, dates, whatever. The columns will be stored as a length field followed by the data, and the database must parse through them to get to the next column.So, in general you should put the most frequently accessed columns first for the best performance.
However, there is something to be said for putting the column(s) most likely to be null last, because they will consume zero bytes. If the database hits the end of a row before finding the Nth column, you know that that column is NULL or the default column value (for not-null columns added to the table with fast add in Oracle Database 11g).
However, there is a convention to put primary key columns first in all cases.
However . . . there are too many howevers! And in general, it really isn’t going to matter too much. The bit about putting the most frequently accessed columns first would have the most weight in a data warehouse in which you are scanning over many millions of rows to build the answer (the repeated overhead of parsing over a lot of columns that are not interesting would add up). In an online transaction processing (OLTP) system, where you are processing tens of rows, though, it wouldn’t be as big a deal.
In a strange case involving dozens of columns that are almost always null, putting them last in a table with lots and lots of rows could save you quite a few bytes, but the table would have to be large and the columns would have to be all null and frequently all null.
So, my recommendation is to put the columns in the order that makes the most sense from an aesthetic and design perspective.
As for the DBA’s rule above (VARCHAR2 columns last and the likelihood of nulls not considered), it would do nothing for performance or manageability. In many cases, it could negatively affect performance. Just remember that pretty much all database data is stored in a stringlike fashion:
A string will have a null flag and a length byte or length bytes (<=250 characters [1 byte], >250 characters [2 bytes]) followed by the data.
A number will have a null flag and a length byte or length bytes (numbers are 0–22 bytes in length) followed by a varying amount of data.
A binary_float will consume 5 bytes—a leading null flag and a length byte followed by 4 bytes of floating-point data.
A binary_double will consume 9 bytes—a leading null flag and a length byte followed by 8 bytes of floating-point data.
The database must read the length byte to figure out how many bytes of that row constitute that column, read over that column’s data to get to the next length byte, and so on. You might want columns you frequently access to be first in a table, because it will take less CPU to access them. But you’d need to be accessing a lot of rows for this to be true!
Suppose you have a table with 1,000 columns and then you query the first column and the last column and compare the TKPROF reports for each. Listing 4 creates the table and shows the reports for the queries.
Code Listing 4: 1,000-column table created, queried, and reported
declare l_create long := 'create table t ( c1 number'; begin for i in 2 .. 1000 loop l_create := l_create || ',c'||i||' number default ' || i; end loop; execute immediate l_create || ')'; end; / insert into t (c1, c1000 ) select rownum, rownum from all_objects; exec dbms_stats.gather_table_stats( user, 'T' ); SELECT C1 FROM T call count cpu elapsed disk query current rows ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— Parse 1 0.00 0.00 0 0 0 0 Execute 10 0.00 0.00 0 0 0 0 Fetch 7230 6.41 15.72 414610 420920 0 722790 ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— total 7241 6.41 15.72 414610 420920 0 722790 SELECT C1000 FROM T call count cpu elapsed disk query current rows ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— Parse 1 0.00 0.00 0 0 0 0 Execute 10 0.00 0.00 0 0 0 0 Fetch 7230 8.66 17.93 421260 3304860 0 722790 ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— total 7241 8.66 17.94 421260 3304860 0 722790
In this case, the CPU overhead was partially from the parsing of 1,000 columns and the chasing down of the chained-row piece (because any row with more than 254 columns will be stored in multiple pieces). You can see this row-chaining side effect in the report in the increased query column values, which resulted from the number of buffer gets processed.
If you change the 1,000 columns in Listing 4 to 250 columns to avoid having to chase down the chained rows, you might see the result in Listing 5.
Code Listing 5: Report for table with only 250 columns
SELECT C1 FROM T call count cpu elapsed disk query current rows ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— Parse 1 0.00 0.00 0 0 0 0 Execute 10 0.00 0.00 0 0 0 0 Fetch 7230 0.62 0.62 1117 94520 0 722790 ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— total 7241 0.62 0.62 1117 94520 0 722790 SELECT C250 FROM T call count cpu elapsed disk query current rows ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— Parse 1 0.00 0.00 0 0 0 0 Execute 10 0.00 0.00 0 0 0 0 Fetch 7230 0.96 0.97 7 94520 0 722790 ————————— ——————— ————— ———————— ———————— ———————— ———————— —————— total 7241 0.96 0.97 7 94520 0 722790
So, the farther down the CREATE list a column is, the longer it will take to retrieve that column. Having a rule that says, “The longer a column is, the farther down the create table statement it must be” doesn’t seem to make sense. There are arguments for putting nullable columns last (for space savings) and for putting infrequently accessed columns last (for performance) but none that I’m aware of for putting longer columns last.
DOWNLOAD Oracle Database 11g Release 2
Photography by Dmitri Popov, Unsplash