December 30, 2019
Almost every program Oracle Database developers write includes PL/SQL as well as SQL statements. PL/SQL statements are run by the PL/SQL statement executor, and SQL statements are run by the SQL statement executor. When the PL/SQL runtime engine encounters a SQL statement, it stops and passes the SQL statement over to the SQL engine. The SQL engine executes the SQL statement and returns information back to the PL/SQL engine. This transfer of control is called a context switch, and each one of these switches incurs overhead that slows down the overall performance of your programs.
This article kicks off a multipart series that drills deep into the bulk-processing features provided in PL/SQL to minimize the overhead of context switches when executing SQL statements from PL/SQL blocks. The material in this article is drawn from a tutorial I created on LiveSQL, and the article contains two types of exercises:
Fill in the blanks. I provide a lot of the code, and you fill in the missing part(s). This way, you can concentrate on the bulk-processing-related syntax.
Write all the code. This is an option for the most dedicated readers! You write the entire solution to the challenge. Run it in your own database or on LiveSQL.
All examples and exercises are based on the traditional Human Resources schema, containing the employees and departments tables. This schema is available on Github (link at the end of the article). I encourage you to read this article (and even do the exercises here) and also check out the tutorial and take full advantage of LiveSQL, where you can easily and instantly execute all code you write.
Now let’s look at a concrete example to explore context switches more thoroughly and identify the reasons why the bulk-processing features of PL/SQL—
BULK COLLECT and
FORALL—can have such a dramatic impact on performance. Suppose my manager asked me to write a PL/SQL procedure that accepts a department ID and a salary percentage increase and gives everyone in that department a specified-percentage raise. Taking advantage of PL/SQL’s elegant cursor
FOR loop and the ability to call SQL statements natively in PL/SQL, I can implement this requirement easily:
CREATE OR REPLACE PROCEDURE increase_salary ( department_id_in IN employees.department_id%TYPE, increase_pct_in IN NUMBER) IS BEGIN FOR employee_rec IN (SELECT employee_id FROM employees WHERE department_id = increase_salary.department_id_in) LOOP UPDATE employees emp SET emp.salary = emp.salary + emp.salary * increase_salary.increase_pct_in WHERE emp.employee_id = employee_rec.employee_id; DBMS_OUTPUT.PUT_LINE ('Updated ' || SQL%ROWCOUNT); END LOOP; END increase_salary;
Suppose there are 10,000 employees in department 15. When I execute this block....
BEGIN increase_salary (50, .10); ROLLBACK; -- to leave the table in its original state END;
....the PL/SQL engine will “switch” over to the SQL engine 10,000 times, once for each row being updated. Tom Kyte, of Ask TOM fame, refers to row-by-row switching like this as “slow-by-slow processing,” and it is definitely something to be avoided.
Generally, the way to improve performance over row-by-row context switching is to not perform row-by-row data manipulation language (DML) operations. This can be accomplished in one of two ways:
Implement the functionality in “pure” SQL—no PL/SQL loop
Use the bulk-processing features of PL/SQL
If you can change your implementation to avoid a loop and instead simply execute a single DML statement, you should do so. For example, I can do this with the
CREATE OR REPLACE PROCEDURE increase_salary ( department_id_in IN employees.department_id%TYPE, increase_pct_in IN NUMBER) IS BEGIN UPDATE employees emp SET emp.salary = emp.salary + emp.salary * increase_salary.increase_pct_in WHERE emp.department_id = increase_salary.department_id_in; END increase_salary;
Of course, it is not always this easy. You might be doing some very complex processing of each row before doing the
DELETE that would be hard to do in SQL. You might need to do error management that’s more nuanced than “all or nothing” SQL will allow. Or you might simply not have sufficient knowledge of SQL to do what’s needed.
LiveSQL offers 24/7 free access to the latest version of Oracle Database; a SQL “scratchpad” where you can write SQL and PL/SQL; and a library of hundreds of scripts and tutorials.
In an ideal world, you would stop programming and take an advanced SQL class (the Oracle Dev Gym offers a free one on SQL analytic functions). In the real world, however, you need to get the program up and running ASAP.
Whatever your situation, the bulk-processing features of PL/SQL offer a straightforward solution—although there will be a lot to consider as you implement your conversion to
BULK COLLECT and
Let’s first take a look at
BULK COLLECT, which improves the performance of multirow querying and is relatively simple. In my next article, I’ll move on to
FORALL, which is used to execute the same nonquery DML statement repeatedly, with different bind variables. That feature has a lot more moving parts and issues you need to take into account (which should come as no surprise, because you are changing data, not simply querying it).
To take advantage of bulk processing for queries, you simply put
BULK COLLECT before the
INTO keyword of your fetch operation and then provide one or more collections after the
Here are some things to know about how
BULK COLLECT works:
It can be used with all three types of collections: associative arrays, nested tables, and VARRAYs.
You can fetch into individual collections (one for each expression in the
SELECT list) or a single collection of records.
The collection is always populated densely, starting from index value 1.
If no rows are fetched, the collection will be emptied of all elements.
You can use
BULK COLLECT in all of these forms:
SELECT column(s) BULK COLLECT INTO collection(s)
FETCH cursor BULK COLLECT INTO collection(s)
EXECUTE IMMEDIATE query_string BULK COLLECT INTO collection(s)
Here’s a block of code that fetches all rows in the employees table with a single context switch and loads the data into a collection of records that are based on the table.
DECLARE TYPE employee_info_t IS TABLE OF employees%ROWTYPE; l_employees employee_info_t; BEGIN SELECT * BULK COLLECT INTO l_employees FROM employees WHERE department_id = 50; DBMS_OUTPUT.PUT_LINE (l_employees.COUNT); END;
If you do not want to retrieve all the columns in a table, create your own user-defined record type and use that to define your collection. All you have to do is make sure the list of expressions in the
SELECT statement matches the record type’s fields.
DECLARE TYPE two_cols_rt IS RECORD ( employee_id employees.employee_id%TYPE, salary employees.salary%TYPE ); TYPE employee_info_t IS TABLE OF two_cols_rt; l_employees employee_info_t; BEGIN SELECT employee_id, salary BULK COLLECT INTO l_employees FROM employees WHERE department_id = 50; DBMS_OUTPUT.PUT_LINE (l_employees.COUNT); END;
You can avoid the nuisance of declaring a record type to serve as the type for the collection by using a “template cursor.” This cursor should have the same
SELECT list as the
BULK COLLECT query. You can, however, leave off the
WHERE clause and anything else after the
FROM clause, because that code will never be used for anything but a
%ROWTYPE declaration. Here’s an example:
DECLARE CURSOR employee_info_c IS SELECT employee_id, salary FROM employees; TYPE employee_info_t IS TABLE OF employee_info_c%ROWTYPE; l_employees employee_info_t; BEGIN SELECT employee_id, salary BULK COLLECT INTO l_employees FROM employees WHERE department_id = 10; END;
In the block below, replace the
#FINISH# tags with code so that the last names of all employees in department 50 are displayed.
DECLARE #FINISH# l_names names_t; BEGIN SELECT last_name #FINISH# FROM employees WHERE department_id = 50 ORDER BY last_name; END;
Write a stored procedure that accepts a department ID, uses
BULK COLLECT to retrieve all employees in that department, and displays their first names and salaries. Then write an anonymous block to run that procedure for department 100.
As with almost all other types of variables and constants you use in your code, collections consume Program Global Area (PGA) memory. If your collection gets too large, your users might encounter an error. To see this happen, run the code below. (Note:
varchar2a is a collection type of strings defined in the
DECLARE l_strings DBMS_SQL.varchar2a; BEGIN FOR indx IN 1 .. 2 ** 31 – 1 LOOP l_strings (indx) := RPAD ('abc', 32767, 'def'); END LOOP; END;
BULK COLLECT, you could attempt to retrieve too many rows in one context switch and run out of PGA memory. To help you avoid such errors, Oracle Database offers a
LIMIT clause for
BULK COLLECT. Indeed, when using
BULK COLLECT, it is recommended that you never, or at least rarely, use an “unlimited”
BULK COLLECT, which is what you get with a
SELECT BULK COLLECT INTO (an implicit query)—and what you saw in the previous module.
Instead, declare a cursor (or a cursor variable); open that cursor; and then, in a loop, retrieve N number of rows with each fetch.
In the block below, I set my fetch limit to just 10 rows to demonstrate how this feature works. You will likely never want to set the limit to less than 100—this topic is explored further below.
DECLARE c_limit PLS_INTEGER := 10; CURSOR employees_cur IS SELECT employee_id FROM employees WHERE department_id = 50; TYPE employee_ids_t IS TABLE OF employees.employee_id%TYPE; l_employee_ids employee_ids_t; BEGIN OPEN employees_cur; LOOP FETCH employees_cur BULK COLLECT INTO l_employee_ids LIMIT c_limit; DBMS_OUTPUT.PUT_LINE (l_employee_ids.COUNT || ' fetched'); EXIT WHEN l_employee_ids.COUNT = 0; END LOOP; END;
One thing to watch out for when switching to
BULK COLLECT (in a loop) is following the same pattern for single-row fetching in a loop. I demonstrate this issue below, but first, a reminder: There are 107 rows in the employees table.
SELECT COUNT(*) FROM employees
Here’s the common way to terminate a loop in which you fetch row-by-row from an explicit cursor:
DECLARE CURSOR emps_c IS SELECT * FROM employees; l_emp emps_c%ROWTYPE; l_count INTEGER := 0; BEGIN OPEN emps_c; LOOP FETCH emps_c INTO l_emp; EXIT WHEN emps_c%NOTFOUND; DBMS_OUTPUT.put_line (l_emp.employee_id); l_count := l_count + 1; END LOOP; DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count); END;
In other words: Fetch a row, and stop if the cursor has retrieved all rows. Now let’s switch to using
BULK COLLECT and
LIMIT, fetching 10 rows at a time, using the same approach to exiting the loop.
DECLARE CURSOR emps_c IS SELECT * FROM employees; TYPE emps_t IS TABLE OF emps_c%ROwTYPE; l_emps emps_t; l_count INTEGER := 0; BEGIN OPEN emps_c; LOOP FETCH emps_c BULK COLLECT INTO l_emps LIMIT 10; EXIT WHEN emps_c%NOTFOUND; DBMS_OUTPUT.put_line (l_emps.COUNT); l_count := l_count + l_emps.COUNT; END LOOP; DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count); END;
Wait, what? Is that right? Do I see
Total rows fetched: 100? Yes, I do. And therein lies the trap. You cannot continue to use the same
EXIT WHEN statement in the same place in your loop when you switch to
BULK COLLECT with
The very last fetch performed retrieved the last seven rows but also exhausted the cursor. So the
TRUE, but the collection has those seven elements in it.
To terminate a loop by using
BULK COLLECT with
LIMIT, you should either
EXIT WHENto the bottom of the loop body.
These two approaches are shown below.
DECLARE CURSOR emps_c IS SELECT * FROM employees; TYPE emps_t IS TABLE OF emps_c%ROwTYPE; l_emps emps_t; l_count INTEGER := 0; BEGIN OPEN emps_c; LOOP FETCH emps_c BULK COLLECT INTO l_emps LIMIT 10; l_count := l_count + l_emps.COUNT; EXIT WHEN emps_c%NOTFOUND; END LOOP; DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count); END; DECLARE CURSOR emps_c IS SELECT * FROM employees; TYPE emps_t IS TABLE OF emps_c%ROwTYPE; l_emps emps_t; l_count INTEGER := 0; BEGIN OPEN emps_c; LOOP FETCH emps_c BULK COLLECT INTO l_emps LIMIT 10; EXIT WHEN l_emps.COUNT = 0; l_count := l_count + l_emps.COUNT; END LOOP; DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count); CLOSE emps_c; END;
The employees table has 107 rows. In the block below, replace the
#FINISH# tags with code so that when the block is executed, the following text is displayed:
Rows fetched 25 Rows fetched 25 Rows fetched 25 Rows fetched 25 Rows fetched 7 Rows fetched 0 DECLARE CURSOR ids_c IS SELECT employee_id FROM employees; #FINISH# l_count INTEGER; BEGIN OPEN ids_c; LOOP FETCH ids_c #FINISH#; DBMS_OUTPUT.put_line ('Rows fetched: ' || l_count); EXIT WHEN #FINISH#; END LOOP; CLOSE ids_c; END;
Write an anonymous block that (using
BULK COLLECT) fetches only the last name and the salary from the employees table 5 rows at a time and then displays that information. Make sure 107 names and salaries are shown!
When should you convert a nonbulk query to one using
BULK COLLECT? More specifically, should you convert a cursor
FOR loop to an explicit cursor and
FETCH BULK COLLECT with
LIMIT? Here are some things to keep in mind:
As long as your PL/SQL optimization level is set to 2 (the default) or higher, the compiler will automatically optimize cursor FOR loops to retrieve 100 rows with each fetch. You cannot modify this number.
If your cursor
FOR loop is “read only” (it does not execute nonquery DML), you can probably leave it as is. That is, fetching 100 rows with each fetch will usually give you sufficient improvements in performance over row-by-row fetching.
FOR loops are optimized this way, so if you have a simple or
WHILE loop that fetches individual rows, you should convert to
If you are fetching a very large number of rows, such as might happen with data warehouse processing or a nightly batch process, you should experiment with larger
LIMIT values to see what kind of bang for the buck you will get.
If your cursor
FOR loop (or any other kind of loop, for that matter) contains one or more nonquery DML statements (
MERGE), you should convert to
BULK COLLECT and
Run the following code to see how optimization affects cursor
FOR loop performance.
CREATE OR REPLACE PROCEDURE test_cursor_performance (approach IN VARCHAR2) IS CURSOR cur IS SELECT * FROM all_source WHERE ROWNUM < 100001; one_row cur%ROWTYPE; TYPE t IS TABLE OF cur%ROWTYPE INDEX BY PLS_INTEGER; many_rows t; last_timing NUMBER; cntr number := 0; PROCEDURE start_timer IS BEGIN last_timing := DBMS_UTILITY.get_cpu_time; END; PROCEDURE show_elapsed_time (message_in IN VARCHAR2 := NULL) IS BEGIN DBMS_OUTPUT.put_line ( '"' || message_in || '" completed in: ' || TO_CHAR ( ROUND ((DBMS_UTILITY.get_cpu_time - last_timing)/100, 2))); END; BEGIN start_timer; CASE approach WHEN 'implicit cursor for loop' THEN FOR j IN cur LOOP cntr := cntr + 1; END LOOP; DBMS_OUTPUT.put_line (cntr); WHEN 'explicit open, fetch, close' THEN OPEN cur; LOOP FETCH cur INTO one_row; EXIT WHEN cur%NOTFOUND; cntr := cntr + 1; END LOOP; DBMS_OUTPUT.put_line (cntr); CLOSE cur; WHEN 'bulk fetch' THEN OPEN cur; LOOP FETCH cur BULK COLLECT INTO many_rows LIMIT 100; EXIT WHEN many_rows.COUNT () = 0; FOR indx IN 1 .. many_rows.COUNT Loop cntr := cntr + 1; end loop; END LOOP; DBMS_OUTPUT.put_line (cntr); CLOSE cur; END CASE; show_elapsed_time (approach); END test_cursor_performance; / /* Try different approaches with optimization disabled. */ ALTER PROCEDURE test_cursor_performance COMPILE plsql_optimize_level=0 / BEGIN dbms_output.put_line ('No optimization...'); test_cursor_performance ('implicit cursor for loop'); test_cursor_performance ('explicit open, fetch, close'); test_cursor_performance ('bulk fetch'); END; / /* Try different approaches with default optimization. */ ALTER PROCEDURE test_cursor_performance COMPILE plsql_optimize_level=2 / BEGIN DBMS_OUTPUT.put_line ('Default optimization...'); test_cursor_performance ('implicit cursor for loop'); test_cursor_performance ('explicit open, fetch, close'); test_cursor_performance ('bulk fetch'); END; /
This exercise has two parts (and for it, assume that the employees table has one million rows, with data distributed equally among departments): (1) Write an anonymous block that contains a cursor
FOR loop that does not need to be converted to using
BULK COLLECT, and (2) write an anonymous block that contains a cursor
FOR loop that does need to use
BULK COLLECT (assume that it cannot be rewritten in “pure” SQL).
RETURNING clause is a wonderful thing. If you are inserting, updating, or deleting data and you need to get some information (such as the primary key of the newly inserted row) back after the statement completes,
RETURNING is the thing for you! Here’s an example:
CREATE TABLE t ( id NUMBER GENERATED ALWAYS AS IDENTITY, n NUMBER) / DECLARE l_id t.id%TYPE; BEGIN INSERT INTO t (n) VALUES (100) RETURNING id INTO l_id; DBMS_OUTPUT.PUT_LINE (l_id); END; /
Suppose, however, that I am changing more than one row. Can I use
RETURNING then? Let’s see....
DECLARE l_id employees.employee_id%TYPE; BEGIN UPDATE employees SET last_name = UPPER (last_name) RETURNING employee_id INTO l_id; ROLLBACK; END; /ORA-01422: exact fetch returns more than requested number of rows
But wait, that’s the sort of error you can get with a
SELECT-INTO that returns more than one row. Why is it showing up here?
RETURNING clause is essentially translated into a
SELECT-INTO, it gets one value and stuffs it into
l_id. But in this case, the
UPDATE statement is returning many IDs. How do you get this to work?
BULK COLLECT to the rescue! I need to take multiple values and put them into something. What could that be? How about a collection? So, yes, if you are changing one or more rows, change
BULK COLLECT INTO and provide a collection to hold the values.
DECLARE TYPE ids_t IS TABLE OF employees.employee_id%TYPE; l_ids ids_t; BEGIN UPDATE employees SET last_name = UPPER (last_name) WHERE department_id = 50 RETURNING employee_id BULK COLLECT INTO l_ids; FOR indx IN 1 .. l_ids.COUNT LOOP DBMS_OUTPUT.PUT_LINE (l_ids (indx)); END LOOP; ROLLBACK; END; /
RETURNING works like a charm. Thanks,
In the block below, replace the
#FINISH# tag with code so that
Deleted = 3 is displayed after execution.
DECLARE TYPE ids_t IS TABLE OF employees.employee_id%TYPE; l_ids ids_t; BEGIN DELETE FROM employees WHERE salary > 15000 #FINISH# DBMS_OUTPUT.PUT_LINE ('Deleted = ' || l_ids.COUNT); ROLLBACK; END; /
Write an anonymous block that deletes all the rows in the employees table for department 50 and returns all the employee IDs and the last names in deleted rows. Then display those values, using
DBMS_OUTPUT.PUT_LINE. Finally, you might want to roll back. That will make it easier to test your code—and continue with the tutorial.
You can also use
BULK COLLECT with native dynamic SQL queries that might return more than one row. As with
FETCH-INTO, just stick
BULK COLLECT before the
INTO and provide a collection (or multiple collections) to hold the multiple values returned.
DECLARE TYPE ids_t IS TABLE OF employees.employee_id%TYPE; l_ids ids_t; BEGIN EXECUTE IMMEDIATE 'SELECT employee_id FROM employees WHERE department_id = :dept_id' BULK COLLECT INTO l_ids USING 50; FOR indx IN 1 .. l_ids.COUNT LOOP DBMS_OUTPUT.put_line (l_ids (indx)); END LOOP; END;
You can even get fancy and use
BULK COLLECT in the
RETURNING clause of a dynamic DML statement:
DECLARE TYPE ids_t IS TABLE OF employees.employee_id%TYPE; l_ids ids_t; BEGIN EXECUTE IMMEDIATE 'UPDATE employees SET last_name = UPPER (last_name) WHERE department_id = 100 RETURNING employee_id INTO :ids' RETURNING BULK COLLECT INTO l_ids; FOR indx IN 1 .. l_ids.COUNT LOOP DBMS_OUTPUT.PUT_LINE (l_ids (indx)); END LOOP; END;
Write the rest of the procedure whose signature is shown below. Use
BULK COLLECT to fetch all the last names from employees identified by that
WHERE clause and return the collection. Then write an anonymous block to test your procedure: pass different
WHERE clauses and display the names retrieved.
PROCEDURE get_names ( where_in IN VARCHAR2, names_out OUT DBMS_SQL.VARCHAR2_TABLE)
So now you know: You can use
BULK COLLECT to improve the performance of multirow queries, returning multiple rows with each fetch. Unfortunately, this feature is probably not enough to turn you into a hero. That’s because when it comes to row-by-row processing, the real performance pain comes not from fetching data but, rather, from changing data through inserts, updates, and deletes.
Consequently, the next article moves on to cover the
FORALL statement, which dramatically reduces the number of context switches needed to execute the same nonquery DML statement multiple times (within a loop of some sort).
Changing data is more complicated in several ways than querying data, so I’ll cover
FORALL in multiple articles.
In the meantime, check out the tutorial.
TAKE the “Bulk Processing with PL/SQL” tutorial.
READ more about BULK COLLECT.
WORK OUT at the Oracle Dev Gym.
INSTALL the Human Resources schema.
Illustration by Wes Rowell