Subscribe

Share

Application Development

Solving the Row-by-Row Problem

Use bulk-processing features in PL/SQL to minimize context switching and speed program performance.

By Steven Feuerstein

December 30, 2019

Almost every program Oracle Database developers write includes PL/SQL as well as SQL statements. PL/SQL statements are run by the PL/SQL statement executor, and SQL statements are run by the SQL statement executor. When the PL/SQL runtime engine encounters a SQL statement, it stops and passes the SQL statement over to the SQL engine. The SQL engine executes the SQL statement and returns information back to the PL/SQL engine. This transfer of control is called a context switch, and each one of these switches incurs overhead that slows down the overall performance of your programs.

This article kicks off a multipart series that drills deep into the bulk-processing features provided in PL/SQL to minimize the overhead of context switches when executing SQL statements from PL/SQL blocks. The material in this article is drawn from a tutorial I created on LiveSQL, and the article contains two types of exercises:

  • Fill in the blanks. I provide a lot of the code, and you fill in the missing part(s). This way, you can concentrate on the bulk-processing-related syntax.

  • Write all the code. This is an option for the most dedicated readers! You write the entire solution to the challenge. Run it in your own database or on LiveSQL.

All examples and exercises are based on the traditional Human Resources schema, containing the employees and departments tables. This schema is available on Github (link at the end of the article). I encourage you to read this article (and even do the exercises here) and also check out the tutorial and take full advantage of LiveSQL, where you can easily and instantly execute all code you write.

The Cost of Context-Switching

Now let’s look at a concrete example to explore context switches more thoroughly and identify the reasons why the bulk-processing features of PL/SQL—BULK COLLECT and FORALL—can have such a dramatic impact on performance. Suppose my manager asked me to write a PL/SQL procedure that accepts a department ID and a salary percentage increase and gives everyone in that department a specified-percentage raise. Taking advantage of PL/SQL’s elegant cursor FOR loop and the ability to call SQL statements natively in PL/SQL, I can implement this requirement easily:

CREATE OR REPLACE PROCEDURE increase_salary (
   department_id_in   IN employees.department_id%TYPE,
   increase_pct_in    IN NUMBER)
IS
BEGIN
   FOR employee_rec
      IN (SELECT employee_id
            FROM employees
           WHERE department_id =
                    increase_salary.department_id_in)
   LOOP
      UPDATE employees emp
         SET emp.salary = emp.salary + 
             emp.salary * increase_salary.increase_pct_in
       WHERE emp.employee_id = employee_rec.employee_id;
      DBMS_OUTPUT.PUT_LINE ('Updated ' || SQL%ROWCOUNT);
   END LOOP;
END increase_salary;

Suppose there are 10,000 employees in department 15. When I execute this block....

BEGIN
   increase_salary (50, .10);
   ROLLBACK; -- to leave the table in its original state
END;

....the PL/SQL engine will “switch” over to the SQL engine 10,000 times, once for each row being updated. Tom Kyte, of Ask TOM fame, refers to row-by-row switching like this as “slow-by-slow processing,” and it is definitely something to be avoided.

Possible Solutions

Generally, the way to improve performance over row-by-row context switching is to not perform row-by-row data manipulation language (DML) operations. This can be accomplished in one of two ways:

  • Implement the functionality in “pure” SQL—no PL/SQL loop

  • Use the bulk-processing features of PL/SQL

If you can change your implementation to avoid a loop and instead simply execute a single DML statement, you should do so. For example, I can do this with the increase_salary procedure:

CREATE OR REPLACE PROCEDURE increase_salary (
   department_id_in   IN employees.department_id%TYPE,
   increase_pct_in    IN NUMBER)
IS
BEGIN
   UPDATE employees emp
      SET emp.salary =
               emp.salary
             + emp.salary * increase_salary.increase_pct_in
    WHERE emp.department_id = 
             increase_salary.department_id_in;
END increase_salary;

Of course, it is not always this easy. You might be doing some very complex processing of each row before doing the INSERT, UPDATE, or DELETE that would be hard to do in SQL. You might need to do error management that’s more nuanced than “all or nothing” SQL will allow. Or you might simply not have sufficient knowledge of SQL to do what’s needed.

LiveSQL

LiveSQL offers 24/7 free access to the latest version of Oracle Database; a SQL “scratchpad” where you can write SQL and PL/SQL; and a library of hundreds of scripts and tutorials.

In an ideal world, you would stop programming and take an advanced SQL class (the Oracle Dev Gym offers a free one on SQL analytic functions). In the real world, however, you need to get the program up and running ASAP.

Whatever your situation, the bulk-processing features of PL/SQL offer a straightforward solution—although there will be a lot to consider as you implement your conversion to BULK COLLECT and FORALL.

Let’s first take a look at BULK COLLECT, which improves the performance of multirow querying and is relatively simple. In my next article, I’ll move on to FORALL, which is used to execute the same nonquery DML statement repeatedly, with different bind variables. That feature has a lot more moving parts and issues you need to take into account (which should come as no surprise, because you are changing data, not simply querying it).

Introduction to BULK COLLECT

To take advantage of bulk processing for queries, you simply put BULK COLLECT before the INTO keyword of your fetch operation and then provide one or more collections after the INTO keyword.

Here are some things to know about how BULK COLLECT works:

  • It can be used with all three types of collections: associative arrays, nested tables, and VARRAYs.

  • You can fetch into individual collections (one for each expression in the SELECT list) or a single collection of records.

  • The collection is always populated densely, starting from index value 1.

  • If no rows are fetched, the collection will be emptied of all elements.

You can use BULK COLLECT in all of these forms:

  • SELECT column(s) BULK COLLECT INTO collection(s)

  • FETCH cursor BULK COLLECT INTO collection(s)

  • EXECUTE IMMEDIATE query_string BULK COLLECT INTO collection(s)

Here’s a block of code that fetches all rows in the employees table with a single context switch and loads the data into a collection of records that are based on the table.

DECLARE
   TYPE employee_info_t IS TABLE OF employees%ROWTYPE;
   l_employees   employee_info_t;
BEGIN
   SELECT *
     BULK COLLECT INTO l_employees
     FROM employees
    WHERE department_id = 50;
   DBMS_OUTPUT.PUT_LINE (l_employees.COUNT);
END;

If you do not want to retrieve all the columns in a table, create your own user-defined record type and use that to define your collection. All you have to do is make sure the list of expressions in the SELECT statement matches the record type’s fields.

DECLARE
   TYPE two_cols_rt IS RECORD (
      employee_id   employees.employee_id%TYPE,
      salary        employees.salary%TYPE
   );
   TYPE employee_info_t IS TABLE OF two_cols_rt;
   l_employees   employee_info_t;
BEGIN
   SELECT employee_id, salary
     BULK COLLECT INTO l_employees
     FROM employees
    WHERE department_id = 50;
   DBMS_OUTPUT.PUT_LINE (l_employees.COUNT);
END;

Quick Tip

You can avoid the nuisance of declaring a record type to serve as the type for the collection by using a “template cursor.” This cursor should have the same SELECT list as the BULK COLLECT query. You can, however, leave off the WHERE clause and anything else after the FROM clause, because that code will never be used for anything but a %ROWTYPE declaration. Here’s an example:

DECLARE
   CURSOR employee_info_c IS
      SELECT employee_id, salary 
        FROM employees;
   TYPE employee_info_t IS TABLE OF employee_info_c%ROWTYPE;
   l_employees   employee_info_t;
BEGIN
   SELECT employee_id, salary
     BULK COLLECT INTO l_employees
     FROM employees
    WHERE department_id = 10;
END;

Fill in the Blanks

In the block below, replace the #FINISH# tags with code so that the last names of all employees in department 50 are displayed.

DECLARE
   #FINISH#
   l_names   names_t;
BEGIN
   SELECT last_name
     #FINISH#
     FROM employees
    WHERE department_id = 50
    ORDER BY last_name;
END;

Exercise

Write a stored procedure that accepts a department ID, uses BULK COLLECT to retrieve all employees in that department, and displays their first names and salaries. Then write an anonymous block to run that procedure for department 100.

Managing PGA Memory with the LIMIT Clause

As with almost all other types of variables and constants you use in your code, collections consume Program Global Area (PGA) memory. If your collection gets too large, your users might encounter an error. To see this happen, run the code below. (Note: varchar2a is a collection type of strings defined in the DBMS_SQL package).

DECLARE
   l_strings   DBMS_SQL.varchar2a;
BEGIN
   FOR indx IN 1 .. 2 ** 31 – 1
   LOOP
      l_strings (indx) := RPAD ('abc', 32767, 'def');
   END LOOP;
END;

When using BULK COLLECT, you could attempt to retrieve too many rows in one context switch and run out of PGA memory. To help you avoid such errors, Oracle Database offers a LIMIT clause for BULK COLLECT. Indeed, when using BULK COLLECT, it is recommended that you never, or at least rarely, use an “unlimited” BULK COLLECT, which is what you get with a SELECT BULK COLLECT INTO (an implicit query)—and what you saw in the previous module.

Instead, declare a cursor (or a cursor variable); open that cursor; and then, in a loop, retrieve N number of rows with each fetch.

In the block below, I set my fetch limit to just 10 rows to demonstrate how this feature works. You will likely never want to set the limit to less than 100—this topic is explored further below.

DECLARE
   c_limit PLS_INTEGER := 10;
   CURSOR employees_cur
   IS
      SELECT employee_id
        FROM employees
       WHERE department_id = 50;
   TYPE employee_ids_t IS TABLE OF 
      employees.employee_id%TYPE;
   l_employee_ids   employee_ids_t;
BEGIN
   OPEN employees_cur;
   LOOP
      FETCH employees_cur
      BULK COLLECT INTO l_employee_ids
      LIMIT c_limit;
      DBMS_OUTPUT.PUT_LINE (l_employee_ids.COUNT || ' fetched');
      EXIT WHEN l_employee_ids.COUNT = 0;
   END LOOP;
END;

One thing to watch out for when switching to LIMIT with BULK COLLECT (in a loop) is following the same pattern for single-row fetching in a loop. I demonstrate this issue below, but first, a reminder: There are 107 rows in the employees table.

SELECT COUNT(*) FROM employees

Here’s the common way to terminate a loop in which you fetch row-by-row from an explicit cursor:

DECLARE
   CURSOR emps_c IS SELECT * FROM employees;
   l_emp   emps_c%ROWTYPE;
   l_count INTEGER := 0;
BEGIN
   OPEN emps_c;
   LOOP
      FETCH emps_c INTO l_emp;
      EXIT WHEN emps_c%NOTFOUND;
      DBMS_OUTPUT.put_line (l_emp.employee_id);
      l_count := l_count + 1;
   END LOOP;
   DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count);
END;

In other words: Fetch a row, and stop if the cursor has retrieved all rows. Now let’s switch to using BULK COLLECT and LIMIT, fetching 10 rows at a time, using the same approach to exiting the loop.

DECLARE
   CURSOR emps_c IS SELECT * FROM employees;
   TYPE emps_t IS TABLE OF emps_c%ROwTYPE;
   l_emps   emps_t;
   l_count INTEGER := 0;
BEGIN
   OPEN emps_c;
   LOOP
      FETCH emps_c BULK COLLECT INTO l_emps LIMIT 10;
      EXIT WHEN emps_c%NOTFOUND;
      DBMS_OUTPUT.put_line (l_emps.COUNT);
      l_count := l_count + l_emps.COUNT;
   END LOOP;
   DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count);
END;

Wait, what? Is that right? Do I see Total rows fetched: 100? Yes, I do. And therein lies the trap. You cannot continue to use the same EXIT WHEN statement in the same place in your loop when you switch to BULK COLLECT with LIMIT.

The very last fetch performed retrieved the last seven rows but also exhausted the cursor. So the %NOTFOUND returns TRUE, but the collection has those seven elements in it.

To terminate a loop by using BULK COLLECT with LIMIT, you should either

  • Move EXIT WHEN to the bottom of the loop body.

or

  • Ignore the cursor and check the collection. When the collection is empty, terminate the loop.

These two approaches are shown below.

DECLARE
   CURSOR emps_c IS SELECT * FROM employees;
   TYPE emps_t IS TABLE OF emps_c%ROwTYPE;
   l_emps   emps_t;
   l_count INTEGER := 0;
BEGIN
   OPEN emps_c;
   LOOP
      FETCH emps_c BULK COLLECT INTO l_emps LIMIT 10;
      l_count := l_count + l_emps.COUNT;
      EXIT WHEN emps_c%NOTFOUND;
   END LOOP;
   DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count);
END;
DECLARE
   CURSOR emps_c IS SELECT * FROM employees;
   TYPE emps_t IS TABLE OF emps_c%ROwTYPE;
   l_emps   emps_t;
   l_count INTEGER := 0;
BEGIN
   OPEN emps_c;
   LOOP
      FETCH emps_c BULK COLLECT INTO l_emps LIMIT 10;
      EXIT WHEN l_emps.COUNT = 0;
      l_count := l_count + l_emps.COUNT;
   END LOOP;
   DBMS_OUTPUT.put_line ('Total rows fetched: ' || l_count);
   CLOSE emps_c;
END;

Fill in the Blanks

The employees table has 107 rows. In the block below, replace the #FINISH# tags with code so that when the block is executed, the following text is displayed:

Rows fetched 25
Rows fetched 25
Rows fetched 25
Rows fetched 25
Rows fetched 7
Rows fetched 0
DECLARE
   CURSOR ids_c IS SELECT employee_id FROM employees;
   #FINISH#
   l_count INTEGER;
BEGIN
   OPEN ids_c;
   LOOP
      FETCH ids_c #FINISH#;
      DBMS_OUTPUT.put_line ('Rows fetched: ' || l_count);
      EXIT WHEN #FINISH#;
   END LOOP;
   CLOSE ids_c;
END;

Exercise

Write an anonymous block that (using BULK COLLECT) fetches only the last name and the salary from the employees table 5 rows at a time and then displays that information. Make sure 107 names and salaries are shown!

Cursor FOR Loops and BULK COLLECT

When should you convert a nonbulk query to one using BULK COLLECT? More specifically, should you convert a cursor FOR loop to an explicit cursor and FETCH BULK COLLECT with LIMIT? Here are some things to keep in mind:

As long as your PL/SQL optimization level is set to 2 (the default) or higher, the compiler will automatically optimize cursor FOR loops to retrieve 100 rows with each fetch. You cannot modify this number.

If your cursor FOR loop is “read only” (it does not execute nonquery DML), you can probably leave it as is. That is, fetching 100 rows with each fetch will usually give you sufficient improvements in performance over row-by-row fetching.

Only cursor FOR loops are optimized this way, so if you have a simple or WHILE loop that fetches individual rows, you should convert to BULK COLLECT—with LIMIT!

If you are fetching a very large number of rows, such as might happen with data warehouse processing or a nightly batch process, you should experiment with larger LIMIT values to see what kind of bang for the buck you will get.

If your cursor FOR loop (or any other kind of loop, for that matter) contains one or more nonquery DML statements (INSERT, UPDATE, DELETE, MERGE), you should convert to BULK COLLECT and FORALL.

Run the following code to see how optimization affects cursor FOR loop performance.

CREATE OR REPLACE PROCEDURE 
test_cursor_performance (approach IN VARCHAR2)
IS
   CURSOR cur IS
      SELECT * FROM all_source WHERE ROWNUM < 100001;
   one_row cur%ROWTYPE;

   TYPE t IS TABLE OF cur%ROWTYPE INDEX BY PLS_INTEGER;

   many_rows     t;
   last_timing   NUMBER;
   cntr number := 0;
   PROCEDURE start_timer
   IS
   BEGIN
      last_timing := DBMS_UTILITY.get_cpu_time;
   END;
   PROCEDURE show_elapsed_time (message_in IN VARCHAR2 := NULL)
   IS
   BEGIN
     DBMS_OUTPUT.put_line (
          '"'
       || message_in
       || '" completed in: '
       || TO_CHAR (
           ROUND ((DBMS_UTILITY.get_cpu_time - last_timing)/100, 2)));
   END;
BEGIN
   start_timer;
   CASE approach
      WHEN 'implicit cursor for loop'
      THEN
         FOR j IN cur
         LOOP
            cntr := cntr + 1;
         END LOOP;
         DBMS_OUTPUT.put_line (cntr);
      WHEN 'explicit open, fetch, close'
      THEN
         OPEN cur;
         LOOP
            FETCH cur INTO one_row;
            EXIT WHEN cur%NOTFOUND;
            cntr := cntr + 1;
         END LOOP;
         DBMS_OUTPUT.put_line (cntr);
         CLOSE cur;
      WHEN 'bulk fetch'
      THEN
         OPEN cur;
         LOOP
            FETCH cur BULK COLLECT INTO many_rows LIMIT 100;
            EXIT WHEN many_rows.COUNT () = 0;
            FOR indx IN 1 .. many_rows.COUNT
            Loop
               cntr := cntr + 1;
            end loop;
         END LOOP;
         DBMS_OUTPUT.put_line (cntr);
         CLOSE cur;
   END CASE;
   show_elapsed_time (approach);
END test_cursor_performance;
/
/* Try different approaches with optimization disabled. */
ALTER PROCEDURE test_cursor_performance
COMPILE plsql_optimize_level=0
/
BEGIN
   dbms_output.put_line ('No optimization...');   
   test_cursor_performance ('implicit cursor for loop');
   test_cursor_performance ('explicit open, fetch, close');
   test_cursor_performance ('bulk fetch');
END;
/
/* Try different approaches with default optimization. */
ALTER PROCEDURE test_cursor_performance
COMPILE plsql_optimize_level=2
/
BEGIN
   DBMS_OUTPUT.put_line ('Default optimization...');
   test_cursor_performance ('implicit cursor for loop');
   test_cursor_performance ('explicit open, fetch, close');
   test_cursor_performance ('bulk fetch');
END;
/

Exercise

This exercise has two parts (and for it, assume that the employees table has one million rows, with data distributed equally among departments): (1) Write an anonymous block that contains a cursor FOR loop that does not need to be converted to using BULK COLLECT, and (2) write an anonymous block that contains a cursor FOR loop that does need to use BULK COLLECT (assume that it cannot be rewritten in “pure” SQL).

RETURNING and BULK COLLECT

The RETURNING clause is a wonderful thing. If you are inserting, updating, or deleting data and you need to get some information (such as the primary key of the newly inserted row) back after the statement completes, RETURNING is the thing for you! Here’s an example:

CREATE TABLE t (
   id NUMBER GENERATED ALWAYS AS IDENTITY,
   n NUMBER)
/
DECLARE
   l_id t.id%TYPE;
BEGIN
   INSERT INTO t (n) VALUES (100)
      RETURNING id INTO l_id;
   DBMS_OUTPUT.PUT_LINE (l_id);
END;
/

Suppose, however, that I am changing more than one row. Can I use RETURNING then? Let’s see....

DECLARE
   l_id employees.employee_id%TYPE;
BEGIN
   UPDATE employees
      SET last_name = UPPER (last_name)
      RETURNING employee_id INTO l_id;
   ROLLBACK;
END;
/ORA-01422: exact fetch returns more than requested number of rows

Oh no!

But wait, that’s the sort of error you can get with a SELECT-INTO that returns more than one row. Why is it showing up here?

Because the RETURNING clause is essentially translated into a SELECT-INTO, it gets one value and stuffs it into l_id. But in this case, the UPDATE statement is returning many IDs. How do you get this to work?

BULK COLLECT to the rescue! I need to take multiple values and put them into something. What could that be? How about a collection? So, yes, if you are changing one or more rows, change INTO to BULK COLLECT INTO and provide a collection to hold the values.

DECLARE
   TYPE ids_t IS TABLE OF employees.employee_id%TYPE;
   l_ids ids_t;
BEGIN
   UPDATE employees
      SET last_name = UPPER (last_name)
    WHERE department_id = 50
      RETURNING employee_id BULK COLLECT INTO l_ids;
   FOR indx IN 1 .. l_ids.COUNT 
   LOOP
      DBMS_OUTPUT.PUT_LINE (l_ids (indx));
   END LOOP;
   ROLLBACK;
END;
/

Now RETURNING works like a charm. Thanks, BULK COLLECT!

Fill in the Blanks

In the block below, replace the #FINISH# tag with code so that Deleted = 3 is displayed after execution.

DECLARE
   TYPE ids_t IS TABLE OF employees.employee_id%TYPE;
   l_ids ids_t;
BEGIN
   DELETE FROM employees
    WHERE salary > 15000
    #FINISH#
   DBMS_OUTPUT.PUT_LINE ('Deleted = ' || l_ids.COUNT);
   ROLLBACK;
END;
/

Exercise

Write an anonymous block that deletes all the rows in the employees table for department 50 and returns all the employee IDs and the last names in deleted rows. Then display those values, using DBMS_OUTPUT.PUT_LINE. Finally, you might want to roll back. That will make it easier to test your code—and continue with the tutorial.

Dynamic SQL and BULK COLLECT

You can also use BULK COLLECT with native dynamic SQL queries that might return more than one row. As with SELECT-INTO and FETCH-INTO, just stick BULK COLLECT before the INTO and provide a collection (or multiple collections) to hold the multiple values returned.

DECLARE
  TYPE ids_t IS TABLE OF employees.employee_id%TYPE;
  l_ids   ids_t;
BEGIN
  EXECUTE IMMEDIATE
    'SELECT employee_id FROM employees WHERE department_id = :dept_id'
    BULK COLLECT INTO l_ids
    USING 50;
  FOR indx IN 1 .. l_ids.COUNT
  LOOP
    DBMS_OUTPUT.put_line (l_ids (indx));
  END LOOP;
END;

You can even get fancy and use BULK COLLECT in the RETURNING clause of a dynamic DML statement:

DECLARE
   TYPE ids_t IS TABLE OF employees.employee_id%TYPE;
   l_ids ids_t;
BEGIN
   EXECUTE IMMEDIATE
      'UPDATE employees SET last_name = UPPER (last_name)
          WHERE department_id = 100
          RETURNING employee_id INTO :ids'
   RETURNING BULK COLLECT INTO l_ids;   
   FOR indx IN 1 .. l_ids.COUNT 
   LOOP
      DBMS_OUTPUT.PUT_LINE (l_ids (indx));
   END LOOP;
END;

Exercise

Write the rest of the procedure whose signature is shown below. Use BULK COLLECT to fetch all the last names from employees identified by that WHERE clause and return the collection. Then write an anonymous block to test your procedure: pass different WHERE clauses and display the names retrieved.

PROCEDURE get_names (
   where_in IN VARCHAR2,
   names_out OUT DBMS_SQL.VARCHAR2_TABLE)

Faster Queries, but…

So now you know: You can use BULK COLLECT to improve the performance of multirow queries, returning multiple rows with each fetch. Unfortunately, this feature is probably not enough to turn you into a hero. That’s because when it comes to row-by-row processing, the real performance pain comes not from fetching data but, rather, from changing data through inserts, updates, and deletes.

Consequently, the next article moves on to cover the FORALL statement, which dramatically reduces the number of context switches needed to execute the same nonquery DML statement multiple times (within a loop of some sort).

Changing data is more complicated in several ways than querying data, so I’ll cover FORALL in multiple articles.

In the meantime, check out the tutorial.

Next Steps

TAKE the “Bulk Processing with PL/SQL” tutorial.

READ more about BULK COLLECT.

WORK OUT at the Oracle Dev Gym.

INSTALL the Human Resources schema.

Illustration by Wes Rowell