Subscribe

Share

Database, SQL and PL/SQL

The Cost of Data Retrieval

Part 9 in a second series on the basics of the relational database and SQL

By Melanie Caffrey

May/June 2017

This article is the ninth in a series that helps you build on the fundamentals you learned in the 12-part SQL 101 series in Oracle Magazine. The previous Beyond SQL 101 article, “Rapid Retrieval of Rows in Small Data Sets,” introduced B-tree indexes, both single-column and composite. You learned how they can be used to increase performance when retrieving rows. You discovered how to create them and what kind of relationship they have with constraints. You also discovered how to alter them, how to drop them, and when they cannot be used. Finally, you explored a few guidelines for when B-tree indexes are most useful, when they are not useful, and when a function-based index can be helpful.

In this article, you will

  • Learn how to create a bitmap index
  • Learn about SQL statement processing operations and parsing
  • See Oracle Database’s Oracle Optimizer program in action
  • Discover database statistics
  • Examine execution paths and explain plans

To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Enterprise Edition Release 2 (12.2.0.1.0). If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_201 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for the SYS and SYSTEM users and make a note of them.

Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_201 schema that are used for this article’s examples. (View the script in a text editor for execution instructions.)

Retrieval Versus Revision

Recall that the previous Beyond SQL 101 article, “Rapid Retrieval of Rows in Small Data Sets,” explained that one of the primary goals of index creation is to boost data retrieval performance. Additionally, you learned that B-tree indexes can be useful when your goal is to retrieve a small number of rows, using an index on column data that is highly selective. However, recall also that for each change of a value in an indexed column, any corresponding indexes must also be changed. Therefore, having indexes in place can potentially slow down data manipulation, because for every DELETE, UPDATE, and INSERT command issued that affects an indexed column, a DELETE, UPDATE, and INSERT action must similarly take place against corresponding indexes to keep them in sync with the changed column values.

Data warehouses are types of databases in which the design and performance goals for data manipulation are different from those of a transactional database. In data warehouses, data loads take place in bulk and at certain intervals, and the primary goal of a data warehouse is to aid users in querying and analyzing data. To help users accomplish this goal, Oracle Database supports bitmap indexes, which are typically used on columns that have few distinct values relative to the number of rows in the table and, therefore, have low selectivity. For example, the EMPLOYEE table currently has only two distinct values for the virtual column WAGE_INCREASE_WORTHINESS: “No Raise Yet” and “Cost of Living Increase Eligible.” If you have a data warehouse in which the users execute queries with a frequently used WHERE clause condition, including a column with few distinct values relative to the total number of rows in the table, this column may be a viable choice for a bitmap index. The structure of a bitmap index looks similar to Figure 1.


o37sql201-f1
Figure 1. Structure of a bitmap index

Figure 1 shows the EMPLOYEE table with a bitmap index on the WAGE_INCREASE_WORTHINESS column. The bitmap index translates the distinct values for the WAGE_INCREASE_WORTHINESS column of individual employees. In this simplified example, the employees Frances Newton (EMPLOYEE_ID=37) and Theresa Wong (EMPLOYEE_ID=6572) each have the value “Cost of Living Increase Eligible” for the WAGE_INCREASE_WORTHINESS column, which turns the bitmap index bit on to 1. The WAGE_INCREASE_WORTHINESS = No Raise Yet and WAGE_INCREASE_WORTHINESS = Unknown values for each of these employees have their index bits turned off to 0, indicating that these values are not true for those employee records. The employee Thomas Jeffrey (EMPLOYEE_ID=6571) has his index bit turned on to 1 for WAGE_INCREASE_WORTHINESS = No Raise Yet, and all other bits in his record are turned off to 0. The SQL in Listing 1 demonstrates the creation of a bitmap index on the WAGE_INCREASE_WORTHINESS column of the EMPLOYEE table.

Code Listing 1: Creating a bitmap index on the WAGE_INCREASE_WORTHINESS column

SQL> set lines 10000
SQL> CREATE BITMAP INDEX employee_wage_inc_worth_bmi
  2  ON employee(wage_increase_worthiness);
Index created.

Note that whereas bitmap indexes can be useful in data warehouses, they are not suitable for transactional tables that experience heavy data manipulation activity by many users. Recall that changes to indexed column values require similar changes to the corresponding index values, so any changes to a bitmap index may significantly slow down transactions. However, if you have multiple single-column bitmap indexes in a data warehouse and you want to retrieve rows based on certain AND and/or OR WHERE clause conditions, bitmap indexes can boost performance. Bitmap indexes are able to quickly compare and merge the bit settings that satisfy the WHERE clause conditions and return results without the need to access the large data warehouse table(s). Similarly, if your WHERE clause condition consists of a single-column predicate on a bitmap-indexed column, the bitmap index will be most useful when you are also executing a COUNT aggregate function in your SQL statement, because the index obtains and aggregates the ON bits and returns the result without having to access the underlying table. Additionally, note that although bitmap indexes typically require less storage space than B-tree indexes, they are available only with the Enterprise Edition of Oracle Database.

Making a Statement

Indexes can improve the performance of SQL statements returning result sets, but before a SQL statement returns a result set, Oracle Database performs several steps and operations that are transparent to the user. First, Oracle Database creates a cursor, an area in memory where the database stores the SQL statement. Next, it parses the SQL statement. Parsing involves several steps:

  • Check the syntax validity. This step ensures that the SQL statement has been written correctly.
  • Check that the table names and column names included in the SQL statement exist.
  • Check whether the user has permissions to access the requested tables and columns. (Permissions and grants will be discussed in a future article in this series.)

Parsing is an expensive operation that requires time and resources. Therefore, for repeatedly executing similar statements, there are ways to reduce or eliminate parsing.

Oracle Database maintains a cache of recently executed SQL statements, along with their execution plans. When a new SQL statement is executed, if an identical statement has been previously parsed, the statement doesn’t need to be parsed again.

Bind variables, placeholders for values, are very useful in reducing the number of necessary parsing actions. These placeholders are substituted by different values when a SQL statement is parsed. If bind variables are used, they are associated with and substituted for the appropriate statement values. Once the bind variable substitution takes place, the SQL statement is executed.

For any SQL statement query, a cursor is created; the statement is parsed; bind variables are substituted; the statement is executed; results are fetched; and after all results are fetched, the cursor is closed. Figure 2 illustrates an overview of SQL statement processing operations.

o37sql201-f2
Figure 2. SQL statement processing operations

Statistically Selective

Part of the parsing operation is the determination of the execution plan, the method by which Oracle Database intends to execute the SQL statement. To determine the best way to execute a SQL statement, Oracle Database employs a program called the Oracle Optimizer, which examines each SQL statement and determines the most optimal sequence of steps necessary to execute it. This sequence of steps is the execution plan.

When determining the best plan, the Oracle Optimizer considers statistics stored in the Oracle Database data dictionary. Recall from a previous Beyond SQL 101 article, “Defining, Constraining, and Manipulating Your Entities,” that the data dictionary is an internally owned and controlled set of tables and views that contain information about your database’s metadata.

Statistics include, among other pieces of information, values such as the number of rows in a table and the number of distinct values for each column in a table. The statistics are used by the Oracle Optimizer to calculate the estimated cost of various execution options to help determine the lowest-cost (and therefore, best) execution plan. Automatic statistics gathering may or may not be enabled. You can use Oracle Enterprise Manager to enable or disable statistics gathering, check for missing or stale statistics, and populate the data dictionary with new or updated statistics when necessary.

The query in Listing 2 checks when tables owned by your schema last had statistics gathered on them. The query in Listing 3 shows when indexes owned by your schema last had statistics gathered on them.

Code Listing 2: Determining when statistics were last gathered for your schema’s tables

SQL> select table_name, num_rows, to_char(last_analyzed, 
'DD-MON-YYYY HH24:MI:SS') last_analyzed
  2    from user_tables
  3  order by table_name;

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
—————————————————————————————— —————————— ————————————————————
ANNUAL_REVIEW                          15 05-JUL-2016 01:00:23
DEPARTMENT                              4 05-JUL-2016 01:00:24
EMPLOYEE                               16 04-JAN-2017 01:00:32
EMPLOYEE_CTAS                           2 11-SEP-2016 16:21:22
EMPLOYEE_EXTRA                          2 26-OCT-2016 01:00:32
EMPLOYEE_IDENTITY                      16 04-JAN-2017 01:00:32
6 rows selected.

Code Listing 3: Determining when statistics were last gathered for your schema’s indexes

SQL> select table_name, index_name, num_rows, distinct_keys, 
to_char(last_analyzed, 'DD-MON-YYYY HH24:MI:SS') last_analyzed
  2    from user_indexes
  3  order by table_name, index_name;

TABLE_NAME     INDEX_NAME                  NUM_ROWS ...KEYS LAST_ANALYZED
—————————————— ——————————————————————————— ———————— ——————— ————————————————————
DEPARTMENT     DEPARTMENT_NAME_LOCATION_UK        4       4 10-SEP-2016 21:05:26
DEPARTMENT     DEPARTMENT_PK                      4       4 05-JUL-2016 15:45:44
EMPLOYEE       DEP_WAGE_INCREASE_I               16       7 15-JAN-2017 16:39:23
EMPLOYEE       EMPLOYEE_PK                       16      16 15-JAN-2017 17:56:41
EMPLOYEE       EMPLOYEE_WAGE_INC_WORTH_BMI        2       2 06-MAR-2017 19:21:37
EMPLOYEE       EMP_DEPT_FK                       16       4 15-JAN-2017 18:39:09
EMPLOYEE       EMP_HIRE_DATE_I                   16      13 11-JAN-2017 21:53:14
EMPLOYEE_EXTRA SYS_C0010551                       2       2 26-OCT-2016 01:00:32
8 rows selected.
A Plan of Execution

Using statistics, the Oracle Optimizer creates an execution plan that outlines each step Oracle Database executes to process a SQL statement. Oracle Database reads the execution plan and executes each step, starting with the most indented step and ending with the least indented step. If two or more steps have the same level of indentation, the steps are executed, in order, from top to bottom. Consider the execution plans, sometimes called explain plans, in Listing 4.

Code Listing 4: Creating and displaying execution plans

SQL> EXPLAIN PLAN FOR
  2     select employee_id, last_name, first_name
  3       from employee
  4      where wage_increase_worthiness = 'No Raise Yet' 
  5     order by last_name, first_name;

Explained.

SQL> select *
  2    from TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
———————————————————————————————————————————————————————————————————————————
Plan hash value: 1993616227
—————————————————————————————————————————————————————————————————————————————
| Id | Operation          | Name     | Rows | Bytes | Cost (%CPU)| Time     |
—————————————————————————————————————————————————————————————————————————————
|  0 | SELECT STATEMENT   |          |    5 |   230 |     4  (25)| 00:00:01 |
|  1 |  SORT ORDER BY     |          |    5 |   230 |     4  (25)| 00:00:01 |
|* 2 |   TABLE ACCESS FULL| EMPLOYEE |    5 |   230 |     3   (0)| 00:00:01 |
—————————————————————————————————————————————————————————————————————————————
Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
———————————————————————————————————————————————————————————————————————————
   2 - filter("WAGE_INCREASE_WORTHINESS"='No Raise Yet')
14 rows selected.

SQL> EXPLAIN PLAN FOR
  2     select employee_id, last_name, first_name
  3      from employee
  4     where wage_increase_worthiness = 'Cost of Living Increase Eligible'
  5     order by last_name, first_name;

Explained.

SQL> select *
  2    from TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————
Plan hash value: 1993616227
—————————————————————————————————————————————————————————————————————————————
| Id | Operation          | Name     | Rows | Bytes | Cost (%CPU)| Time     |
—————————————————————————————————————————————————————————————————————————————
|  0 | SELECT STATEMENT   |          |   11 |   506 |    4   (25)| 00:00:01 |
|  1 |  SORT ORDER BY     |          |   11 |   506 |    4   (25)| 00:00:01 |
|* 2 |   TABLE ACCESS FULL| EMPLOYEE |   11 |   506 |    3    (0)| 00:00:01 |
—————————————————————————————————————————————————————————————————————————————

Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————
   2 - filter("WAGE_INCREASE_WORTHINESS"='Cost of Living Increase Eligible')
15 rows selected.

The execution plans in Listing 4 are created by use of the EXPLAIN FOR syntax. A query to the DISPLAY function in the built-in DBMS_XPLAN PL/SQL package obtains each execution plan. (An explanation regarding PL/SQL or the TABLE function that encapsulates the call to DBMS_XPLAN is beyond the scope of this article. Both the TABLE function and an introduction to PL/SQL will be addressed in subsequent articles in this series.) Note that the ID column identifies only the step number, much like a step name; it does not denote the execution order of the steps. The first step performed is a full table scan of the EMPLOYEE table to retrieve all records. The next step sorts the returned records according to the ORDER BY clause, and the final step of the first execution plan in Listing 4 returns all rows that satisfy the WHERE clause condition predicate

2 - filter("WAGE_INCREASE_WORTHINESS"='No Raise Yet')

The ROWS column of the execution plan displays approximately how many rows the Oracle Optimizer expects to process at each step. Recall from the previous Beyond SQL 101 article, “Rapid Retrieval of Rows in Small Data Sets,” that if your query warrants the return of a large number of row values from your table, it might be more efficient to scan the entire table to return the row values than to look up and obtain row values from an index. Because the EMPLOYEE table is currently composed of only 16 records, the first execution plan in Listing 4’s approximation of 5 records returned comprises more than a quarter of the records in the entire table. For comparison purposes, the second execution plan in Listing 4 is generated for the same SQL statement, but for the WHERE clause condition predicate

2 - filter("WAGE_INCREASE_WORTHINESS"='Cost of Living 
Increase Eligible')

And although the ROWS column value differs between the two execution plans, the execution plan steps are identical.

Listing 5 outlines an execution plan that shows an index access included as an execution step. In the first step executed, step 3, INDEX RANGE SCAN searches the EMP_HIRE_DATE_I index for the accounting department record and retrieves the DEPARTMENT_ID value for the approximate number of rows that meet the WHERE clause condition predicate

3 - access("HIRE_DATE">TO_DATE(' 2016-09-01 00:00:00', 
'syyyy-mm-dd hh24:mi:ss'))

Note that a predicate line that states that it is an “access” performs an index key lookup whereas a predicate line that states that it is a “filter,” such as both predicates in Listing 4, is simply a WHERE clause condition used to filter the returned results.

The next step in Listing 5, step 2, accesses the EMPLOYEE table with the ROWID value(s) retrieved from the EMP_HIRE_DATE_I index. Step 1 sorts the returned records according to the ORDER BY clause, and the final step, step 0, returns all rows that satisfy the WHERE clause condition.

Code Listing 5: An execution plan that uses an index key lookup

SQL> EXPLAIN PLAN FOR
  2  select last_name, first_name
  3    from employee
  4  where hire_date > to_date('01-SEP-2016', 'DD-MON-YYYY')
  5  order by last_name, first_name;
Explained.

SQL> select *
  2    from TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————————
Plan hash value: 1409198325
———————————————————————————————————————————————————————————————————————————————
| Id | Operation        | Name           | Rows | Bytes | Cost (%CPU)| Time   |
———————————————————————————————————————————————————————————————————————————————
|  0 | SELECT STATEMENT |                |    1 |    23 |    3  (34) |00:00:01|
|  1 |  SORT ORDER BY   |                |    1 |    23 |    3  (34) |00:00:01|
|  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEE| 1|23|2   (0) |00:00:01|
|* 3 |    INDEX RANGE SCAN | EMP_HIRE_DATE_I |    1 |   |    1   (0) |00:00:01|
———————————————————————————————————————————————————————————————————————————————
PLAN_TABLE_OUTPUT
—————————————————————————————————————————————————————————————————————————————
Predicate Information (identified by operation id):
———————————————————————————————————————————————————
   3 - access("HIRE_DATE">TO_DATE(' 2016-09-01 00:00:00', 
                                  'syyyy-mm-dd hh24:mi:ss'))
15 rows selected.

Listing 6 demonstrates an execution plan that uses the EMPLOYEE_WAGE_INC_WORTH_BMI bitmap index. Note that because the query uses a single predicate on the bitmap-indexed column and is performing the COUNT aggregate function, the answer can be obtained from the bitmap index alone and a table access is, therefore, not necessary.

Code Listing 6: An execution plan that uses a bitmap index due to the COUNT aggregate function

SQL> EXPLAIN PLAN FOR
  2  select COUNT(employee_id) employees
  3    from employee
  4   where wage_increase_worthiness = 'No Raise Yet';
Explained.

SQL> select *
  2    from TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
—————————————————————————————————————————————————————————————————————————
Plan hash value: 1431199220
———————————————————————————————————————————————————————————————————————————
| Id | Operation        | Name           | Rows| Bytes|Cost(%CPU)|Time    |
———————————————————————————————————————————————————————————————————————————
|  0 | SELECT STATEMENT |                |    1|    26|   1   (0)|00:00:01|
|  1 |  SORT AGGREGATE  |                |    1|    26|          |        |
|  2 |   BITMAP CONVERSION COUNT  |      |    6|   156|   1   (0)|00:00:01|
|* 3 |    BITMAP INDEX FAST FULL SCAN| EMPLOYEE_WAGE_INC_WORTH_BMI| | | | |
———————————————————————————————————————————————————————————————————————————
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————
Predicate Information (identified by operation id):
———————————————————————————————————————————————————
   3 - filter("WAGE_INCREASE_WORTHINESS"='No Raise Yet')
15 rows selected.
Summary

This article introduced you to bitmap indexes. You discovered how to create them and when they might be an appropriate index choice. You also learned how bitmap indexes should not be used in a database with regular transaction activity but instead in a data warehouse with occasional bulk loads of data. You were introduced to the Oracle Optimizer, SQL statement processing operations, and parsing. You learned what a bind variable is and how it can help reduce parsing activity. Last, you learned about statistics and the creation and display of execution plans. In the next article in this series, you’ll learn more about the Oracle data dictionary and be introduced to SQL statement scripting.

Next Steps

READ

DOWNLOAD

LEARN more about relational database design and concepts.

TRY Oracle Database Cloud services.

 

Photography by Scott Webb, Unsplash