By Connor McDonald
Two or more features you utilize to build your database applications might seem totally unrelated, and thus you might not even consider that one may have a detrimental impact on the other. But many a developer has been lulled into a false sense of security by assuming a functional separation rather than validating it via integration testing. I’ve been traveling a lot recently, and recent observations I’ve made at airports serve as a nice metaphor for reinforcing the importance of integration-testing your applications.
It’s common now for many airlines to charge a fee for each checked bag and no fee for carry-on bags. Passengers who carry all their bags on, therefore, save money and time getting out of the airport, because there’s no more waiting at the baggage carousel. And for passengers with checked bags, their additional investment means that it takes less time to collect their bags, because there are fewer bags to deliver from the aircraft to the carousel. Another change, or perhaps innovation, I’ve seen more recently is a move to achieve faster boarding via more rigid definition and control of “boarding zones” before flights, with passengers for different zones queuing in different waiting areas before a flight starts boarding.
In isolation, both of these air travel innovations would appear to be sensible and beneficial for all concerned. However, the combination of these strategies has created a problem—one I observed every time I flew with an airline over the past few months. To avoid paying the extra fee for checked luggage, passengers would bring with them the maximum allowed amount of carry-on luggage, a totally understandable position to take. Why pay more if you can avoid it? The consequence was that the aircraft cabin could not accommodate the volume of luggage, so once the cabin luggage space was filled, all remaining baggage had to be checked into the cargo hold anyway. In trying to avoid this, passengers would begin queuing in their respective boarding zones as early as possible. I often saw these queues start well over an hour before a flight was due to depart, but airport departure gates are not designed for this eventuality. Gates’ walkways were totally clogged with queuing passengers fervently hoping to avoid having to check their luggage.
Two innovations, lower-cost flights and streamlined boarding, are seemingly unrelated, yet together they have created congestion chaos at airport boarding gates, and not just at the boarding gates of the airline concerned. Even other airlines that do not adopt the newer boarding practice are now experiencing issues with late-boarding passengers, because their own boarding gates are rendered nearly inaccessible by large queues in the walkways.
Returning to the Database
Here is a similar example, this time with database innovations. Consider an application built on a simple CUSTOMERS table structure.
SQL> CREATE TABLE CUSTOMERS 2 ( 3 COUNTRY VARCHAR2(128), 4 CREATED DATE, 5 CUST_NAME VARCHAR2(150) 6 ); Table created. SQL> create index cust_ix on customer ( cust_name ); Index created.
The application contains the following query for selecting customers with a specific surname. After the table has been populated with the expected data volumes, the code can be tested with a known customer name:
SQL> select * 2 from customers 3 where cust_name = 'ADAMS'; COUNTRY CREATED CUST_NAME ———————————— ————————— —————————————— AUS 08-NOV-16 ADAMS
The query returns an expected row and is using an appropriate index on the CUST_NAME column to find rows efficiently. All looks as it should until—when a subsequent query is being run—the realization that there might be an underlying issue:
SQL> select * 2 from customers 3 where upper(cust_name) = 'ADAMS'; COUNTRY CREATED CUST_NAME ———————————— ————————— —————————————— AUS 07-NOV-16 Adams AUS 08-NOV-16 ADAMS AUS 09-NOV-16 adams
This second query suggests a possible flaw in application logic with respect to the database design. Due to the way the application has been coded, it’s possible that incomplete results are being returned to the consumers of that first query, because the rows with the surname ADAMS in mixed case are being dropped. If case insensitivity is indeed the desired goal for this application, a potentially large refactoring exercise is going to be required, because the application code will need to be revisited to recast any SQL statements with the predicate CUST_NAME = <value> to include the UPPER expression applied to the CUST_NAME column.
Unintended Side Effects
Not only do the two queries to the CUSTOMER table return different results but they also have differing performance characteristics, as you can see in the execution plans for each:
SQL> select * 2 from customers 3 where cust_name = 'ADAMS'; ————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | ————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 1 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMERS | 1 | |* 2 | INDEX RANGE SCAN | CUST_IX | 1 | ————————————————————————————————————————————————————————————————— SQL> select * 2 from customers 3 where upper(cust_name) = 'ADAMS'; ——————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | ——————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 1 | 152 | |* 1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 152 | ———————————————————————————————————————————————————————
A side effect of wanting case-insensitive data is that refactoring the code to include UPPER creates a performance issue. Use of an expression prohibits the use of the existing index, and application performance is degraded. A new index is required:
SQL> create index cust_ix2 2 on customers ( upper(cust_name) ); Index created.
The ability to create indexes on expressions has existed since Oracle8i Database and is commonly used to implement case-insensitive queries. But unless the refactoring of the application code can be done en masse, it is likely that both indexes will need to coexist until all the code can be revisited. Although it is trivial to add an index to a table, you should never add indexes without understanding the repercussions of doing so. Each index
Obviously, in an application that may already have hundreds of indexes across many tables, there may be the perception that “just” one more index will hardly have an impact. But similarly, an airline thought that “lower-cost flights” would not create airport boarding gate chaos! The unintended side effects are not limited to just requiring a new index. Once that new index is created, consider what happens when a DBA attempts to perform an online space reclamation within the CUSTOMERS table.
SQL> alter table customers shrink space; * ERROR at line 1: ORA-10631: SHRINK clause should not be specified for this object
As in the airline metaphor, two seemingly unrelated features (function-based indexes and TABLE SHRINK SPACE) clash when combined. In this case, the side effect is a restriction on the SHRINK SPACE functionality: it cannot be performed if a function-based index is in place. Whenever you are using any feature in Oracle Database, it is critical that you perform integration testing across your entire application, including the maintenance activities of the DBA.
Column Collation in Oracle Database 12c Release 2
Integration testing would have revealed the conflict between function-based indexes and SHRINK SPACE, but that does not solve the underlying question: how to handle mixed-case data in a performant way without resorting to additional indexes or a large refactoring effort. Oracle Database 12c Release 2 includes the new column-level collation feature to help developers tackle this issue. With column-level collation, it is now possible to specify table columns as case-insensitive.
Returning to the example, the CUSTOMERS table is now re-created with the CUST_NAME column using the new COLLATE clause and with a single index on the CUST_NAME column.
SQL> CREATE TABLE CUSTOMERS 2 ( 3 COUNTRY VARCHAR2(128), 4 CREATED DATE, 5 CUST_NAME VARCHAR2(150) COLLATE BINARY_CI 6 ); Table created. SQL> create index cust_ix on customer ( cust_name ); Index created.
The BINARY_CI specification (CI = case insensitive) indicates that case is not a differentiator between values in the CUST_NAME column. Note that the index is on the CUST_NAME column only. The basic query on ADAMS customer names now retrieves all relevant rows independent of case and can take advantage of the standard index on the CUST_NAME column.
SQL> select * from customer 2 where cust_name = 'ADAMS'; ID CUST_NAME —————————— ———————————————————— 37 ADAMS 237 Adams 447 adams ———————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | ———————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 1 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER | 1 | |* 2 | INDEX RANGE SCAN | CUST_IX | 1 | ————————————————————————————————————————————————————————————————
Note that this is a true collation feature, not just a means of obtaining case-insensitive data. For example, aggregation of the data also ignores the case differences between the values:
SQL> select cust_name, count(*) 2 from customer 3 where cust_name = 'ADAMS' 4 group by cust_name; CUST_NAME COUNT(*) ——————————— ————————— ADAMS 3
A variant on the case insensitivity of data is the handling of accented data. The collation BINARY_AI (AI = accent insensitive) can be specified to also include accented data in the collation. The CUSTOMERS table will be re-created with the CUST_NAME column specified as accent insensitive:
SQL> CREATE TABLE CUSTOMER ( 2 ID INT, 3 CUST_NAME VARCHAR2(40) COLLATE BINARY_AI 4 ); Table created. SQL> create index cust_ix on customer ( cust_name ); Index created.
After sample data is added to the table, note the effect of BINARY_AI. A fourth row is retrieved, and the third character in the customer name is ă (a with a breve).
SQL> select * from customer 2 where cust_name = 'ADAMS'; ID CUST_NAME —————————— ———————————————————— 37 ADAMS 237 Adams 447 adams 647 adăms
Checking for Side Effects
The new collation feature allows for case-insensitive queries, without the need to add expressions to existing queries. Digging into the details, however, reveals one lingering side effect. When the index on CUST_NAME is created, the collation of BINARY_CI on the column makes an adjustment to the underlying definition of the index. Although the data definition language (DDL) text specified the creation of an index on column CUST_NAME, the actual column definition is something else. A query of the data dictionary shows the index definition:
SQL> select column_expression 2 from user_ind_expressions 3 where index_name = 'CUST_IX'; COLUMN_EXPRESSION ————————————————————————————————————————————— NLSSORT("CUST_NAME",'nls_sort=''BINARY_CI''')
An expression is used to define the index, and hence this is still a function-based index, so there will be limitations on the use of SHRINK SPACE.
To use the new collation features in Oracle Database 12c Release 2, the database must have been modified to use a feature that became available in Oracle Database 12c Release 1: larger data type sizes for string data. MAX_STRING_SIZE must be changed to EXTENDED from the default of STANDARD to enable VARCHAR2 columns to be sized up to 32,767 bytes instead of the default, 4,000.
If the change to the database string size has not been made, attempts to use the new collation features will result in an error, such as
SQL> alter table CUSTOMER default collation binary_ai; * ERROR at line 1: ORA-43929: Collation cannot be specified if parameter MAX_STRING_SIZE=STANDARD
The steps for activating extended strings in the database are documented in Oracle Database Reference. Consult Oracle Database Globalization Support Guide for more information on the new collation features in Oracle Database 12c Release 2.
With every release of Oracle Database, there are a myriad of new features available to developers and database administrators alike. Just like the features you build into your applications, the database features you exploit should always be tested extensively to ensure that you do not run into any unintended side effects. In this article’s particular example of case insensitivity, the collation features in Oracle Database 12c Release 2 can help reduce the effort of code refactoring or the need for additional indexes. However, always consult the documentation to get a good understanding of the feature and perform thorough integration tests to make sure you get the maximum benefit from it.
LEARN more about Oracle Database 12c Release 2.
DOWNLOAD Oracle Database 12c Release 2.
Photography by Getty Images/iStockphoto
Connor McDonald is a former Oracle ACE Director who has joined the Oracle Developer Advocates Team. Over the past 25 years, he has worked with systems in Australia, the United Kingdom, Southeast Asia, Western Europe, and the United States. He has co-authored three books and has been a popular speaker at Oracle conferences around the world, specializing in topics regarding the database engine and PL/SQL.