My first programming job was working with COBOL in a mainframe environment in the 1980s. In those days, dealing with the case of strings was a simple affair. All of our company’s code was in upper case; all of the screens we built presented their information in upper case; and all of the reports we dispatched to the noisy impact printers in the server room were printed in upper case. It was a simple and easy-to-implement strategy. Upper case was the only case.
As the years went on, this rudimentary approach needed to evolve as the technologies and architectures upon which we built solutions demanded a more natural feel to the display of textual data. Nowadays, users expect their content to resemble the grammatical norms of the modern world, and that includes the representation of names and proper nouns within the database and the applications sitting on that data. There are two conflicting requirements: allowing data entry in an application to be flexible in terms of the use of case and presenting that data in a consistent fashion when rendering it for output to the application.
Applications can enforce the validation of data, but an over-reliance on front-end applications enforcing data validation can reveal inconsistencies with the case of text, as shown in the following:
SQL> select surname 2 from names; SURNAME ——————— jones brown SMITH
The stored data can be patched for correctness, but that obviously does not address the root cause of how the inconsistency was permitted and whether it should be permitted. As a consequence, a commonly adopted compromise is to correct the data as it is read back from the database, as shown in the following:
SQL> select initcap(surname) 2 from names; INITCAP(SURNAME) ———————————————— Jones Brown Smith
In many cases this is a sufficient solution, but there are some names (including that of this article’s author) that are not correctly normalized by the INITCAP function. The following shows what happens when INITCAP encounters a name such as McDonald:
SQL> select initcap(surname) 2 from names; INITCAP(SURNAME) ———————————————— Jones Brown Smith O'Brien Mcdonald
The INITCAP function is unaware of the various idiosyncrasies associated with names and does not capitalize the D in McDonald. Similarly, the same functionality of INITCAP that capitalizes the first letter following an apostrophe and lets it correctly handle names such as O’Brien creates an issue when INITCAP is used to normalize other proper nouns that may contain apostrophes. The following INITCAP example demonstrates possibly inappropriate formatting of possessive names, such as some common company names:
SQL> select initcap(organization_name) 2 from companies; INITCAP(ORGANIZATION_NAME) —————————————————————————— Oracle Corporation McDonald'S Restaurant Jackson'S SuppliesExtending Functionality with PL/SQL
If INITCAP does correctly meet your requirements, you can extend the existing implementation of INITCAP with your own version. Listing 1 presents a PL/SQL function example that caters to boundary conditions in the data, while falling back onto the existing INITCAP implementation for normal cases.
Code Listing 1 INITCAP extensions added via a PL/SQL wrapper
SQL> create or replace 2 function MY_INITCAP(p_string varchar2) return varchar2 is 3 l_string varchar2(1000) := p_string; 4 begin 5 if lower(l_string) like 'mac%' then 6 l_string := 'Mac'||initcap(substr(l_string,4)); 7 elsif lower(l_string) like 'mc%' then 8 l_string := 'Mc'||initcap(substr(l_string,3)); 9 end if; 10 11 if l_string like '''%' then 12 null; 13 else 14 if not regexp_like(l_string,'(Mac[A-Z]|Mc[A-Z])') then 15 l_string := initcap(l_string); 16 end if; 17 if l_string like '_''S%' then 18 null; 19 else 20 l_string := replace(l_string,'''S','''s'); 21 end if; 22 end if; 23 24 return l_string; 25 end; 26 / Function created. SQL> select my_initcap(surname) 2 from names; MY_INITCAP(SURNAME) ——————————————————— Jones Brown Smith O'Brien McDonald SQL> select my_initcap(organization_name) 2 from companies; MY_INITCAP(ORGANIZATION_NAME) ————————————————————————————— Oracle Corporation McDonald's Restaurant Jackson's Supplies
Building a function like MY_INITCAP that is bulletproof is not a trivial undertaking. While all of the data presented so far was correctly normalized, it is easy to find additional boundary cases that cause the function to fail, as shown in the following code:
SQL> select my_initcap(problem_data) 2 from potential_problems; MY_INITCAP(PROBLEM_DATA) ———————————————————————— Mcdonald's Restaurant MacY's
In this case, the combination of two boundary conditions, McDonald and a trailing S, created problems, and the rendering of Macy’s demonstrates that a blanket rule of intercepting strings that are prefixed with Mc or Mac is too simplistic an approach.Performance
You could sample more of your existing data and continue to enhance and extend my MY_INITCAP function to improve the successful conversion ratio, but as the function grows in size and complexity, the potential of the function to have an impact on query performance equally grows. The Oracle Database built-in INITCAP function, being part of the Oracle Database kernel, is accessible directly from the SQL engine, whereas my own improved and extended MY_INITCAP function is a PL/SQL function, and calling PL/SQL functions from SQL statements incurs some overhead. The overhead of a single execution of the function is negligible, but the overhead increases significantly if high execution counts are part of an application’s requirements. The following code tests the performance of the built-in INITCAP function versus my PL/SQL version when each is called 2 million times from a SQL statement.
SQL> set timing on SQL> select count(initcap(object_name)) from two_million_names; COUNT(INITCAP(OBJECT_NAME)) ——————————————————————————— 2000000 1 row selected. Elapsed: 00:00:00.54 SQL> select count(my_initcap(object_name)) from two_million_names; COUNT(MY_INITCAP(OBJECT_NAME)) —————————————————————————————— 2000000 1 row selected. Elapsed: 00:00:11.24
Developers, testers, and DBAs alike can become overzealous when interpreting performance tests like these, dismissing the viability of PL/SQL functions as “too slow.” Consequently, blanket rules such as “no PL/SQL calls from SQL” are sometimes created in coding standards within enterprises. And while it is true that some of the functionality of the MY_INITCAP function can be engineered into a pure SQL solution as shown in the following code, that would sacrifice the functionality, maintainability, and modularity of my PL/SQL solution.
SQL> select 2 case 3 when lower(surname) like 'mac%' 4 then 'Mac'||initcap(substr(surname,4)) 5 when lower(surname) like 'mc%' 6 then 'Mc'||initcap(substr(surname,3)) 7 when surname like '''%' then surname 8 when initcap(surname) like '_''S%' then surname 9 else replace(initcap(surname),'''S','''s') 10 end my_initcap 11 from names; MY_INITCAP ——————————— Jones Brown Smith McDonald
The performance of code is only one facet of its suitability as a solution to an application requirement.Better Integration with Oracle Database 12c
In Oracle Database 12c, the distinction between SQL and PL/SQL is reduced, allowing for a new “class” of PL/SQL functions—functions bound solely to the SQL statement within which they are intended to be executed. Continuing with my MY_INITCAP example, if the PL/SQL function I created exists solely to be called within a single SQL statement, Oracle Database 12c allows me to embed that function directly within the SQL statement itself and not introduce clutter into the list of compiled PL/SQL objects stored within the database. Whereas my pure SQL solution required convoluted CASE statements, embedding a PL/SQL function within a SQL statement retains the modularity and flexibility of PL/SQL without sacrificing the functionality or maintainability of the code, as shown in the following code:
SQL> WITH 2 function my_initcap(p_string varchar2) return varchar2 is 3 l_string varchar2(1000) := p_string; 4 begin 5 if lower(l_string) like 'mac%' then 6 l_string := 'Mac'||initcap(substr(l_string,4)); 7 elsif lower(l_string) like 'mc%' then 8 l_string := 'Mc'||initcap(substr(l_string,3)); 9 end if; ... 23 24 return l_string; 25 end; 26 select my_initcap(surname) 27 from names; MY_INITCAP(SURNAME) ——————————————————— Jones Brown Smith McDonald
The PL/SQL embedded within the SQL statement is not limited to a single function. Multiple functions and/or procedures are permitted, although the PL/SQL called from the final SQL statement must be a function returning a value. The following code shows how MY_INITCAP has been segmented to improve code readability by adding a routine, IS_SCOTTISH, to handle the Mac and Mc prefixes:
SQL> WITH 2 function is_scottish(p_string varchar2) return boolean is 3 begin 4 return regexp_like(p_string,'(Mac[A-Z]|Mc[A-Z])'); 5 end; 6 function my_initcap(p_string varchar2) return varchar2 is 7 l_string varchar2(1000) := p_string; 8 begin 9 if is_scottish(l_string) then ... 28 end; 29 select my_initcap(surname) 30 from names;
You can also test whether calling embedded PL/SQL from within SQL yields any performance improvements. To do this, I will create a very simple standalone function within the database that returns the value 1. The following code shows how long it takes to call that function 1 million times from a SQL statement:
SQL> create or replace 2 function F return number is 3 begin 4 return 1; 5 end; 6 / Function created. SQL> select sum(f) 2 from 3 ( select level from dual 4 connect by level <= 1000 ), 5 ( select level from dual 6 connect by level <= 1000 ); Elapsed: 00:00:02.04
I can then compare this result to the result of completing the identical ask with the PL/SQL function defined within the SQL statement:
SQL> with 2 function f1 return number is 3 begin 4 return 1; 5 end; 6 select sum(f1) 7 from 8 ( select level from dual 9 connect by level <= 1000 ), 10 ( select level from dual 11 connect by level <= 1000 ) 12 / Elapsed: 00:00:00.42
In Oracle Database 12c, defining PL/SQL functions directly within the SQL statement from which they are executed improves the speed of switching between the SQL engine and the called PL/SQL function. You no longer need to complicate your code by converting it to pure SQL syntax, and you retain the inherent modularity and maintainability of the PL/SQL language.
But what about existing production applications that call PL/SQL functions from SQL statements? Do they need refactoring to migrate each function to a WITH clause for each SQL statement? No. But if you have existing PL/SQL functions that augment the capabilities of your application’s SQL statements, you can take advantage of the performance improvement. In Oracle Database 12c, a new user-defined function (UDF) pragma is available to signify to the PL/SQL compiler that a function will be used primarily from the SQL engine.
The following code repeats the performance test with my simple function, this time with the UDF pragma added:
SQL> create or replace 2 function F return number is 3 pragma udf; 4 begin 5 return 1; 6 end; 7 / SQL> select sum(f) 2 from 3 ( select level from dual 4 connect by level <= 1000 ), 5 ( select level from dual 6 connect by level <= 1000 ) 7 ; Elapsed: 00:00:00.39
The performance is equivalent to running the function in the WITH clause of a SQL statement.
Conclusion The close functional integration between the PL/SQL and SQL engines has always been a great boon for developers building data-centric applications. However, there has been some resistance to realizing the benefits of the modular, maintainable code achievable with PL/SQL because of performance concerns. With user-defined functions in Oracle Database 12
c, developers can now attain those benefits as well as superior application and database performance.
DOWNLOAD Oracle Database 12c.
Photography by Ricardo Gomez, Unsplash