What is the best way to write generic procedures to work on strings and/or lists of strings? For example, what is the best way to write a function to parse a delimited string into a collection, with a specific separator? I assume that function parameters should be declared as VARCHAR2(with no length) and "internal" strings should be declared as VARCHAR2(32767). What is the best way of representing a list of strings in the most generic way possible? Also, is there a good way to extend the code to support CLOBs (character large objects) as well, without duplicating the code (to cover every possible combination of parameter types)?
Sounds as if you like to write generic, reusable code. So do I. It is intellectually stimulating and can save me lots of time, because I can avoid writing the same things over and over again.
On the other hand, it is also important to recognize that a generic program is usually more complex and harder to test than more-specific variations on an algorithm. Working out requirements for something that should be truly generic is never as easy as it seems at first glance. Furthermore, we often venture unknowingly into the dangerous terrain of overdesign. "Gee, why don't I make this program really flexible by adding five other parameters to modify the default behavior of the program?"
Before I answer your questions, I'd like to offer some high-level advice on your next generic utility creation adventure:
Don't overdesign. Be very practical about what you need now and what you can clearly see you will need in the near future. Don't daydream about possible requirements as you design your program.
Don't overcode. Make sure that you hew closely to the design you have settled on. Again, when you are writing generic code, you will find yourself thinking about more and more possibilities, even as you write the program.
Cut your losses. You might find as you start to build your generic program that it is tougher than you expected. Listen to your intuition. If the little voice inside your head is saying, "Yes, I know I can do this . . . but is it really all that important?" you should think seriously about abandoning your generic approach and instead write a more specific program from scratch that meets your current requirements.
Having said all that, your question covers three areas:
1. How should you declare "internal" strings—local variables inside your generic functions? When declaring a local variable of type VARCHAR2, you must specify a maximum length. The question then becomes how best to do this. Should you declare it as VARCHAR2(32767), because that is the largest possible value? Will that use excessive amounts of memory, if only small string values are manipulated?
2. What is the best way to declare and work with lists of strings?
3. How can you best write programs to work with both VARCHAR2 variables and CLOBs?
Here are my answers, followed by examples and additional explanations:
1. The best way to declare your local variables is to define a subtype that is based on the biggest possible string size in a shared utility package. All of your string functions can then reference that subtype when declaring local variables. If you create a single package containing all your string functions, you can declare the subtype in that package. And you will not use up any more memory than needed, because Oracle Database allocates memory as needed for string variables declared with a length of 32,767 characters.
2. To work with a list of strings, you need to declare the list based on an already-defined collection type. You can use a type offered by Oracle Database, such as DBMS_SQL.VARCHAR2S or DBMS_SQL.VARCHAR2A. You can also define your own type and then reference that type.
3. You should use CLOBs only when you need to manipulate strings longer than 32,767 bytes or characters in length. Assuming, then, that you are running Oracle9i Database Release 2, you should generally be able to use the same code for many of your functions, because the newer versions of Oracle Database allow you to execute built-in string functions natively against CLOBs, as well as VARCHAR2 string variables.
Let's take a look at each of these answers in more detail.
Avoiding hard-coded VARCHAR2 constraints. A program I often use to demonstrate various best-practices techniques is a simple variant on SUBSTR, which I call BETWNSTR. The built-in SUBSTR function takes a string, the starting location, and the number of characters, and returns that portion of the string. You can get more information about SUBSTR from the Oracle Database SQL Reference.
SUBSTR is dandy, but I often have the starting and ending locations in a string and want the substring between those two positions. To use SUBSTR, I must calculate the number of characters between start and end. I find that I always forget the relevant formula (end - start + 1). So I put together an itty-bitty function to remember the formula, as follows:
FUNCTION betwnstr ( string_in IN VARCHAR2 , start_in IN PLS_INTEGER , end_in IN PLS_INTEGER ) RETURN VARCHAR2 IS l_return VARCHAR2(32767); BEGIN l_return := SUBSTR ( string_in, start_in, end_in - start_in + 1 ); RETURN l_return; END betwnstr;
Here are two notes to make about this implementation:
I have declared a local variable, but I don't really need it here. It is present to demonstrate the best practice of how to best declare such variables.
This is a simplified implementation of BETWNSTR to demonstrate best practices. It does not handle all the cases a real "between string" utility should.
Code Listing 1: string_pkg with BETWNSTR
CREATE OR REPLACE PACKAGE string_pkg IS SUBTYPE maxvarchar2_t IS VARCHAR2 ( 32767 ); FUNCTION betwnstr ( string_in IN VARCHAR2 , start_in IN PLS_INTEGER , end_in IN PLS_INTEGER ) RETURN VARCHAR2; END string_pkg; / CREATE OR REPLACE PACKAGE BODY string_pkg IS FUNCTION betwnstr ( string_in IN VARCHAR2 , start_in IN PLS_INTEGER , end_in IN PLS_INTEGER ) RETURN VARCHAR2 IS l_return maxvarchar2_t; BEGIN l_return := SUBSTR ( string_in , start_in , end_in - start_in + 1 ); RETURN l_return; END betwnstr; END string_pkg; /
Now I am going to move the BETWNSTR function into a package and move the hard-coded local variable declaration, as shown in Listing 1. Notice that I have now declared the following subtype:
SUBTYPE maxvarchar2_t IS VARCHAR2 ( 32767 );
I use that subtype in my declaration of the local variable.
It is true that I do still have a hard-coding of the number 32767 in my package, but it appears only once and serves as the single point of definition for the new, biggest VARCHAR2-allowed datatype.
If Oracle Database increases the maximum size for a VARCHAR2, I can make a change in one place and recompile string_pkg and all programs that use this package, and then my application will be able to immediately take advantage of the expanded datatype.
Work with lists of strings. As noted above, to work with a list of strings you need to declare the list based on an already-defined collection type. You can use a type offered by Oracle Database, such as DBMS_SQL.VARCHAR2S (maximum of 255 characters per string) or DBMS_SQL.VARCHAR2A (maximum of 32,767 characters per string, introduced in Oracle9i Database Release 2). You can also define your own type, again perhaps in your centralized string package, and then reference that type.
Suppose I want to add a function to my string package that accepts a list of elements separated by a delimiter (a comma-delimited list, or a string such as that returned by DBMS_UTILITY.FORMAT_CALL_STACK, delimited by new-line characters) and returns a collection whose elements are those strings between delimiters. I can take one of two approaches:
1. Use a collection type defined by Oracle Database and available automatically to all programmers, such as those found in the DBMS_SQL package. Here is an example of the function header for this approach:
CREATE OR REPLACE PACKAGE string_pkg IS SUBTYPE maxvarchar2_t IS VARCHAR2 ( 32767 ); FUNCTION betwnstr ( string_in IN VARCHAR2 , start_in IN PLS_INTEGER , end_in IN PLS_INTEGER ) RETURN VARCHAR2; FUNCTION list_to_collection ( string_in IN VARCHAR2 , delim_in IN VARCHAR2 DEFAULT ' , ' ) RETURN DBMS_SQL.varchar2a; END string_pkg; /
2. Alternatively, I define my own collection type in the package and then use that in the function interface:
CREATE OR REPLACE PACKAGE string_pkg IS SUBTYPE maxvarchar2_t IS VARCHAR2 ( 32767 ); TYPE maxvarchar2_aat IS TABLE OF maxvarchar2_t INDEX BY PLS_INTEGER; FUNCTION betwnstr ( string_in IN VARCHAR2 , start_in IN PLS_INTEGER , end_in IN PLS_INTEGER ) RETURN VARCHAR2; FUNCTION list_to_collection ( string_in IN VARCHAR2 , delim_in IN VARCHAR2 DEFAULT ' , ' ) RETURN maxvarchar2_aat; END string_pkg; /
The first option (using DBMS_SQL.VARCHAR2A) saves some typing and can be used with programs that are already using collections based on the DBMS_SQL type. The downside is that it introduces an external dependency in this package. Because the DBMS_SQL package is present in all versions of Oracle Database, it's hard to get too worried about this external dependency.
Generally, though, you are better off avoiding such dependencies or at least minimizing the number of dependencies in a program. The second option (a collection type defined directly within the package specification) allows you to define a completely self-contained utility.
Functions for VARCHAR2s and CLOBs. In Oracle9i Database Release 2 and above, you can in many circumstances treat VARCHAR2 and CLOB interchangeably. Visit the "PL/SQL Semantics for LOBs" section in the Oracle Database Application Developer's Guide—Large Objects, to get more details about how Oracle Database will implicitly convert between VARCHAR2 and CLOB, and how you can apply many of the standard VARCHAR2 built-in functions directly to CLOBs.
Because CLOBs can be much larger than VARCHAR2 strings, you can think of a VARCHAR2 variable as being (logically) a subtype of CLOB. So a function that works with CLOBs should also work with VARCHAR2s.
One might then be tempted to say, "Let's just use CLOBs as the standard datatype for string functions." I could then build a VARCHAR2 "skin" that simply invokes the CLOB-based algorithm for its implementation. Given the performance characteristics of CLOB, however, it is recommended that you use CLOB only when you're sure that VARCHAR2 (32767) is not big enough.
So from a performance standpoint, I will want to implement the CLOB variation of list_to_collection, one that accepts a CLOB and returns a collection of CLOBs, with a copy-and-paste operation, simply replacing the VARCHAR2 declarations with CLOBs wherever appropriate.
And until the performance characteristics of CLOBs improve, you will generally want to create separate programs for this datatype.
Avoiding the hard-coding of the 32,767 literal. Now, I have a problem with the above program: I have hard-coded the 32767 maximum length into my call to SUBSTR. I could get around this problem by adding a constant to the package specification like this:
CREATE OR REPLACE PACKAGE string_pkg IS c_max_varchar2_len CONSTANT PLS_INTEGER := 32767; SUBTYPE maxvarchar2_t IS VARCHAR2 (32767); I can then use this constant in my call to SUBSTR: 16 FOR clob_index IN 1 .. l_clobs.COUNT 17 LOOP 18 l_return ( clob_index ) := 19 SUBSTR ( l_clobs ( clob_index ) 20 , 1 21 , c_max_varchar2_len ); 22 END LOOP;
This is better, but I still have multiple occurrences of the 32767 literal in my package specification.
If I am properly obsessed with avoiding repetition of such values in my code, I could take things a step further and take advantage of conditional compilation as follows:
ALTER SESSION SET plsql_ccflags = 'max_varchar2_length:32767' / CREATE OR REPLACE PACKAGE string_pkg IS SUBTYPE maxvarchar2_t IS VARCHAR2 ( $$max_varchar2_length ); and then my function implementation would look like this: 16 FOR clob_index IN 1 .. l_clobs.COUNT 17 LOOP 18 l_return ( clob_index ) := 19 SUBSTR ( l_clobs ( clob_index ) 20 , 1 21 , $$max_varchar2_length ); 22 END LOOP;
For more information on conditional compilation, check out:
Conclusions. We should all pay attention to the opportunities to create reusable programs, especially handy string functions. In the process of creating these generic utilities, we should do everything we can to make the code easy to maintain (avoid hard-codings and redundant algorithms), while still ensuring that the code is efficient enough to be useful in a production environment.
Code Listing 2:string_fun Package with full BETWNSTR implementation
The following script was created by Bryn Llewellyn, PL/SQL Product Manager, to compare VARCHAR2 and CLOB performance.
Code Listing 3: Script to compare VARCHAR2 and CLOB performance
Using the CLOB implement as the foundation algorithm. Recognizing that CLOBs are slower than VARCHAR2s, I thought I would still show you how you could implement the CLOB version of "list to collection" as the foundation algorithm and then implement the VARCHAR2 version on top of that. It is a useful technique for avoiding duplication of application logic, one that you should be entirely comfortable applying to your own situations. Of course, avoiding code redundancy usually cannot trump performance bottlenecks.
Here we go...
I will add a CLOB variation of list_to_collection to my package, one that accepts a CLOB and returns a collection of CLOBs, each element of which contains a delimited item from the incoming CLOB.
My string package specification now needs another collection type and function header:
CREATE OR REPLACE PACKAGE string_pkg IS ... previous VARCHAR2 elements ... TYPE clob_aat IS TABLE OF CLOB INDEX BY PLS_INTEGER; FUNCTION cloblist_to_collection ( string_in IN CLOB , delim_in IN VARCHAR2 DEFAULT ',' ) RETURN clob_aat; END string_pkg; /
Now, I could implement this function with a copy-and-paste operation, but I would really much rather have just one instance of the parsing algorithm used in the list_to_collection function. So here is what I will do:
1. Create cloblist_to_collection by copying list_to_collection.
2. Change all relevant declarations from VARCHAR2 to the CLOB types (individual variables and collection types).
3. Replace the algorithm inside list_to_collection with a call to cloblist_to_collection.
Let's take a look. Here is the header and declaration section of cloblist_to_collection:
FUNCTION cloblist_to_collection ( string_in IN CLOB , delim_in IN VARCHAR2 DEFAULT ',' ) RETURN clob_aat IS l_loc PLS_INTEGER; l_row PLS_INTEGER := 1; l_startloc PLS_INTEGER := 1; l_return clob_aat;
Notice that I have changed the string_in datatype to CLOB, and return the clob_aat collection type instead of maxvarchar2_aat. The body of this function remains unchanged — it looks exactly the same as it did when it implemented list_to_collection.
Now I must "reinvent" the list_to_collection function. Here is my implementation, with explanation below:
1 FUNCTION list_to_collection ( 2 string_in IN VARCHAR2 3 , delim_in IN VARCHAR2 DEFAULT ',' 4 ) 5 RETURN maxvarchar2_aat 6 IS 7 l_clobs clob_aat; 8 l_return maxvarchar2_aat; 9 BEGIN 10 -- Parse the string as a CLOB. 11 l_clobs := cloblist_to_collection ( 12 TO_CLOB ( string_in ), delim_in ); 13 14 -- Copy the individual items to the string collection. 15 -- Use SUBSTR to avoid VALUE_ERROR exceptions. 16 FOR clob_index IN 1 .. l_clobs.COUNT 17 LOOP 18 l_return ( clob_index ) := 19 SUBSTR ( l_clobs ( clob_index ) 20 , 1 21 , 32767); 22 END LOOP; 23 24 RETURN l_return; 25* END list_to_collection;
|7-8||Declare a CLOB collection for use in the call to cloblist_to_collection, and another collection to be returned by the function.|
|11-12||Convert the VARCHAR2 string to CLOB and then pass it to the cloblist_to_collection to do the parsing.|
|16-22||Copy the contents of the CLOB collection to the "maximum VARCHAR2" collection. Use SUBSTR to take only the first 32,767 characters of each CLOB, to avoid raising VALUE_ERROR.
Note: instead of truncating and possibly losing string data with SUBSTR, you might want to adjust the algorithm to break up individual items that are longer than 32767 characters and pass them back as separate items in the "maximum VARCHAR2" collection.
What is the overhead involved in having DBMS_OUTPUT calls in production code where serveroutput is turned off? I've read about how to selectively call or compile code, but I can't find anything that says, "Yes, there is considerable overhead in making DBMS_OUTPUT calls; conditionally skip them where possible" or "No, there is minimal impact; don't go to the effort of trying to conditionally make the calls."
What a fine and very practical question you ask.
Before I answer it, I'll briefly describe how DBMS_OUTPUT works and provide some information about tracing.
How DBMS_OUTPUT works. The DBMS_OUTPUT package provides a procedure (PUT_LINE) that writes text out to a buffer (an array in memory) if output has been enabled. Host environments like SQL*Plus can then call the DBMS_OUTPUT.GET_LINES procedure to retrieve the contents of the buffer and display those strings through some kind of visual interface. DBMS_OUTPUT is, therefore, only useful when executing code from an environment that can/will display output from this built-in package.
DBMS_OUTPUT is certainly the simplest and most direct means of seeing what is happening inside your PL/SQL subprogram. The need for the host environment to extract data from the buffer, however, limits the usefulness of DBMS_OUTPUT as a tracing mechanism.
Tracing issues and requirements. Tracing generally refers to implementing a mechanism in your code so that you can trace or watch what is going on in your code. Tracing is different, by the way, from debugging. When you trace, you run your application and gather information about its real-time behavior. Afterwards, you examine the trace to identify bugs. When you debug, you step through the code line by line and examine the goings-on in your code. This is a crucial technique for identifying the cause of a bug.
There are a variety of tracing scenarios:
Development tracing. I want to trace the execution of my program during development, to understand, test, and debug the code thoroughly. I neither want nor need tracing in my production code.
Production tracing. I need to include optional tracing in my production application. If a user reports a problem, I can then turn on tracing within production, gather the necessary information, and then perform my analysis.
Tracing flexibility. In some simple cases, I can rely solely on DBMS_OUTPUT to display tracing data to the screen. In many other situations, however, I will want to use a different repository for my trace. For example, suppose my program runs for hours. I want to see the trace information while the program continues to execute. So I will write information out to a table using an autonomous transaction procedure. I can then see the output immediately from another session.
As with every other aspect of your application design, you should think through the kinds of tracing you need and the repositories in which the trace information must be written before you begin writing your application.Test case
|Length of String||Number of Calls||Elapsed Time
(hundredths of seconds)
|Call to DBMS_OUTPUT.PUT_LINE Removed||N/A||1000000||11|
Testing. To answer your question, I put together a script—located in Listing 4 —to test the overhead of calling DBMS_OUTPUT.PUT_LINE. I explain it and the results in a bit more detail below, but I will first offer my conclusions to you:
1. The impact on application performance is (here comes one of those classically infuriating answers) application-specific . That is, a call to DBMS_OUTPUT.PUT_LINE all by itself doesn't incur very much overhead. (For example, over 100,000 invocations of my test code, calling DBMS_OUTPUT.PUT_LINE and passing NULL, took .61 seconds. Removing the call to this built-in reduced execution time to .11 seconds.) What can take a lot of time is evaluating the value that you pass to the single string parameter of DBMS_OUTPUT.PUT_LINE. More on this below.
2. The more important question when examining the issue of tracing in one's code is, "What is the best way to do it, from the standpoint of readability and maintainability of code?" (Hey, this is, after all, a best-practices column!) Generally, one's code is more readable when it is not littered with lots of calls to trace programs.
The results are in. Having said all that, let's now take a look at the results of my analysis of the overhead of calling DBMS_OUTPUT.PUT_LINE. I will then conclude with more best-practice- oriented comments.
To construct the script to analyze overhead in DBMS_OUTPUT.PUT_LINE in Listing 4, here is the approach I took:
I used consecutive calls to DBMS_UTILITY.GET_CPU_TIME to compute elapsed time down to the hundredth of a second (you can substitute this program with DBMS_UTILITY.GET_TIME if you are not yet running Oracle Database 10g).
I recorded the results in a database table, because DBMS_OUTPUT must be disabled for at least part of this script and is, in any case, the focus of our analysis and should not be used to record and display results.
For most of the tests, I passed a string that requires some effort to construct each time the built-in is called. Specifically, I used RPAD to create a string of a length specified at the time the test is run, and concatenated to that a date and a Boolean. This may be more work than is usually performed when you call DBMS_OUTPUT.PUT_LINE, but it is surely not outside the realm of normal tracing activity.
I called DBMS_OUTPUT.PUT_LINE with the following variations: output enabled, output disabled, pass just a NULL, and don't call DBMS_OUTPUT.PUT_LINE at all.
Table 1 shows what I found after running this script. As you can see, there is certainly overhead associated with calling DBMS_OUTPUT.PUT_LINE, and the scale of that overhead depends very much on the amount of work it takes to construct and pass the string to DBMS_OUTPUT.PUT_LINE. Is this overhead "considerable" or "minimal"? I cannot answer that question for you—it depends on your application.
Tracing recommendations. I offer the following tracing recommendations:
Never call DBMS_OUTPUT.PUT_LINE directly in your application code. Create a wrapper that accepts the application data you want to see in your trace and then calls DBMS_OUTPUT.PUT_LINE. Why do I suggest this? Prior to Oracle Database 10g, DBMS_OUTPUT.PUT_LINE will raise an error if you try to display strings with more than 255 characters (that limit has now been raised to 32K). Also, you may want to send trace information to a file or table rather than the screen, and a direct call to DBMS_OUTPUT.PUT_LINE forces you to sacrifice that flexibility.
If you are running Oracle Database 10g Release 2, take advantage of conditional compilation so that you can easily "compile out " the trace calls when your program goes into production (unless you need production-time tracing, of course). Now, one issue I believe you will currently encounter when using lots of conditional compilation statements in your code ($IF, $ENDIF, $ERROR, $$ ccflagname , and so on) is that the code is harder to read. I expect, however, that the various PL/SQL editors, such as Toad, SQL Navigator, PL/SQL Developer, and—the latest entry—Oracle SQL Developer, will soon offer toggles to hide this logic as you are editing and reviewing your code, greatly improving that situation.
If you are not yet running Oracle Database 10g Release 2, also consider enclosing trace calls inside your own conditional statement, which might look like this:
IF trace_pkg.trace_enabled THEN trace_pkg.trace (l_my_data); END IF;
With this approach, the argument(s) passed to the trace mechanism will be evaluated only if tracing is enabled. You can, therefore, keep to an absolute minimum the overhead your application will experience when tracing is disabled.
LEARN more about conditional compilation
DOWNLOAD Oracle Database 10g
Photography by Scott Webb, Unsplash