By Connor McDonald
November 18, 2019
Like many other people, I suspect, I occasionally daydream about an unexpected windfall.
“...17...37...12....25...and the final number is 28!” blurts out the announcer on the television, and suddenly I am staring at the scrunched-up scrap of paper in my hand and exclaiming, “Oh my goodness...,” along with a selection of other slightly more profane exclamations. “I’ve won the lottery!”
In the blink of an eye, I have countless millions of dollars, and although my financial security and that of my loved ones should be paramount, I am already nurturing a lengthy shopping list of material goods that are equal parts indulgent and unnecessary. My weakness is cars, so in this daydream, I would pop open my 12-car garage each morning, jump into one of the luxury sports cars I have procured, and blast off down the street in an exciting but illegal plume of rear tire smoke. Typically, it is at somewhere around this point that both my custom-paint-job Lamborghini and my countless millions fade back into the realm of fiction as I return to the reality of staring at my losing lottery ticket.
Imagine for a moment that you did get to drive that Lamborghini every day and experience the adrenaline rush of having all that performance at your fingertips—but every day when you pop open that fictional garage, you find the Lamborghini in pieces. The unbridled performance is still there, but first you need to reattach the doors, hoist the car up onto a lift, bolt the wheels back on, and refit the car seats. Only then does that performance finally become available to you.
SQL processing in Oracle Autonomous Database is akin to owning that Lamborghini. The levels of performance there can be mind-blowing, but doing SQL processing poorly in any database is akin to having to build that high-performance car from scratch every morning before you get to drive it. A blazing-fast two-minute commute to the office is not such a fantastic experience if it is preceded by an hour of assembly. And because data access by SQL is at the heart of every database application, ensuring that SQL processing is efficient is critical to achieving Lamborghini-like levels of database performance. To obtain efficiency in SQL processing, the first step is understanding exactly what I mean by SQL processing.
Most computer languages learned in college or used in the workplace have three common phases:
The code is compiled to ensure that it is valid. Compilation errors are corrected until eventually a working program is produced.
The compiled code is executed.
The execution produces an outcome; for example, data is created or some action takes place.
In that respect, SQL is like any other coding language. When a SQL statement is passed to the database, it follows the same three phases:
The SQL statement is compiled. Unlike with other languages, this compilation is normally performed on the fly as the application requests the statement’s execution, but compilation nevertheless must occur to ensure that a SQL statement is correct.
The SQL statement is executed. As with other coding languages, this is the actual work phase of the SQL.
The SQL statement produces an outcome. For example, records in a table are updated, or—in the case of a query—data is presented back to the calling application.
The overwhelming majority of the information about SQL performance in white papers, blog posts, videos, and the standard documentation is strongly focused on the execution phase. Discussion centers on the optimizer plan, what indexes are available, how the data is structured, and so forth. Tuning the SQL execution phase is seen as a complex task requiring high levels of expertise and a considerable time investment for both the developer and the DBA. Thankfully, enhancements in Oracle Autonomous Database are reducing that effort. Features such as automatic indexes, real-time optimizer statistics, and automation of parallelization are helping tackle SQL execution performance concerns automatically—without human intervention. SQL execution performance is not a fully solved problem by any means, but the autonomous database itself is a far more proactive participant than prior databases in supporting successful applications.
However, because the compilation phase of SQL processing is typically invisible to the application (unless the SQL statement is invalid and crashes the application), it is often overlooked as a critical component of SQL performance. The wealth of literature about SQL execution performance and the advancements in the autonomous database all come into play only once a SQL statement has been compiled.
Expanding on the three phases of SQL processing: To process a SQL statement, Oracle Database conducts a more complete sequence of steps.
Syntax check. A syntax check verifies that the SQL to be processed could conceivably be valid in any Oracle Database instance. For example, this SQL statement
1Other activities such as query transformation occur, and it is worth nothing that they add even more to the cost of compiling a SQL statement. But they are outside the scope of the SQL processing discussed in this article.
SQL> select * from emp were empno = 10; select * from emp were empno = 10 * ERROR at line 1: ORA-00933: SQL command not properly ended
cannot be executed in any SQL-based database, due to the misspelling of the
WHERE keyword. No set of circumstances could ever make this SQL statement runnable in any database. However, the converse is not a true: The syntactical correctness of a SQL statement does not guarantee that it can be run in a particular Oracle Database instance.
Semantic check. The SQL statement
select * from emp where empnoo = 10
is syntactically correct, which means there is potentially a database within which this statement could successfully run. The semantic check establishes whether the statement is valid on this database—that is, the one in which it is being processed. In this case, the check fails:
SQL> select * from emp where empnoo = 10; select * from emp where empnoo = 10 * ERROR at line 1: ORA-00904: "EMPNOO": invalid identifier
The misspelling of the
EMPNO column means that the SQL statement does not pass the semantic checking. Even if the SQL statement does contain the correct column and table names and any other object references are also valid entries in the data dictionary, a SQL statement may still not pass semantic checking, because security privileges must also be taken into consideration. The failure of the query in the example
SQL> select * from emp; select * from emp * ERROR at line 1: ORA-00942: table or view does not exist
is not necessarily an indication that the
EMP table does not exist. It might mean instead that the user wanting to execute this SQL statement does not have the necessary security privileges to see the data in the
Optimization. If a SQL statement is both syntactically and semantically valid, it can indeed be executed in this database. But the database must then determine the best means of execution, where defining “best” means deriving the mechanism via which execution will complete in the shortest response time for the user or the application issuing the statement. This is perhaps the true “secret sauce” of the entire compilation phase, where the myriad potential execution permutations must be assessed and their costs evaluated by the optimizer to determine the shortest execution time, using statistics about the data contained in the data dictionary. Countless pages and books have been authored about the power and intricacies of the optimizer in Oracle Database.
Row source generation. Once the best execution option—the optimizer “plan”—has been determined, this plan is converted into a more tangible set of operations for the database to perform. For example, an optimizer plan may make use of an index, but the row source generation takes that plan and produces a set of instructions related to that index: “Walk down the index blocks, use the key and rowid from the index structure to probe the table blocks, etc., etc.” The row source builds the true set of instructions that will ultimately govern the execution of the SQL statement.
Execution. Finally, the SQL statement is ready to do the work it was requested to do. The SQL engine takes the row source information as input and commences to read the desired information from the database and/or write to it.
Process results. If the SQL statement was a
SELECT or contains a
RETURNING clause, the calling application will most probably consume the output from that query.
It is apparent from the above sequence of steps that a lot of compilation work occurs in the database engine before the SQL statement can be executed by the database. Compiling a SQL statement is typically referred to as parsing a SQL statement, and any work the database is doing that is not directly focused on the execution phase of a SQL statement is—from the application’s or user’s perspective—a waste of database resources. It might even be the dominant component of resource consumption, as you’ll see shortly, so the challenge for the database is how to reduce the cost of parsing.
If an application needs to execute 10,000 SQL statements during normal operation, a reasonable assumption is that those 10,000 SQL statements must complete the aforementioned steps before they can be executed. It seems that parsing is just an unavoidable cost of running SQL statements in a relational database system. For some other relational database products, this is true, but two enhancements exist in Oracle Database that provide opportunities to reduce the amount of parsing that is performed without reducing the associated number of executions of SQL statements.
The library cache. The first opportunity is based on the observation that an application might run the same SQL statements multiple times. Dozens of users might be using the same function, such as “Show all sales for today.” If a SQL statement has been parsed once and proven to be valid, then as long as the underlying database structures that the SQL statement references and the security privileges of the database schema executing the SQL statement have not been altered, subsequent executions are implicitly valid. Oracle Database includes a memory store called the library cache, and each time a SQL statement is parsed, the database performs an initial check on the cache to see if that SQL statement has already been parsed. If the statement is new, a complete parse—also known as a “hard parse”—must be performed, but if the statement is already present in the library cache, the expensive activities of a hard parse can be avoided and the information in the library cache from the previous hard parse can be reused.
Figure 1 shows a representation of how the library cache might support the reuse of the information from a previous hard parse of the same SQL statement. There are 11 SQL statement executions, but two parses are avoided due to repeated executions of the same statement.
Figure 1: Statements processed top to bottom, with two parses avoided
Savvy readers will be quick to point out that, given the variety of SQL statements present within an application and the need of individual application users to query data that is relevant to them, the probability of the same SQL statement’s being already present in the library cache seems extremely low, and this is true. However, recall what the task of parsing is: to ensure that the SQL is a valid statement for the database it is being executed on and to determine a suitable optimizer plan to execute the statement as efficiently as possible. From that perspective, two SQL statements that are textually different could be considered “nearly” the same. For example, consider these two SQL statements:
select surname, firstname from emp where empno = 123 select surname, firstname from emp where empno = 456
Assuming that these statements are both parsed from the same database schema and are thus both referring to the same definition of the table named
EMP, the hard parse effort for the first statement is sufficient to indicate that the second statement must also be valid. Also, assuming that the
EMPNO column is the primary key for the
EMP table, the likely optimizer plan for the first statement is a unique index primary key lookup for the provided
EMPNO value, and this too is sufficient to indicate that the second statement will have the same optimizer plan. These two SQL statements can be considered equivalent if the difference in the literal values is ignored.
Bind variables. The second enhancement that facilitates the reduction of hard parsing is the concept of binding. If two SQL statements differ only by the values within a literal, Oracle Database enables parsing to be performed with a placeholder that is representative of any potential value that might be provided. The above query of the
EMP table can be tackled now as
select surname, firstname from emp where empno = <unknown value>
Then the information gleaned from the single hard parse of this new statement can be used for both executions of the statement shown earlier, where
<unknown value> is 123 and 456, respectively. The terminology used for the unknown value is typically “bind variable” or “placeholder.” The syntax for specifying a bind variable differs from language to language, but the usage pattern is the same. The parse phase is separated from the execution phase to enable differing values to be used for multiple executions.
Here’s a Java example:
PreparedStatement pstmt = con.prepareStatement("select pk from parse_demo where pk = ?");
Here’s a PL/SQL example:
dbms_sql.parse(c,' select pk from parse_demo where pk = :bindvar');
Returning to the set of SQL statements in Figure 1, Figure 2 shows how the use of bind variables demonstrates a greater likelihood of SQL statement reuse, which will yield an equivalent reduction in the amount of parsing to be performed.
Figure 2: Statements with bind variables, now avoiding five parses instead of two
To benchmark the cost of parsing and hence observe the potential performance benefits of avoiding parse overheads, I prepared two very simple Java routines. Figure 3 shows a routine that uses the string concatenation operator (
+) to construct a SQL statement and therefore will present 10,000 unique SQL statements to the database. The routine in Figure 4 will perform the same 10,000 executions, but it separates the execution into two parts: a
prepareStatement call to parse a single SQL statement with a bind variable for the predicate on the
PK column, and then a
executeQuery pair of calls to execute the SQL statement for 10,000 different values of that bind variable.
Figure 3: 10,000 SQL statements differing only by a literal value
Figure 4: 10,000 SQL statements now treated as the same statement, utilizing a bind variable
I ran the Figure 3 and Figure 4 routines on a virtual machine on Oracle Cloud Infrastructure’s Always Free Autonomous Database, with the following results:
[opc@cmcdonald ~]$ java ParseLiterals Running 10,000 iterations... Elapsed: 10932ms [opc@cmcdonald ~]$ java ParseBinds Running 10,000 iterations... Elapsed: 909ms
The 12x performance improvement with bind variables is very impressive, but it also reveals an important detail about the cost of parsing. The second benchmark implies that the true execution cost—that is, the work to actually retrieve the requested row of data—is 909 ms, or slightly less, given the small overheads in the Java language execution itself. This means that in the first benchmark, 10 of the 11 seconds was entirely parsing effort—not an uncommon result. For simple SQL statements such as a primary key lookup of data, the parsing cost can often dramatically outweigh the cost of doing the actual work required of the SQL statement!
I encounter this common rebuttal to the benefits of using bind variables, but it is actually a compliment to the incredible performance of Oracle Database. The first benchmark, although it takes 12 times as long as the second one, is still producing nearly 1,000 SQL statement executions per second. Even with the syntax checking, the semantic checking, and the evaluating of the full set of optimizer plans, the database engine can complete 1,000 parses per second. Developers often tell me that this is more than enough for their requirements. Many applications will never need 1,000 SQL statement executions per minute, let alone per second.
But applications do not perform primary key lookups only. The power of the relational database system is that datasources can be joined, aggregated, and filtered, all with the simplicity of a single SQL statement. Whether an application is submitting handcrafted or automatically generated SQL statements to the database, the parsing costs increase as the complexity of those SQL statements increases. The impact of parsing in such situations can be much larger. Figure 5 shows some Java code in which the SQL statement being parsed is a join between two data dictionary views, each of which, in turn, contains references to more views. Concatenation of the literal value means that each execution in the loop will be a unique SQL statement that needs to be fully parsed.
Figure 5: Parsing for more-realistic application SQL requirements
For the Java code and concatenation in Figure 5, the argument to the program is the number of iterations of the SQL statement to perform.
[opc@cmcdonald ~]$ java ParseComplex 10 Running 10 iterations... Elapsed: 3137ms
This program achieves only three executions per second, and this rate does not get better with repetition. The penalty scales linearly as the number of iterations is increased.
[opc@cmcdonald ~]$ java ParseComplex 20 Running 20 iterations... Elapsed: 6219ms
Compare this with the bind variable alternative where the separation of the code into
executeQuery phases has been done to ensure that repeated SQL executions are found in the library cache.
[opc@cmcdonald ~]$ java ParseComplexWithBind 20 Running 20 iterations... Elapsed: 302ms
You have already seen the performance benefits of binding, but note how much better the solution now scales.
[opc@cmcdonald ~]$ java ParseComplexWithBind 200 Running 200 iterations... Elapsed: 195ms
It might seem counterintuitive that the 200 iterations are faster than the 20 iterations, but the 200 iterations are getting the benefit of the pre-existing SQL statement information’s already being in the library cache from the previous benchmark of 20 iterations. And this solution continues to scale spectacularly well.
[opc@cmcdonald ~]$ java ParseComplexWithBind 2000 Running 2000 iterations... Elapsed: 414ms [opc@cmcdonald ~]$ java ParseComplexWithBind 20000 Running 20000 iterations... Elapsed: 3014ms
When customers invite me to visit to talk about hardware capacity planning, the first thing I often ask them is to see performance statistics about their parsing. It is sometimes possible to entirely avoid a hardware upgrade just by ensuring that SQL statement processing is being done in a manner that takes advantage of the benefits of the library cache in Oracle Database.
Oracle Autonomous Database is indeed a Lamborghini in the database world. It offers incredible levels of performance without the traditional management overhead that used to be associated with a finely tuned Oracle Database instance. However, those benefits are still bound to developers’ having a solid understanding of how SQL is processed in Oracle Database. Parsing a SQL statement is a CPU-intensive activity, and failure to take advantage of the library cache and binding in high-frequency SQL execution applications can result in excessive resource consumption and poor response times.
Yes, the performance impact of SQL parsing is a critical factor in the success of applications running on Oracle Database, but in the next article, I’ll discuss an issue closely related to SQL parsing that might be much more important: Poor SQL parsing can be indicative of a security issue so severe that your entire enterprise could be at risk from hackers.
LEARN more about SQL processing.
DOWNLOAD Oracle Database 19c.
Illustration by Wes Rowell