Subscribe

Share

Database, SQL and PL/SQL

Assume the Best; Plan for the Worst

Here’s a technique for delivering better performance through optimistic code programming.

By Connor McDonald

January/February 2016

Beverly works for an IT consultancy that is building an application for a local shopping outlet. It is a fairly straightforward point-of-sale application that records sales transactions as customers move through the checkout aisle. The specification provided to Beverly is equally straightforward:

  • Before each transaction is recorded,
    • Check the shopping outlet ID to make sure it is valid
    • Check the product ID for the item being purchased to make sure it is valid and currently marked as active
    • Check the checkout aisle number to make sure it is valid
    • Make sure the sale amount is less than an alerting threshold set for each outlet
  • Log the transaction.
  • Update the outlet summary table, which shows sales per checkout per day.

Figure 1 shows the tables on which Beverly will be building her application.
o16performance-f1

Figure 1. Beverly’s data model

Translation of the application specification into PL/SQL code is a simple affair, and Beverly produces a first cut at a PL/SQL procedure to perform all the necessary work, as shown in Listing 1.

Code Listing 1: Beverly’s first pass at the LOG_TRANSACTION procedure

create or replace 
procedure LOG_TRANSACTION(
p_shop_outlet_id transactions.shop_outlet_id%type,
p_product_id transactions.product_id%type,
p_checkout_num transactions.checkout_num%type,
p_sales_amount transactions.sales_amount%type ) is
l_validation_check int;
l_operating_date shop_outlet_stats.operating_date%type;
begin
-- check the shop outlet ID to make sure it is valid
begin
select 1
into l_validation_check
from SHOP_OUTLET
where SHOP_OUTLET_ID = p_shop_outlet_id;
exception
when no_data_found then
raise_application_error(-20000,'Invalid shop outlet ID');
end;
-- check the product ID to make sure it is valid and active
begin
select 1
into l_validation_check
from PRODUCT
where PRODUCT_ID = p_product_id
and ACTIVE_IND = 'Y';
exception
when no_data_found then
raise_application_error(-20000,'Invalid product ID or
product is not currently active');
end;
-- check the aisle number
begin
select 1
into l_validation_check
from SHOP_OUTLET
where SHOP_OUTLET_ID = p_shop_outlet_id
and p_checkout_num between CHECKOUT_NUM_MIN and CHECKOUT_NUM_MAX;
exception
when no_data_found then
raise_application_error(-20000,'Invalid checkout aisle number');
end;
-- check the sale threshold
begin
select 1
into l_validation_check
from SHOP_OUTLET
where SHOP_OUTLET_ID = p_shop_outlet_id
and SALE_ALERT_THRESHOLD >= p_sales_amount ;
exception
when no_data_found then
raise_application_error(-20000,'Sales threshold exceeded,
manager assistance required');
end;
-- Then log the transaction, and remember the current date
insert into TRANSACTIONS (txn_timestamp, shop_outlet_id, product_id,
checkout_num, sales_amount)
values ( current_timestamp, p_shop_outlet_id, p_product_id,
p_checkout_num, p_sales_amount )
returning trunc(txn_timestamp) into l_operating_date;
-- Update the outlet summary table showing sales per checkout per day.
merge into SHOP_OUTLET_STATS stats
using (
select p_shop_outlet_id SHOP_OUTLET_ID,
p_checkout_num CHECKOUT_NUM,
l_operating_date OPERATING_DATE,
p_sales_amount SALES_AMOUNT,
p_product_id PRODUCT_ID
from dual ) new_txn
on ( stats.SHOP_OUTLET_ID = new_txn.SHOP_OUTLET_ID
and stats.CHECKOUT_NUM = new_txn.CHECKOUT_NUM
and stats.OPERATING_DATE = new_txn.OPERATING_DATE
and stats.PRODUCT_ID = new_txn.PRODUCT_ID
)
when matched then
update set stats.sales_total = stats.sales_total + new_txn.sales_amount
when not matched then
insert ( stats.shop_outlet_id, stats.product_id, stats.checkout_num,
stats.operating_date, stats.sales_total )
values ( new_txn.shop_outlet_id, new_txn.product_id, new_txn.checkout_num,
new_txn.operating_date, new_txn.sales_amount );
end;

The application will be used for only a few shopping outlets, and its performance will probably be perfectly acceptable in its current form, but Beverly takes a lot of pride in her work, so she looks for some optimizations. Immediately, she can see some redundant SQL calls to the SHOP_OUTLET table that she can consolidate. Having tested the functional correctness of version 1 of her LOG_TRANSACTION procedure, she optimizes the performance of the code in version 2, reducing the number of SQL calls in the procedure to only four, as shown in Listing 2.

Code Listing 2: Beverly’s first pass at the LOG_TRANSACTION procedure, with optimizations

create or replace 
procedure LOG_TRANSACTION(
p_shop_outlet_id transactions.shop_outlet_id%type,
p_product_id transactions.product_id%type,
p_checkout_num transactions.checkout_num%type,
p_sales_amount transactions.sales_amount%type ) is
l_shop_outlet shop_outlet%rowtype;
l_product_exists int;
l_operating_date shop_outlet_stats.operating_date%type;
begin
-- check the shop outlet ID to make sure it is valid
begin
select *
into l_shop_outlet
from SHOP_OUTLET
where SHOP_OUTLET_ID = p_shop_outlet_id;
exception
when no_data_found then
raise_application_error(-20000,'Invalid shop outlet ID');
end;
if p_checkout_num not between l_shop_outlet.checkout_num_min
and l_shop_outlet.checkout_num_max
then
raise_application_error(-20000,'Invalid checkout aisle number');
end if;
if p_sales_amount > l_shop_outlet.sale_alert_threshold
then
raise_application_error(-20000,'Sales threshold exceeded,
manager assistance required');
end if;
-- check the product ID to make sure it is valid
begin
select 1
into l_product_exists
from PRODUCT
where PRODUCT_ID = p_product_id
and ACTIVE_IND = 'Y';
exception
when no_data_found then
raise_application_error(-20000,'Invalid product ID
or product is not currently active');
end;
-- Then log the transaction
insert into TRANSACTIONS (txn_timestamp, shop_outlet_id, product_id,
checkout_num, sales_amount)
values ( current_timestamp, p_shop_outlet_id, p_product_id,
p_checkout_num, p_sales_amount )
returning trunc(txn_timestamp) into l_operating_date;
-- Update the outlet summary table showing sales per checkout per day.
merge into SHOP_OUTLET_STATS stats
using (
select p_shop_outlet_id SHOP_OUTLET_ID,
p_checkout_num CHECKOUT_NUM,
l_operating_date OPERATING_DATE,
p_sales_amount SALES_AMOUNT,
p_product_id PRODUCT_ID
from dual ) new_txn
on ( stats.SHOP_OUTLET_ID = new_txn.SHOP_OUTLET_ID
and stats.CHECKOUT_NUM = new_txn.CHECKOUT_NUM
and stats.OPERATING_DATE = new_txn.OPERATING_DATE
and stats.PRODUCT_ID = new_txn.PRODUCT_ID
)
when matched then
update set stats.sales_total = stats.sales_total + new_txn.sales_amount
when not matched then
insert ( stats.shop_outlet_id, stats.product_id, stats.checkout_num,
stats.operating_date, stats.sales_total )
values ( new_txn.shop_outlet_id, new_txn.product_id, new_txn.checkout_num,
new_txn.operating_date, new_txn.sales_amount );
end;
/

Beverly is very happy with the result. She has implemented the required logic with only four SQL statements, with the SELECT and MERGE statements using unique indexes where appropriate. This is the version that eventually is delivered into production, and the application is a great success.

In fact, it is so successful that a large retail organization has started to take an interest in the application for its own network of shopping outlets. And within a few months, that organization does indeed purchase the application Beverly wrote, and it will be used nationwide.

Beverly is pretty excited. The application she wrote is now going to be used on a much larger scale. It will be handling hundreds of shopping outlets across the country, each with dozens of checkout aisles, which will equate to thousands of transactions per second. And she is confident that she will now reap the rewards of her early discipline: minimizing the SQL statements issued and ensuring that each statement was efficient.


New Challenges at Scale

And then the bad news comes in. Even with Beverly’s careful attention to performance, when the application is scaled up to the extreme levels required by the large retailer, the database server is struggling to cope. There are insufficient resources to process the volume of transactions, and the retailer’s management has asked for a crisis meeting with Beverly’s manager to discuss an emergency purchase of additional hardware capacity. Beverly’s manager asks her to see if anything can be done to avoid this, because the hardware purchase would damage the relationship with the large retailer. Beverly’s manager tells her he has faith in her analysis skills and that he will back her judgment as to whether any further optimizations are possible. If additional optimizations are not possible, he will recommend the additional hardware.

Beverly has always appreciated the level of trust her manager places in her. Even in those rare situations when—like all other developers—she makes an error and delivers a bug, her manager has always supported her to facilitate a quick remediation.

And that gives Beverly an idea. She can do the same with her application. She will assume the best about the data. Why spend all that time validating all the data elements before she logs the transaction, when surely for the vast majority of the time, the data is going to be valid anyway? Of course, analogizing to the support her manager provides when her code has a bug, Beverly knows she will need to provide that “support” for the data when it is not valid. Assuming the validity of the data is one thing, but allowing invalid data to pollute the database is something entirely different!

With the mindset of “Assume the data is valid, but make sure invalid data is rejected,” Beverly builds version 3 of her LOG_TRANSACTION procedure, shown in Listing 3.

Code Listing 3: Beverly’s “assume data is valid” pass at the LOG_TRANSACTION procedure

create or replace 
procedure LOG_TRANSACTION(
p_shop_outlet_id transactions.shop_outlet_id%type,
p_product_id transactions.product_id%type,
p_checkout_num transactions.checkout_num%type,
p_sales_amount transactions.sales_amount%type ) is
l_shop_outlet shop_outlet%rowtype;
l_product_exists int;
l_operating_date shop_outlet_stats.operating_date%type;
begin
-- Log the transaction
insert into TRANSACTIONS (txn_timestamp, shop_outlet_id, product_id,
checkout_num, sales_amount)
values ( current_timestamp, p_shop_outlet_id, p_product_id,
p_checkout_num, p_sales_amount )
returning trunc(txn_timestamp) into l_operating_date;
-- Update the outlet summary table showing sales per checkout per day.
update
( select stats.sales_total
from SHOP_OUTLET_STATS stats,
SHOP_OUTLET shop,
PRODUCT prod
where stats.SHOP_OUTLET_ID = p_shop_outlet_id
and stats.CHECKOUT_NUM = p_checkout_num
and stats.OPERATING_DATE = l_operating_date
and stats.SHOP_OUTLET_ID = shop.shop_outlet_id
and stats.CHECKOUT_NUM between shop.checkout_num_min
and shop.checkout_num_max
and stats.PRODUCT_ID = p_product_id
and stats.PRODUCT_ID = prod.product_id
and shop.sale_alert_threshold >= p_sales_amount
and prod.active_ind = 'Y'
) t
set t.sales_total = t.sales_total + p_sales_amount;
if sql%notfound then
-- check the shop outlet ID to make sure it is valid
begin
select *
into l_shop_outlet
from SHOP_OUTLET
where SHOP_OUTLET_ID = p_shop_outlet_id;
exception
when no_data_found then
raise_application_error(-20000,'Invalid shop outlet ID');
end;
if p_checkout_num not between l_shop_outlet.checkout_num_min
and l_shop_outlet.checkout_num_max
then
raise_application_error(-20000,'Invalid checkout aisle number');
end if;
if p_sales_amount > l_shop_outlet.sale_alert_threshold
then
raise_application_error(-20000,'Sales threshold exceeded,
manager assistance required');
end if;
-- check the product ID to make sure it is valid
begin
select 1
into l_product_exists
from PRODUCT
where PRODUCT_ID = p_product_id
and ACTIVE_IND = 'Y';
exception
when no_data_found then
raise_application_error(-20000,'Invalid product ID or
product is not currently active');
end;
insert into SHOP_OUTLET_STATS
( shop_outlet_id, product_id, checkout_num, operating_date, sales_total )
values ( p_shop_outlet_id, p_product_id, p_checkout_num,
l_operating_date, p_sales_amount );
end if;
end;
/

In this latest version of the procedure, Beverly inserts a transaction without any preliminary validation and then performs an update against an inline view to record the summary statistics. With this approach, if all the data is valid, only two SQL statements will be issued. But she also has crafted her update in such a way that it will enable her to detect problems with the data. The update statement may fail to find a row to update for one of the following reasons:


  1. One of the parameters passed (shop outlet, product, checkout aisle, sales amount) is invalid.
  2. The parameters are indeed valid, but in the case of the first occurrence of a sale for the current day, a row needs to be inserted rather than updated in the SHOP_OUTLET_STATS table.

Beverly can take care of #1 by using the same validation she used in the previous procedure version, with the important difference being that these checks will be run only if the data is genuinely invalid—which should be almost never—or for the first sale of the day. This is a dramatic reduction in the number of times the checks will need to be run.

Once she completes her functional checks of the new version of the LOG_TRANSACTION procedure, Beverly tests the performance benefits she has achieved. She fires up a simple test using the first optimized procedure and the “assume data is valid” procedure, each time against an empty TRANSACTIONS table.

Here’s Beverly’s result for the first optimized procedure:

SQL> set timing on
SQL>
SQL> begin
2 for i in 1 .. 100000 loop
3 LOG_TRANSACTION(
4 1,
5 mod(i,2)+1,
6 mod(i,3)+1,
7 i/100);
8 end loop;
9 end;
10 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:13.03

Here’s Beverly’s result for the “assume data is valid” procedure:

SQL> begin
2 for i in 1 .. 100000 loop
3 LOG_TRANSACTION(
4 1,
5 mod(i,2)+1,
6 mod(i,3)+1,
7 i/100);
8 end loop;
9 end;
10 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:08.40

Beverly reports back to her manager that she has unearthed a nearly 40 percent execution time savings in her code. Thanks to her work, the retailer’s additional hardware purchase is no longer required.


Conclusion

The natural way of thinking about populating a table with data is to always validate the data first and only then allow it to be logged. However, with most applications, you can reasonably expect that data presented to the database has been through at least some cursory checks. Although you should never relax your controls to the extent that invalid data is allowed to sneak through, you can take advantage of an optimistic approach to the quality of the data to get performance benefits for your applications.

Next Steps

 DOWNLOAD Oracle Database 12c.

 ASK a database development question.



Photography by Ricardo Gomez, Unsplash