When you start writing SQL, you’re likely to hear “SQL is a set-based language” and advice like “avoid row-by-row processing”.
But what do these mean?!
I struggled to understand these terms when learning SQL. I discovered their meaning early in my career - the hard way!
I was helping build a factory scheduling and planning system. We were onsite to install it at the latest plant, busy patching bugs and adding missing functionality.
During the rollout I was adding a feature to mark a production process as complete. This needed to update several rows.
Only a few months into the job I was still getting to grips with SQL. I didn’t understand how the database would know which rows to change.
So I wrote a process to loop through the data, using the table’s primary key to change one row on each iteration:
for ... loop update some_table set col = ... where primary_key = loop_var; end loop;
This is what row-by-row processing means. Running SQL statements many times, reading or writing one row each time. If you want sloooooow database code, this is the best way to do it!
But in this case the problem was even worse. Somehow I’d managed to create an infinite loop.
To keep the factory running I needed to get the changes live. So with minimal testing I deployed the infinite loop to production.
Shortly after the CPUs on the database server maxed out. The loop had started and brought the system to its knees.
I realized my mistake fast.
I hastily added an exit condition so the loop ended after changing the data. We killed the runaway process, pushed the new code to production, and all was back to normal.
While this solved the immediate problem, it took me a while to realize there’s a much better way to code this process.
Remove the loop!
An update
changes all the rows it selects. Instead of a loop, I should have added a where
clause to find all the data to change:
update some_table set col = ... where rows_to_change = 'Y';
Calling this runs the update
once. The database changes all the rows identified by the where
clause in one go. The optimizer figures out the fastest way to do this.
This is what “SQL is a set-based language” refers to. Every statement should get or change all the rows in one execution. You tell the database what to do, it figures out how to do it. This is key for efficient SQL.
So remember; if you want slow database code:
To write fast SQL, get or change all the rows in one go. Here are some pointers on how to do this for different types of statements:
where
clause for queries should find all the rows you need to fetchinsert into … select …
Switching to set-based thinking takes practice. To help you with this take our free SQL quizzes on Oracle Dev Gym.
You'll also need to rethink some tasks that are obvious with procedural coding.
For example, how do you continue processing if one row throws an error? How do you get the values of updated or deleted rows? What if you need to mix procedural processes like sending email with set-based SQL?
Oracle Database has tools to help you address these objectives:
Using these allows you to keep the performance advantages of set processing while keeping the error handling benefits of procedural processing.
But most importantly of all:
Ensure you test code before pushing it to production!
Want to learn SQL? Take our free SQL beginner’s course Databases for Developers: Foundations.
Chris Saxon is an Oracle Developer Advocate for SQL. His job is to help you get the best out of the Oracle Database and have fun with SQL!
To help you with this he blogs at All Things SQL. He also creates videos combining SQL and magic on YouTube at the The Magic of SQL.
If you have questions about working with Oracle Database technology, please reach out to him. You can do this via Twitter or on Ask Tom.
Previous Post