Built-in sorting optimizations to support analytical SQL
By KLaker on Mar 24, 2014
One of the proof points that I often make for using analytical SQL over more sophisticated SQL-based methods is that we have included specific optimizations within the database engine to support our analytical functions. In this blog post I am going to briefly talk about how the database optimizes the number of sorts that occur when using analytical SQL.
Sort Optimization 1: Ordering Groups
Many of analytical functions include PARTITION BY and/or an ORDER BY clause both of which by definition implies that an ordering process is going to be required. As each function can have its own PARTITION BY-ORDER BY clause this can create situations where lot of different sorts are needed. For example, if we have a SQL statement that included the following:
Rank() Over (Partition by (x) Order by (w)) Sum(a) Over (Partition by (w,x) Order by (z))Ntile() Over (Partition by (x) Order by (y)) Sum(b) Over (Partition by (x,y) Order by (z))
this could involve four different sort processes to take into account the use of both PARTITION BY and ORDER BY clauses across the four functions. Performing four separate sort processes on a data set could add a tremendous overhead (depending on the size of the data set). Therefore, we have taken two specific steps to optimize the sorting process.
The first step is create the notion of "Ordering Groups". This optimizations looks for ways to group together sets of analytic functions which can be evaluated with a single sort. The objective is to construct a minimal set of ordering groups which in turn minimizes the number of sorts. In the example above we would create two ordering groups as follows:
This allows us to reduce the original list of sorts down from 4 to just 2.
Sort Optimization 2: Eliminating Sorts
We can further reduce the number sorts that need to be performed by carefully scheduling the execution so that:
- Ordering groups with sorts corresponding to that in the GROUP BY execute first (immediately after the GROUP BY)
- Ordering groups with sorts corresponding to that in the ORDER BY execute last (immediately before the ORDER BY)
In addition, we can also eliminate sorts when an index or join method (sort-merge) makes sorting unnecessary.
Optimization 3 : RANK Predicates
Where a SQL statement includes RANK() functions there are additional optimizations that kick-in. Instead of sorting all the data, adding the RANK and then applying the predicate, the RANK predicate is evaluated as part of the sort process. The net result is that fewer records are actually sorted, resulting in more efficient execution.
Overall, these three optimizations ensure that as few sorts as possible are performed when you include SQL analytical functions as part of your SQL statements.