Subscribe

Share

DBA

Old Dog, New Tricks, Part 2

Here’s a new SQL syntax for hierarchy processing.

By Connor McDonald

September/October 2018

In my previous article, I described ways database professionals have historically been working with hierarchy structures inside Oracle Database. The CONNECT BY and START WITH clauses have been around since Oracle Database version 2, and using them is the most common way of coding hierarchical SQL processing. However, ANSI-standard SQL does not recognize CONNECT BY and START WITH—these are exclusively Oracle-specific functions. The ANSI standard employs an alternative mechanism for hierarchical data relationships, but before I tackle that, I’ll set the foundation by returning to some basic principles of navigating a hierarchy.

First Principles

To consider hierarchy processing from the standpoint of first principles, I’ll take a simple hierarchical query against the EMP table with the output left-padded to demonstrate the depth of employees in the hierarchy, as shown in Listing 1.

Listing 1: A simple hierarchical query

SQL> select rpad(' ',level)||ename, 
  2  from emp
  3  start with mgr is null
  4  connect by prior empno = mgr

ENAME                      
———————————————————————————
 KING
   BLAKE
     JAMES
     ALLEN
     WARD
   CLARK
     MILLER

In this query, I start with employees who have no manager (MGR is NULL) and then apply a CONNECT BY clause to traverse the hierarchy. But what if no explicit hierarchical syntax was possible? How would a hierarchy be traversed with standard SQL commands? To achieve this, I can break the query into its constituent parts.

First I need to extract the rows where the manager is not specified, as shown in Listing 2.

Listing 2: First level of the hierarchy

SQL> select empno, ename, mgr
  2  from   emp
  3  where  mgr is null;

     EMPNO ENAME                       MGR
—————————— ———————————————————— ——————————
      7839 KING

I have now discovered the first, or top, level of the hierarchy. Now I must find all employees whose manager is KING. I could construct a new query that explicitly references WHERE MGR = ‘ KING’, but in general what I need is the output from the query in Listing 2 to become an input into a query to derive the next level of the hierarchy. I can do that with a nonhierarchical query by using a standard join. In Listing 3, I nest the initial query locating those rows where the manager is null as an inline view into a wrapping query that joins the MGR column to the EMPNO column to get the employees in the next level in the hierarchy. This yields the three employees—JONES, BLAKE, and CLARK—who report to KING.

Listing 3: Second level of the hierarchy via a join

SQL> select e2.empno, e2.ename, e2.mgr
  2  from   emp e2,
  3         ( select empno, mgr
  4           from   emp
  5           where  mgr is null ) inner
  6  where  e2.mgr = inner.empno;

     EMPNO ENAME                       MGR
—————————— ———————————————————— ——————————
      7566 JONES                      7839
      7698 BLAKE                      7839
      7782 CLARK                      7839

Now the second level in the hierarchy has been established, and it returns a set of rows that is the input for deriving the third level of the hierarchy. I take the query from Listing 3, nest it again as an inline view into a wrapper query, and join MGR to the EMPNO column, as shown in Listing 4.

Listing 4: Third level of the hierarchy via a join

SQL> select e3.empno, e3.ename, e3.mgr
  2  from   emp e3,
  3         ( select e2.empno, e2.ename, e2.mgr
  4           from   emp e2,
  5                  ( select empno, mgr
  6                    from   emp
  7                    where  mgr is null ) inner
  8           where  e2.mgr = inner.empno ) inner
  9  where  e3.mgr = inner.empno;

     EMPNO ENAME                       MGR
—————————— ———————————————————— ——————————
      7902 FORD                       7566
      7788 SCOTT                      7566
      7900 JAMES                      7698
      7844 TURNER                     7698
      7654 MARTIN                     7698
      7521 WARD                       7698
      7934 MILLER                     7782

Finally, because there are four levels in the employee hierarchy, I can nest the query from Listing 4 to reveal the people who sit at the very bottom layer of the hierarchy, as shown in Listing 5.

Listing 5: Fourth (and last) level of the hierarchy via a join

SQL> select e4.empno, e4.ename, e4.mgr
  2  from   emp e4,
  3         (  select e3.empno, e3.ename, e3.mgr
  4            from   emp e3,
  5                  ( select e2.empno, e2.ename, e2.mgr
  6                    from   emp e2,
  7                           ( select empno, mgr
  8                             from   emp
  9                             where  mgr is null ) inner
 10                    where  e2.mgr = inner.empno ) inner
 11           where  e3.mgr = inner.empno ) inner
 12  where  e4.mgr = inner.empno;

     EMPNO ENAME                       MGR
—————————— ———————————————————— ——————————
      7876 ADAMS                      7788

Obviously, this approach is untenable in general. The number of levels in the hierarchy is never known in advance, so it is never known how many nested inline views are required to entirely traverse the hierarchy. However, this approach does reveal that processing a hierarchy via SQL is a recursive operation, in that a set of output data from one query becomes the input into a similar wrapping query. This leads naturally to the new form of hierarchy processing that is available from Oracle Database 11g Release 2 onward—the ANSI-standard SQL approach of using recursive common table expressions (CTEs).

Simpler SQL with WITH

Before I describe recursive CTEs, it is worth covering the simpler case of nonrecursive CTEs.

First a virtual table consisting of a query expression can be defined with the WITH clause, enabling that table to be referenced throughout the remainder of the query. Listing 6 shows an example of the WITH clause that refers to a virtual table named LAST_HIRE, which is the query expression for deriving the most recent hiring date per department in the EMP table.

Listing 6: Basic usage of the WITH clause

SQL> WITH last_hire AS
  2  ( select deptno, max(hiredate)
  3    from   emp
  4    group by deptno
  5  )
  6  select * from last_hire;

    DEPTNO MAX(HIRED
—————————— —————————
        30 03-DEC-81
        20 12-JAN-83
        10 23-JAN-82

At first glance, the WITH clause may seem like merely a waste of keystrokes, but its true benefit can be seen when the functional requirements of a query become more complex. Consider this business requirement given to a developer—

“Find the total salary paid by each department, get the average of these totals, and list those departments above that average.”

—which needs to be expressed with a SQL statement. Rather than tackling the entire task holistically, developers using the WITH clause can tackle the task in smaller, more manageable pieces, much as they would in their 3GL application code. I can start with the first part of the requirement—

“Find the total salary paid by each department”

—and construct an appropriate query for just that component, as shown in Listing 7.

Listing 7: Part 1, finding the total

SQL> SELECT dname, SUM(sal) dept_sal
  2  FROM   emp e, dept d
  3  WHERE  e.deptno = d.deptno
  4  GROUP  BY dname;

Using a WITH statement, I can now assume the existence of the Listing 7 query as if it were a table in the database. Given a virtual table containing that data (dept_salaries), I can then construct a query to handle the second part of the requirement—

“get the average of these totals”

—as shown in Listing 8.

Listing 8: Part 2, getting the average

SQL>  WITH dept_salaries AS (
  2        SELECT dname, SUM(sal) dept_sal
  3        FROM emp e, dept d
  4        WHERE e.deptno = d.deptno
  5        GROUP BY dname),
  6   SELECT AVG(dept_sal) avsal
  7   FROM  dept_salaries;

The final part of the requirement—

“list those departments above that average”

—implies that the Listing 8 result containing average departmental salaries needs to be referenced in a subsequent SQL statement, so it makes sense to also assign that query expression a virtual table name (avg_sal), as shown in Listing 9.

Listing 9: Creating another virtual table name

SQL>  WITH dept_salaries AS (
  2        SELECT dname, SUM(sal) dept_sal
  3        FROM emp e, dept d
  4        WHERE e.deptno = d.deptno
  5        GROUP BY dname),
  6     avg_sal AS ( SELECT AVG(dept_sal) avsal
  7                  FROM dept_salaries);

With the two query expressions aliased, I can complete the three-step task by joining the two virtual tables to find the appropriate departments, as shown in Listing 10.

Listing 10: Part 3, listing the departments

SQL>  WITH dept_salaries AS (
  2        SELECT dname, SUM(sal) dept_sal
  3        FROM emp e, dept d
  4        WHERE e.deptno = d.deptno
  5        GROUP BY dname),
  6     avg_sal AS ( SELECT AVG(dept_sal) avsal
  7                  FROM dept_salaries)
  8   SELECT * FROM dept_salaries d, avg_sal a
  9   WHERE d.dept_sal > a.avsal
 10   ORDER BY d.dname;

Building SQL in piecemeal fashion with CTEs is a great way of solving complicated query problems. It also assists in the maintenance of the SQL, because it reveals the thought process by which the original author of the SQL statement tackled the task.

Recursive CTEs

Oracle Database 11g Release 2 and later releases support an extension to CTEs. The WITH syntax can also be recursive—that is, a table you are defining with a query expression using the WITH clause can contain references to the table currently being defined. Earlier I described how navigating hierarchical data can be tackled as a series of self-referencing queries—recursive CTEs are a natural solution for this task. To demonstrate the structure of the new hierarchical syntax, I’ll return to the first principles I used for navigating the employer hierarchy via its component parts.

Listing 11 shows the definition of a table called EACH_LEVEL, the name indicating that it will navigate each level of the hierarchy. The definition is a UNION ALL of two queries. The first part of the UNION ALL is the entry point into the hierarchy, equivalent to START WITH in the conventional hierarchical SQL syntax style, retrieving all employees who have no manager. The second query following the UNION ALL is where the recursive nature of the WITH statement becomes apparent. I am querying the EMP table and then joining back to the EACH_LEVEL table, even though I am currently defining the EACH_LEVEL table. The recursive nature demonstrates a concept similar to when I was repeatedly creating and running inline queries. In this case, EACH_LEVEL is referring back to itself, so each time I encounter the recursive part of the definition, I navigate recursively through the structure, drilling deeper into the hierarchy.

Listing 11: The recursive CTE

SQL> with EACH_LEVEL (empno, name, mgr) as
  2  ( --
  3    -- start with
  4    --
  5    select empno, ename, mgr
  6    from   emp
  7    where  mgr is null
  8    --
  9    -- connect by
 10    --
 11    union all
 12    select emp.empno, emp.ename, emp.mgr
 13    from   emp, EACH_LEVEL
 14    where  emp.mgr = each_level.empno
 15  )
 16  select *
 17  from   each_level;

     EMPNO NAME                   MGR
—————————— ——————————————— ——————————
      7839 KING                      
      7566 JONES                 7839
      7698 BLAKE                 7839
      7782 CLARK                 7839
      7499 ALLEN                 7698
      7521 WARD                  7698
      7654 MARTIN                7698
      7788 SCOTT                 7566
      7844 TURNER                7698
      7900 JAMES                 7698
      7902 FORD                  7566
      7934 MILLER                7782
      7369 SMITH                 7902
      7876 ADAMS                 7788

The two sides of the UNION ALL are analogous to two clauses in the conventional hierarchical syntax, in that an entry point into the hierarchy is defined, followed by the definition of how to use CONNECT BY or recursively traverse the hierarchy. It is the developer who now takes control of the conventional pseudofunctions available in the conventional hierarchical syntax. For example, if I need to print out the depth of the hierarchy as I would typically do with the LEVEL pseudofunction, I can add a column to increment the level each time I recursively pass through EACH_LEVEL. Listing 12 shows this in action. The upper part of the UNION ALL must by definition be level = 1, because it is the entry point, and as the recursive statement under the UNION ALL is executed, the level increments by 1 each time I recursively navigate through the hierarchy.

Listing 12: Incrementing the level recursively

SQL> with each_level (empno, name, mgr, rlevel) as
  2  ( select empno, ename, mgr, 1 rlevel
  3    from   emp
  4    where  mgr is null
  5    union all
  6    select emp.empno, emp.ename, emp.mgr, rlevel+1
  7    from   emp, each_level
  8    where  emp.mgr = each_level.empno
  9  )
 10  select * from each_level;

     EMPNO NAME                   MGR     RLEVEL
—————————— ——————————————— —————————— ——————————
      7839 KING                                1
      7566 JONES                 7839          2
      7698 BLAKE                 7839          2
      7782 CLARK                 7839          2
      7499 ALLEN                 7698          3
      ...

Unfortunately, I cannot use the name LEVEL for my virtual table column, because it is reserved for its built-in predecessor. Similarly, to access the SYS_CONNECT_BY_PATH function to concatenate children in their hierarchy with their respective parent rows, I can build it myself by using the standard concatenation operators in the recursive definition. At the lowest level of the hierarchy, an employee name concatenation is just the employee name itself, but within the recursive definition under the UNION ALL, I take the existing concatenation of employee names so far and append the employee name I’m currently navigating. This builds the concatenated list of employee names just as the SYS_CONNECT_BY_PATH function does, as demonstrated in Listing 13.

Listing 13: Building the concatenated list of employee names

SQL> with each_level (empno, name) as
  2  ( select empno, ename from  emp
  3    where  mgr is null
  4    union all
  5    select e.empno,
  6           each_level.name||'-'||e.ename
  7    from   emp e, each_level
  8    where  e.mgr = each_level.empno
  9  )
 10  select empno, name from each_level;

     EMPNO NAME
—————————— ————————————————————————————————
      7839 KING
      7566 KING-JONES
      7698 KING-BLAKE
      7782 KING-CLARK
      7499 KING-BLAKE-ALLEN
      7521 KING-BLAKE-WARD
       ...

Extensions to the Syntax

In the same way that, by default, a conventional syntax hierarchical query will fail if a cycle is detected, a recursive WITH definition will also fail if a cycle is detected, albeit with a different error code. Listing 14 shows the recursive WITH definition attempt and error.

Listing 14: Recursive WITH definition detecting cycle and returning error message

SQL> with each_level (empno, name, mgr) as
  2  ( select empno, ename, mgr
  3    from   emp
  4    where  ename = 'KING'
  5    union all
  6    select emp.empno, emp.ename, emp.mgr
  7    from   emp, each_level
  8    where  emp.mgr = each_level.empno
  9  )
 10  select *
 11  from   each_level;

ERROR:
ORA-32044: cycle detected while executing recursive WITH query

You can detect cyclic relationships and avoid errors by extending the definition with the CYCLE clause, as shown in Listing 15. The behavior is slightly different from the way cycles are handled in the conventional hierarchical syntax, in that cyclic rows are still presented in the result set but a new column is added with a single character flag indicating whether this row is a member of a cyclic relationship.

Listing 15: Recursive WITH, cycle, CYCLE clause, and new column

SQL> with each_level (empno, name, mgr) as
  2  ( select empno, ename, mgr from emp
  3    where  ename = 'KING'
  4    union all
  5    select emp.empno, emp.ename, emp.mgr
  6    from   emp, each_level
  7    where  emp.mgr = each_level.empno )
  8  CYCLE mgr SET is_cycle TO 'Y' DEFAULT 'N'
  9  select * from each_level;

     EMPNO NAME              MGR IS_CYCLE
—————————— —————————— —————————— ——————————
      7839 KING             7499 N
      7566 JONES            7839 N
      7521 WARD             7698 N
      7839 KING             7499 Y
      7876 ADAMS            7788 N
      ...

The recursive CTE syntax allows for some additional facilities in terms of how the hierarchy should be traversed. You can navigate any hierarchy either by taking “slices” in a top-down approach or by simply working down along a branch until all the leaves are exhausted and then working back up through the branches in the same way someone might navigate through a maze. Each mechanism is available to developers using the SEARCH clause in the recursive CTE definition. I can search breadth-first or depth-first, as shown in Listing 16, with the ordering of the results achieved by the SET clause, which nominates a new column that yields a sequence number defining the order in which the rows should be presented.

Listing 16: Recursive CTE, SEARCH, and SET clause

SQL> with each_level (empno, name, hiredate, mgr) as
  2  ( select empno, ename, hiredate, mgr from emp
  3    where  ename = 'KING'
  4    union all
  5    select e.empno,
  6      each_level.name||'-'||e.ename, e.hiredate, e.mgr
  7    from   emp e, each_level
  8    where  e.mgr = each_level.empno )
  9  SEARCH BREADTH FIRST BY HIREDATE SET IDX
 10  select name, hiredate, idx  from each_level;

NAME                             HIREDATE         IDX
———————————————————————————————— ————————— ——————————
KING                             17-NOV-81          1
KING-JONES                       02-APR-81          2
KING-BLAKE                       01-MAY-81          3
KING-CLARK                       09-JUN-81          4
KING-BLAKE-ALLEN                 20-FEB-81          5
KING-BLAKE-WARD                  22-FEB-81          6
...
KING-JONES-FORD-SMITH            17-DEC-80         13
KING-JONES-SCOTT-ADAMS           23-MAY-87         14

SQL> with each_level (empno, name, hiredate, mgr) as
  2  ( select empno, ename, hiredate, mgr from emp
  3    where  ename = 'KING'
  4    union all
  5    select e.empno,
  6      each_level.name||'-'||e.ename, e.hiredate, e.mgr
  7    from   emp e, each_level
  8    where  e.mgr = each_level.empno )
  9  SEARCH DEPTH FIRST BY HIREDATE SET IDX
 10  select name, hiredate, idx  from each_level;

NAME                             HIREDATE         IDX
———————————————————————————————— ————————— ——————————
KING                             17-NOV-81          1
KING-JONES                       02-APR-81          2
KING-JONES-FORD                  03-DEC-81          3
KING-JONES-FORD-SMITH            17-DEC-80          4
KING-JONES-SCOTT                 19-APR-87          5
KING-JONES-SCOTT-ADAMS           23-MAY-87          6
KING-BLAKE                       01-MAY-81          7
KING-BLAKE-ALLEN                 20-FEB-81          8
...

Alternative Use Cases

The ability to run operations recursively via a single SQL statement gives rise to the ability to do recursive problem solving even when the data is not necessarily hierarchical in nature. If I need to iterate through some data an unknown number of times, recursion is a very effective tool for achieving this. Here is an example of using recursive CTEs when it does not immediately appear to be a hierarchical problem. I have a table called MESSAGES that includes some sentences containing names of my fellow developer advocates and myself. I would like to replace those names with the other people’s respective Twitter handles. To do that, I have a lookup table called TWITTER_HANDLES with the terms I will replace and the appropriate Twitter handles to replace them, as shown in Listing 17.

Listing 17: Looking up Twitter handles

SQL> select * from messages;

TXT
————————————————————————————————————————————————————————————————————
I caught up with Connor and Maria Colgan today. They have taken over 
AskTOM for Oracle Developers

SQL> select * from twitter_handles;

  ID  TERM                        HANDLE
————  ——————————————————————————  ———————————————
   1  Connor McDonald             @connor_mc_d
   2  Connor                      @connor_mc_d
   3  Maria Colgan                @sqlmaria
   4  Oracle Developers           @otndev
   5  Oracle                      @oracle
   6  AskTOM                      @oracleasktom

This means that for each of the rows in TWITTER_HANDLES, I need to continually iterate through all the rows of data in my MESSAGES table, each time replacing a found term with its appropriate Twitter handle until all the terms have been exhausted, as shown in Listing 18.

Listing 18: Selecting tweets and replacing names with Twitter handles

SQL> with
  2    tweetised(ind,tweet_txt)  as
  3  (
  4    select 1 ind, txt tweet_txt
  5    from   messages
  6    union all
  7    select ind+1, replace(tweet_txt,term,handle)
  8    from   tweetised, twitter_handles
  9    where  ind = id
 10  )
 11  select * from tweetised;

      IND  TWEET_TXT
—————————  —————————————————————————————————————————————————————————
        1  I caught up with Connor and Maria Colgan today. They have 
           taken over AskTOM for...

        2  I caught up with Connor and Maria Colgan today. They have
           taken over AskTOM for...

        3  I caught up with @connor_mc_d and Maria Colgan today. 
           They have taken over AskTOM for...

        4  I caught up with @connor_mc_d and @sqlmaria today. They 
           have taken over AskTOM for ...

        5  I caught up with @connor_mc_d and @sqlmaria today. They 
           have taken over AskTOM for @otndev

        6  I caught up with @connor_mc_d and @sqlmaria today. They 
           have taken over AskTOM for @otndev

        7  I caught up with @connor_mc_d and @sqlmaria today. They 
           have taken over @oracleasktom ...

I am generating multiple rows of output for each single row of data in my messages table, with each new row containing an additional term replaced with its associated Twitter handle. Hence, simply by picking up the very last row, using a FETCH FIRST clause, I will display the result I need. I recursively run a REPLACE operation, using the recursive nature of the WITH statement, as shown in Listing 19.

Listing 19: Picking the last row with FETCH FIRST

SQL> with
  2  tweetised(ind,tweet_txt)  as
  3  (
  4    select 1 ind, txt tweet_txt
  5    from   messages
  6    union all
  7    select ind+1, replace(tweet_txt,term,handle)
  8    from   tweetised, twitter_handles
  9    where  ind = id
 10  )
 11  select * from tweetised
 12  order by ind desc
 13  fetch first 1 row only;

       IND  TWEET_TXT
——————————  ——————————————————————————————————————————————————————————
         7  I caught up with @connor_mc_d and @sqlmaria today. They...

Summary

Oracle Database continues to support the conventional START WITH and CONNECT BY syntax, but don’t be afraid to expand your SQL toolkit by utilizing recursive common table expressions to achieve the same results. As shown in this article, you might also find uses for these expressions beyond standard hierarchical relationships.

Next Steps

LEARN more about hierarchical SQL processing.

DOWNLOAD Oracle Database 12c.

TRY Oracle Database cloud services.

Illustration by Wes Rowell