By Connor McDonald
In my previous article, I described ways database professionals have historically been working with hierarchy structures inside Oracle Database. The CONNECT BY and START WITH clauses have been around since Oracle Database version 2, and using them is the most common way of coding hierarchical SQL processing. However, ANSI-standard SQL does not recognize CONNECT BY and START WITH—these are exclusively Oracle-specific functions. The ANSI standard employs an alternative mechanism for hierarchical data relationships, but before I tackle that, I’ll set the foundation by returning to some basic principles of navigating a hierarchy.
To consider hierarchy processing from the standpoint of first principles, I’ll take a simple hierarchical query against the EMP table with the output left-padded to demonstrate the depth of employees in the hierarchy, as shown in Listing 1.
Listing 1: A simple hierarchical query
SQL> select rpad(' ',level)||ename, 2 from emp 3 start with mgr is null 4 connect by prior empno = mgr ENAME ——————————————————————————— KING BLAKE JAMES ALLEN WARD CLARK MILLER
In this query, I start with employees who have no manager (MGR is NULL) and then apply a CONNECT BY clause to traverse the hierarchy. But what if no explicit hierarchical syntax was possible? How would a hierarchy be traversed with standard SQL commands? To achieve this, I can break the query into its constituent parts.
First I need to extract the rows where the manager is not specified, as shown in Listing 2.
Listing 2: First level of the hierarchy
SQL> select empno, ename, mgr 2 from emp 3 where mgr is null; EMPNO ENAME MGR —————————— ———————————————————— —————————— 7839 KING
I have now discovered the first, or top, level of the hierarchy. Now I must find all employees whose manager is KING. I could construct a new query that explicitly references WHERE MGR = ‘ KING’, but in general what I need is the output from the query in Listing 2 to become an input into a query to derive the next level of the hierarchy. I can do that with a nonhierarchical query by using a standard join. In Listing 3, I nest the initial query locating those rows where the manager is null as an inline view into a wrapping query that joins the MGR column to the EMPNO column to get the employees in the next level in the hierarchy. This yields the three employees—JONES, BLAKE, and CLARK—who report to KING.
Listing 3: Second level of the hierarchy via a join
SQL> select e2.empno, e2.ename, e2.mgr 2 from emp e2, 3 ( select empno, mgr 4 from emp 5 where mgr is null ) inner 6 where e2.mgr = inner.empno; EMPNO ENAME MGR —————————— ———————————————————— —————————— 7566 JONES 7839 7698 BLAKE 7839 7782 CLARK 7839
Now the second level in the hierarchy has been established, and it returns a set of rows that is the input for deriving the third level of the hierarchy. I take the query from Listing 3, nest it again as an inline view into a wrapper query, and join MGR to the EMPNO column, as shown in Listing 4.
Listing 4: Third level of the hierarchy via a join
SQL> select e3.empno, e3.ename, e3.mgr 2 from emp e3, 3 ( select e2.empno, e2.ename, e2.mgr 4 from emp e2, 5 ( select empno, mgr 6 from emp 7 where mgr is null ) inner 8 where e2.mgr = inner.empno ) inner 9 where e3.mgr = inner.empno; EMPNO ENAME MGR —————————— ———————————————————— —————————— 7902 FORD 7566 7788 SCOTT 7566 7900 JAMES 7698 7844 TURNER 7698 7654 MARTIN 7698 7521 WARD 7698 7934 MILLER 7782
Finally, because there are four levels in the employee hierarchy, I can nest the query from Listing 4 to reveal the people who sit at the very bottom layer of the hierarchy, as shown in Listing 5.
Listing 5: Fourth (and last) level of the hierarchy via a join
SQL> select e4.empno, e4.ename, e4.mgr 2 from emp e4, 3 ( select e3.empno, e3.ename, e3.mgr 4 from emp e3, 5 ( select e2.empno, e2.ename, e2.mgr 6 from emp e2, 7 ( select empno, mgr 8 from emp 9 where mgr is null ) inner 10 where e2.mgr = inner.empno ) inner 11 where e3.mgr = inner.empno ) inner 12 where e4.mgr = inner.empno; EMPNO ENAME MGR —————————— ———————————————————— —————————— 7876 ADAMS 7788
Obviously, this approach is untenable in general. The number of levels in the hierarchy is never known in advance, so it is never known how many nested inline views are required to entirely traverse the hierarchy. However, this approach does reveal that processing a hierarchy via SQL is a recursive operation, in that a set of output data from one query becomes the input into a similar wrapping query. This leads naturally to the new form of hierarchy processing that is available from Oracle Database 11g Release 2 onward—the ANSI-standard SQL approach of using recursive common table expressions (CTEs).
Simpler SQL with WITH
Before I describe recursive CTEs, it is worth covering the simpler case of nonrecursive CTEs.
First a virtual table consisting of a query expression can be defined with the WITH clause, enabling that table to be referenced throughout the remainder of the query. Listing 6 shows an example of the WITH clause that refers to a virtual table named LAST_HIRE, which is the query expression for deriving the most recent hiring date per department in the EMP table.
Listing 6: Basic usage of the WITH clause
SQL> WITH last_hire AS 2 ( select deptno, max(hiredate) 3 from emp 4 group by deptno 5 ) 6 select * from last_hire; DEPTNO MAX(HIRED —————————— ————————— 30 03-DEC-81 20 12-JAN-83 10 23-JAN-82
At first glance, the WITH clause may seem like merely a waste of keystrokes, but its true benefit can be seen when the functional requirements of a query become more complex. Consider this business requirement given to a developer—
“Find the total salary paid by each department, get the average of these totals, and list those departments above that average.”
—which needs to be expressed with a SQL statement. Rather than tackling the entire task holistically, developers using the WITH clause can tackle the task in smaller, more manageable pieces, much as they would in their 3GL application code. I can start with the first part of the requirement—
“Find the total salary paid by each department”
—and construct an appropriate query for just that component, as shown in Listing 7.
Listing 7: Part 1, finding the total
SQL> SELECT dname, SUM(sal) dept_sal 2 FROM emp e, dept d 3 WHERE e.deptno = d.deptno 4 GROUP BY dname;
Using a WITH statement, I can now assume the existence of the Listing 7 query as if it were a table in the database. Given a virtual table containing that data (dept_salaries), I can then construct a query to handle the second part of the requirement—
“get the average of these totals”
—as shown in Listing 8.
Listing 8: Part 2, getting the average
SQL> WITH dept_salaries AS ( 2 SELECT dname, SUM(sal) dept_sal 3 FROM emp e, dept d 4 WHERE e.deptno = d.deptno 5 GROUP BY dname), 6 SELECT AVG(dept_sal) avsal 7 FROM dept_salaries;
The final part of the requirement—
“list those departments above that average”
—implies that the Listing 8 result containing average departmental salaries needs to be referenced in a subsequent SQL statement, so it makes sense to also assign that query expression a virtual table name (avg_sal), as shown in Listing 9.
Listing 9: Creating another virtual table name
SQL> WITH dept_salaries AS ( 2 SELECT dname, SUM(sal) dept_sal 3 FROM emp e, dept d 4 WHERE e.deptno = d.deptno 5 GROUP BY dname), 6 avg_sal AS ( SELECT AVG(dept_sal) avsal 7 FROM dept_salaries);
With the two query expressions aliased, I can complete the three-step task by joining the two virtual tables to find the appropriate departments, as shown in Listing 10.
Listing 10: Part 3, listing the departments
SQL> WITH dept_salaries AS ( 2 SELECT dname, SUM(sal) dept_sal 3 FROM emp e, dept d 4 WHERE e.deptno = d.deptno 5 GROUP BY dname), 6 avg_sal AS ( SELECT AVG(dept_sal) avsal 7 FROM dept_salaries) 8 SELECT * FROM dept_salaries d, avg_sal a 9 WHERE d.dept_sal > a.avsal 10 ORDER BY d.dname;
Building SQL in piecemeal fashion with CTEs is a great way of solving complicated query problems. It also assists in the maintenance of the SQL, because it reveals the thought process by which the original author of the SQL statement tackled the task.
Oracle Database 11g Release 2 and later releases support an extension to CTEs. The WITH syntax can also be recursive—that is, a table you are defining with a query expression using the WITH clause can contain references to the table currently being defined. Earlier I described how navigating hierarchical data can be tackled as a series of self-referencing queries—recursive CTEs are a natural solution for this task. To demonstrate the structure of the new hierarchical syntax, I’ll return to the first principles I used for navigating the employer hierarchy via its component parts.
Listing 11 shows the definition of a table called EACH_LEVEL, the name indicating that it will navigate each level of the hierarchy. The definition is a UNION ALL of two queries. The first part of the UNION ALL is the entry point into the hierarchy, equivalent to START WITH in the conventional hierarchical SQL syntax style, retrieving all employees who have no manager. The second query following the UNION ALL is where the recursive nature of the WITH statement becomes apparent. I am querying the EMP table and then joining back to the EACH_LEVEL table, even though I am currently defining the EACH_LEVEL table. The recursive nature demonstrates a concept similar to when I was repeatedly creating and running inline queries. In this case, EACH_LEVEL is referring back to itself, so each time I encounter the recursive part of the definition, I navigate recursively through the structure, drilling deeper into the hierarchy.
Listing 11: The recursive CTE
SQL> with EACH_LEVEL (empno, name, mgr) as 2 ( -- 3 -- start with 4 -- 5 select empno, ename, mgr 6 from emp 7 where mgr is null 8 -- 9 -- connect by 10 -- 11 union all 12 select emp.empno, emp.ename, emp.mgr 13 from emp, EACH_LEVEL 14 where emp.mgr = each_level.empno 15 ) 16 select * 17 from each_level; EMPNO NAME MGR —————————— ——————————————— —————————— 7839 KING 7566 JONES 7839 7698 BLAKE 7839 7782 CLARK 7839 7499 ALLEN 7698 7521 WARD 7698 7654 MARTIN 7698 7788 SCOTT 7566 7844 TURNER 7698 7900 JAMES 7698 7902 FORD 7566 7934 MILLER 7782 7369 SMITH 7902 7876 ADAMS 7788
The two sides of the UNION ALL are analogous to two clauses in the conventional hierarchical syntax, in that an entry point into the hierarchy is defined, followed by the definition of how to use CONNECT BY or recursively traverse the hierarchy. It is the developer who now takes control of the conventional pseudofunctions available in the conventional hierarchical syntax. For example, if I need to print out the depth of the hierarchy as I would typically do with the LEVEL pseudofunction, I can add a column to increment the level each time I recursively pass through EACH_LEVEL. Listing 12 shows this in action. The upper part of the UNION ALL must by definition be level = 1, because it is the entry point, and as the recursive statement under the UNION ALL is executed, the level increments by 1 each time I recursively navigate through the hierarchy.
Listing 12: Incrementing the level recursively
SQL> with each_level (empno, name, mgr, rlevel) as 2 ( select empno, ename, mgr, 1 rlevel 3 from emp 4 where mgr is null 5 union all 6 select emp.empno, emp.ename, emp.mgr, rlevel+1 7 from emp, each_level 8 where emp.mgr = each_level.empno 9 ) 10 select * from each_level; EMPNO NAME MGR RLEVEL —————————— ——————————————— —————————— —————————— 7839 KING 1 7566 JONES 7839 2 7698 BLAKE 7839 2 7782 CLARK 7839 2 7499 ALLEN 7698 3 ...
Unfortunately, I cannot use the name LEVEL for my virtual table column, because it is reserved for its built-in predecessor. Similarly, to access the SYS_CONNECT_BY_PATH function to concatenate children in their hierarchy with their respective parent rows, I can build it myself by using the standard concatenation operators in the recursive definition. At the lowest level of the hierarchy, an employee name concatenation is just the employee name itself, but within the recursive definition under the UNION ALL, I take the existing concatenation of employee names so far and append the employee name I’m currently navigating. This builds the concatenated list of employee names just as the SYS_CONNECT_BY_PATH function does, as demonstrated in Listing 13.
Listing 13: Building the concatenated list of employee names
SQL> with each_level (empno, name) as 2 ( select empno, ename from emp 3 where mgr is null 4 union all 5 select e.empno, 6 each_level.name||'-'||e.ename 7 from emp e, each_level 8 where e.mgr = each_level.empno 9 ) 10 select empno, name from each_level; EMPNO NAME —————————— ———————————————————————————————— 7839 KING 7566 KING-JONES 7698 KING-BLAKE 7782 KING-CLARK 7499 KING-BLAKE-ALLEN 7521 KING-BLAKE-WARD ...
Extensions to the Syntax
In the same way that, by default, a conventional syntax hierarchical query will fail if a cycle is detected, a recursive WITH definition will also fail if a cycle is detected, albeit with a different error code. Listing 14 shows the recursive WITH definition attempt and error.
Listing 14: Recursive WITH definition detecting cycle and returning error message
SQL> with each_level (empno, name, mgr) as 2 ( select empno, ename, mgr 3 from emp 4 where ename = 'KING' 5 union all 6 select emp.empno, emp.ename, emp.mgr 7 from emp, each_level 8 where emp.mgr = each_level.empno 9 ) 10 select * 11 from each_level; ERROR: ORA-32044: cycle detected while executing recursive WITH query
You can detect cyclic relationships and avoid errors by extending the definition with the CYCLE clause, as shown in Listing 15. The behavior is slightly different from the way cycles are handled in the conventional hierarchical syntax, in that cyclic rows are still presented in the result set but a new column is added with a single character flag indicating whether this row is a member of a cyclic relationship.
Listing 15: Recursive WITH, cycle, CYCLE clause, and new column
SQL> with each_level (empno, name, mgr) as 2 ( select empno, ename, mgr from emp 3 where ename = 'KING' 4 union all 5 select emp.empno, emp.ename, emp.mgr 6 from emp, each_level 7 where emp.mgr = each_level.empno ) 8 CYCLE mgr SET is_cycle TO 'Y' DEFAULT 'N' 9 select * from each_level; EMPNO NAME MGR IS_CYCLE —————————— —————————— —————————— —————————— 7839 KING 7499 N 7566 JONES 7839 N 7521 WARD 7698 N 7839 KING 7499 Y 7876 ADAMS 7788 N ...
The recursive CTE syntax allows for some additional facilities in terms of how the hierarchy should be traversed. You can navigate any hierarchy either by taking “slices” in a top-down approach or by simply working down along a branch until all the leaves are exhausted and then working back up through the branches in the same way someone might navigate through a maze. Each mechanism is available to developers using the SEARCH clause in the recursive CTE definition. I can search breadth-first or depth-first, as shown in Listing 16, with the ordering of the results achieved by the SET clause, which nominates a new column that yields a sequence number defining the order in which the rows should be presented.
Listing 16: Recursive CTE, SEARCH, and SET clause
SQL> with each_level (empno, name, hiredate, mgr) as 2 ( select empno, ename, hiredate, mgr from emp 3 where ename = 'KING' 4 union all 5 select e.empno, 6 each_level.name||'-'||e.ename, e.hiredate, e.mgr 7 from emp e, each_level 8 where e.mgr = each_level.empno ) 9 SEARCH BREADTH FIRST BY HIREDATE SET IDX 10 select name, hiredate, idx from each_level; NAME HIREDATE IDX ———————————————————————————————— ————————— —————————— KING 17-NOV-81 1 KING-JONES 02-APR-81 2 KING-BLAKE 01-MAY-81 3 KING-CLARK 09-JUN-81 4 KING-BLAKE-ALLEN 20-FEB-81 5 KING-BLAKE-WARD 22-FEB-81 6 ... KING-JONES-FORD-SMITH 17-DEC-80 13 KING-JONES-SCOTT-ADAMS 23-MAY-87 14 SQL> with each_level (empno, name, hiredate, mgr) as 2 ( select empno, ename, hiredate, mgr from emp 3 where ename = 'KING' 4 union all 5 select e.empno, 6 each_level.name||'-'||e.ename, e.hiredate, e.mgr 7 from emp e, each_level 8 where e.mgr = each_level.empno ) 9 SEARCH DEPTH FIRST BY HIREDATE SET IDX 10 select name, hiredate, idx from each_level; NAME HIREDATE IDX ———————————————————————————————— ————————— —————————— KING 17-NOV-81 1 KING-JONES 02-APR-81 2 KING-JONES-FORD 03-DEC-81 3 KING-JONES-FORD-SMITH 17-DEC-80 4 KING-JONES-SCOTT 19-APR-87 5 KING-JONES-SCOTT-ADAMS 23-MAY-87 6 KING-BLAKE 01-MAY-81 7 KING-BLAKE-ALLEN 20-FEB-81 8 ...
Alternative Use Cases
The ability to run operations recursively via a single SQL statement gives rise to the ability to do recursive problem solving even when the data is not necessarily hierarchical in nature. If I need to iterate through some data an unknown number of times, recursion is a very effective tool for achieving this. Here is an example of using recursive CTEs when it does not immediately appear to be a hierarchical problem. I have a table called MESSAGES that includes some sentences containing names of my fellow developer advocates and myself. I would like to replace those names with the other people’s respective Twitter handles. To do that, I have a lookup table called TWITTER_HANDLES with the terms I will replace and the appropriate Twitter handles to replace them, as shown in Listing 17.
Listing 17: Looking up Twitter handles
SQL> select * from messages; TXT ———————————————————————————————————————————————————————————————————— I caught up with Connor and Maria Colgan today. They have taken over AskTOM for Oracle Developers SQL> select * from twitter_handles; ID TERM HANDLE ———— —————————————————————————— ——————————————— 1 Connor McDonald @connor_mc_d 2 Connor @connor_mc_d 3 Maria Colgan @sqlmaria 4 Oracle Developers @otndev 5 Oracle @oracle 6 AskTOM @oracleasktom
This means that for each of the rows in TWITTER_HANDLES, I need to continually iterate through all the rows of data in my MESSAGES table, each time replacing a found term with its appropriate Twitter handle until all the terms have been exhausted, as shown in Listing 18.
Listing 18: Selecting tweets and replacing names with Twitter handles
SQL> with 2 tweetised(ind,tweet_txt) as 3 ( 4 select 1 ind, txt tweet_txt 5 from messages 6 union all 7 select ind+1, replace(tweet_txt,term,handle) 8 from tweetised, twitter_handles 9 where ind = id 10 ) 11 select * from tweetised; IND TWEET_TXT ————————— ————————————————————————————————————————————————————————— 1 I caught up with Connor and Maria Colgan today. They have taken over AskTOM for... 2 I caught up with Connor and Maria Colgan today. They have taken over AskTOM for... 3 I caught up with @connor_mc_d and Maria Colgan today. They have taken over AskTOM for... 4 I caught up with @connor_mc_d and @sqlmaria today. They have taken over AskTOM for ... 5 I caught up with @connor_mc_d and @sqlmaria today. They have taken over AskTOM for @otndev 6 I caught up with @connor_mc_d and @sqlmaria today. They have taken over AskTOM for @otndev 7 I caught up with @connor_mc_d and @sqlmaria today. They have taken over @oracleasktom ...
I am generating multiple rows of output for each single row of data in my messages table, with each new row containing an additional term replaced with its associated Twitter handle. Hence, simply by picking up the very last row, using a FETCH FIRST clause, I will display the result I need. I recursively run a REPLACE operation, using the recursive nature of the WITH statement, as shown in Listing 19.
Listing 19: Picking the last row with FETCH FIRST
SQL> with 2 tweetised(ind,tweet_txt) as 3 ( 4 select 1 ind, txt tweet_txt 5 from messages 6 union all 7 select ind+1, replace(tweet_txt,term,handle) 8 from tweetised, twitter_handles 9 where ind = id 10 ) 11 select * from tweetised 12 order by ind desc 13 fetch first 1 row only; IND TWEET_TXT —————————— —————————————————————————————————————————————————————————— 7 I caught up with @connor_mc_d and @sqlmaria today. They...
Oracle Database continues to support the conventional START WITH and CONNECT BY syntax, but don’t be afraid to expand your SQL toolkit by utilizing recursive common table expressions to achieve the same results. As shown in this article, you might also find uses for these expressions beyond standard hierarchical relationships.
LEARN more about hierarchical SQL processing.
DOWNLOAD Oracle Database 12c.
Illustration by Wes Rowell
Connor McDonald is a former Oracle ACE Director who has joined the Oracle Developer Advocates Team. Over the past 25 years, he has worked with systems in Australia, the United Kingdom, Southeast Asia, Western Europe, and the United States. He has co-authored three books and has been a popular speaker at Oracle conferences around the world, specializing in topics regarding the database engine and PL/SQL.