Pivotal Access to Your Data: Analytic Functions, Concluded

June 18, 2013 | 12 minute read
Text Size 100%:

Part 12 in a series on the basics of the relational database and SQL

By Melanie Caffrey

July/August 2013

Part 11 in this series, “Leading Ranks and Lagging Percentages: Analytic Functions, Continued” (Oracle Magazine, May/June 2013), continued the discussion of analytic functions that began in Part 10. It demonstrated analytic functions that enable you to obtain results for top-N queries, evaluate data comparisons, and calculate percentages within a group, among other actions. This article wraps up the series’ coverage of analytic functions by showing

  • How you can get a new perspective on your results with pivot queries that convert column data into row data or row data into column data

  • How to employ an inline view to use an analytic function in a WHERE clause

To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.

Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema that are required for this article’s examples. (View the script in a text editor for execution instructions.)

Turning Your Data on Its Side

A common business reporting requirement is that data in a column be displayed horizontally rather than vertically, for better readability. For example, compare the result set in Listing 1 to the one in Listing 2. The query in Listing 1 lists all employees alongside their respective department IDs, sorted by department and employee name. The data returned in Listing 2 is the same as that returned in Listing 1, but it is displayed differently.

Code Listing 1: Obtaining a list of employees, sorted by department and employee name

SQL> set feedback on
SQL> set lines 32000
SQL> select department_id, last_name, first_name
  2    from employee
  3  order by department_id, last_name, first_name;
——————————————  —————————————   ———————————
           10   Dovichi         Lori
           10   Eckhardt        Emily
           10   Friedli         Roger
           10   James           Betsy
           10   Michaels        Matthew
           10   Newton          Donald
           20   leblanc         mark
           20   peterson        michael
           30   Jeffrey         Thomas
           30   Wong            Theresa
                Newton          Frances

Code Listing 2: Employees, sorted by department and name, displaying one row per department

SQL> select department_id,
  2         LISTAGG(first_name||' '||last_name, ', ')
  3         WITHIN GROUP
  4         (order by last_name, first_name) employees
  5    from employee
  6  group by department_id
  7  order by department_id;
—————————————  —————————————————————————————————————————————————————————
           10  Lori Dovichi, Emily Eckhardt, Roger Friedli, Betsy James, 
               Matthew Michaels, Donald Newton
           20  mark leblanc, michael peterson
           30  Thomas Jeffrey, Theresa Wong
               Frances Newton
4 rows selected.

Listing 2 uses the LISTAGG function (introduced in Oracle Database 11g) to construct a comma-delimited list of employees per department, thereby pivoting the more traditionally displayed result set in Listing 1. You can use LISTAGG as a single-group aggregate function, a multigroup aggregate function, or an analytic function.

When LISTAGG is invoked as a single-group aggregate function, it operates on all rows that satisfy any WHERE clause condition and—like all other single-group aggregate functions—returns a single output row. The example in Listing 2 demonstrates the use of LISTAGG as a multigroup aggregate function, returning a row for each group defined by the GROUP BY clause.

The syntax for the LISTAGG function is

LISTAGG ( column | expression, 
delimiter ) WITHIN GROUP (ORDER BY column | expression)

LISTAGG performs as an analytic function if you add an OVER clause:

OVER (PARTITION BY column | expression)

The column or expression to be aggregated, the WITHIN GROUP keywords, and the ORDER BY clause that immediately follows the WITHIN GROUP keywords (that is, the sort that takes place within the grouping) are mandatory in all three LISTAGG use cases.

The query in Listing 3 uses LISTAGG as an analytic function. It obtains a list of salaries, from highest to lowest, per department. Alongside each salary value is the name, in last-name/first-name alphabetical order, of the employee who earns that salary value. In addition, every employee for the current row’s listed department is returned in salary order, from highest to lowest, and in last-name/first-name alphabetical order.

Code Listing 3: Invoking the LISTAGG function as an analytic function

SQL> select department_id, salary, first_name||’ ‘||last_name earned_by,
  2         listagg(first_name||’ ‘||last_name, ‘, ‘)
  3         within group
  4         (order by salary desc nulls last, last_name, first_name)
  5       over (partition by department_id) employees
  6    from employee
  7   order by department_id, salary desc nulls last, last_name, first_name;
—————————————     ——————  ——————————————    ———————————————————————————————
           10     100000  Emily Eckhardt    Emily Eckhardt, Donald Newton,
                                            Matthew Michaels, Roger Friedli,
                                            Betsy James, Lori Dovichi
           10      70000  Matthew Michaels  Emily Eckhardt, Donald Newton,
                                            Matthew Michaels, Roger Friedli,
                                            Betsy James, Lori Dovichi
           10      60000  Roger Friedli     Emily Eckhardt, Donald Newton,
                                            Matthew Michaels, Roger Friedli,
                                            Betsy James, Lori Dovichi
           10      60000  Betsy James       Emily Eckhardt, Donald Newton,
                                            Matthew Michaels, Roger Friedli,
                                            Betsy James, Lori Dovichi
           10             Lori Dovichi      Emily Eckhardt, Donald Newton,
                                            Matthew Michaels, Roger Friedli,
                                            Betsy James, Lori Dovichi
           20      90000  michael peterson  michael peterson, mark leblanc
           20      65000  mark leblanc      michael peterson, mark leblanc
           30     300000  Thomas Jeffrey    Thomas Jeffrey, Theresa Wong
           30      70000  Theresa Wong      Thomas Jeffrey, Theresa Wong
                   75000  Frances Newton    Frances Newton
11 rows selected.
Twisting and Turning into Fewer and Wider

A PIVOT clause enables you to turn rows into columns and present your data in a cross-tabular format. The syntax of the PIVOT clause is

SELECT … FROM … PIVOT ( aggregate-
function column | expression ) 
        FOR column | expression to be pivoted  IN (value1, … valueN)
          ) AS alias

Compare the result sets in Listings 4 and 5. The query in Listing 4 displays a summary of each department’s total employee salary amount in a cross-tabular report. The query in Listing 5 returns the same departmental salary summaries as those in Listing 4 but in columnar format, which your users might consider less readable.

Code Listing 4: Using the PIVOT function to obtain cross-tabular results

SQL> select *
  2    from (select department_id, salary
  3            from employee) total_department_sals
  4     PIVOT (SUM(salary)
  5       FOR department_id IN (10 AS Accounting, 20 AS Payroll, 30 AS IT,
  6                             NULL AS Unassigned_Department));
——————————     ———————  ——————————  —————————————————————
    370000      155000      370000                  75000
1 row selected.

Code Listing 5: Traditional columnar display of summarized salaries, grouped by department

SQL> select department_id, sum(salary)
  2    from employee
  3  group by department_id
  4  order by department_id nulls last;
—————————————  ———————————
           10       370000
           20       155000
           30       370000
4 rows selected.

The query in Listing 6 demonstrates that it’s possible to pivot on more than one column. The results from this query display the sum total of salaries per department, but only for employees who were hired in a particular year. You can also pivot on and display multiple aggregate values, as Listing 7 demonstrates. The query in Listing 7 obtains the sum of all salaries, alongside the latest date of hire for an employee, per department.

Code Listing 6: Displaying the sum total salaries of employees per department for a particular year

SQL> select *
  2    from (select department_id, 
  3          to_char(trunc(hire_date, 'YYYY'), 'YYYY') hire_date, salary
  4            from employee)
  5   PIVOT (SUM(salary)
  6     FOR (department_id, hire_date) IN
  7           ((10, '2007') AS Accounting_2007,
  8            (20, '2008') AS Payroll_2008,
  9            (30, '2010') AS IT_2010
 10           )
 11         );
ACCOUNTING_2007  PAYROLL_2008     IT_2010
———————————————  ————————————     ————————
         190000         90000      370000  
1 row selected.

Code Listing 7: Pivoting on and displaying multiple aggregate columns

SQL> select *
  2    from (select department_id, hire_date, salary
  3            from employee)
  4     PIVOT (SUM(salary) AS sals,
  5            MAX(hire_date) AS latest_hire
  6     FOR department_id IN (10, 20, 30, NULL));
——————— ————————— ——————— ————————— ——————— —————————— ————————— —————————
370000  07-JUL-11 155000  06-MAR-09 370000  27-FEB-10  75000     14-SEP-05
1 row selected.

When you use multiple aggregate functions, it’s advisable to supply an alias for each of them. The resultant column headings are a concatenation of the pivot values (or pivot aliases), an underscore, and (if you’ve supplied them) the aliases of the aggregate functions. For example, some of Listing 7’s columns are 10_SALS and 10_LATEST. Note that the columns for the latest hire dates per department, such as 10_LATEST, are actually columns using the LATEST_HIRE alias. When you prepend the LATEST_HIRE alias with the department ID, the query should return a column that reads, for example, 10_LATEST_HIRE.

However, with SQL*Plus, the column heading displayed for a column with a DATE datatype is never longer than the default format for the value returned. The HIRE_DATE column’s format is DD-MON-RR, so only the first nine characters of the heading are displayed. To display a full heading, such as 10_LATEST_HIRE, consider using TO_CHAR to apply a date format mask to the column.

If you don’t supply an alias for your aggregate functions, you might get an error message, as shown in the example in Listing 8. Because neither of the aggregate functions in Listing 8 is aliased, the PIVOT operator doesn’t know to which one to apply the column heading for the pivot value (in this case, the DEPARTMENT_ID value). As a result, the PIVOT operator can’t simply use its default column headings and the query fails with a “column ambiguously defined” error message. Avoid this error by creating an alias for each aggregate function; don’t rely solely on the default column headings that result from use of the PIVOT operation.

Code Listing 8: A “column ambiguously defined” error occurs

SQL> select *
  2    from (select department_id, hire_date, salary
  3             from employee)
  4      PIVOT (SUM(salary),
  5             MAX(hire_date)
  6      FOR department_id IN (10, 20, 30, NULL));
select *
ERROR at line 1:
ORA-00918: column ambiguously defined
A Horizontal View of the Vertical

Just as you might have a reporting need to turn rows into columns, you might also need to turn columns into rows. You’ve seen one way to do this with the LISTAGG function. You can also use the UNPIVOT operator for this purpose. Note that the UNPIVOT operator does not reverse an action performed with the PIVOT operator. Rather, it works on data that is already stored as pivoted.

Consider the CREATE TABLE statement in Listing 9. It creates a table with pivoted data, using a query similar to the one in Listing 7. Now you can query this data by using the UNPIVOT operator, as Listing 10 illustrates. Compare the values returned from the query in Listing 10 with the values returned from the query in Listing 7. As you can see, they are the same but are displayed differently.

Code Listing 9: Creating a table with pivoted data

SQL> CREATE TABLE pivoted_emp_data AS
  2  select *
  3    from (select department_id, hire_date, salary
  4            from employee)
  5    PIVOT (SUM(salary) sum_sals,
  6           MAX(hire_date) latest_hire
  7    FOR department_id IN (10 AS Acc, 20 AS Pay, 30 AS IT, NULL));
Table created.

Code Listing 10: Using the UNPIVOT operator to turn rows into columns

SQL> select hire_date, salary
  2    from pivoted_emp_data
  4        ((hire_date, salary)
  5           FOR department_id IN (
  6           (acc_latest_hire, acc_sum_sals) AS 'Accounting',
  7           (pay_latest_hire, pay_sum_sals) AS 'Payroll',
  8           (it_latest_hire, it_sum_sals) AS 'IT',
  9           (null_latest_hire, null_sum_sals) AS 'Unassigned'
 10       ))
 11  order by hire_date, salary;
——————————     ——————
14-SEP-05       75000
06-MAR-09      155000
27-FEB-10      370000
07-JUL-11      370000
4 rows selected.

The results in Listing 7 are returned as one long record (row), with each HIRE_DATE and SALARY combination, pivoted by department, displayed side by side. In contrast, each of these combinations is returned as a separate and distinct row from the query in Listing 10, with the HIRE_DATE and SALARY values displayed in separate columns. Note that the query in Listing 10 unpivots and returns value pairs of different datatypes. HIRE_DATE uses the DATE datatype, and SALARY uses the NUMBER datatype, so the alias you use for these value pairs must be enclosed in single quotation marks. If it is not, you might get an error message like the one shown in Listing 11.

Code Listing 11: Using aliases for value pairs of different datatypes

SQL> select hire_date, salary
  2    from pivoted_emp_data
  4        ((hire_date, salary)
  5           FOR department_id IN (
  6           (acc_latest_hire, acc_sum_sals) AS Accounting,
  7           (pay_latest_hire, pay_sum_sals) AS Payroll,
  8           (it_latest_hire, it_sum_sals) AS IT,
  9           (null_latest_hire, null_sum_sals) AS Unassigned
 10       ))
 11   order by hire_date, salary;
         (acc_latest_hire, acc_sum_sals) AS Accounting,
ERROR at line 6:
ORA-56901: non-constant expression is not allowed for pivot|unpivot values
When and How to Predicate Analytically

Other than the final ORDER BY clause, analytic functions are the last set of operations performed in a query. Because they can appear only in the SELECT list or the ORDER BY clause, you cannot use them directly in any predicates, including in a WHERE or HAVING clause. If you need to select from a result set based on the outcome of applying an analytic function, you can use an inline view. An inline view is a SELECT statement in the FROM clause of another SELECT statement. It acts as a TABLE (otherwise known as a FROM) clause. You have already seen examples of inline view capability in this article in Listings 4, 6, 7, 8, and 9.

Code Listing 12: Using an inline view to enable use of an analytic function as a predicate

SQL> select *
  2    from (select department_id, last_name||', '||first_name, salary,
  3                 dense_rank() over (partition by department_id
  4                                    order by salary desc nulls last) d_rank
  5            from employee)
  6   where d_rank < 3
  7  order by department_id, salary desc nulls last;
—————————————  ——————————————————————————————————————    ——————     ——————
           10  Eckhardt, Emily                           100000          1
           10  Newton, Donald                             80000          2
           20  peterson, michael                          90000          1
           20  leblanc, mark                              65000          2
           30  Jeffrey, Thomas                           300000          1
           30  Wong, Theresa                              70000          2
               Newton, Frances                            75000          1
7 rows selected.

Suppose you want to use an analytic function to obtain the top two salary earners by department. As Listing 12 illustrates, you can place the analytic function operation in an inline view and alias it. The alias provided to the inline view in Listing 12 is D_RANK (this is named for the result of applying the DENSE_RANK analytic function). The query in the inline view (the inner query) must be resolved before it can be used by the query that encompasses it (the outer query). After the inline view completes, the outer query can use its result in a predicate. The predicate clause in the outer query for Listing 12 is

WHERE d_rank < 3
Striving to Perform Well

Although analytic functions help you form a more elegant and less convoluted SQL solution to a reporting requirement, they are not a replacement for writing good code. Your goal should be to constantly and consistently write good SQL that’s easy to maintain and that will perform well over time. It’s all too easy to abuse SQL techniques that make processes easier. Used incorrectly, any SQL technique can be written poorly and become a system inhibitor.

In particular, sorting and sifting data can exhaust system resources. (The query in Listing 13, for example, includes three potential sort operations.) This shouldn’t necessarily deter you from using analytic functions, but keep in mind that you can write a query that brings a system to its knees just as easily as you can write one that provides you with an efficient, elegant, and easy-to-maintain solution. With the power of analytic functions comes responsibility.

Code Listing 13: Query with analytic functions that may cause system performance problems

SQL> select first_name||' '||last_name, department_id, hire_date,
  2    sum(salary) over (order by department_id, 
  3    first_name||' '||last_name) sum_dept_emp,
  4    avg(salary) over (order by hire_date, department_id) avg_dept_hire_dt
  5    from employee
  6   order by department_id, hire_date, first_name||' '||last_name;
————————————————————————— ——————————— ————————— ———————————— ————————————————
Emily Eckhardt                     10 07-JUL-04       240000           100000
Donald Newton                      10 24-SEP-06       140000            85000
Betsy James                        10 16-MAY-07        60000       74166.6667
Matthew Michaels                   10 16-MAY-07       310000       74166.6667
Roger Friedli                      10 16-MAY-07       370000       74166.6667
Lori Dovichi                       10 07-JUL-11       240000            97000
michael peterson                   20 03-NOV-08       525000       76428.5714
mark leblanc                       20 06-MAR-09       435000            75000
Thomas Jeffrey                     30 27-FEB-10       895000           100000
Theresa Wong                       30 27-FEB-10       595000            97000
Frances Newton                        14-SEP-05       970000            87500
11 rows selected.

This article concludes the discussion of analytic functions introduced in Part 10 and continued in Part 11 of this series. It demonstrates how LISTAGG, PIVOT, and UNPIVOT can be used to manipulate the way your data is displayed. You’ve seen how to turn columns into rows and rows into columns and how such views differ from each other. You’ve learned how to take individual data items and return them as delimited lists for more-readable reports. You’re also now aware of the specific caveats that apply when you use these functionalities.

Last but not least, you’re aware now that queries that use analytic functions can—if you’re not careful—consume many, if not all, of your system resources. In all cases, analytic functions greatly reduce the need to write complicated SQL to obtain the same results. But they are tools to be wielded with equal parts of enthusiasm and caution. Review the documentation for more details.

This article concludes the SQL 101 series. You’ve learned basic relational database concepts and many SQL coding constructs, but the series has given you only a glimpse into what Oracle SQL has to offer you. Be sure to continue to read the documentation and try existing and new Oracle Database features. Thank you for being readers of Oracle Magazine and of the SQL 101 series. As you continue writing SQL, my hope is that you enjoy it as much as I do.

Next Steps

 READ SQL 101, Parts 1–11

READ more about
relational database design and concepts
 Oracle Database Concepts 11g Release 2 (11.2)
 Oracle Database SQL Language Reference 11g Release 1 (11.1)
 Oracle Database Data Warehousing Guide 11g Release 2 (11.2)

 DOWNLOAD the sample script for this article

Photography by Meric Dagli, Unsplash

Melanie Caffrey

Melanie Caffrey is a senior development manager at Oracle. She is a coauthor of Beginning Oracle SQL for Oracle Database 12c (Apress, 2014), Expert PL/SQL Practices for Oracle Developers and DBAs (Apress, 2011), and Expert Oracle Practices: Oracle Database Administration from the Oak Table (Apress, 2010).

Previous Post

Leading Ranks and Lagging Percentages: Analytic Functions, Continued

Melanie Caffrey | 13 min read

Next Post

Sophisticated Call Stack Analysis

Steven Feuerstein | 10 min read