Subscribe

Share

Application Development

How to Dynamically Change the Columns in a SQL Query

Use polymorphic table functions to add columns to and remove columns from your result sets.

By Chris Saxon

August 23, 2019

You slip into your Monday morning team meeting just in time to hear your boss’s next “brilliant” idea: You are going to build a generic comma-separated values (CSV) parser! It must be able to take any CSV file as input and split out the results using SQL. Each record and field in the file will become a new row and column in the output. And you need to build the parser by this Friday!

Your mind starts racing. You’ll have to parse the file to read the header, see how many fields it returns, and then construct a SQL query with the same number of columns in the SELECT statement.

Sounds like you’ll need some dynamic SQL. This could be a very long week.

Luckily you’re using Oracle Database 18c, so there's another option available. It’s one of the coolest new SQL extensions: polymorphic table functions (PTFs).

What Are Polymorphic Table Functions?

Oracle Database has included table functions for a long time. You can use them to generate rows in your result set.

But what if you want to change the columns at runtime based on input parameters? You have to fall back on some messy dynamic SQL or arcane custom aggregate functions.

PTFs offer a better solution. With PTFs you can, at runtime, add or remove columns from a table’s result set, based on values you supply.

Let’s find out how.

How Do I Create a Polymorphic Table Function?

A PTF includes two key components:

  • A package, which defines the columns that will (or won’t) appear in the output and row values for any new columns

  • The function you’ll call to run the PTF, which can be a standalone function or a function nested within the package

The package contains the PTF’s implementation. It also includes two key parts:

  • A describe function tells the database which columns to include in the output.

  • A fetch_rows procedure assigns values to each row for these new columns.

Let’s start with a simple example. I’ll pass a list of columns to remove from the result set and another list of columns to add to the output.

If you want to follow along, you can find the code to build and run this polymorphic table function on Oracle Live SQL.

The Polymorphic Table Function

The following code creates a standalone PTF:

create or replace function hide_existing_add_new_cols (
  tab       table, 
  add_cols  columns, 
  hide_cols columns
) return table pipelined 
  row polymorphic 
  using hide_and_add_cols_pkg;
/ 

This code includes some new syntax:

  • tab table specifies which table the PTF operates on. Every PTF must have exactly one table parameter.

  • add_cols columns and hide_cols columns list the column identifiers to add and remove, respectively, from the result set. These parameters use the new pseudo-operator, columns. They allow you to pass one or more column identifiers to the PTF.

  • row polymorphic means the PTF processes rows independently. You can also define table semantics PTFs, which allow you to operate on a set of rows. As you’ll see later, these parameters also enable powerful extensions when you’re executing your functions.

  • using hide_and_add_cols_pkg binds the PTF to the hide_and_add_cols_pkg package.

As you can see in the last full line of code for the function, to implement this PTF, I need to create the hide_and_add_cols_pkg package.

The following is the header for this package:

create or replace package hide_and_add_cols_pkg as

  function describe (
    tab        in out dbms_tf.table_t,
    add_cols   dbms_tf.columns_t,
    hide_cols  dbms_tf.columns_t
  ) return dbms_tf.describe_t;

  procedure fetch_rows;

end hide_and_add_cols_pkg; 
/

The describe function must have the same parameters as the PTF itself, with one subtle difference. You must replace the TABLE argument with the DBMS_TF.table_t type and any COLUMNS arguments with DBMS_TF.column_t. The describe function must also have a return type of DBMS_TF.describe_t.

The fetch_rows procedure must also have the same parameters as the PTF, minus any TABLE or COLUMNS arguments. In this example, the PTF includes only arguments of type TABLE or COLUMNS, so fetch_rows has no parameters.

So what do you do inside these program units?

The describe Function

The describe function defines the shape of the result set. That is, it returns the names and data types of the columns to display in the query results. To get hide_existing_add_new_cols to do what it describes, there are two things you need to do:

  • Hide any existing columns listed on the columns-to-suppress list

  • Add the list of new columns

You can do these things with the following describe function in the body of the hide_and_add_cols_pkg package:

  function describe (
    tab        in out dbms_tf.table_t,
    add_cols   dbms_tf.columns_t,
    hide_cols  dbms_tf.columns_t
  ) return dbms_tf.describe_t as
    new_cols dbms_tf.columns_new_t;
    col_list dbms_tf.columns_t := add_cols;
  begin
    
    for i in 1 .. tab.column.count loop            
      if tab.column(i).description.name member of hide_cols then
        tab.column(i).for_read := false;
        tab.column(i).pass_through := false;
      end if;
    end loop;                                     
  
    for i in 1 .. col_list.count loop

      new_cols(i) := dbms_tf.column_metadata_t (
        name => col_list(i), 
        type => dbms_tf.type_number
      );

    end loop;                                 

    return dbms_tf.describe_t ( 
      new_columns => new_cols
    );

  end describe;

Lines 10–15 of the describe function remove any of the table’s columns that are in the hide_cols list. The code sets the PASS_THROUGH and FOR_READ properties of the table’s columns.

The properties work as follows:

  • PASS_THROUGH determines whether a column appears in the output.

  • FOR_READ determines whether the fetch_rows procedure can reference the column values.

You can set these values independently. So, for example, you can set the PASS_THROUGH column value to TRUE and the FOR_READ column value to FALSE and vice versa. This PTF removes columns from the results, and it doesn’t need those column values to define new columns, so you can set both properties to FALSE.

Note that the table parameter also includes an array of its columns, so you can loop through these as shown in line 10. Line 11 checks whether the name of any existing column in the table matches a column in the list to exclude. If a column name matches, lines 12 and 13 set the PASS_THROUGH and FOR_READ properties for it to FALSE.

Next, lines 17–24 define the new columns. The code loops through the array of new_cols, and for each value, it uses DBMS_TF.column_metadata_t to define a new column of type NUMBER. Finally, lines 26–28 return the new column array to the client. This adds the new columns to the query’s output.

Now that I have defined the new columns, it’s time to set their values.

The fetch_rows Procedure

To keep it simple, the fetch_rows procedure will assign a value to each new column according to the following formula:

ROW NUMBER * NEW COLUMN POSITION

So for the first column, the values for rows 1, 2, and 3 will be 1, 2, and 3, respectively. The second column will have the values 2, 4, and 6; the third column will have the values 3, 6, and 9; and so on.

To set these values, I need to loop through the rows and new columns in the result set. I can access these by fetching the current environment, which returns a record that includes two column arrays: a GET_COLUMNS list and a PUT_COLUMNS list.

The columns in the GET_COLUMNS list are the existing columns in the table for which the FOR_READ property is set to TRUE. The columns in the PUT_COLUMNS list include the new columns I defined in the describe function.

The following code sets the new column values:

  procedure fetch_rows 
  as
    env       dbms_tf.env_t;
    col       dbms_tf.tab_number_t;
    last_row  pls_integer := 0;
  begin
    
    env := dbms_tf.get_env();

    for cols in 1 .. env.put_columns.count loop   
    
      dbms_tf.xstore_get('col' || cols, last_row); 

      for rws in 1 .. env.row_count loop
        col ( rws ) := ( rws + last_row ) * cols;
      end loop;
     
      dbms_tf.put_col ( cols, col );

      dbms_tf.xstore_set('col' || cols, last_row + env.row_count);
      
    end loop;                                    

end fetch_rows;

Line 8 gets the compilation state of the PTF by calling DBMS_TF.get_env(). The env variable holds the current environment.

I can now use this to loop through the PUT columns—those added to the result set—and define a value for each row. Lines 10–22 iterate through these columns. Lines 14–16 loop through the current rows, building up an array of values. To return these to the client, line 18 calls DBMS_TF.put_col. This assigns the values in the array to the new column in position N. The column position matches the order in which they were added in the describe function.

Now you might be wondering this: What’s the purpose of the DBMS_TF.xstore* calls?

PTFs process rows in batches, and each new batch runs fetch_rows again. So one call to a PTF may execute fetch_rows many times.

Each call to DBMS_TF.get_env returns the number of rows in that batch—not the total number of rows processed. If the row loop set the column value to ROW NUMBER * COLUMN NUMBER, PTFs processing many rows will reset the counter part way through your results. It’s unlikely you want this to happen.

To avoid this, you can save the state in the execution store. This is a series of key-value pairs that you access with the DBMS_TF.xstore* procedures.

Before assigning the row values, I get the number of the last row processed, which happens on line 12. DBMS_TF.xstore_get looks for the key passed as the first argument. The second argument is an in-out parameter. If the key doesn’t exist, DBMS_TF.xstore_get returns the value passed in for this second parameter.

So DBMS_TF.xstore_get('col' || cols, last_row) searches for the keys col1, col2, and so on. For the first set of rows, these keys don’t exist, so the code returns the initialized value for last_row: which is zero. Otherwise, it returns the position of the last row processed.

Then, after processing all the rows in the current batch, I need to update the execution store to hold the position of the current row in the results, as shown on line 20. Calling DBMS_TF.xstore_set overwrites the current value for any keys that exist.

So that builds the PTF. Now how do you use it?

Calling Polymorphic Table Functions

To use the PTF, simply call it in the FROM clause of your query, passing the name of the table you’re operating on and the lists of columns to add and remove.

To replace the DUMMY column in the DUAL table with one named C1, I run the following:

select * 
from   hide_existing_add_new_cols ( 
  dual, columns ( c1 ), columns ( dummy )
); 
C1   
    1

I can also pass in a subquery to the PTF by defining a common-table expression (CTE):

with rws as ( 
  select dummy from dual
  connect by level <= 4
)
  select * 
  from   hide_existing_add_new_cols ( 
    rws, columns ( c1, c2, c3, c4 ), columns ( not_here )
  );

DUMMY     C1    C2    C3    C4   
X          1     2     3     4 
X          2     4     6     8 
X          3     6     9    12 
X          4     8    12    16

This demonstrates a key feature of SQL: Everything is a table. You can pass views or named CTEs as table inputs to a PTF.

But you can’t pass a subquery directly. The following code will fail:

select * 
from   hide_existing_add_new_cols ( 
  ( select dummy from dual ), 
  columns ( c1, ), 
  columns ( dummy )
);

SQL Error: ORA-00904: : invalid identifier

Table Semantics Versus Row Semantics

To implement this PTF, you need to know only the position of the row in the result set—not the values from other rows. While row semantics are good enough for this PTF implementation, if you’re writing a PTF that calculates results across rows—say, aggregations—you need to use table semantics.

Switching to table semantics PTFs also enables you to define the order in which you process the dataset. So let’s re-create the PTF with table semantics:

create or replace function hide_existing_add_new_cols (
  tab       table, 
  add_cols  columns, 
  hide_cols columns
) return table pipelined 
  table polymorphic 
  using hide_and_add_cols_pkg;
/

This enables you to use the PARTITION BY and ORDER BY clauses when you call the PTF.

To demonstrate this, let’s create a table storing the integers 1 to 6, identifying whether they’re even and including their negative values:

create table integers as
  select level number_value,
         case mod ( level, 2 ) 
           when 0 then 'Y'
           when 1 then 'N'
         end is_even,
         -level negated_value
  from   dual
  connect by level <= 6;

Now I can define the order in which the PTF processes the rows. To do this, I add the ORDER BY clause after the table:

select * 
from   hide_existing_add_new_cols ( 
  integers
    order by negated_value, 
  columns ( c1, c2 ), 
  columns ( is_even )
)
order  by number_value;

NUMBER_VALUE    NEGATED_VALUE    C1    C2   
           1               -1     6    12 
           2               -2     5    10 
           3               -3     4     8 
           4               -4     3     6 
           5               -5     2     4 
           6               -6     1     2

This sorts the input to the PTF by the negated values, so you have descending counters relative to the positive values. Also note that the final ORDER BY for the query is different from the input for the PTF.

As with analytic functions, the PARTITION BY clause carves the input dataset into groups for each value in the columns, so you can have separate counters for the odd and even values by using PARTITION BY is_even:

select * 
from   hide_existing_add_new_cols ( 
  integers
    partition by is_even
    order by number_value, 
  columns ( c1, c2 ), 
  columns ( negated_value )
)
order by is_even, number_value;

   NUMBER_VALUE    IS_EVEN    C1    C2   
              1    N           1     2 
              3    N           2     4 
              5    N           3     6 
              2    Y           1     2 
              4    Y           2     4 
              6    Y           3     6

Partitioning and sorting your dataset this way uses standard functionality. All you need to do is define the PTF with table semantics.

What’s the Point of All This Nonsense?

Clearly PTFs are a powerful feature. With them you change the shape of a result set at runtime.

But you may be wondering: Why bother building one to add row counters? This is much easier to do with the analytic function ROW_NUMBER.

True; my example is a little contrived, but it demonstrates the key features of PTFs. Namely, it demonstrates how to

  • Remove existing columns from a result set in the describe function

  • Add new columns in this function

  • Assign values to new columns in the fetch_rows procedure

  • Use table semantics PTFs to split and sort the input rows

  • Manage execution state using DBMS_TF.xstore* procedures

But what are the real-world use cases for PTFs?

Let’s return to your original requirement: to read a CSV file that has an unknown number of fields and split the values out into separate columns. Doing this without PTFs requires complex dynamic SQL.

But with PTFs in place, you could read the first row of the CSV file and pass it as an argument to the function, then use it to define the columns in your results in the describe function, and split the input rows into the columns in the fetch_rows procedure. Here’s an example of a dynamic CSV-to-column splitter PTF.

Other cases where PTFs may be appropriate include the following:

  • Dynamic PIVOT and UNPIVOT operations.

  • Top-N Plus queries. (Use PTFs in place of a traditional Top-N query, but with an extra summary row, giving totals for all the remaining rows in the dataset.)

  • Executing R scripts against a table, returning the results of the R script.

Summary

I suspect there will be relatively few general-purpose PTF use cases. But I’m also sure many developers will find more weird and wacky ways of using them!

What do you think of PTFs: awesome or awful? Can you come up with other general-purpose use cases? If you have any comments or questions about PTFs, reach out to Chris Saxon on Twitter.

Next Steps

READ an overview of polymorphic table functions.

WORK with the code for this article on Oracle Live SQL.

WORK with a CSV-to-columns converter on Oracle Live SQL.

Illustration by Wes Rowell