Finding the Previous Row in a Different Group Using SQL

January 6, 2016 | 10 minute read
Chris Saxon
Developer Advocate
Text Size 100%:

Imagine you have a table with a series of transactions. These are classified into credits or debits:

-------- ----------------- ----------   --------
       6 01-JAN-2015 00:00 CREDIT             40
       7 01-JAN-2015 00:00 CREDIT             90
       8 01-JAN-2015 00:00 DEBIT              80
       5 02-JAN-2015 00:00 CREDIT            200
       4 03-JAN-2015 00:00 DEBIT              99
       3 04-JAN-2015 00:00 CREDIT             50
       2 05-JAN-2015 00:00 DEBIT             200
       1 06-JAN-2015 00:00 DEBIT             100

Your boss has asked you to do the following:

"I want a report showing the ID of the previous transaction of the other type, according to the transaction date. Now!"

So for credits, you need the previous debit trans_id and vice-versa. In the case of ties (transactions at the same time), simply take the highest transaction id. So the output you want looks like:

-------- -------------- ------------- ------ ------------------
       6 01-JAN-2015    CREDIT            40                   
       7 01-JAN-2015    CREDIT            90                   
       8 01-JAN-2015    DEBIT             80                  7
       5 02-JAN-2015    CREDIT           200                  8
       4 03-JAN-2015    DEBIT             99                  5
       3 04-JAN-2015    CREDIT            50                  4
       2 05-JAN-2015    DEBIT            200                  3
       1 06-JAN-2015    DEBIT            100                  3

You need to write the SQL to do this.


Let's look at three different approaches:


First, let's analyze the problem.

You need to find values from previous rows. The obvious way to do this is with lag. Using this as-is doesn't do what you need though. A basic order by trans_datetime returns the value from the immediately preceding row. You need to skip back N rows.

The second parameter for lag is the offset. You can use this to provide the value for N. But the value for N is variable. You don't know how many credits (or debits) there are in the current series. Calculating the value for N is tricky.

Using partition by to group the results by trans_type doesn't help either. This just returns the previous ID of the same type.

So how can you do it?

You need a two-step process. First, check the type of the current transaction. If it's credit, then you want the previous row that was a debit (and the reverse). You can use a case expression to do this.

Next you need to find the trans_id value for this. Lag's out, so what can you use?

The last_value function enables you to look backwards in your results. This returns the final value according to the sorting and windowing you specify. You're looking for the last transaction of the other type. This looks like it fits.

Now put this all together. If your current transaction is credit, you want the last_value of debit transactions (and vice-versa). You can do this as follows:

    when trans_type = 'CREDIT' then
      last_value( case when trans_type = 'DEBIT' then trans_id end )
        over( order by trans_datetime, trans_id )
      last_value( case when trans_type = 'CREDIT' then trans_id end )
        over( order by trans_datetime, trans_id )

By default last_value considers all values, including null. It also has a windowing clause of:

  range between unbounded preceding and current row 

This isn't what you want. The case expression inside the last_value maps the current row to null. This is the final value in the ordered set (so far). So all the values for this are null. To overcome this, you need to skip back to the first non-null entry. You do this by adding the "ignore nulls" clause.

The complete query is then:

  select t.*,
           when trans_type = 'CREDIT' then
             last_value( case when trans_type = 'DEBIT' then trans_id end ) 
               ignore nulls over( order by trans_datetime, trans_id )
             last_value( case when trans_type = 'CREDIT' then trans_id end ) 
               ignore nulls over( order by trans_datetime, trans_id )
         end prev_id_other_trans_type
  from   transactions t
  order  by trans_datetime, trans_id;

This does the job. Yet it takes a bit of thinking to understand the SQL.

Let's look at other ways to see if they are easier to understand.

Model Clause

Ny Nordisk mode, Catwalk by Benjamin Suomela (CC-BY-2.5-DK)


No, not that kind of model!

In 10g, Oracle introduced the model clause. This enables you to create columns using complex formulas. This gives spreadsheet-like abilities in SQL!

You can use it here to do the logic above.

To do this, you need to supply three parts for the model. Dimension by states how you're going to uniquely identify each cell. Measures lists the columns you're returning. Rules tells Oracle how to create the values for the previous ID.

Dimension By

You need to supply this to tell Oracle how to identify each row and column in the results. i.e. a unique identifier.

Row_number(), ordered by transaction_datetime, trans_id does this. To help with switching credits to debits and the reverse, you need a bit more. So also put trans_type in the dimension by, giving:

  dimension by (  
    trans_type tp,   
    row_number() over (order by trans_datetime, trans_id) rn  


Here you just need all the columns in the table and the column we're creating, prev_id_other_type. You need to give this an initial value. It's a number, so zero works fine:

  measures ( trans_datetime, trans_type, trans_id, 0 prev_id_other_type )  


This is where the fun starts. You want to assign the last debit to credits and vice-versa. First, you need to state which cells you're providing values for. This is where the columns you defined in the dimension by come in.

To aid understanding, you can do this in two separate rules. One for CREDIT and one for DEBIT. So you have two assignment targets:

  prev_id_other_type['CREDIT', any] = <expression>,
  prev_id_other_type['DEBIT', any] = <expression>

The any keyword means you want to calculate the values for all the rows.

Now you need to figure out what goes in the expression. In words, this is:

"The most recent prior trans_id of the other type. If there are two transactions at the same time, take the one with the greatest trans_id."

You need the greatest trans_id, which is a clue to use max(). Next you need to state which rows you want the get this max from. For credits, this is the debits. You want to consider all these previous rows. You can do this with the slightly cryptic expression:

  rn <= cv()

CV() returns the current value of the dimension. This is rn. So you're saying look at all rows where the row number is less than or equal to the current. Remember that you defined rn as the order to look at rows (trans_datetime, trans_id). So gives you:

  prev_id_other_type['CREDIT', any] = max(trans_id) ['DEBIT', rn <= cv()]

You can do a similar thing for debits by switching CREDIT and DEBIT in the above.

But this isn't quite right. Transaction dates and ids are in different orders. So the previous row ordered by trans_id isn't necessarily the previous row ordered by trans_datetime.

To overcome this, you can use the last analytic function.

This enables you to return the trans_id of the rows ordered by date. You use it as follows:

  max(trans_id) keep ( dense_rank last order by trans_datetime, trans_id )

This gives a complete solution of:

  select * 
  from   transactions t  
    dimension by (   
      trans_type tp,   
      row_number() over (order by trans_datetime, trans_id) rn  
    measures ( trans_datetime, trans_type, trans_id, 0 prev_id_other_type )  
    rules (  
      prev_id_other_type['CREDIT', any] = 
        max(trans_id) keep (
          dense_rank last order by trans_datetime, trans_id
        ) ['DEBIT', rn <= cv()],
      prev_id_other_type['DEBIT', any] = 
        max(trans_id) keep (
          dense_rank last order by trans_datetime, trans_id
        ) ['CREDIT', rn <= cv()]
  order by trans_datetime, trans_id;

Again, this works. While I think it makes it more obvious you're returning credit ids for debits and vice-versa, exactly what this does is still unclear.

So let's look at the final method, pattern matching.

Pattern Matching (Match_Recognize)


12c brought the match_recognize clause. This enables you to look for patterns in your data using regular expressions.

As with the model clause you need to supply several parts. Let's take a bottom-up approach to using this.

First, you need to define the variables. You want one for each transaction type. This is as easy as stating which trans_type they are equal to:

     c as trans_type = 'CREDIT',
     d as trans_type = 'DEBIT' 

Next, you need to define the pattern you're looking for. You need any sequence of one or more occurrences of either variable. Here you need a regular expression. The | symbol is the choice operator. This matches the expression either side of it. So you want ( c | d ).

You want to match at least one of these. So place the + symbol after it. This gives the following pattern:

  pattern( ( c | d )+ )

You need to return all the rows in the table. So specify

  all rows per match

To finish, state which order you need to process the rows and how to calculate the prev_id_other_type column. As before, the order is simply trans_datetime, trans_id.

As with the model clause, you use the measures clause to define the new column. To "switch over" the transaction types, use the classifier function. This gives the name of the variable that matches the current row. So if a row's trans_type = 'CREDIT', this is C and its D for debits.

To get the previous trans_id of the other type, simply decode classifier. Use this to return the trans_id of the other variable:

  decode(classifier(), 'D', c.trans_id, d.trans_id)

This works because if the current row is a debit, c.trans_id keeps the value from the last credit transaction.

So your final pattern matching SQL is:

  select * 
  from   transactions
    match_recognize (
      order by trans_datetime, trans_id
      measures decode(classifier(), 'D', c.trans_id, d.trans_id) prev_id_other_trans_type
      all rows per match
      pattern( (c|d)+ )
        c as trans_type = 'CREDIT',
        d as trans_type = 'DEBIT' 
  order  by trans_datetime, trans_id;

Phew! That was hard work. You now have three different methods to choose from.

How do you decide which to use?

All need a bit of thinking to understand. Picking one over the other based on clarity comes down to personal preferences. In all cases, good commenting is essential.

Let's take an objective approach. Which is fastest?


All three methods only access the table once. So it comes down to how efficiently each technique can process the data.

To test this, I've loaded the table with 10,000 rows. I ran it on an Oracle Cloud Database (Extreme Performance Service) on version

Let's look at the times for three executions of each.



Run 1: Elapsed: 00:00:15.86
Run 2: Elapsed: 00:00:13.10
Run 3: Elapsed: 00:00:12.84

Mean: 13.93 seconds


Run 1: Elapsed: 00:00:38.73
Run 2: Elapsed: 00:00:35.80
Run 3: Elapsed: 00:00:40.88

Mean: 38.47 seconds

Match Recognize

Run 1: Elapsed: 00:00:12.13
Run 2: Elapsed: 00:00:12.30
Run 3: Elapsed: 00:00:12.21

Mean: 12.21 seconds

The model clause is the clear loser here. It takes around three times longer than the other two approaches. Pattern matching has a slight edge on standard analytics. So if you're on 12c, you may want to go for this method.

Want to try this out for yourself? Have a go on LiveSQL.

What do you think? Which approach do you like best? Are there any better ways to do this?

Please let us know in the comments!

Thanks to Stew Ashton and Rajeshwaran Jeyabal for providing the pattern matching and analytic solutions in this Ask Tom thread.

Chris Saxon

Developer Advocate

Chris Saxon is an Oracle Developer Advocate for SQL. His job is to help you get the best out of the Oracle Database and have fun with SQL!

To help you with this he blogs at All Things SQL. He also creates videos combining SQL and magic on YouTube at the The Magic of SQL.

If you have questions about working with Oracle Database technology, please reach out to him. You can do this via Twitter or on Ask Tom.

Previous Post

Creating Multiple Tables in a Single Transaction

Chris Saxon | 4 min read

Next Post

Why You Can Get ORA-00942 Errors with Flashback Query

Chris Saxon | 3 min read