How to group rows into 5, 10 or 15-minute intervals with Oracle SQL

August 5, 2022 | 9 minute read
Chris Saxon
Developer Advocate
Text Size 100%:

Often you want to summarise time-series data into time intervals such as five, ten or fifteen minutes.

One way to do this is:

Using Oracle SQL, to find the minutes subtract the current datetime value from a start date. For example, the start of the day, which you can get by passing the column to trunc.

This returns the number of days between the times. Convert this to minutes by multiplying up by 1,440 (the number of minutes in a day).

To normalize these minutes to the time interval, use this formula:

floor ( minutes / time_interval ) * time_interval

Wall clocks
Photo courtesy of Gratisography

Count rows in 5-minute intervals

Putting this all together gives this SQL statement:

with rws as (
  select trunc ( datetime ) dy,
         trunc ( datetime, 'mi' ) mins,
         5 / 1440 time_interval
  from   time_data
), intervals as (
  select dy + (
           floor ( ( mins - dy ) / time_interval ) * time_interval
         ) start_datetime
  from   rws
)
  select start_datetime, count(*) from intervals
  group  by start_datetime
  order  by start_datetime;

The expression trunc ( …, 'mi' ) removes seconds from the time. This enables you to use this solution for both date and timestamp values.

This is because trunc always returns a date value. If you’re working with timestamps the difference between them is an interval. This needs different logic to process. Thus using trunc standardizes the logic whatever the input data types.

The named subquery is to aid readability and maintenance. If you prefer compact SQL, you can condense this all into one query. For example, this groups rows into 10-minute intervals:

select trunc ( datetime ) + (
         floor ( 
           ( trunc ( datetime, 'mi' ) - trunc ( datetime ) ) 
             * 1440 / 10 
         ) * 10 / 1440
       ) start_datetime,
       count (*)
from   time_data
group  by trunc ( datetime ) 
            + ( floor ( 
                  ( trunc ( datetime, 'mi' ) - trunc ( datetime ) ) 
                    * 1440 / 10 
                ) * 10 / 1440 
            )
order  by start_datetime;

You can use this principle to group rows into buckets of any size up to one day. Instead of hardcoding the duration to five, ten, or fifteen minutes, do the good thing and use bind variables:

with rws as (
  select trunc ( datetime ) dy,
         trunc ( datetime, 'mi' ) mins,
         :duration_in_minutes / 1440 time_interval
  from   time_data
), intervals as (
  select dy + (
           floor ( ( mins - dy ) / time_interval ) * time_interval
         ) start_datetime
  from   rws
)
  select start_datetime, count(*) from intervals
  group  by start_datetime
  order  by start_datetime;

When calling this SQL, assign the length of each interval to :duration_in_minutes. Note these assumptions when doing this:

  • Every interval starts at a multiple of :duration_in_minutes from midnight

  • Either the table stores at least one row for every period or it’s OK to have gaps between times

The first could be an issue if you use intervals that are not a factor of 1,440 and the date range spans days. For example, if you want 7-minute periods, the last group of the first day will start at 23:55. The first interval of the second day also starts at midnight. So the last group of each day will be a 5-minute interval!

To overcome this, call trunc with units greater than the period covered. Also ensure that the start and end dates are within the unit you choose. For example, if the time range is two days you can truncate to the month - provided all the intervals fall in the same month. If the start is on 31st August and the end 1st September, you'll need to choose a larger period such as quarters or years.

The last assumption is easy to overlook. While in busy systems it may be almost certain there are data recorded for every interval, slots could be missing due to outages or bugs. To be sure every period is in the output, you need a different approach.

Vintage alarm clock
Photo courtesy of Gratisography

Group rows into 10-minute buckets with sparse data

One way to guarantee you get a row for every interval in the date range is to:

Which looks like:

with times as (
  select to_date ( :start_time, 'DD-MON-YYYY HH24:MI' )
           + ( ( level - 1 ) * :duration_in_minutes / 1440 ) dt
  from   dual
  connect by level <= :number_of_intervals
), intervals as (
  select dt start_date,
         dt + ( :duration_in_minutes / 1440 ) end_date
  from   times
)
  select start_date, end_date, count ( datetime )
  from   intervals i
  left  join time_data t
  on     start_date <= datetime
  and    datetime < end_date
  group  by start_date, end_date
  order  by start_date;

The best thing about this solution is it generalizes to intervals of any size. For example, 12 minutes, 90 minutes, or 3 days. As before, set :duration_in_minutes to the number of minutes in the interval.

The query above accepts the start date as input. You may want to get the initial time from the data itself.

To do this, find the minimum datetime value first and use this as the driving point for the data generation:

with start_date as (
  select min ( datetime ) start_date,
         :duration_in_minutes / 1440 time_interval
  from   time_data
), intervals as (
  select start_date + ( ( level - 1 ) * time_interval ) start_date,
         start_date + ( ( level ) * time_interval ) end_date
  from   start_date
  connect by level <= :number_of_intervals
)
  select start_date, count ( datetime )
  from   intervals
  left  join time_data
  on     start_date <= datetime
  and    datetime < end_date
  group  by start_date
  order  by start_date;

Joining to a generated list of times is a good general case solution. It’s flexible and – most importantly – guaranteed to return a row for every interval in the period.

The drawback is you need to join, and the optimizer can struggle to estimate the number of rows generated, leading to poor execution plans. If you’re 100% certain there will be at least one row in every interval the grouping method is more efficient.

Grouping on the source data alone is also valid if you want to omit any missing intervals!

Sometimes you may want to keep these gaps, but with a twist. Each interval should begin at the time of the next row, not a fixed multiple of the start time.

Combine rows into 15-minute intervals with variable gaps

The solutions so far all assume that the start of each group should be a multiple of the initial time.

For example, if the times are 00:00, 00:07, 00:13, and 00:21, grouping into 10-minute intervals would have three groups starting at 00:00, 00:10, and 00:20. But starting each group from the next available time would produce two groups 00:00 – 00:10, and 00:13 – 00:23!

The challenge here is there could be any size gap between rows. It’s possible to solve this using recursive queries or the model clause. But I prefer to use SQL pattern matching. This looks like:

select *
from   time_data
  match_recognize (
    order by datetime
    measures
      init.datetime as interval_start,
      init.datetime + numtodsinterval (
        :duration_in_minutes, 'minute' 
      ) as interval_end,
      count(*) as rw#
    pattern ( init time_interval* )
    define
      time_interval as datetime < init.datetime 
        + numtodsinterval ( 
          :duration_in_minutes, 'minute' 
        )
  );

To understand what’s going on here, look at the pattern and define clauses. The database walks through the rows to see which variables they match, working from left to right through the pattern.

  • For the first row, the database checks if it matches the first variable: init. This variable is undefined - it's unlisted in the define clause - so is always true. Thus the first row matches init.

  • It then moves to the second row and second variable: time_interval. This has an asterisk after it. Patterns are regular expressions, meaning time_interval matches zero or more rows.

  • Thus the database adds all rows to the group where time_interval is true. These are rows which are within :duration_in_minutes of the first row in each group (init.datetime). To aid readability, I've added a minute interval instead of converting the minutes to a fraction of a day (:duration_in_minutes / 1440). Use whichever of these methods you prefer.

  • As soon as the database fetches a row :duration_in_minutes or more after the start time_interval is false. This completes the group. The database then starts searching for the pattern again, starting from the init variable. This variable is always true, so this fetched row becomes the starting point for the next group.

This process repeats until the database has fetched all the rows in the data set.

SQL pattern matching makes it easy to solve other non-standard time grouping needs. For example, Oren Nakdimon shows how you can use this to combine events into groups of equal frequency. Or you may want to group rows that are within N-minutes of the last row.

Large station clock
Photo courtesy of Gratisography

Group all rows that fall within 20 minutes of the previous

Instead of grouping rows into fixed size intervals, you may want to combine all rows that fall within twenty minutes of the previous. This could lead to intervals of any size!

The SQL pattern matching method for this is:

select *
from   time_data
  match_recognize (
    order by datetime
    measures
      init.datetime as interval_start,
      init.datetime + numtodsinterval ( 
        :duration_in_minutes, 'minute' 
      ) as interval_end,
      count(*) as rw#
    pattern ( init time_interval* )
    define
      time_interval as datetime < prev ( datetime ) 
        + numtodsinterval ( 
          :duration_in_minutes, 'minute' 
        )
  );

This is like the previous match_recognize solution. All that’s different is a tweak to the time_interval definition. Instead of grouping rows that are within N-minutes of the first timestamp in the group, it checks the value of the previous row.

Summary

There are many ways to group rows into N-minute intervals with SQL. If you’re asked to do this it’s important define exactly how to combine rows and deal with missing data.

Often you’ll need a row for every interval, even when there are gaps in the data. Generating a row per interval, then outer joining your data to this ensures every time is present.

The downside of joining is it adds work to the query. If performance matters and you’re certain there are no gaps in your data, basic grouping is fastest. SQL pattern matching is also efficient, but more importantly enables you to use non-standard grouping logic.

If you'd like to try these examples out, you can get the scripts for this post on Live SQL.

Want to improve your SQL skills? Oracle Dev Gym has a vast library of SQL quizzes to help you learn all about Oracle Database.

Chris Saxon

Developer Advocate

Chris Saxon is an Oracle Developer Advocate for SQL. His job is to help you get the best out of the Oracle Database and have fun with SQL!

To help you with this he blogs at All Things SQL. He also creates videos combining SQL and magic on YouTube at the The Magic of SQL.

If you have questions about working with Oracle Database technology, please reach out to him. You can do this via Twitter or on Ask Tom.


Previous Post

New quizzes page, quiz author, and daily featured quizzes on Oracle Dev Gym

Chris Saxon | 4 min read

Next Post


Avoid writing SQL inside loops for fast database code #JoelKallmanDay

Chris Saxon | 4 min read