Often you want to summarise time-series data into time intervals such as five, ten or fifteen minutes.
One way to do this is:
Get the minutes between the row’s datetime and a starting value
Normalize these to the interval you want
Add the normalized minutes to the start date
Group the result of this formula and count the rows
Using Oracle SQL, to find the minutes subtract the current datetime value from a start date. For example, the start of the day, which you can get by passing the column to trunc.
This returns the number of days between the times. Convert this to minutes by multiplying up by 1,440 (the number of minutes in a day).
To normalize these minutes to the time interval, use this formula:
floor ( minutes / time_interval ) * time_interval
Putting this all together gives this SQL statement:
with rws as ( select trunc ( datetime ) dy, trunc ( datetime, 'mi' ) mins, 5 / 1440 time_interval from time_data ), intervals as ( select dy + ( floor ( ( mins - dy ) / time_interval ) * time_interval ) start_datetime from rws ) select start_datetime, count(*) from intervals group by start_datetime order by start_datetime;
The expression trunc ( …, 'mi' )
removes seconds from the time. This enables you to use this solution for both date
and timestamp
values.
This is because trunc
always returns a date
value. If you’re working with timestamps
the difference between them is an interval
. This needs different logic to process. Thus using trunc
standardizes the logic whatever the input data types.
The named subquery is to aid readability and maintenance. If you prefer compact SQL, you can condense this all into one query. For example, this groups rows into 10-minute intervals:
select trunc ( datetime ) + ( floor ( ( trunc ( datetime, 'mi' ) - trunc ( datetime ) ) * 1440 / 10 ) * 10 / 1440 ) start_datetime, count (*) from time_data group by trunc ( datetime ) + ( floor ( ( trunc ( datetime, 'mi' ) - trunc ( datetime ) ) * 1440 / 10 ) * 10 / 1440 ) order by start_datetime;
You can use this principle to group rows into buckets of any size up to one day. Instead of hardcoding the duration to five, ten, or fifteen minutes, do the good thing and use bind variables:
with rws as ( select trunc ( datetime ) dy, trunc ( datetime, 'mi' ) mins, :duration_in_minutes / 1440 time_interval from time_data ), intervals as ( select dy + ( floor ( ( mins - dy ) / time_interval ) * time_interval ) start_datetime from rws ) select start_datetime, count(*) from intervals group by start_datetime order by start_datetime;
When calling this SQL, assign the length of each interval to :duration_in_minutes
. Note these assumptions when doing this:
Every interval starts at a multiple of :duration_in_minutes
from midnight
Either the table stores at least one row for every period or it’s OK to have gaps between times
The first could be an issue if you use intervals that are not a factor of 1,440 and the date range spans days. For example, if you want 7-minute periods, the last group of the first day will start at 23:55. The first interval of the second day also starts at midnight. So the last group of each day will be a 5-minute interval!
To overcome this, call trunc
with units greater than the period covered. Also ensure that the start and end dates are within the unit you choose. For example, if the time range is two days you can truncate to the month - provided all the intervals fall in the same month. If the start is on 31st August and the end 1st September, you'll need to choose a larger period such as quarters or years.
The last assumption is easy to overlook. While in busy systems it may be almost certain there are data recorded for every interval, slots could be missing due to outages or bugs. To be sure every period is in the output, you need a different approach.
One way to guarantee you get a row for every interval in the date range is to:
Generate a row for all the intervals you want
Outer join to these the times that are greater than or equal to the start and strictly less than the end of each interval
Which looks like:
with times as ( select to_date ( :start_time, 'DD-MON-YYYY HH24:MI' ) + ( ( level - 1 ) * :duration_in_minutes / 1440 ) dt from dual connect by level <= :number_of_intervals ), intervals as ( select dt start_date, dt + ( :duration_in_minutes / 1440 ) end_date from times ) select start_date, end_date, count ( datetime ) from intervals i left join time_data t on start_date <= datetime and datetime < end_date group by start_date, end_date order by start_date;
The best thing about this solution is it generalizes to intervals of any size. For example, 12 minutes, 90 minutes, or 3 days. As before, set :duration_in_minutes
to the number of minutes in the interval.
The query above accepts the start date as input. You may want to get the initial time from the data itself.
To do this, find the minimum datetime value first and use this as the driving point for the data generation:
with start_date as ( select min ( datetime ) start_date, :duration_in_minutes / 1440 time_interval from time_data ), intervals as ( select start_date + ( ( level - 1 ) * time_interval ) start_date, start_date + ( ( level ) * time_interval ) end_date from start_date connect by level <= :number_of_intervals ) select start_date, count ( datetime ) from intervals left join time_data on start_date <= datetime and datetime < end_date group by start_date order by start_date;
Joining to a generated list of times is a good general case solution. It’s flexible and – most importantly – guaranteed to return a row for every interval in the period.
The drawback is you need to join, and the optimizer can struggle to estimate the number of rows generated, leading to poor execution plans. If you’re 100% certain there will be at least one row in every interval the grouping method is more efficient.
Grouping on the source data alone is also valid if you want to omit any missing intervals!
Sometimes you may want to keep these gaps, but with a twist. Each interval should begin at the time of the next row, not a fixed multiple of the start time.
The solutions so far all assume that the start of each group should be a multiple of the initial time.
For example, if the times are 00:00, 00:07, 00:13, and 00:21, grouping into 10-minute intervals would have three groups starting at 00:00, 00:10, and 00:20. But starting each group from the next available time would produce two groups 00:00 – 00:10, and 00:13 – 00:23!
The challenge here is there could be any size gap between rows. It’s possible to solve this using recursive queries or the model
clause. But I prefer to use SQL pattern matching. This looks like:
select * from time_data match_recognize ( order by datetime measures init.datetime as interval_start, init.datetime + numtodsinterval ( :duration_in_minutes, 'minute' ) as interval_end, count(*) as rw# pattern ( init time_interval* ) define time_interval as datetime < init.datetime + numtodsinterval ( :duration_in_minutes, 'minute' ) );
To understand what’s going on here, look at the pattern
and define
clauses. The database walks through the rows to see which variables they match, working from left to right through the pattern.
For the first row, the database checks if it matches the first variable: init
. This variable is undefined - it's unlisted in the define
clause - so is always true. Thus the first row matches init
.
It then moves to the second row and second variable: time_interval
. This has an asterisk after it. Patterns are regular expressions, meaning time_interval
matches zero or more rows.
Thus the database adds all rows to the group where time_interval
is true. These are rows which are within :duration_in_minutes
of the first row in each group (init.datetime
). To aid readability, I've added a minute interval instead of converting the minutes to a fraction of a day (:duration_in_minutes / 1440
). Use whichever of these methods you prefer.
As soon as the database fetches a row :duration_in_minutes
or more after the start time_interval
is false. This completes the group. The database then starts searching for the pattern again, starting from the init
variable. This variable is always true, so this fetched row becomes the starting point for the next group.
This process repeats until the database has fetched all the rows in the data set.
SQL pattern matching makes it easy to solve other non-standard time grouping needs. For example, Oren Nakdimon shows how you can use this to combine events into groups of equal frequency. Or you may want to group rows that are within N-minutes of the last row.
Instead of grouping rows into fixed size intervals, you may want to combine all rows that fall within twenty minutes of the previous. This could lead to intervals of any size!
The SQL pattern matching method for this is:
select * from time_data match_recognize ( order by datetime measures init.datetime as interval_start, init.datetime + numtodsinterval ( :duration_in_minutes, 'minute' ) as interval_end, count(*) as rw# pattern ( init time_interval* ) define time_interval as datetime < prev ( datetime ) + numtodsinterval ( :duration_in_minutes, 'minute' ) );
This is like the previous match_recognize
solution. All that’s different is a tweak to the time_interval
definition. Instead of grouping rows that are within N-minutes of the first timestamp in the group, it checks the value of the previous row.
There are many ways to group rows into N-minute intervals with SQL. If you’re asked to do this it’s important define exactly how to combine rows and deal with missing data.
Often you’ll need a row for every interval, even when there are gaps in the data. Generating a row per interval, then outer joining your data to this ensures every time is present.
The downside of joining is it adds work to the query. If performance matters and you’re certain there are no gaps in your data, basic grouping is fastest. SQL pattern matching is also efficient, but more importantly enables you to use non-standard grouping logic.
If you'd like to try these examples out, you can get the scripts for this post on Live SQL.
Want to improve your SQL skills? Oracle Dev Gym has a vast library of SQL quizzes to help you learn all about Oracle Database.
Chris Saxon is an Oracle Developer Advocate for SQL. His job is to help you get the best out of the Oracle Database and have fun with SQL!
To help you with this he blogs at All Things SQL. He also creates videos combining SQL and magic on YouTube at the The Magic of SQL.
If you have questions about working with Oracle Database technology, please reach out to him. You can do this via Twitter or on Ask Tom.
Previous Post
Next Post