Many apps show user streaks – how many consecutive days they logged in, their winning streaks or they completed a task. On SQuizL we display your longest and current series of successful guesses.
The question is: how do you find these sequences using SQL?
To celebrate #JoelKallmanDay let's find out!
For example, say users 42 and 3141 both have four separate answer chains:
To find these groups you need to:
This is a job that SQL pattern matching is perfect for.
You can use this statement to find the streaks for each user:
This works by:
partition by user_id)
order by answer_date)
patternfinds any row (
init) followed by zero or more
consecutiveis rows where the
answer_datefor the current row equals the
answer_datefor the previous row plus one (day).
measuresclause defines the output columns. These are the number of rows in each group (
count(*) streak) and it’s start date (
min ( answer_date ) grp_start)
Note this assumes a maximum of one answer per day. If you can have multiple events per day, de-duplicate these in a subquery first.
At this point you have a row per streak per user. Grouping by user and returning the
max ( streak ) gives the length of the longest sequence.
It’s less obvious how to find when this chain started and the current streak’s length. The
last functions – perhaps better known as the
keep clause – are ideal here.
This is an extension to aggregate functions which looks like:
<aggregate function> keep ( dense_rank [ first | last ] order by <expr> )
order by <expr>)
first) or final (
last) values in the ordered group
So to find the longest chain you can:
min) or last (
answer_datefor these rows
The choice of
max matters when there’s a tie for the first or last value. Remember user 3141 has two separate three-day streaks. Using
min returns the start date of the oldest,
max the start date of the newest.
To find the start of the oldest, longest streak use:
min ( grp_start ) keep ( dense_rank last order by streak )
You can use a similar approach to find the length of the current series. These are the rows with the most recent
grp_start. Instead of getting the last of an ascending order, let’s get the first of a descending sequence:
firstvalue for this
max ( streak )
max ( streak ) keep ( dense_rank first order by grp_start desc )
Conceptually taking the last rows of an ascending sort or the first values of a descending order are the same. If the sort columns include
null, check whether you want these at the top or bottom of the data set. You can control this with the
nulls first|last clause if necessary.
Plug these into the pattern matching query above to give this final statement:
And viola: you have the start and length of the longest consecutive sequence, along with the days in the most recent streak!
For more details on using SQL pattern matching, here’s my slides on using this to find patterns in your data:
You can run the queries to find streaks in Live SQL or build the tables with this script:
Want to test your SQL syntax knowledge? Take SQuizL, a daily free guess the SQL statement quiz.
Chris Saxon is an Oracle Developer Advocate for SQL. His job is to help you get the best out of the Oracle Database and have fun with SQL!