The Olympics is over for another year. But there's still plenty of time for SQL-style data wrangling of the results!
To do this, I've compiled a table of medal winners from Rio for each sport:
OLYMPIC_YEAR SPORT GENDER EVENT MEDAL NOC ATHLETE
2016 Archery M Men's Individual Gold KOR KU Bonchan
2016 Archery M Men's Individual Silver FRA VALLADONT Jean-Charles
2016 Archery M Men's Individual Bronze USA ELLISON Brady
2016 Archery W Women's Individual Gold KOR CHANG Hyejin
2016 Archery W Women's Individual Silver GER UNRUH Lisa
2016 Archery W Women's Individual Bronze KOR KI Bobae
This is great when looking for a specific result. But what everyone really wants to know how their country fared overall. To get this you need to convert the table above to the final medal table:
Country | Gold | Silver | Bronze |
United States | 46 | 37 | 38 |
Great Britain | 27 | 23 | 17 |
China | 26 | 18 | 26 |
Russia | 19 | 18 | 19 |
Germany | 17 | 10 | 15 |
To do this, you need to count the number of gold, silver and bronze rows for each country. Then create new columns to hold the results.
The question is, how?
This post will teach you. You'll also learn various row and column transformations with SQL including:
If you want to play along you can access the scripts in LiveSQL. Or you can nab the create table scripts at the bottom of this post.
Ready? Let's begin!
Oracle Database 11g introduced the pivot operator. This makes switching rows to columns easy. To use this you need three things:
The value in the new columns must be an aggregate. For example, count, sum, min, etc. Place a pivot clause containing these items after the table name, like so:
select * from table
pivot ( 3 for 1 in (2, 2, 2) );
So to create the final medal table from the raw data, you need to plug in:
Stick it all together and you get:
select * from olympic_medal_winners
pivot (
count(*) for medal in (
'Gold' gold, 'Silver' silver, 'Bronze' bronze
)
)
order by noc
fetch first 6 rows only;
OLYMPIC_YEAR SPORT GENDER EVENT NOC ATHLETE GOLD SILVER BRONZE
2016 Athletics M Men's 1500m ALG MAKHLOUFI Taoufik 0 1 0
2016 Athletics M Men's 800m ALG MAKHLOUFI Taoufik 0 1 0
2016 Hockey M Men ARG Argentina 1 0 0
2016 Judo W Women -48 kg ARG PARETO Paula 1 0 0
2016 Sailing X Nacra 17 Mixed ARG Carranza Saroli 1 0 0
2016 Sailing X Nacra 17 Mixed ARG Lange 1 0 0
Hmmm, that's not right! You wanted the total medals for each country. This is giving the results per athlete!
This is because Oracle adds an implicit group by for all the columns not in the pivot clause. To avoid this, use an inline view that selects just the columns you want in the results:
select * from (
select noc, medal from olympic_medal_winners
)
pivot (
count(*) for medal in (
'Gold' gold, 'Silver' silver, 'Bronze' bronze
)
)
order by 2 desc, 3 desc, 4 desc
fetch first 5 rows only;
NOC GOLD SILVER BRONZE
USA 47 40 40
CHN 31 19 29
GBR 30 26 19
RUS 21 19 19
GER 20 11 17
And voila!
This is looking promising. But it's still not right. China didn't finish second. Team GB did! And all countries have too many medals.
So what's going on?
Doubles events.
In these cases multiple people win a medal. And each person has their own entry in the winners table. So for sports like badminton, tennis and rowing there are multiple rows for some events. But the medal table is counted per event, not per person!
To overcome this, instead of counting all rows you need to get the number of different events. Do this by changing the count(*) to a count (distinct). The distinct contains the columns defining a particular event. In this case that's sport, gender and event.
Include these in inline view and update the pivot clause, giving:
select * from (
select noc, medal, sport, event, gender
from olympic_medal_winners
)
pivot (
count ( distinct sport ||'#'|| event ||'#'||gender )
for medal in (
'Gold' gold, 'Silver' silver, 'Bronze' bronze
)
)
order by 2 desc, 3 desc, 4 desc
fetch first 5 rows only;
NOC GOLD SILVER BRONZE
USA 46 37 38
GBR 27 23 17
CHN 26 18 26
RUS 19 18 19
GER 17 10 15
That's better!
Now you've got your medal table. But if you're a true data geek like me you'll want to include other things in the total. For example:
Including these in the pivot is easy. All you need to do is add more aggregate functions to your SQL!
For the count of sports per medal, this is count(distinct sport). For the athlete names you can use listagg().
select * from (
select noc, medal, sport, event, gender, athlete
from olympic_medal_winners
)
pivot (
count( distinct sport ||'#'|| event ||'#'|| gender ) medals,
count( distinct sport ) sports,
listagg( athlete, ',') within group (order by athlete) athletes
for medal in ( 'Gold' gold )
)
where gold_medals > 1
order by gold_medals, gold_sports, noc
fetch first 5 rows only;
NOC GOLD_MEDALS GOLD_SPORTS GOLD_ATHLETES
RSA 2 1 SEMENYA Caster,VAN NIEKERK Wayde
THA 2 1 SRISURAT Sukanya,TANASAN Sopita
BEL 2 2 THIAM Nafissatou,VAN AVERMAET Greg
DEN 2 2 BLUME Pernille,Denmark
GEO 2 2 KHINCHEGASHVILI Vladimer,TALAKHADZE Lasha
Note that for the athlete's name, you get an entry per medal won. So people like Michael Phelps and Jason Kenny who won multiple medals will appear several times!
To overcome this you can slap a regular expression around the gold_athletes column. For example:
regexp_replace(gold_athletes, '([^,]+)(,\1)+', '\1')
This has to go in the select clause. Not the pivot clause!
When pivoting multiple functions, be aware that Oracle prefixes each generated column with the alias you provide in the "in" clause. Without the alias Oracle uses the values from the source column. So with many aggregates the new columns will all have the same name. This leads to nasty
ORA-00918: column ambiguously defined
errors. So make sure you alias the functions!
The SQL above only shows countries with more than one gold medal. Note the position of the where clause. This is after the pivot. This rule applies even if you're filtering on columns in the original table. For example, if you want to find gold medals won by countries starting with D, stick the where at the end:
select * from (
select noc, medal, sport, event, gender, athlete
from olympic_medal_winners
)
pivot (
count( distinct sport ||'#'|| event ||'#'|| gender ) medals,
count( distinct sport ) sports,
listagg( athlete, ',') within group (order by athlete) athletes
for medal in ( 'Gold' gold )
)
where noc like 'D%'
order by gold_medals;
NOC GOLD_MEDALS GOLD_SPORTS GOLD_ATHLETES
DOM 0 0
DEN 2 2 BLUME Pernille,Denmark
Maybe you don't want to sum, count or otherwise aggregate anything. Perhaps you want a matrix, showing which sports each country won a medal in. Sports across the top, countries down the side. And an X in each cell where that country was a medalist in that sport:
Steps 1 and 2 are easy. But how do you get the X? Remember you have to use an aggregate. So the solution is simply: |
min('X')
Plug those values in and you get:
select * from (
select noc, sport
from olympic_medal_winners
)
pivot (
min('X') for sport in (
'Archery' as arc, 'Athletics' as ath, 'Hockey' as hoc,
'Judo' as jud, 'Sailing' as sai, 'Wrestling' as wre
)
)
order by noc
fetch first 7 rows only;
NOC ARC ATH HOC JUD SAI WRE
ALG X
ARG X X X
ARM X
AUS X X X
AUT X
AZE X X
BAH X
Note you don't have to list out all the sports. Just those you want in your matrix. Rows from other sports are excluded from the final table. This rule applies to all forms of pivot. Only rows which match values in the in list appear in the results.
So far so good. But there are a few issues you may hit when pivoting. The first is that this feature is only available in Oracle Database 11g and above. So if you're on earlier versions you need a different approach.
If pivot isn't available in your database you'll have to do it the old-school way: manually. For each column you want to create in the final results you need to:
This means you'll have several columns in the form:
aggregate_function ( case when pivoting_column = 'VALUE' then pivoted_value end )
For example, the old fashioned way to define columns for the medals table is:
count ( case when medal = 'Gold' then 1 end ) gold_medals,
count ( case when medal = 'Silver' then 1 end ) silver_medals,
count ( case when medal = 'Bronze' then 1 end ) bronze_medals
To split the values out, just add a group by for the columns you want to count each value by. The final SQL statement then looks like:
select noc,
count ( case when medal = 'Gold' then 1 end ) gold_medals,
count ( case when medal = 'Silver' then 1 end ) silver_medals,
count ( case when medal = 'Bronze' then 1 end ) bronze_medals
from olympic_medal_winners
group by noc
order by 2 desc, 3 desc, 4 desc
fetch first 5 rows only;
NOC GOLD_MEDALS SILVER_MEDALS BRONZE_MEDALS
USA 47 40 40
CHN 31 19 29
GBR 30 26 19
RUS 21 19 19
GER 20 11 17
Convoluted, isn't it? And to fix the double counting athletes problem you need an even more obscure:
count (
distinct case
when medal = 'Gold'
then sport ||'#'|| event ||'#'|| gender
end
)
I'm glad the pivot function exists now!
Another hurdle that often comes up is changing the column list dynamically. The Olympic committee has a habit of tweaking which sports are contested each time. So the matrix of sports to country winners may have different columns each time.
For the Olympics this happens at most every four years. So changing your SQL is low effort.
But this could become a pain if the values often change. Fortunately there are ways around this.
If you know the columns will change regularly, it may be best to go with dynamic SQL. With this you generate the pivot values on the fly. All you need is a SQL query providing these in a comma-separated list.
Listagg is perfect here. For example, to generate a list of all the contested sports, use:
select listagg('''' || sport || ''' as ' || sport, ',') within group (order by sport)
from ( select distinct sport from olympic_medal_winners );
You can then pop this into a dynamic SQL statement, such as:
declare
sql_stmt clob;
pivot_clause clob;
begin
select listagg('''' || sport || ''' as "' || sport || '"', ',') within group (order by sport)
into pivot_clause
from (select distinct sport from olympic_medal_winners);
sql_stmt :=
'select * from ( select noc, sport from olympic_medal_winners )
pivot ( count(*) for sport in (' || pivot_clause || ') )';
execute immediate sql_stmt;
end;
/
Like pivot, listagg is only available in 11g and up. So if you're stuck in the dark ages on an earlier version, check out Tom Kyte's stragg function.
This works. But it can be fiddly to write. Especially if you have a complex pivot. And the number of columns can change on each execution. Which makes fetching the results tricky.
On top of all this you've got an extra query to find the columns. So this could slow you down.
Fortunately there's a method that's far easier to write, fetch the results from and is a single query!
To use this, just specify XML after the pivot keyword. Then place a subquery in the values clause. And you're done!
This gives you the results as an XML document. Each column is in an item tag:
|
At this point the Olympic geeks among you may be shouting "But Algeria didn't win any medals in archery or gymnastics in 2016! Why is this giving them one in each?"
Count(*) returns how many rows there are. There's one row per country. So this will give every country at least one medal in each sport!
To avoid this you need to count a specific column. Crucially, this needs to be empty if a country has no medals in a given event. This is because count(column) only returns the number of non-null rows. So if a country has no medals, in a sport it will appear as zero. For example:
select * from (
select noc, sport, athlete
from olympic_medal_winners
)
pivot xml (
count ( athlete ) medal_winners for sport in (
select sport
from olympic_medal_winners
where sport like 'A%')
)
where rownum = 1;
NOC SPORT_XML
ALG <?xml version="1.0" encoding="UTF-8"?>
<PivotSet>
<item>
<column name="SPORT">Archery</column>
<column name="MEDAL_WINNERS">0</column>
</item>
<item>
<column name="SPORT">Artistic Gymnastics</column>
<column name="MEDAL_WINNERS">0</column>
</item>
<item>
<column name="SPORT">Athletics</column>
<column name="MEDAL_WINNERS">2</column>
</item>
</PivotSet>
If you want the XML to include all the sports you can use the any keyword instead:
pivot xml ( count(*) medal_winners for sport in ( any ) )
The great part about this the results will automatically update if the query output changes. So when Armchair Sitting becomes an Olympic sport the XML will include this as an element :)
The terrible part is you get the output in XML! So you're going to have to parse this to extract the values...
But there is another way! Anton Scheffer put together a solution using custom types. This enables you to pivot the results of a query. For example:
select *
from table (
pivot (
'select noc,
medal,
count ( distinct sport ||''#''|| event ||''#''|| gender ) count_medal
from olympic_medal_winners
group by noc, medal'
)
);
This is really cool. But it comes with notable parsing overheads. And if your query creates a large number of columns the SQL it generates may be too large for the variables it uses!
Use with care.
So far we've been working with the raw event results. But what if you've only got the final medal table, like this:
NOC GOLD_MEDALS SILVER_MEDALS BRONZE_MEDALS
USA 46 37 38
GBR 27 23 17
CHN 26 18 26
RUS 19 18 19
GER 17 10 15
...
And you want to write SQL converting the gold, silver and bronze columns into rows?
It's time for the unpivot!
Unpivot works in a similar way to pivot. You need to supply three things:
So for the medal table this is:
select * from olympic_medal_tables
unpivot (
medal_count for medal_colour in (
gold_medals as 'GOLD',
silver_medals as 'SILVER',
bronze_medals as 'BRONZE'
)
)
order by noc
fetch first 6 rows only;
NOC MEDAL_COLOUR MEDAL_COUNT
ALG GOLD 0
ALG BRONZE 0
ALG SILVER 2
ARG GOLD 3
ARG BRONZE 0
ARG SILVER 1
Nice and easy.
But what if there are multiple sets of columns you want to unpivot? For example, medals won and distinct sports they were won in? e.g.
NOC GOLD_MEDALS GOLD_SPORTS SILVER_MEDALS SILVER_SPORTS BRONZE_MEDALS BRONZE_SPORTS
USA 46 14 37 13 38 17
GBR 27 16 23 15 17 11
CHN 26 10 18 11 26 13
RUS 19 10 18 11 19 11
GER 17 9 10 8 15 13
As with pivot, this is easy. If a little cumbersome to write!
First name the two columns you want in the output. Then specify each pair of columns that provide the values for the rows:
select * from olympic_medal_tables
unpivot (
( medal_count, sport_count ) for medal_colour in (
( gold_medals, gold_sports ) as 'GOLD',
( silver_medals, silver_sports ) as 'SILVER',
( bronze_medals, bronze_sports ) as 'BRONZE'
)
)
fetch first 9 rows only;
NOC MEDAL_COLOUR MEDAL_COUNT SPORT_COUNT
USA GOLD 46 14
USA SILVER 37 13
USA BRONZE 38 17
GBR GOLD 27 16
GBR SILVER 23 15
GBR BRONZE 17 11
CHN GOLD 26 10
CHN SILVER 18 11
CHN BRONZE 26 13
Note the aliases for the column pairings. Without these medal column contains the concatenation of the source column names. For example GOLD_MEDALS_GOLD_SPORTS.
Unfortunately you can't split the values back out. So if all you have is the final medal table, you can't go back and get the rows by sport, event and so on.
Unless your source columns are a comma separated list. Say, the names of the athletes who won each medal!
In this case, you can unpivot to get the medals and athlete lists as rows:
select * from olympic_medal_tables
unpivot (
( medal_count, athletes ) for medal_colour in (
( gold_medals, gold_athletes ) as 'GOLD',
( silver_medals, silver_athletes ) as 'SILVER',
( bronze_medals, bronze_athletes ) as 'BRONZE'
)
)
where medal_colour = 'GOLD'
and medal_count = 2
order by noc
fetch first 3 rows only;
NOC MEDAL_COLOUR MEDAL_COUNT ATHLETES
BEL GOLD 2 THIAM Nafissatou,VAN AVERMAET Greg
DEN GOLD 2 BLUME Pernille,Denmark
GEO GOLD 2 KHINCHEGASHVILI Vladimer,TALAKHADZE Lasha
You can now split the names into separate rows. There are several ways to do this. Here I've used Stew Ashton's XQuery tokenizing solution:
with rws as (
select * from olympic_medal_tables
unpivot (
( medal_count, athletes ) for medal_colour in (
( gold_medals, gold_athletes ) as 'GOLD',
( silver_medals, silver_athletes ) as 'SILVER',
( bronze_medals, bronze_athletes ) as 'BRONZE'
)
)
where medal_colour = 'GOLD'
and medal_count = 2
)
select noc, athlete
from rws, xmltable (
'if (contains($X,",")) then ora:tokenize($X,"\,") else $X'
passing athletes as X
columns athlete varchar2(4000) path '.'
)
order by 1, 2
fetch first 6 rows only;
NOC ATHLETE
BEL THIAM Nafissatou
BEL VAN AVERMAET Greg
DEN BLUME Pernille
DEN Denmark
GEO KHINCHEGASHVILI Vladimer
GEO TALAKHADZE Lasha
We've seen how to convert rows into columns. And columns back into rows. But what if you want to both at the same time?
Let's revisit the medals won by each country in each sport table:
NOC ATH GYM CYC BOX SAI
BRA 1 3 0 1 2
CHN 6 2 2 4 1
DEN 1 0 2 0 3
ESP 2 0 0 0 0
ETH 8 0 0 0 0
GRE 1 1 0 0 2
Your boss has asked you to switch it over, so countries are across the top and sports down the side. Like so:
SPORT BRA CHN DEN ESP ETH GRE
Athletics 1 6 1 2 8 1
Artistic Gym 3 2 0 0 0 1
Boxing 1 4 0 0 0 0
Sailing 2 1 3 0 0 2
Track Cycling 0 2 2 0 0 0
Aka a transpose.
But there isn't an in-built transpose function in SQL. So what's the solution?
Pivot and unpivot!
You can chain these together like so:
select * from olympic_country_sport_medals
pivot (
sum(ath) ath, sum(box) box, sum(gym) gym, sum(sai) sai, sum(cyc) cyc
for noc in (
'BRA' BRA, 'CHN' CHN, 'DEN' DEN, 'ESP' ESP, 'ETH' ETH, 'GRE' GRE
)
)
unpivot (
( BRA, CHN, DEN, ESP, ETH, GRE ) for sport in (
(BRA_ATH, CHN_ATH, DEN_ATH, ESP_ATH, ETH_ATH, GRE_ATH) as 'Athletics',
(BRA_GYM, CHN_GYM, DEN_GYM, ESP_GYM, ETH_GYM, GRE_GYM) as 'Artistic Gym',
(BRA_BOX, CHN_BOX, DEN_BOX, ESP_BOX, ETH_BOX, GRE_BOX) as 'Boxing',
(BRA_SAI, CHN_SAI, DEN_SAI, ESP_SAI, ETH_SAI, GRE_SAI) as 'Sailing',
(BRA_CYC, CHN_CYC, DEN_CYC, ESP_CYC, ETH_CYC, GRE_CYC) as 'Track Cycling'
)
);
SPORT BRA CHN DEN ESP ETH GRE
Athletics 1 6 1 2 8 1
Artistic Gym 3 2 0 0 0 1
Boxing 1 4 0 0 0 0
Sailing 2 1 3 0 0 2
Track Cycling 0 2 2 0 0 0
Hmmm, that looks painful. For each extra column you want to transpose, you need to add a sum to the pivot. Then a big list of countries in unpivot's sport list. This is tedious.
Fortunately there is an easier way. Unpivot first, then pivot!
Do this and your SQL is:
select * from olympic_country_sport_medals
unpivot (
(medals) for sport in ( ath, box, gym, sai, cyc )
)
pivot (
sum(medals) for noc in (
'BRA' BRA, 'CHN' CHN, 'DEN' DEN, 'ESP' ESP, 'ETH' ETH, 'GRE' GRE
)
);
SPORT BRA CHN DEN ESP ETH GRE
CYC 0 2 2 0 0 0
BOX 1 4 0 0 0 0
GYM 3 2 0 0 0 1
SAI 2 1 3 0 0 2
ATH 1 6 1 2 8 1
Much better :)
Adding more sports is simply a matter of adding a value to the unpivot. And for more countries you just extend the pivot country list.
You've seen how the SQL pivot and unpivot operators can enable powerful data transformations. Between these two you can swap your rows and columns over however you like!
Just remember the three step process for each. For pivot it's the column(s) you're pivoting on, the values defining the new columns and the functions giving the values for these.
With unpivot you need the column which states the source of the values, what these sources are and the list of columns you want to convert to rows.
All the examples so far are with a single table. Often you'll want to use pivots with joins. You can pivot or unpivot the result of a join, for example:
select * from t1 join t2 on ...
pivot ( ... )
Or
select * from t1 join t2 on ...
unpivot ( ... )
So you can embed these in much larger SQL queries.
If you want the scripts in this post, head over to LiveSQL. There you can find and run the demos on a sample of the data. Or you can use this table:
create table olympic_medal_winners (
olympic_year int,
sport varchar2( 30 ),
gender varchar2( 1 ),
event varchar2( 128 ),
medal varchar2( 10 ),
noc varchar2( 3 ),
athlete varchar2( 128 )
);
You can then create the tables for the unpivot and transpose examples from this like so:
create table olympic_medal_tables as
select * from (
select noc, medal, sport, event, gender
from olympic_medal_winners
)
pivot ( count(distinct sport ||'#'|| event ||'#'||gender ) for medal in (
'Gold' gold_medals, 'Silver' silver_medals, 'Bronze' bronze_medals
))
order by 2 desc, 3 desc, 4 desc;
-- Medal table with distinct sports
drop table olympic_medal_tables purge;
create table olympic_medal_tables as
select * from (
select noc, medal, sport, event, gender
from olympic_medal_winners
)
pivot ( count(distinct sport ||'#'|| event ||'#'||gender ) medals,
count(distinct sport) sports
for medal in (
'Gold' gold, 'Silver' silver, 'Bronze' bronze
))
order by 2 desc, 4 desc, 6 desc;
-- Medal table with athlete names
drop table olympic_medal_tables purge;
create table olympic_medal_tables as
select * from (
select noc, medal, sport, event, gender, athlete
from olympic_medal_winners
)
pivot ( count(distinct sport ||'#'|| event ||'#'||gender ) medals,
listagg(athlete, ',') within group (order by athlete) athletes
for medal in (
'Gold' gold, 'Silver' silver, 'Bronze' bronze
))
order by 2 desc, 4 desc, 6 desc;
create table olympic_country_sport_medals as
select * from (
select noc, sport
from olympic_medal_winners
)
pivot (count(sport) for sport in (
'Athletics' as ath, 'Artistic Gymnastics' as gym, 'Cycling Track' as cyc,
'Boxing' as box, 'Sailing' as sai
)
)
order by 1;
Check the docs for more examples of how to use pivot and unpivot. Or you can head over to Ask Tom where Connor and I can help you write your SQL. Just remember to provide a test case!
Learn more SQL tricks in Databases for Developers: Next Level
Note: the examples use the "fetch first" syntax to get the first N rows. This is only available in Oracle Database 12c and above. If you want to do top-N queries in releases before this, you need to use the rownum trick.
UPDATE 25 Feb, 2019: Fixed link to CSV data file
UPDATE 28 Jul, 2020: Added link to Databases for Developers
UPDATE 1 Apr, 2022: Updating formatting
UPDATE 22 July, 2022: Updated headings
UPDATE 16 Dec, 2022: Updated headings, adding alt text
Chris Saxon is an Oracle Developer Advocate for SQL. His job is to help you get the best out of the Oracle Database and have fun with SQL!
To help you with this he blogs at All Things SQL. He also creates videos combining SQL and magic on YouTube at the The Magic of SQL.
If you have questions about working with Oracle Database technology, please reach out to him. You can do this via Twitter or on Ask Tom.