Analyzing Formula 1 JSON Data with Oracle SQL/JSON Path Expressions
As part of Oracle JSON Insights, this post show how Oracle AI Database lets you query JSON data directly from SQL using SQL/JSON path expressions. Rather than flattening JSON into relational tables first, you can extract fields, filter documents, project columns, and even perform type-aware comparisons directly from JSON.
In this post we’ll use SQL/JSON path expressions to analyze a Formula 1 stint-level dataset.
The source data is available on GitHub as final_data.csv and has been loaded into an Oracle JSON collection table named f1_stints.
Each row contains one JSON document stored in the data column.
Example document:
{
"bestPreRaceTime": 80.452,
"eventYear": 2021,
"bestLapTimeIsFrom": "Qualifying",
"RoundNumber": 3,
"Stint": 4,
"EventName": "Portuguese Grand Prix",
"Position": 17,
"Team": "Haas F1 Team",
"fuel_bias": 86.35908974358976,
"raceStintsNums": 4,
"deg_bias": 85.33125917911491,
"Rainfall": 0,
"StintLen": 32,
"TyreAge": 1,
"meanTrackTemp": 38.328125,
"meanAirTemp": 19.184375,
"_id": "6a35c694bc11f1a36a82abe6",
"CircuitLength": 4.684,
"": 3348,
"designedLaps": 66,
"lapNumberAtBeginingOfStint": 33,
"GridPosition": 19,
"meanHumid": 40.721875,
"fuel_slope": -0.05593589743589682,
"Compound": "HARD",
"Driver": "MSC",
"deg_slope": -0.018910302945693302
}
Each document describes one race stint, including:
- Event and season
- Driver and team
- Tyre compound
- Stint number
- Grid and finishing positions
- Stint length
- Weather conditions
- Track temperature
- Tyre degradation metrics
We’ll progressively build four SQL examples:
- Simple field access
- Document-level predicates
- Column projection with
JSON_TABLE - JSON item methods
The goal is to answer useful Formula 1 questions while introducing the SQL/JSON path syntax naturally.
Query 1 — Simple Field Access
Let’s begin with the simplest use of SQL/JSON.
Suppose we want to answer:
Which tyre compounds did race winners start on?
Each JSON document contains:
- Compound
- Stint number
- Final finishing position
We can extract those values using JSON_VALUE and then let SQL perform the aggregation.
WITH opening_stints AS (
SELECT
JSON_VALUE(t.data, '$.Compound' RETURNING VARCHAR2(30)) AS compound,
JSON_VALUE(t.data, '$.Position' RETURNING NUMBER) AS finish_position,
JSON_VALUE(t.data, '$.Stint' RETURNING NUMBER) AS stint_number
FROM f1_stints t
)
SELECT
compound,
COUNT(*) AS winner_first_stints
FROM opening_stints
WHERE finish_position = 1
AND stint_number = 1
GROUP BY compound
ORDER BY winner_first_stints DESC, compound;
The JSON paths are intentionally simple:
$.Compound
$.Position
$.Stint
The $ symbol represents the root of the JSON document, and each path simply accesses a field by name.
Result
| Compound | Winner first stints |
|---|---|
| MEDIUM | 34 |
| SOFT | 27 |
| ULTRASOFT | 8 |
| SUPERSOFT | 6 |
| INTERMEDIATE | 5 |
| WET | 2 |
| HYPERSOFT | 2 |
Even this basic example provides an interesting racing insight.
Across this dataset, race winners most often started on MEDIUM tyres rather than the softest available compound.
The important takeaway is that SQL/JSON path expressions don’t have to be complicated. Often they’re simply the bridge that lets SQL work naturally with semi-structured JSON data.
Query 2 — Document-Level Predicates
In the previous query we extracted values first and filtered afterwards.
Sometimes it’s cleaner to filter the JSON document directly.
Let’s ask:
Which race winners started outside the top five?
We’re looking for documents where:
- Position = 1
- Stint = 1
- GridPosition > 5
This is a perfect use case for JSON_EXISTS.
SELECT
JSON_VALUE(t.data, '$.eventYear' RETURNING NUMBER) AS event_year,
JSON_VALUE(t.data, '$.EventName' RETURNING VARCHAR2(100)) AS event_name,
JSON_VALUE(t.data, '$.Driver' RETURNING VARCHAR2(10)) AS driver,
JSON_VALUE(t.data, '$.GridPosition' RETURNING NUMBER) AS grid_position,
JSON_VALUE(t.data, '$.Compound' RETURNING VARCHAR2(30)) AS compound,
JSON_VALUE(t.data, '$.StintLen' RETURNING NUMBER) AS first_stint_laps
FROM f1_stints t
WHERE JSON_EXISTS(
t.data,
'$?(@.Position == 1 && @.Stint == 1 && @.GridPosition > 5)'
)
ORDER BY event_year, event_name;
The key expression is the JSON path predicate:
$?(@.Position == 1 && @.Stint == 1 && @.GridPosition > 5)
Inside the predicate, @ refers to the current JSON document.
Oracle evaluates the predicate and keeps only documents that satisfy all three conditions.
Result
| Year | Race | Driver | Grid | Compound | First stint laps |
|---|---|---|---|---|---|
| 2018 | Chinese Grand Prix | RIC | 6 | ULTRASOFT | 16 |
| 2018 | German Grand Prix | HAM | 14 | SOFT | 41 |
| 2020 | Italian Grand Prix | GAS | 10 | SOFT | 19 |
| 2020 | Turkish Grand Prix | HAM | 6 | WET | 8 |
| 2021 | Azerbaijan Grand Prix | PER | 6 | SOFT | 13 |
| 2021 | Hungarian Grand Prix | OCO | 8 | INTERMEDIATE | 2 |
| 2021 | São Paulo Grand Prix | HAM | 10 | MEDIUM | 7 |
This is where SQL/JSON starts to feel more analytical.
Instead of extracting values first, we’re describing an entire racing scenario and asking Oracle Database to return only the matching JSON documents.
Query 3 — Projecting JSON with JSON_TABLE
When a query needs multiple JSON fields simultaneously, JSON_TABLE provides a much cleaner approach.
It projects JSON into relational columns so the remainder of the query looks like ordinary SQL.
Let’s compare tyre degradation by compound.
Our question is:
Across dry stints of at least 10 laps, which tyre compounds have the highest average degradation slope?
SELECT
jt.compound,
COUNT(*) AS stint_count,
ROUND(AVG(jt.deg_slope), 4) AS avg_deg_slope,
ROUND(STDDEV(jt.deg_slope), 4) AS deg_slope_stddev,
ROUND(MIN(jt.deg_slope), 4) AS best_case,
ROUND(MAX(jt.deg_slope), 4) AS worst_case
FROM f1_stints t,
JSON_TABLE(t.data, '$'
COLUMNS (
compound VARCHAR2(20) PATH '$.Compound',
deg_slope NUMBER PATH '$.deg_slope',
stint_len NUMBER PATH '$.StintLen',
rainfall NUMBER PATH '$.Rainfall'
)
) jt
WHERE
jt.rainfall = 0
AND jt.deg_slope IS NOT NULL
AND jt.stint_len >= 10
AND jt.compound IN (
'SOFT',
'MEDIUM',
'HARD',
'SUPERSOFT',
'ULTRASOFT',
'HYPERSOFT'
)
GROUP BY jt.compound
ORDER BY avg_deg_slope DESC;
The important part is the projection:
JSON_TABLE(t.data, '$'
COLUMNS (
compound VARCHAR2(20) PATH '$.Compound',
deg_slope NUMBER PATH '$.deg_slope',
stint_len NUMBER PATH '$.StintLen',
rainfall NUMBER PATH '$.Rainfall'
)
)
Once projected, the remainder of the query is simply relational SQL using functions like:
AVGSTDDEVMINMAX
Result
| Compound | Stints | Avg deg slope | Stddev |
|---|---|---|---|
| HARD | 820 | -0.0204 | 0.1335 |
| ULTRASOFT | 151 | -0.0311 | 0.1370 |
| MEDIUM | 1083 | -0.0325 | 0.2345 |
| SUPERSOFT | 178 | -0.0459 | 0.2456 |
| SOFT | 874 | -0.0495 | 0.4904 |
| HYPERSOFT | 32 | -0.1902 | 0.7377 |
The HARD compound shows the highest average degradation slope in this filtered sample, while HYPERSOFT has both the lowest average and the smallest sample size.
The standard deviation is equally informative.
SOFT and HYPERSOFT show much wider variation than HARD or ULTRASOFT, suggesting degradation depends not only on tyre compound but also on race conditions, strategy, temperature, traffic, and driver management.
Query 4 — JSON Item Methods
Our final example introduces SQL/JSON item methods.
These methods make type conversion explicit during extraction and filtering.
Let’s ask:
In dry conditions, which slick compounds were stretched the longest?
SELECT
JSON_VALUE(data, '$.Compound.string()' RETURNING VARCHAR2(20)) AS compound,
COUNT(*) AS stint_count,
ROUND(
AVG(JSON_VALUE(data, '$.StintLen.number()' RETURNING NUMBER)),
1
) AS avg_stint_laps,
MAX(
JSON_VALUE(data, '$.StintLen.number()' RETURNING NUMBER)
) AS longest_stint,
ROUND(
AVG(JSON_VALUE(data, '$.meanTrackTemp.number()' RETURNING NUMBER)),
1
) AS avg_track_temp
FROM f1_stints
WHERE JSON_EXISTS(
data,
'$?(
@.Rainfall.number() == 0
&& @.StintLen.number() >= 25
&& @.Compound.string() in ("SOFT", "MEDIUM", "HARD")
)'
)
GROUP BY
JSON_VALUE(data, '$.Compound.string()' RETURNING VARCHAR2(20))
ORDER BY
avg_stint_laps DESC;
Notice the explicit item methods:
$.Compound.string()
$.StintLen.number()
$.meanTrackTemp.number()
The predicate also uses item methods:
$?(
@.Rainfall.number() == 0
&& @.StintLen.number() >= 25
&& @.Compound.string() in ("SOFT","MEDIUM","HARD")
)
These methods don’t change the analytical question—they simply make the intended data types explicit.
This becomes particularly valuable when JSON fields may contain mixed types or require coercion.
Result
| Compound | Stints | Avg laps | Longest stint | Avg track temp |
|---|---|---|---|---|
| HARD | 587 | 35.6 | 66 | 38.5 |
| MEDIUM | 458 | 32.5 | 62 | 36.6 |
| SOFT | 255 | 32.4 | 56 | 37.9 |
This gives us another useful Formula 1 insight.
Among long dry stints, the HARD tyre is consistently stretched the furthest, while MEDIUM and SOFT produce remarkably similar average stint lengths.
Conclusion
These four examples demonstrate a natural progression when working with JSON data in Oracle Database.
- Simple field access is ideal when you only need a handful of values.
- Document-level predicates let the JSON document itself determine whether a row matches.
- JSON_TABLE projects multiple JSON fields into a relational shape for more complex analysis.
- JSON item methods make type handling explicit during filtering, extraction, and aggregation.
The important point is that SQL/JSON path expressions do not replace SQL, they complement it.
JSON path expressions excel at navigating semi-structured documents and expressing document-level conditions. SQL remains the best tool for grouping, joining, aggregating, sorting, and ranking data.
Together, they make it possible to perform rich analytical queries over JSON documents without first flattening the data into a traditional relational schema.
Oracle JSON Insights
Learn from the team that build the features at Oracle. Read articles like this post, watch demos and webcasts, or start to develop using the LiveLabs, all on available from one page: Oracle JSON Insights.
