Analyzing Formula 1 JSON Data with Oracle SQL/JSON Path Expressions

As part of Oracle JSON Insights, this post show how Oracle AI Database lets you query JSON data directly from SQL using SQL/JSON path expressions. Rather than flattening JSON into relational tables first, you can extract fields, filter documents, project columns, and even perform type-aware comparisons directly from JSON.

In this post we’ll use SQL/JSON path expressions to analyze a Formula 1 stint-level dataset.

The source data is available on GitHub as final_data.csv and has been loaded into an Oracle JSON collection table named f1_stints.

Each row contains one JSON document stored in the data column.

Example document:

{
    "bestPreRaceTime": 80.452,
    "eventYear": 2021,
    "bestLapTimeIsFrom": "Qualifying",
    "RoundNumber": 3,
    "Stint": 4,
    "EventName": "Portuguese Grand Prix",
    "Position": 17,
    "Team": "Haas F1 Team",
    "fuel_bias": 86.35908974358976,
    "raceStintsNums": 4,
    "deg_bias": 85.33125917911491,
    "Rainfall": 0,
    "StintLen": 32,
    "TyreAge": 1,
    "meanTrackTemp": 38.328125,
    "meanAirTemp": 19.184375,
    "_id": "6a35c694bc11f1a36a82abe6",
    "CircuitLength": 4.684,
    "": 3348,
    "designedLaps": 66,
    "lapNumberAtBeginingOfStint": 33,
    "GridPosition": 19,
    "meanHumid": 40.721875,
    "fuel_slope": -0.05593589743589682,
    "Compound": "HARD",
    "Driver": "MSC",
    "deg_slope": -0.018910302945693302
}

Each document describes one race stint, including:

  • Event and season
  • Driver and team
  • Tyre compound
  • Stint number
  • Grid and finishing positions
  • Stint length
  • Weather conditions
  • Track temperature
  • Tyre degradation metrics

We’ll progressively build four SQL examples:

  1. Simple field access
  2. Document-level predicates
  3. Column projection with JSON_TABLE
  4. JSON item methods

The goal is to answer useful Formula 1 questions while introducing the SQL/JSON path syntax naturally.


Query 1 — Simple Field Access

Let’s begin with the simplest use of SQL/JSON.

Suppose we want to answer:

Which tyre compounds did race winners start on?

Each JSON document contains:

  • Compound
  • Stint number
  • Final finishing position

We can extract those values using JSON_VALUE and then let SQL perform the aggregation.

WITH opening_stints AS (
  SELECT
    JSON_VALUE(t.data, '$.Compound' RETURNING VARCHAR2(30)) AS compound,
    JSON_VALUE(t.data, '$.Position' RETURNING NUMBER)       AS finish_position,
    JSON_VALUE(t.data, '$.Stint' RETURNING NUMBER)          AS stint_number
  FROM f1_stints t
)
SELECT
  compound,
  COUNT(*) AS winner_first_stints
FROM opening_stints
WHERE finish_position = 1
  AND stint_number = 1
GROUP BY compound
ORDER BY winner_first_stints DESC, compound;

The JSON paths are intentionally simple:

$.Compound
$.Position
$.Stint

The $ symbol represents the root of the JSON document, and each path simply accesses a field by name.

Result

CompoundWinner first stints
MEDIUM34
SOFT27
ULTRASOFT8
SUPERSOFT6
INTERMEDIATE5
WET2
HYPERSOFT2

Even this basic example provides an interesting racing insight.

Across this dataset, race winners most often started on MEDIUM tyres rather than the softest available compound.

The important takeaway is that SQL/JSON path expressions don’t have to be complicated. Often they’re simply the bridge that lets SQL work naturally with semi-structured JSON data.


Query 2 — Document-Level Predicates

In the previous query we extracted values first and filtered afterwards.

Sometimes it’s cleaner to filter the JSON document directly.

Let’s ask:

Which race winners started outside the top five?

We’re looking for documents where:

  • Position = 1
  • Stint = 1
  • GridPosition > 5

This is a perfect use case for JSON_EXISTS.

SELECT
  JSON_VALUE(t.data, '$.eventYear' RETURNING NUMBER)        AS event_year,
  JSON_VALUE(t.data, '$.EventName' RETURNING VARCHAR2(100)) AS event_name,
  JSON_VALUE(t.data, '$.Driver' RETURNING VARCHAR2(10))     AS driver,
  JSON_VALUE(t.data, '$.GridPosition' RETURNING NUMBER)     AS grid_position,
  JSON_VALUE(t.data, '$.Compound' RETURNING VARCHAR2(30))   AS compound,
  JSON_VALUE(t.data, '$.StintLen' RETURNING NUMBER)         AS first_stint_laps
FROM f1_stints t
WHERE JSON_EXISTS(
  t.data,
  '$?(@.Position == 1 && @.Stint == 1 && @.GridPosition > 5)'
)
ORDER BY event_year, event_name;

The key expression is the JSON path predicate:

$?(@.Position == 1 && @.Stint == 1 && @.GridPosition > 5)

Inside the predicate, @ refers to the current JSON document.

Oracle evaluates the predicate and keeps only documents that satisfy all three conditions.

Result

YearRaceDriverGridCompoundFirst stint laps
2018Chinese Grand PrixRIC6ULTRASOFT16
2018German Grand PrixHAM14SOFT41
2020Italian Grand PrixGAS10SOFT19
2020Turkish Grand PrixHAM6WET8
2021Azerbaijan Grand PrixPER6SOFT13
2021Hungarian Grand PrixOCO8INTERMEDIATE2
2021São Paulo Grand PrixHAM10MEDIUM7

This is where SQL/JSON starts to feel more analytical.

Instead of extracting values first, we’re describing an entire racing scenario and asking Oracle Database to return only the matching JSON documents.


Query 3 — Projecting JSON with JSON_TABLE

When a query needs multiple JSON fields simultaneously, JSON_TABLE provides a much cleaner approach.

It projects JSON into relational columns so the remainder of the query looks like ordinary SQL.

Let’s compare tyre degradation by compound.

Our question is:

Across dry stints of at least 10 laps, which tyre compounds have the highest average degradation slope?

SELECT
  jt.compound,
  COUNT(*)                       AS stint_count,
  ROUND(AVG(jt.deg_slope), 4)    AS avg_deg_slope,
  ROUND(STDDEV(jt.deg_slope), 4) AS deg_slope_stddev,
  ROUND(MIN(jt.deg_slope), 4)    AS best_case,
  ROUND(MAX(jt.deg_slope), 4)    AS worst_case
FROM f1_stints t,
  JSON_TABLE(t.data, '$'
    COLUMNS (
      compound  VARCHAR2(20) PATH '$.Compound',
      deg_slope NUMBER       PATH '$.deg_slope',
      stint_len NUMBER       PATH '$.StintLen',
      rainfall  NUMBER       PATH '$.Rainfall'
    )
  ) jt
WHERE
  jt.rainfall = 0
  AND jt.deg_slope IS NOT NULL
  AND jt.stint_len >= 10
  AND jt.compound IN (
    'SOFT',
    'MEDIUM',
    'HARD',
    'SUPERSOFT',
    'ULTRASOFT',
    'HYPERSOFT'
  )
GROUP BY jt.compound
ORDER BY avg_deg_slope DESC;

The important part is the projection:

JSON_TABLE(t.data, '$'
  COLUMNS (
    compound  VARCHAR2(20) PATH '$.Compound',
    deg_slope NUMBER       PATH '$.deg_slope',
    stint_len NUMBER       PATH '$.StintLen',
    rainfall  NUMBER       PATH '$.Rainfall'
  )
)

Once projected, the remainder of the query is simply relational SQL using functions like:

  • AVG
  • STDDEV
  • MIN
  • MAX

Result

CompoundStintsAvg deg slopeStddev
HARD820-0.02040.1335
ULTRASOFT151-0.03110.1370
MEDIUM1083-0.03250.2345
SUPERSOFT178-0.04590.2456
SOFT874-0.04950.4904
HYPERSOFT32-0.19020.7377

The HARD compound shows the highest average degradation slope in this filtered sample, while HYPERSOFT has both the lowest average and the smallest sample size.

The standard deviation is equally informative.

SOFT and HYPERSOFT show much wider variation than HARD or ULTRASOFT, suggesting degradation depends not only on tyre compound but also on race conditions, strategy, temperature, traffic, and driver management.


Query 4 — JSON Item Methods

Our final example introduces SQL/JSON item methods.

These methods make type conversion explicit during extraction and filtering.

Let’s ask:

In dry conditions, which slick compounds were stretched the longest?

SELECT
  JSON_VALUE(data, '$.Compound.string()' RETURNING VARCHAR2(20)) AS compound,

  COUNT(*) AS stint_count,

  ROUND(
    AVG(JSON_VALUE(data, '$.StintLen.number()' RETURNING NUMBER)),
    1
  ) AS avg_stint_laps,

  MAX(
    JSON_VALUE(data, '$.StintLen.number()' RETURNING NUMBER)
  ) AS longest_stint,

  ROUND(
    AVG(JSON_VALUE(data, '$.meanTrackTemp.number()' RETURNING NUMBER)),
    1
  ) AS avg_track_temp

FROM f1_stints
WHERE JSON_EXISTS(
  data,
  '$?(
    @.Rainfall.number() == 0
    && @.StintLen.number() >= 25
    && @.Compound.string() in ("SOFT", "MEDIUM", "HARD")
  )'
)
GROUP BY
  JSON_VALUE(data, '$.Compound.string()' RETURNING VARCHAR2(20))
ORDER BY
  avg_stint_laps DESC;

Notice the explicit item methods:

$.Compound.string()
$.StintLen.number()
$.meanTrackTemp.number()

The predicate also uses item methods:

$?(
  @.Rainfall.number() == 0
  && @.StintLen.number() >= 25
  && @.Compound.string() in ("SOFT","MEDIUM","HARD")
)

These methods don’t change the analytical question—they simply make the intended data types explicit.

This becomes particularly valuable when JSON fields may contain mixed types or require coercion.

Result

CompoundStintsAvg lapsLongest stintAvg track temp
HARD58735.66638.5
MEDIUM45832.56236.6
SOFT25532.45637.9

This gives us another useful Formula 1 insight.

Among long dry stints, the HARD tyre is consistently stretched the furthest, while MEDIUM and SOFT produce remarkably similar average stint lengths.


Conclusion

These four examples demonstrate a natural progression when working with JSON data in Oracle Database.

  • Simple field access is ideal when you only need a handful of values.
  • Document-level predicates let the JSON document itself determine whether a row matches.
  • JSON_TABLE projects multiple JSON fields into a relational shape for more complex analysis.
  • JSON item methods make type handling explicit during filtering, extraction, and aggregation.

The important point is that SQL/JSON path expressions do not replace SQL, they complement it.

JSON path expressions excel at navigating semi-structured documents and expressing document-level conditions. SQL remains the best tool for grouping, joining, aggregating, sorting, and ranking data.

Together, they make it possible to perform rich analytical queries over JSON documents without first flattening the data into a traditional relational schema.

Oracle JSON Insights

Learn from the team that build the features at Oracle. Read articles like this post, watch demos and webcasts, or start to develop using the LiveLabs, all on available from one page: Oracle JSON Insights.