X

All Things Database: Education, Best Practices,
Use Cases & More

Data Warehouse and Visualizations for Flight Analytics

Everyone who flies has experienced a flight delay at some point. Delays have negative impacts: for passengers there is nothing worse than being trapped in an airport and for airlines it is lost revenue.

Try a Data Warehouse to Improve Your Analytics Capabilities

Analysts are always looking for answers to reduce flight delays. They want it now. Not six months from now. They are asking for analytical capabilities to discover innovative answers to their questions. While analysts are looking for those data insights, leadership wants insights delivered in a clear and concise format to understand the business. IT can’t deal with difficult to manage legacy approaches requiring expensive teams with highly specialized skills.

Analyzing Flight Delays

For many airlines and passengers, one key performance measure comes to mind more than any other: flight delays. A delay is any period of time which a flight is late. It’s the difference between the time scheduled on your boarding passes and when you actually board the plane.

In this blog, we will target primary risk factors leading to delays and cancellations from the month of January 2018 to understand the five most common types of delays: carrier delay, weather delay, National Air System (NAS) delay, security delay, and late aircraft delay.

Understanding the Data Set

Using data visualizations to identify patterns and analyze flight delays increases the amount of time that planes are in the air with minimal effort by locating the weak links in the chain. For example, the data showed that in January 2018 alone domestic airlines collectively suffered 97,760 delays and 17,169 cancellations of schedule flights. We continued to drill into this raw data set to understand delays by these categories: classification, airport, airline, state, and days of the month. Airlines not only reduce costs by strategically identifying and mitigating major delays, but also monetize their data by targeting key areas where they can most easily improve their service delivery for customers.

We looked at the following questions:

  • What could I be delayed by and how long will my delay take?
  • Which are the best and worst days to fly based on expected delays?
  • Which state has the most flight cancellations?
  • Which airlines operated the most flights and had the most delays?
  • Which airports had the most departures and experienced the most delays?

To get started, we downloaded the public-domain Airline On-Time Performance dataset from Bureau of Transportation Statistics for the month of January 2018. This dataset has 570,119 rows. 

There are multiple ways to upload data to Oracle Cloud for analysis using Oracle Autonomous Data Warehouse. For this example, we uploaded the data pulled from the Bureau of Transportation Statistics and reviewed the data in Data Visualization Desktop.

Here’s a quick snapshot of the data from Data Visualization Desktop:

Observations

1. What could I be delayed by and how long will my delay take?

There are many reasons why you can experience flight delays. This graph showcases carrier, weather, National Air System (NAS), security, and late aircraft delays. We can see that on average the longest delays are caused by late aircraft delays (over 25 minutes) while the shortest delays are caused by security delays (less than a minute).

Challenge: Download the data from the Bureau of Transportation Statistics for the month of December 2017 and do a similar analysis.

 

2. Which are the best and worst days to fly based on expected delays?

For January 2018, the worst days to travel were the 12th and the 17th, where both days had an aggregate of over 7000 hours of flights delays. Our initial hypothesis was there would be a strong correlation between flight delays and number of flights. Our expectation was that a reduction in the number of flights meant reduced strain on capacity leading to a proportionate reduction of delays. However, when we overlaid the number of flights, we found that the number of flights remained relatively stable throughout January 2018. This mean that the flight delays experienced was independent of the number of flights. The best days to fly were the 27th and 31st. On average, flights took off early!

Challenge: Overlay pricing data to identify fluctuations in price depending on the day of the month.

 

3. Which state has the most flight delays?

Florida followed by Illinois and California have the highest total flight delays in the month of January 2018. In Data Visualization Desktop, you can hover over the states to see the exact amount of departure delays.

Challenge: Drill down to a specific day of the month and overlay with weather to show how weather is affecting flight delays.

 

*For visualizations #4, we will use the “My Calculations” tool to determine the total flights operated by each airline. We do this by taking a count of the flights operated by each unique carrier (seen above). In visualization #5, apply “My Calculations” to determine the total flights departing from each airport. 

 

4. Which airlines operated the most flights and had the most delays?

In this visualization, we see the total amount of flights operated by each airline and corresponding delays. Southwest (WN) had the most operated flights at 109,676 flights followed by American Airlines (AA) at 73,598 flights and Delta (DL) at 71,254 flights. The top 3 airlines with the most delays are Skywest (OO) followed by Southwest (WN) and Delta (DL).

Challenge: Based on the data above, which airlines have a disproportionate amount of delays?

 

5. Which airports had the most departures and experienced the most delays?

In this visualization, we see the total flights departing from each origin airport and the corresponding delays. The airports with the most departures flights are Hartsfield–Jackson Atlanta International Airport (ATL) followed by O'Hare International Airport (ORD) and Dallas/Fort Worth International Airport (DFW).

Airports that experienced the most net delays were: O'Hare International Airport (ORD), Hartsfield–Jackson Atlanta International Airport (ATL), and Dallas/Fort Worth International Airport (DFW). In just the month of January, the net delays from O'Hare International Airport totaled just over 397,000 minutes of delay which equates to 276 days.

Challenge: Delays are not only caused by the origin airport but also by the destination airport. Try replicating our results but with destination airports. What observations can you draw from comparing delays from origin and destination airports?

Summary

Oracle Autonomous Database allow users to easily create data marts in the cloud with no specialized DBA skills and generate powerful business insights. It took us less than ten minutes to provision a database and upload data for analysis.

Now you can also leverage the autonomous data warehouse through a cloud trial:

Sign up for your free Autonomous Data Warehouse trial today

Please visit the blogs below for a step-by-step guide on how to start your free cloud trial: upload your data into OCI Object Store, create an Object Store Authentication Token, create a Database Credential for user and load data using the Data Import Wizard in SQL Developer:

Feedback and question welcome. Tell us about the delays you’ve personally experienced!

Written by Sai Valluri and Philip Li

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services