Skip to content

darktheDE/airline-dwh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airline Flight Operations & Asset Health Data Warehouse

This project implements a comprehensive Data Warehouse (DWH) system designed to analyze U.S. domestic flight performance integrated with technical aircraft registration data. By combining transactional flight data with the FAA Aircraft Registry, the system provides a 360-degree view of operational efficiency, financial impacts of delays, and asset maintenance lifecycles.

System Architecture

Main Dashboard

📌 Project Overview

The aviation industry incurs billions of dollars in losses annually due to flight delays and cancellations. This project moves beyond simple descriptive statistics by implementing a multidimensional model that correlates flight logs with aircraft age, engine types, and manufacturer data.

Key Business Objectives:

  • Operational Excellence: Evaluate On-Time Performance (OTP) and identify airport bottlenecks.

  • Financial Impact: Quantify estimated revenue loss based on delay duration and cancellation rates.

  • Asset Health & Maintenance: Analyze the correlation between aircraft age/flight hours and technical incident frequency.

  • Ground Operation Efficiency: Measure turnaround time performance at major hubs.

🏗 System Architecture

The system follows the Kimball Lifecycle methodology, utilizing a robust ETL pipeline:

  1. Data Sources: CSV datasets (Kaggle 2015 Flights) & FAA Registry (Master Data).
  2. OLTP Layer: Initial data ingestion into a relational MS SQL Server database.
  3. Staging Area: Data cleaning, type casting, and structural normalization.
  4. Data Warehouse (DWH): Multidimensional Star Schema with surrogate keys.
  5. OLAP Cube (SSAS): Multidimensional cubes for high-performance analytical querying.
  6. Visualization (Power BI): Executive dashboards for decision support.

🖼 Main Dashboard

Tech: Live Connection to SSAS (OLAP)

The project features a high-performance Executive Dashboard that provides real-time insights derived from the multidimensional cube. It is designed to answer critical business questions across three different Fact grains:

  • Insight 1: Financial Loss & Delay by State
    • Question: Which U.S. states incur the highest economic impact and arrival delay?
    • Visual: Clustered Bar Chart (Total Loss vs. Total Arr Delay).
  • Insight 2: Delay Root Cause Breakdown
    • Question: What are the primary factors (Weather, Carrier, NAS, Late Aircraft) causing delays for each airline?
    • Visual: Stacked Bar Chart with categorical delay measures.
  • Insight 3: Monthly Fleet Activity & Trends
    • Question: How do flight volumes and total delay minutes correlate seasonally?
    • Visual: Line and Stacked Column Chart (Flights count vs. Delay minutes).
  • Insight 4: Ground Turnaround Efficiency
    • Question: Which cities have the highest cumulative ground turnaround delay bottlenecks?
    • Visual: Top 10 Ranking Chart for Turnaround Variance.

📊 Dimensional Modeling

The project utilizes a Star Schema with Conformed Dimensions to ensure consistency across all Fact tables.

Dimensions:

  • Dim_Date & Dim_Time: Hierarchical time dimensions (Year > Quarter > Month > Day).
  • Dim_Airport: Geographic hierarchy (State > City > Airport). (SCD Type 1).
  • Dim_Airline: Carrier identification and categorization. (SCD Type 1).
  • Dim_Aircraft: Detailed technical specifications (Manufacturer, Model, Engine Type). (SCD Type 2) to track historical changes in engine upgrades or ownership.

Fact Tables:

  1. Fact_Flight_Transaction: (Transaction Fact) Records every individual flight leg and its associated delays.
  2. Fact_Aircraft_Daily_Snapshot: (Periodic Snapshot) Aggregates daily usage, flight cycles, and technical incidents per aircraft.
  3. Fact_Turnaround_Efficiency: (Accumulating Snapshot) Tracks the ground time between a flight’s arrival and its next departure.

⚙ ETL Strategy (SSIS)

The ETL process is built using SQL Server Integration Services (SSIS) with a focus on performance and reliability:

  • Incremental Loading: Implemented via a Watermark table logic. The system only extracts records where Updated_Date > Last_Load_Date.
  • Data Quality: Automated handling of NULL values, trimming trailing spaces from FAA data, and standardizing time formats (Float to HH:MM).
  • SCD Logic: Built-in SCD components to manage historical tracking for the Aircraft dimension.
  • Advanced SQL: Utilized Window Functions (LAG()) in the staging phase to calculate turnaround durations.

🧠 Analysis & Insights (SSAS & SQL)

The SSAS Multidimensional Cube provides the semantic layer for complex analytical queries. Business insights are delivered via SQL and visualized in Power BI:

  • Financial Loss by State: Identifying which U.S. states incur the highest estimated financial losses and arrival delay minutes.
  • Delay Root Cause by Airline: Breaking down Weather, Carrier, NAS, and Late Aircraft delay minutes per airline to pinpoint operational weaknesses.
  • Monthly Fleet Activity Trend: Tracking daily flight count vs. total delay minutes per month to reveal seasonal patterns.
  • Turnaround Variance by City: Ranking cities by cumulative ground turnaround delay to identify Ground Operations bottlenecks.

🚀 Tech Stack

  • Database: Microsoft SQL Server 2022
  • ETL: SQL Server Integration Services (SSIS)
  • OLAP: SQL Server Analysis Services (SSAS)
  • BI & Viz: Microsoft Power BI Desktop
  • Design Tools: Kimball Dimensional Modeling Workbooks (High-Level & Detailed)

📁 Repository Structure

.
├── Dashboard/              # Power BI reports and visualizations (.pbix, .pdf)
├── Data/                   # Raw datasets (Not tracked in Git, manually placed)
│   ├── 2015-flight-delays-and-cancellations/
│   └── faa-aircraft-registry/
├── Docs/                   # Project documentation, diagrams, and tasks
│   └── tasks/              # Step-by-step implementation logs
├── SQL_Script/             # DDL/DML scripts for OLTP, Staging, and DWH
├── SSAS_Cube/              # Analysis Services multidimensional project
├── SSIS_Package/           # Integration Services ETL packages
└── README.md

📥 Data Preparation

To run this project, you need to download the datasets and place them in the correct directory.

1. Download Links:

2. Manual Placement Instruction:

Extract and place the downloaded files into:

  • Flight Data: Data/2015-flight-delays-and-cancellations/ (airlines.csv, airports.csv, flights.csv)
  • FAA Registry: Data/faa-aircraft-registry/ (ACFTREF.txt, MASTER.txt)

👥 Team Contribution (Group 22)

This project was developed following Scrum principles across 4 Sprints:

  • Đỗ Kiến Hưng (23133030): Project Lead, System Architecture, Incremental Load & CDC, ETL Fact_Flight_Transaction, SQL Insight Analytics & Power BI Dashboard.
  • Phan Trọng Phú (23133056): Dataset Analysis, SCD Type 1 & Type 2 Mechanism, ETL Dim_Airport, Dim_Airline, Dim_Aircraft, ETL Fact_Turnaround_Efficiency.
  • Phan Trọng Quí (23133061): OLTP & DWH Database Design, Kimball Bus Matrix & Dimensional Modeling, SSIS Project Configuration, ETL Dim_Date, Dim_Time, ETL Fact_Aircraft_Daily_Snapshot.

Task contribution: Plane.so

About

Final project for the Data Warehouse course | Team of 3 | HCM-UTE. This project implements a comprehensive Data Warehouse (DWH) system designed to analyze U.S. domestic flight performance integrated with technical aircraft registration data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages