0% found this document useful (0 votes)
12 views25 pages

Sujay Final Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

Sujay Final Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

FLIGHT FARE PREDICTION

Project Report
Of Major Project

SUBMITTED BY:
Abhay Kumar (2200817)
in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
at

BABA BANDA SINGH BAHADUR ENGINEERING COLLEGE

FATEHGARH SAHIB, PUNJAB (INDIA) 140407

(AFFILIATED TO I.K.G. PUNJAB TECHNICAL UNIVERSITY, KAPURTHALA, PUNJAB (INDIA))

Jan-May 2025

1
CANDIDATE’S DECLARATION

I hereby certify that the project entitled “Flight Fare Prediction” submitted by Sujay Kumar
in partial fulfillment of the requirement for the award of degree of the B.Tech. (Computer
Science & Engineering) submitted in I.K. Gujral Punjab Technical University, Kapurthala at
Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib is an authentic record of
my own work carried out during a period from January, 2025 to May, 2025 under the guidance
of Prof. Kamalpreet Kaur Gurna,(Assistance professor Department of Computer Science &
Engineering). The matter presented in this project has not formed the basis for the award of
any other degree, diploma, fellowship or any other similar titles.

Signature of the Student

Place:

Date:

2
BABA BANDA SINGH BAHADUR ENGINEERING COLLEGE
Approved by AICTE, Govt. Of Punjab, Affiliated to IKGPTU

(Courses Accrediated by NBA (AICTE))


Dr. Lakhvir Singh
Principal

Ref. No. BBSBEC/CSE/MP/2025/914 Date .....................

CERTIFICATE

This is to certify that the project titled “Flight Fare Prediction ” is the bona fide work carried
out by Sujay Kumar in partial fulfillment of the requirement for the award of degree of the
B. Tech. (Computer Science & Engineering) submitted in I.K. Gujral Punjab Technical
University, Kapurthala at Baba Banda Singh Bahadur Engineering
College, Fatehgarh Sahib is an authentic record his/her work carried out during a period from
January, 2025 to May,2025 under the guidance of Prof. Kamalpreet Kaur Gurna, Assistance
Professor, Department of Computer Science & Engineering). The Major Project Viva-Voce
Examination has been held on

Signature of the Guide Signature of the HoD

Department of CSE.

Signature of the Principal

BBSBEC, Fatehgarh Sahib

CHANDIGARH ROAD, FATEHGARH SAHIB-140407 (INDIA)


Ph.: 01763 503056, 503143, 503141 Fax: 01763 50Website:
www.bbsbec.edu.in Email: principal@bbsbec.ac.in

3
Abstract
Airfare pricing is a highly dynamic and complex process influenced by numerous factors such
as travel dates, airline policies, route popularity, number of stops, and time of booking. These
variables create uncertainty for travelers trying to book affordable tickets and for airlines
attempting to optimize their pricing strategies. This project aims to address this challenge by
developing a machine learning-based solution to predict flight fares accurately using historical
data.
The project involves extensive data preprocessing, including handling missing values, encoding
categorical variables, and engineering new features such as flight duration, journey month, and
total stops. Several machine learning algorithms, including Linear Regression, Decision Trees,
Random Forest, and XGBoost, were trained and evaluated using performance metrics like MAE,
MSE, RMSE, and R² Score. Among these, XGBoost demonstrated the best predictive
performance.

Marks to be filled by Guide Marks Obtained


Regularity (8)

Self Motivation and Determination(8)

Working within Team(8)

Total (24)

Signature of the Guide

4
Acknowledgement

I express my sincere gratitude to the I.K. Gujral Punjab Technical University Kapurthala for
giving me the opportunity to work on the Major Project during my 8th semester of BTECH
(CSE) is an important aspect in the field of engineering.

I would like to thank Dr. Lakhvir Singh, Principal and Dr. Jatinder Singh Saini, Head of
Department, CSE at Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib for
their kind support.

I also owe my sincerest gratitude towards Prof. Kamalpreet Kaur Gunna for her valuable
advice and healthy criticism throughout my project which helped me immensely to complete
my work successfully.

I would also like to thank everyone who has knowingly and unknowingly helped me
throughout my work. Last but not least, a word of thanks for the authors of all those books
and papers which I have consulted during my project

5
Index

S.no Content Page


no.

1 Introduction 7

2 Literature Survey 8

3 Data Analysis and Preprocessing 9 & 10

4 Machine Learning Model Development 11 to 13

5 Challenges and Solutions 14 & 15

6 Visualizations and Insights 15 to 17

7 Model Deployment Strategy 18 & 19

8 Results/output 19

9 Conclusion and Recommendations 20

10 Future Enhancements 21 & 22

11 References 23

12 Appendices 24 & 25

6
1. Introduction

Flight ticket pricing is an intricate and unpredictable process, influenced by multiple factors,
including airline policies, seasonality, demand fluctuations, and external conditions such as
fuel prices and geopolitical situations. Due to the dynamic nature of ticket prices, customers
often face challenges in identifying the best time to book flights at the most economical rates.
Airlines, too, need a robust system to optimize pricing strategies while maintaining
competitiveness.

This project aims to leverage machine learning techniques to analyze historical flight fare data
and build a predictive model capable of estimating future flight ticket prices. By applying data
science methodologies, this study examines key attributes such as airline type, date of journey,
source and destination, total stops, and flight duration. The goal is to develop an efficient and
accurate predictive model that can help travelers make informed decisions and airlines
enhance their pricing strategies.

To achieve this, extensive exploratory data analysis (EDA) is performed to understand the
data distribution and uncover hidden patterns. Several machine learning models are
implemented and evaluated to determine the most effective approach for predicting flight
prices. The final model is validated and deployed for real-world usability. Additionally,
potential challenges in data preprocessing, model selection, and deployment strategies are
addressed, along with recommendations for future improvements.

7
2. Literature Survey
The prediction of flight fares has been a subject of interest within both the academic and
commercial sectors due to the complex and dynamic nature of airline pricing strategies.
Several studies and research initiatives have contributed to the development of models and
systems that aim to forecast flight prices accurately.
2.1 Airline Revenue Management (ARM) Systems
Traditional ARM systems use demand forecasting, overbooking, and fare class optimization
to maximize airline revenue. These systems rely heavily on historical data, booking curves,
and market segmentation. While effective, these models lack flexibility and struggle with
adapting to sudden market changes.
2.2 Machine Learning in Fare Prediction
Recent research has applied machine learning techniques to airfare prediction, leveraging
algorithms such as linear regression, decision trees, random forests, and gradient boosting.
For example, Mishra et al. (2019) used regression models on scraped airline data and achieved
moderate success in short-term fare predictions. Similarly, Hossain et al. (2020) implemented
ensemble learning techniques, reporting improved accuracy by combining multiple models.
2.3 Time Series and Price Trend Analysis
Some approaches focus on time series forecasting using ARIMA or LSTM networks to
analyze fare changes over time. These methods capture temporal trends but may lack
contextual awareness of non-time-related features like airline brand or number of stops.\
2.4 Explainability in ML Models
A growing body of literature highlights the importance of model interpretability. Tools like
SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-agnostic
Explanations) help make complex models more transparent, especially when applied in
domains like pricing where user trust is critical.

8
3. Data Analysis and Preprocessing

3.1 Dataset Overview


The dataset consists of key attributes that influence flight ticket prices, such as:

• Airline: The name of the airline operating the flight (e.g., Air India, Indigo, Jet Airways).
• Date_of_Journey: The date when the passenger’s journey starts, which can
impact pricing due to seasonal demand fluctuations.

• Source & Destination: The locations where the journey starts and ends, affecting
the price based on distance and route popularity.

• Route: The flight path, including layovers, which may influence ticket costs.
• Arrival_Time & Departure_Time: Flight times, as peak-hour flights are typically
more expensive.
• Duration: Total flight time, as direct flights are usually costlier than flights with layovers.
• Total_Stops: The number of layovers in a journey, affecting ticket pricing.
• Additional_Info: Information on baggage allowance, meals, and other services.
• Price: The target variable, representing the ticket price.

2.2 Exploratory Data Analysis (EDA)

EDA was conducted to gain insights into the dataset, identify trends, and prepare the data for
modeling. The following steps were carried out:

• Missing Values Handling: Missing data was detected in columns such as Route and Total_Stops.
Missing values were filled using mode imputation or inferred based on other attributes.
• Feature Engineering: Additional features such as month, day, and weekday were extracted from
the Date_of_Journey to identify seasonal and day-wise trends in pricing.

9
• Categorical Encoding: Since the dataset contains categorical variables (e.g., Airline, Source,
Destination), they were transformed using one-hot encoding and label encoding to be used in
machine learning models.

• Outlier Detection: Boxplots and Z-score analysis were used to identify outliers in ticket prices.
Unusual values were either removed or transformed to improve model performance.

• Correlation Analysis: Pearson correlation was used to identify relationships between variables.
Highly correlated variables were examined to avoid redundancy in the model.

• Visualization: Histograms, box plots, and scatter plots were used to analyze data distribution,
price variations among airlines, and trends across different journey dates.

10
4. Machine Learning Model Development

4.1 Model Selection


Selecting the right model is crucial for accurate price prediction. To achieve this, multiple
machine learning models were trained, tested, and compared. The models used for this study
include:
• Linear Regression: A simple regression model that establishes a relationship between
independent variables and price but may not capture complex patterns effectively.

• Random Forest Regressor: An ensemble learning technique that combines multiple decision
trees to improve accuracy and reduce overfitting.

• XGBoost Regressor: A powerful gradient boosting algorithm known for its high performance
and efficiency, making it ideal for structured datasets.

• Decision Tree Regressor: A model that splits the data based on feature importance but is prone
to overfitting.

• Gradient Boosting Regressor: A boosting technique that sequentially improves weak models
to enhance predictive performance.

These models were trained on the dataset after preprocessing steps such as encoding
categorical variables, handling missing values, and normalizing numerical features.
Hyperparameter tuning was performed to optimize model performance.

11
4.2 Model Evaluation
To evaluate the models, performance metrics such as Mean Absolute Error (MAE), Mean
Squared Error (MSE), and R-Squared Score were used:

• MAE (Mean Absolute Error): Measures the average absolute differences between predicted
and actual prices, indicating the model's accuracy.

• MSE (Mean Squared Error): Measures the average squared differences, giving higher weight
to larger errors.

12
• R-Squared Score: Determines how well the model explains the variance in the data.

After extensive evaluation, the XGBoost Regressor demonstrated the best performance,
providing the lowest MAE and MSE while achieving the highest R-Squared Score. This
model effectively captures non-linear relationships within the dataset and generalizes well to
unseen data, making it the most suitable choice for deployment.
Further improvements, such as feature selection, hyperparameter tuning, and additional deep
learning models, could be explored in future iterations to enhance accuracy and efficiency.

13
5. Challenges and Solutions

5.1 Data Imbalance:


One of the primary challenges encountered during model training was an imbalance in the
dataset. Certain airlines had significantly fewer flight records compared to others, leading to
a bias in model predictions. This imbalance can cause the model to favor airlines with more
data, reducing prediction accuracy for underrepresented airlines. To mitigate this,
oversampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE)
were applied to artificially increase the data points for underrepresented classes. Additionally,
under-sampling was considered for majority classes to prevent dominance in predictions.

5.2 Feature Importance:


Feature selection plays a crucial role in improving model accuracy. Certain features such as
Duration and Total_Stops had a high correlation with flight prices, indicating their strong
impact on pricing trends. Feature importance analysis using techniques like SHAP (SHapley
Additive exPlanations) and feature permutation was performed to determine which attributes
contributed the most to price predictions. Redundant or less impactful features were removed
to enhance model efficiency and prevent overfitting. This ensured that the model focused only
on the most relevant predictors, improving generalization to unseen data.

5.3 Hyperparameter Tuning:


Optimizing model parameters is essential to enhance predictive performance. Grid Search and
Randomized Search techniques were employed to fine-tune hyperparameters for models like
Random Forest Regressor and XGBoost Regressor. Parameters such as learning rate,
maximum depth of trees, number of estimators, and minimum samples split were adjusted
iteratively to achieve the best results. The tuning process significantly improved the model’s
accuracy and robustness, reducing errors in prediction and ensuring a more generalized model.

14
6. Visualizations and Insights

Visualizations play a crucial role in understanding data distribution, identifying trends, and
assessing the performance of the machine learning model. The following visualizations were
created to provide deeper insights into the dataset and model evaluation:

• Distribution of Flight Prices: A histogram was used to visualize the spread of ticket prices. This
helped in understanding the price distribution, identifying skewness, and detecting outliers. It
revealed that flight prices vary significantly based on factors such as airline, route, and demand
trends.

15
• Correlation Heatmap: A heatmap was generated to display relationships between numerical
features. It highlighted strong correlations between attributes like flight duration, total stops, and
price. This analysis helped in feature selection by removing highly correlated redundant variables
that might introduce multicollinearity.

• Boxplot of Prices by Airline: A boxplot comparison among different airlines was created to
analyze how ticket prices vary across airlines. The visualization demonstrated that premium
airlines tend to have higher price ranges, whereas budget airlines offer relatively lower fares.
This insight was useful in understanding airline pricing strategies.

16
• Feature Importance Chart: A bar graph showcasing the importance of different features in
predicting flight prices was created. The most influential features were flight duration, number
of stops, and departure time. This visualization helped in refining the model by focusing on the
most impactful variables.

• Predicted vs. Actual Prices: A scatter plot was generated to compare predicted ticket prices with
actual prices. Ideally, points should align along the diagonal, indicating accurate predictions. Any
deviations helped assess the model's errors and potential areas for improvement.

• Flight Duration vs. Price Analysis: A scatter plot was used to analyze the relationship between
flight duration and ticket price. It revealed that non-stop flights typically have higher prices than
connecting flights, even if the duration difference is minimal. This helped reinforce the
significance of the "Total_Stops" and "Duration" features in price prediction.

• Departure Time vs. Price Trends: A line graph was plotted to understand the effect of departure
time on ticket prices. Flights departing at peak hours (morning and evening) were found to have
higher fares compared to off-peak hours. This insight was useful in optimizing flight schedules
for better cost efficiency.

These visualizations not only provided valuable insights into the dataset but also enhanced
model performance by guiding feature selection and improving interpretability. Future work
could incorporate interactive dashboards to allow users to explore these insights dynamically.

17
7. Model Deployment Strategy

To make the predictive model accessible and scalable, several deployment strategies were
considered:

• Flask Web Application: A user-friendly web interface was developed using Flask, allowing
users to input flight details and receive price predictions instantly. This approach ensures a
seamless and interactive user experience.

• API Development: A REST API was built to provide easy access to the predictive model for
external applications. The API allows developers to integrate flight price predictions into existing
booking systems, travel websites, and mobile applications, enhancing usability.

• Cloud Deployment: To ensure scalability and real-time availability, the model was deployed on
cloud platforms such as AWS and Google Cloud. Cloud deployment allows the model to handle
multiple requests simultaneously, reducing latency and improving performance.

• Containerization with Docker: To ensure consistency across different environments, the model
and its dependencies were containerized using Docker. This approach simplifies deployment
across multiple cloud services and on-premise infrastructures.

• Continuous Monitoring and Updates: Model performance is continuously monitored using


logging and performance tracking tools. Periodic updates are made based on real-time data to
ensure predictions remain accurate and relevant.

By implementing these deployment strategies, the flight price prediction model is made
accessible to users while maintaining efficiency, scalability, and ease of integration with
existing travel platforms.

18
8. Rwsults/Output

19
9. Conclusion and Recommendations

The predictive model successfully estimates flight ticket prices, helping travelers plan their
journeys efficiently. The model considers key factors such as airline type, journey date,
departure time, total stops, and duration to provide accurate price predictions. By leveraging
machine learning techniques, it offers a data-driven approach that enhances transparency in
flight pricing and assists customers in making cost-effective travel decisions.
Further improvements can be made by integrating real-time data from airline booking
systems, enabling more accurate predictions. External factors such as weather conditions, fuel
price fluctuations, and demand trends should also be incorporated into the model to refine its
accuracy. Additionally, deep learning techniques, such as recurrent neural networks (RNNs)
or transformers, could be explored to capture complex patterns in flight price trends.

20
10. Future Enhancements

To further improve the model and its usability, the following enhancements are suggested:

• Integration of Real-Time Data: Currently, the model relies on historical data for price
prediction. Integrating real-time flight ticket prices from airline websites and travel agencies
would enhance accuracy by capturing live market trends and fluctuations. Web scraping
techniques or API integrations can be used for this purpose.

• Inclusion of Weather Data: Weather conditions can significantly impact flight schedules and
prices. Incorporating meteorological data, such as severe weather warnings or seasonal climate
variations, could refine the model’s predictive capability by identifying patterns related to flight
delays or cancellations.

• Customer Demand Analysis: Seasonal trends and demand fluctuations play a crucial role in
pricing. Analyzing peak travel seasons, holiday periods, and economic conditions affecting travel
demand can help improve price prediction. Time series analysis techniques can be employed to
factor in these variations.

• Deep Learning Implementation: While the current model leverages machine learning
algorithms, further enhancements could include the use of deep learning techniques such as
recurrent neural networks (RNNs) or transformers. These models can identify complex patterns
in pricing trends and improve long-term forecasting.

• Enhanced Feature Engineering: Additional factors such as fuel price trends, economic
indicators, and airline promotional strategies can be incorporated into the model. These factors
can provide a more comprehensive understanding of flight pricing.

21
• User-Friendly Web Application: Expanding the deployment strategy to include a full-stack web
application with interactive dashboards would improve accessibility. Users could visualize flight
price trends, input custom parameters, and receive real-time insights tailored to their travel needs.

By implementing these enhancements, the model can evolve into a more robust, accurate, and
user-centric prediction system, making airfare estimation even more reliable for travelers and
industry stakeholders.

22
11. References

• Doganis, R. (2019). The Airline Business. Routledge. This book explores airline pricing
strategies and revenue management, providing insights into the factors affecting flight fares.
• IATA. (2022). Dynamic Pricing in Airline Industry. International Air Transport Association. A
comprehensive report on how airlines use machine learning to optimize ticket prices in real
time.
• Hastie, T., Tibshirani, R., & Friedman, J. (2017). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer. This book provides a deep understanding of
machine learning techniques used for predictive modeling, including price forecasting.
• Dataset Link: Flight Fare Dataset

23
12. Appendices

A. Code Snippets

24
B. Data Dictionary

25

You might also like