M.A.
M COLLEGE OF ENGINEERING AND TECHNOLOGY
SIRUGANUR, TRICHY-621105
DEPARTMENT OF INFORMATION TECHNOLOGY
NAAN MUDHALVAN
NM1022 – EXPERIENCED BASED PROJECT LEARNING
LABORATORY
TITILE: DATA ANALYTICS-PROJECT
DEVELOPMENTENHANCING PUBLIC TRANSPORTATION
EFFICIENCY AND ACCESSIBILITY THROUGH DATA ANALYSIS
Submitted by
subashini.S- 812022205051
ANNA UNIVERSITY
CHENNAI
2023- 2024
CERTIFICATE
This is to certify that the bonafide record of this work is done by Selvan
/ Selvi:………………………………….. Reg. No………………………............of II-year IV
semester in Information Technology for NM1022 – Experienced Based Project Learning
Laboratory during the academic year……………………………………….
Faculty-in-charge HoD
Submitted for the University Practical Examination held on…………………………….
INTERNAL EXAMINER EXTERNAL EXAMINER
ABSTRACT:
In modern urban environments, efficient and accessible transportation systems
are critical for economic growth, social inclusion, and environmental
sustainability. This project explores how data analytics can enhance public
transportation systems by improving efficiency and accessibility. By leveraging
large datasets from various sources such as public transit schedules, real-time
traffic information, demographic data, and user feedback, this study aims to
identify patterns, optimize routes, and propose data-driven strategies for public
transportation management.
Key methodologies employed include descriptive analytics to understand current
usage patterns, predictive analytics to forecast demand and potential disruptions,
and prescriptive analytics to recommend actionable improvements. Geographic
Information Systems (GIS) are used to visualize spatial data, highlighting areas
with inadequate service and suggesting enhancements.
The results demonstrate that data analytics can significantly contribute to
reducing travel time, increasing reliability, and making transportation more
equitable. Specific recommendations include adjusting schedules based on peak
usage times, reallocating resources to underserved areas, and integrating real-time
data into transit apps to provide users with timely updates. This study underscores
the potential of data-driven decision-making in transforming public
transportation, ultimately contributing to a more connected and efficient urban
landscape.
CHAPTER CONTENT PAGE NO
1 Problem Statement 05
2 Methodology 06
3 Existing Work 08
4 Program & Output 09
5 Proposed work 11
6 Dashboard creation link 16
6 Live sessions 17
7 Action plan 21
8 Conclusion 22
10 References 23
CHAPTER 1
PROBLEM STATEMENT:
The efficiency and accessibility of public transportation are critical components
in urban development and quality of life. Despite investments in public transit
infrastructure, many cities still face challenges in optimizing their systems.
Common issues include inconsistent service, long wait times, overcrowding, and
inadequate coverage, which can deter usage and contribute to increased traffic
congestion and pollution. Additionally, access to reliable and affordable public
transportation is often uneven, disproportionately affecting low-income and
marginalized communities.
The objective is to enhance the efficiency and accessibility of public
transportation through comprehensive data analysis. By leveraging data on
ridership patterns, traffic flows, demographic information, and existing transit
operations, we aim to identify key areas for improvement, optimize routes and
schedules, and develop strategies to ensure equitable access. This approach seeks
to create a more efficient, reliable, and inclusive public transportation system that
meets the needs of all residents.
CHAPTER 2
METHODOLOGY:
The methodology for this project involves a multi-phase approach leveraging
various data analytics techniques to enhance the efficiency and accessibility of
public transportation systems. The process is structured into the following phases:
1. Data Collection:
▪ Sources: Collect data from diverse sources including public transit
schedules, real-time traffic data, GPS data from public transit vehicles,
demographic information from census databases, and user feedback from
surveys and social media.
▪ Integration: Integrate these datasets into a unified data warehouse to
facilitate comprehensive analysis.
2. Data Cleaning and Preprocessing:
▪ Cleaning:Address missing values, remove duplicates, and correct
inconsistencies in the datasets.
▪ Transformation:Standardize data formats and transform raw data into usable
formats. For instance, convert time-stamped GPS data into route segments
and align traffic data with transit schedules.
3. Descriptive Analytics:
▪ Usage Patterns: Analyze current usage patterns, identifying peak times,
high-traffic routes, and underutilized services.
▪ Performance Metrics: Calculate key performance indicators (KPIs) such as
average travel time, service frequency, and passenger load factors.
4.Predictive Analytics:
▪ Demand Forecasting:Use machine learning models (e.g., time series
analysis, regression models) to forecast future ridership trends and
potential service disruptions.
▪ Scenario Analysis: Conduct what-if analyses to predict the impact of
various changes (e.g., adding new routes, changing schedules) on system
performance.
5. Prescriptive Analytics:
▪ Optimization Models: Develop optimization models to improve route
planning, schedule adjustments, and resource allocation. Linear
programming and genetic algorithms can be utilized for this purpose.
▪ GIS Mapping:Use Geographic Information Systems (GIS) to visualize
spatial data, highlighting areas with inadequate service and suggesting
areas for improvement.
By following this comprehensive methodology, the project aims to create a data-
driven framework for enhancing public transportation systems, making them
more efficient, reliable, and accessible for all users. The iterative nature of the
methodology allows for continuous improvement and adaptation to changing
urban dynamics and user needs.
CHAPTER 3
EXISTING WORK:
The application of data analytics in public transportation is a growing field with
numerous studies and projects demonstrating its potential to enhance system
efficiency and accessibility. This section reviews significant existing work that
has laid the foundation for our project.
1. Real-Time Transit Information Systems:
• Cities like New York, London, and San Francisco have implemented real-
time transit information systems that use GPS data from buses and trains
to provide passengers with up-to-date arrival times and service alerts.
These systems have been shown to reduce wait times and improve user
satisfaction.
2. Predictive Analytics for Demand Forecasting:
• Research by Zhang et al. (2017) applied machine learning models to
forecast public transit demand in Beijing, achieving high accuracy in
predicting ridership levels during different times of the day. Such models
help transit authorities prepare for peak demand periods and allocate
resources more effectively.
3. Route Optimization Algorithms:
• The work of Fan and Machemehl (2006) focused on optimizing bus routes
and schedules using genetic algorithms. Their study demonstrated
significant improvements in service efficiency and passenger convenience
by minimizing travel time and operational costs.
4. Geographic Information Systems (GIS) in Public Transportation:
• Studies like those by Delmelle et al. (2011) have used GIS to map public
transit accessibility and identify gaps in service coverage. These visual
tools help planners understand spatial inequalities in transit access and
prioritize areas for service enhancements.
CHAPTER 4
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
# Load data from CSV file
data = pd.read_csv('transportation_data.csv')
# Print data summary
print(data.head())
print(data.info())
print(data.describe())
# Handle missing values
data.fillna(data.mean(), inplace=True)
# Normalize data
scaler = MinMaxScaler()
data[['passenger_count','travel_time']]=scaler.fit_transform(data[['passenger_co
unt', 'travel_time']])
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1),
data['target'], test_size=0.2, random_state=42)
# Train random forest model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
OUTPUT:
route_id vehicle_id passenger_count travel_time maintenance_time
0 1 1 10 30 60
1 1 1 15 35 65
2 1 1 12 32 62
3 2 2 20 40 70
4 2 2 18 38 68
CHAPTER 5
PROPOSED WORK:
This project aims to leverage data analytics to enhance the efficiency and
accessibility of public transportation systems. The proposed work involves a
systematic approach comprising multiple phases, each focused on utilizing
advanced analytics techniques to address specific challenges in public
transportation.
1. Data Collection and Integration:
Sources: Gather data from a wide array of sources, including public transit
schedules, real-time traffic data, GPS data from transit vehicles, demographic
data, and user feedback from surveys and social media.
2. Data Cleaning and Preprocessing:
Cleaning: Implement robust data cleaning processes to handle missing values,
remove duplicates, and correct inconsistencies.
3. Descriptive Analytics:
current Usage Patterns: Analyze current usage patterns to identify peak usage
times, high-traffic routes, and underutilized services.
4. Predictive Analytics
Demand Forecasting: Utilize machine learning models (e.g., time series
analysis, regression models) to forecast future ridership trends, identify potential
disruptions, and anticipate future demands on the transportation network.
Scenario Analysis: Conduct what-if analyses to predict the impact of various
changes (e.g., introducing new routes, modifying schedules) on system
performance.
SYSTEM REQUIREMENTS:
• Software Components
1. Data Collection
• APIs for Real-time Data: APIs to gather real-time transportation data
from sources like transit authorities, GPS, and IoT devices.
Examples: Google Maps API, OpenTripPlanner, Transport API.
2. Data Processing
• ETL Tools (Extract, Transform, Load): Tools to extract data from
sources, transform it into usable formats, and load it into a database.
Examples: Apache NiFi, Talend, Apache Airflow.
• Data Cleaning and Preprocessing Libraries: Libraries to clean and
preprocess data to ensure quality and consistency.
Examples: Pandas (Python), dplyr (R).
3. Data Analysis:
• Statistical Analysis Software:Software for conducting statistical analyses
and modeling.
Examples: R, Python (with libraries like SciPy, NumPy, StatsModels).
4. Visualization:
• Data Visualization Tools: Tools to create interactive and static
visualizations of transportation data.
Examples: Tableau, Power BI, Plotly (Python), ggplot2 (R).
• Geospatial Analysis Tools: Tools to analyze and visualize geospatial data.
Implementation Details
WORK FLOW DIAGRAM:
DATA COLLECTION AND PRE-PROCESSING:
A. Data Collection
1. Identify Data Sources
• Government Databases: City planning departments, transportation
agencies, census data.
• Public Transportation Systems: Usage data from buses, trains, subways
(e.g., ticket sales, passenger counts)
2. Collect Data
• APIs: Use APIs from transportation departments, weather services, and
mapping services.
• Web Scraping: Collect data from public websites and traffic reports.
• Database Exports: Obtain data exports from local transportation agencies.
• IoT Devices: Integrate data from IoT devices like smart traffic lights and
GPS trackers.
3. Data Integratio
• Merge Datasets: Combine datasets from different sources using common
keys (e.g., geographic coordinates, timestamps).
• Geospatial Data Processing: Use GIS tools to integrate and process
geospatial data (e.g., mapping traffic incidents to specific locations).
B. Data Preparation
1.Create New Features
• Time-Based Features: Extract features like hour of the day, day of the
week, month, which can be crucial for traffic pattern analysis.
• Geospatial Features: Calculate distances between points, create features
based on geographic clusters (e.g., neighborhoods).
• Aggregated Metrics: Create rolling averages, sums, or other aggregations
over time periods (e.g., weekly average traffic volume).
2.Dimensionality Reduction
• Principal Component Analysis (PCA): Reduce the dimensionality of the
dataset while retaining most of the variance.
DATA EXPLORATION (DATA VISUALIZATION & ANALYSIS):
Absolutely, I'd be happy to help with data exploration, visualization, and analysis.
To get started, I'll need a bit more information:
1. Dat: Do you have a dataset you would like to explore? If so, please upload it.
2. Goals: What are your main goals for the exploration and analysis? Are there
specific questions you want to answer or patterns you're interested in?
3.Visualization: Are there particular types of visualizations you have in mind
(e.g., bar charts, scatter plots, heatmaps)?
4. Tools: Are there specific tools or libraries you prefer to use for this analysis
(e.g., Pandas, Matplotlib, Seaborn)?
Once you provide the dataset and details, we can begin the exploration and create
visualizations to uncover insights.
CHAPTER 6
DASHBOARD CREATION LINK:
Creating a dashboard typically involves using a platform or software designed for
data visualization and reporting. Here are some popular tools and their
corresponding links to help you get started:
1. Tablea:
[Tableau Dashboard Creation](https://www.tableau.com/learn/training)
2. Power BI:
[Power BI Dashboard Creation](https://docs.microsoft.com/en-us/power-
bi/create-reports)
3. Google Data Studio:
[Google Data Studio Dashboard Creation](https://datastudio.google.com/)
4. Looker:
[Looker Dashboard Creation](https://cloud.google.com/looker/docs)
CHAPTER 7
LIVE SESSIONS:
REPORTS CREATION LINK & SCREENSHOT:
Creating reports can also be done using various tools and platforms designed for
data analysis and presentation. Here are some popular options along with links to
their guides and resources:
1. Tableau:
[Tableau Report Creation](https://www.tableau.com/learn/training)
2. Power BI:
[Power BI Report Creation](https://docs.microsoft.com/en-us/power-bi/create-
reports)
3.Looker:
[Looker Report Creation](https://cloud.google.com/looker/docs/r/building-
reports)
Story (link & Screenshot)
Data analytics is the process of examining, cleaning, transforming, and modeling
data to discover useful information, draw conclusions, and support decision-
making. It involves a variety of techniques and tools to analyze data and extract
meaningful insights. Here are the key components and steps in data analytics:
Data Transformation: Converting data into a suitable formformat for analysis,
which may include normalization, aggregation, and encoding.
WEB CREATION (LINK& SCREENSHOT) :
https://constructor.university/programs/graduate-education/data-science-for-
society-
business?creative=655481484443&keyword=data%20science%20analytics&ma
tchtype=b&network=g&device=m&utm_source=google&utm_medium=paid&
utm_campaign=3506_CU_GS_GE_DSSB&utm_content=DSSB_Data_Analytic
s_BR&gad_source=1&gclid=CjwKCAjwjeuyBhBuEiwAJ3vuoUQGT56znLWJ
eusPEvAEnOcv14E9UMTWx-cnJIb8-1ZiBKc9xiBIghoCoEIQAvD_BwE
Descriptive Analysis about this dataset (Insights):
Descriptive analysis of a transportation dataset for enhancing public development
through data analysis could include insights such as:
1. Usage Pattern: Identifying peak hours and days for transportation services.
2. Route Efficiency: Analyzing which routes are most heavily used and which
may need optimization.
3. Demographic Trends: Understanding who uses the transportation services,
such as age groups, income levels, etc.
4. Accessibility: Assessing the accessibility of transportation services in different
areas and identifying areas with limited access.
5. Service Reliability: Examining on-time performance and reliability metrics.
6. Mode Preferences: Determining which modes of transportation (bus, train,
subway, etc.) are most popular among commuters.
7. Seasonal Variations: Identifying any seasonal trends or fluctuations in
transportation usage.
8.Impact of Events: Analyzing how events or incidents affect transportation
usage and patterns.
These insights can help policymakers and transportation authorities make data-
driven decisions to improve efficiency and accessibility for the public.
Predictive Analysis from the insights
Enhancing public development through data analytics in transportation efficiency
and accessibility involves gathering and analyzing data to identify patterns and
trends. Predictive analysis uses these insights to forecast future trends and
optimize resources, such as predicting traffic patterns to improve public
transportation routes or estimating demand for specific services to allocate
resources more effectively.
CHAPTER 7
ACTION PLAN:
To enhance transportation efficiency and accessibility through data analysis,
consider the following action plan:
1.Data Collection Gather data from various sources including transportation
agencies, GPS tracking systems, traffic cameras, and mobile apps to collect
comprehensive information on transportation patterns, routes, congestion, and
usage.
2.Data Cleaning and Integration: Cleanse and integrate the collected data to
ensure consistency and reliability. This step involves removing duplicates,
correcting errors, and formatting the data for analysis.
3.Data Analysis Utilize statistical and machine learning techniques to analyze the
data and identify patterns, trends, and areas for improvement in transportation
efficiency and accessibility. This could involve predictive modeling, clustering
analysis, and network optimization algorithms.
4. Identify Key Metrics Define key performance indicators (KPIs) to measure
transportation efficiency and accessibility such as average travel time, congestion
levels, mode share, and service coverage.
CHAPTER 8
CONCLUSION:
In conclusion, the analysis conducted in this project has highlighted the
significant impact of data analytics on enhancing public transportation efficiency
and accessibility. By harnessing data-driven insights, transportation authorities
can optimize routes, schedules, and infrastructure to better serve communities.
Furthermore, leveraging data analytics enables proactive decision-making and
the implementation of targeted interventions to address specific challenges faced
by commuters. Moving forward, continued investment in data analytics will be
crucial for fostering sustainable and inclusive transportation systems that meet
the evolving needs of urban populations.
CHAPTER 8
REFERENCE:
1. Transportation Research Board (TRB): http://www.trb.org/
2. American Public Transportation Association (APTA): http://www.apta.com/
3. Federal Transit Administration (FTA): http://www.transit.dot .gov/
4. Data.gov (Transportation): http: //www.data.gov/transportation
5. Kaggle (Transportation datasets): http://www.kaggle .com/datasets?search=
transportation