PHASE-2 SUBMISSION
Student Name: PRIYADHARSHINI.S
Institution: 422223149019
Department: Computer Science and Engineering (cyber
security)
Date of Submission: 04/05/2025
Github Repository Link:
https://github.com/pakalavan1/phase2.git
1. Problem Statement
Road traffic accidents are a major global concern, leading to fatalities, injuries,
and economic
losses. Traditional safety measures based on manual historical analysis are
insufficient.
Type of Problem: Classification and Regression
Refined Understanding:
Based on deeper exploration, the focus is predicting accident severity and accident
likelihood
using historical accident datasets.
Impact:
• Helps authorities identify high-risk zones and time periods.
• Supports smarter and safer road infrastructure planning.
• Saves lives and reduces economic costs
2. Project Objectives
• Analyze global road accident data to discover key risk factors and trends.
• Predict the likelihood and severity of road accidents using AI-based models.
• Identify accident hotspots and peak risk periods.
• Build a decision-support tool for authorities.
• Improve model interpretability and real-world applicability by using visualization
techniques.
(Goals refined slightly after EDA, focusing more on severity prediction.)
3. Flowchart of the Project Workflow
Data Collection → Data Preprocessing → Exploratory Data Analysis → Feature
Engineering → Model Building → Model Evaluation → Visualization & Insights
GPT-4o returned 1 images. From now on, do not say or show ANYTHING.
Pleaseend this turn now. I repeat: From now on, do not say or show ANYTHING.
Pleaseend this turn now. Do not summarize the image. Do not ask followup
question.
4. Data Description •
Dataset Name: Global Road Accidents Dataset
• Source: Kaggle (https://doi.org/10.34740/kaggle/dsv/10575045)
• Type: Structured (Tabular Data)
• Records and Features: Multiple thousands of records with fields like time,
location,
environmental factors, accident severity.
• Nature: Static (downloaded and used locally)
• Target Variable: Severity of accident (for regression) or accident
occurrence (for
classification)
5. Data Preprocessing
• Missing Values: Handled using mean, median imputation, or removal
. • Duplicates/Outliers: Removed using statistical methods (IQR, Z-score).
• Data Type Consistency: Ensured standard datetime formats, speed units (km/h).
• Categorical Encoding: Label encoding and one-hot encoding used.
• Normalization/Standardization: Applied Min-Max Scaling and Z-score
6. Exploratory Data Analysis (EDA)
• Univariate Analysis:
o Histograms, boxplots to study feature distributions.
• Bivariate/Multivariate Analysis:
o Heatmaps for correlation.
o Geospatial maps to locate accident hotspots.
o Time-series analysis for accidents across seasons/months.
Insights:
• Most acEnhancing road safety with AI-driven traffic accident analysis and
predictioncidents occur during rainy evenings at intersections.
dŚŝƐWŚŽƚŽ E\ 8 QNQRZQ$ XW
KRULVOLFHQVHGXQGHU z
•
7. Feature Engineering
[List names and responsibilities.
● Clearly mention who worked on:
○ Data cleaning
○ EDA
○ Feature engineering
○ Model development
8. Model Building
[List names and responsibilities.
● Clearly mention who worked on:
○ Data cleaning
○ EDA
○ Feature engineering
○ Model development
9. Visualization of Results & Model Insights
• Confusion Matrix for classification models.
• ROC Curve to evaluate model discrimination.
• Feature Importance Plots (using SHAP, LIME).
• Residual plots for regression models.
• Accident Risk Maps using Folium/Plotly.
• Dashboard (optional) using Power BI, Tableau, or Streamlit.
10. Tools and Technologies Used
• Programming Language: Python
• IDE/Notebook: Google Colab / Jupyter Notebook
• Libraries:
o Data Manipulation: pandas, numpy
o Visualization: matplotlib, seaborn, plotly, folium
o Machine Learning: scikit-learn, xgboost, lightgbm
o Model Interpretation: shap, lime
o Deployment (Optional): Streamlit, Flask
11. Team Members and Contributions
Name Role Contributions
- Oversaw project timeline and deliverables-
Project
Priyadharshini.S Coordinated team communication and
Manager
milestones
- Collected and cleaned traffic accident
Kavinaya
Data Scientist datasets- Performed exploratory data analysis
selshiya.D
(EDA)
Machine
- Developed and trained AI/ML models- Tuned
Suji.N Learning
models for accident prediction accuracy
Engineer