Generative AI for the Simulation of Autonomously Driven
Vehicles
A PROJECT REPORT
Submitted by
Ankur Gangwar(22BCS50119)
Vipul Mishra(22BCS17052)
in partial fulfillment for the award of the degree of
BACHELOR OF ENGINEERINGE
IN
COMPUTER SCIENCE & ENGINEERING
Chandigarh University
“April 2025”
1
BONAFIDE CERTIFICATE
Certified that this project report “GENERATIVE AI FOR THE SIMULATION
OF AUTONOMOUSLY DRIVEN VEHICLES” is the Bonafide work of
“ ANKUR GANGWAR, VIPUL MISHRA” who carried out the project work
under my/our supervision.
SIGNATURE SIGNATURE
HEAD OF THE DEPARTMENT SUPERVISOR
AD Sandeep Kang Parveen Badoni(E13737)
BE-CSE BE-CSE
2
ACKNOWLEDGEMENT
We would like to express our deepest gratitude to all the individuals who have contributed to the
successful completion of this project. First and foremost, we would like to thank our project
supervisor Parveen Badoni (E13737) for providing us with guidance, support, and constructive
feedback throughout the completion of this project name “Generative AI for the Simulation of
Autonomously Driven Vehicles”. Your expertise and knowledge have been invaluable in shaping
this project. We also extend our appreciation to all the team members who have contributed their
time, effort, and ideas to make this project a success. Your dedication and commitment to the project
have been instrumental in achieving the desired results. Furthermore, we would like to thank our
colleagues and friends who have provided us with the necessary resources and support whenever
required. Last but not least, we would like to acknowledge the contribution of our families and
loved ones, who have been our pillars of strength throughout this project. We thank all the faculty
members and other supporting staff for the help they provided to us for the completion of our
project. Ankur Gangwar Vipul Mishra (BE-CSE, 6th semester)
3
TABLE OF CONTENTS
CHAPTER 1. INTRODUCTION ........................................................................ 8
1.1. Client/Need Identification/Identification of relevant Contemporary issue ..................... 8
1.2. Identification of Problem ................................................................................................ 8
1.3. Identification of Tasks .................................................................................................... 8
1.4. Timeline .......................................................................................................................... 9
1.5. Organization of the Report............................................................................................ 10
CHAPTER 2. LITERATURE REVIEW/BACKGROUND STUDY.............11
2.1. Timeline of the reported problem ................................................................................. 11
2.2. Proposed solutions ........................................................................................................ 11
2.3. Bibliometric analysis .................................................................................................... 11
2.4. Review Summary .......................................................................................................... 11
2.5. Problem Definition........................................................................................................ 12
2.6. Goals/Objectives ........................................................................................................... 12
CHAPTER 3. DESIGN FLOW/PROCESS ......................................................13
3.1. Evaluation & Selection of Specifications/Features....................................................... 13
3.2. Design Constraints ........................................................................................................ 13
3.3. Analysis and Feature finalization subject to constraints ............................................... 14
3.4. Design Flow .................................................................................................................. 14
3.5. Design selection ............................................................................................................ 15
3.6. Implementation plan/methodology ............................................................................... 16
3.7. Implementation of solution ............................................ Error! Bookmark not defined.
CHAPTER 4. CONCLUSION AND FUTURE WORK ... Error! Bookmark not
defined.
4
4.1. Conclusion .................................................................................................................... 19
4.2. Future work ................................................................................................................... 20
REFERENCES .......................................................................................................21
5
List of Figures
Figure 1 ………………………………………………………………………………….12
Figure 2 ………………………………………………………………………………….13
Figure 3 …………………………………………………………………………….……16
Figure 4 ………………………………………………………………………………….17
6
ABSTRACT
As the era of 6G networks approaches, the importance of Intelligent Transportation Systems is
increasingly prominent, representing the frontier of technological advancement and serving as a
core component of smart city development. The integration of autonomous driving technologies
with advanced techniques such as deep learning is rapidly progressing, encompassing key areas
such as environmental perception, localization and map construction, path planning and decision-
making, as well as motion control. However, challenges remain in ensuring both the efficiency and
safety of the model learning process. To address these issues, this paper proposes the Generative
AI-Enhanced Autonomous Driving (GAIHAD) framework, which leverages generative AI
techniques to enhance decision-making and execution capabilities in multi-interactive
environments. The GAIHAD framework consists of two primary modules: the Interactive
Enhanced Intelligent Decision-Making and Execution Module, and the Noise-Enhanced Risk
Assessment Module. The former employs a Proximal Policy Optimization based interactive
reinforcement learning approach to mimic human driving behaviors and incorporates a bidirectional
Model Predictive Control system to handle multi-constraint motion planning. The latter introduces
a noise-augmented dual-stream Generative Adversarial Network to capture the randomness
inherent in driving patterns and predict potential collision trajectories. Our experiments, conducted
across various scenarios, demonstrate the superior performance of the GAIHAD framework in
trajectory prediction accuracy. For example, under nighttime conditions, the ADE values of
GAIHAD are 0.03, 0.11, and 0.20 meters for prediction horizons of 3, 5, and 10 seconds,
respectively, outperforming CSM and IMMTP by a large margin.
7
CHAPTER 1.
INTRODUCTION
1.1. Client Identification/Need Identification/Identification of relevant
Contemporary issue
The development and deployment of autonomous vehicles (AVs) is one of the most disruptive
trends in modern transportation, with potential benefits in safety, efficiency, and accessibility.
However, one of the most pressing challenges facing this industry is the lack of sufficient real-
world data for training and validating self-driving algorithms, especially in rare or hazardous
scenarios.
According to a 2023 report by McKinsey & Company, achieving Level 4/5 autonomy may
require vehicles to be tested for over 8 billion kilometres, a feat nearly impossible using physical
road testing alone. Additionally, a National Highway Traffic Safety Administration (NHTSA)
analysis shows that human drivers in the U.S. encounter "edge-case" scenarios — such as sudden
pedestrian crossings or erratic vehicle behaviour — less than 0.01% of the time, yet these are the
very instances most critical to autonomous safety.
Generative AI presents a transformative opportunity in this context by enabling the synthetic
generation of highly realistic and diverse driving scenarios, thereby accelerating the testing and
validation pipeline without the risks and costs associated with physical road testing.
1.2. Identification of Problem
The development and deployment of autonomously driven vehicles is one of the most complex
engineering challenges of the modern era. A critical requirement for this process is the ability to
expose these systems to an extensive variety of driving scenarios to ensure their safety, reliability,
and decision-making accuracy under all possible conditions.
8
However, a significant and persistent challenge exists in the current ecosystem: the inability to
access or generate a sufficiently diverse, scalable, and representative set of driving scenarios
for training, testing, and validation. Real-world data collection is inherently limited by time,
geography, cost, and ethical constraints — especially when it comes to dangerous or rare edge-
case situations such as sudden pedestrian movement, multi-vehicle collisions, unusual weather
events, or unexpected infrastructure failures.
Another dimension of the problem lies in the increasing expectations from regulatory bodies to
demonstrate not only the technical capability of autonomous vehicles but also their safe operation
in scenarios that are statistically rare but potentially catastrophic. Meeting such regulatory
expectations requires rigorous and repeatable testing environments that current methods struggle
to provide at scale.
In addition, as the autonomous vehicle industry evolves, there is an increasing demand for
efficient scenario expansion, continuous performance evaluation, and faster deployment — all of
which are constrained by the current limitations in data availability, scenario diversity, and
simulation realism.
1.3. Timeline
Problem Study & Literature Review
Understand current simulation limitations in AVs.
Review generative AI techniques and related research.
System Design & Development
Define requirements and choose suitable generative models.
Prepare data, train models, and generate driving scenarios.
Testing & Evaluation
Evaluate the quality of generated scenarios.
Test AV systems using synthetic data and analyse performance.
Conclusion & Future Work
Summarize findings, discuss challenges, and suggest improvements.
9
1.4. Organization of the Report
The report will be structured as follows:
Chapter 1: Introduction – Provides background information, identifies the problem,
justifies its relevance, and outlines the objectives and methodology.
Chapter 2: Literature Review – Reviews existing research on time series analysis in
disease prediction and identifies gaps.
Chapter 3: Methodology – Details the data sources, preprocessing techniques, and
modelling approaches.
Chapter 4: Implementation and Results – Presents the development of predictive models
and evaluates their performance.
Chapter 5: Discussion and Policy Implications – Interprets the results and discusses their
practical implications.
Chapter 6: Conclusion and Future Work – Summarizes key findings and suggests future
research directions.
10
CHAPTER 2.
LITERATURE REVIEW/BACKGROUND STUDY
2.1. Timeline of the reported problem
In The problem of inadequate simulation diversity for autonomous vehicles (AVs) emerged
prominently after 2016, when RAND Corporation reported that billions of real-world miles
would be needed to validate AV safety. In 2018, Uber’s AV fatal accident highlighted the
inability to train and test for rare scenarios. Since 2020, research has intensified around synthetic
data and AI-based simulation, as noted in reports by NHTSA, IEEE, and NVIDIA.
2.2. Proposed solutions
Earlier solutions include:
Manual Scenario Programming (e.g., SUMO, OpenDrive)
Data-Driven Simulators (e.g., Waymo, Apollo)
Simulation Environments (e.g., CARLA, LGSVL) Limitations: Lack of scalability,
realism, and scene diversity. Few models could simulate rare, unpredictable events.
Bibliometric analysis
Key Features: Most studies emphasize the need for robust data preprocessing, feature
selection, and the incorporation of external factors like climate and mobility data.
Effectiveness: Machine learning and deep learning models have demonstrated improved
predictive accuracy compared to traditional statistical methods.
Drawbacks: Challenges such as data scarcity, overfitting, interpretability, and ethical
concerns regarding patient data privacy remain significant barriers to widespread adoption.
2.3. Review Summary
Literature shows strong interest in using generative models to overcome real-world data
limitations in AV simulation. However, most studies lack integration with actual testing
pipelines or focus only on visuals without scenario logic. This project aims to fill that gap by
11
using generative AI to create full driving scenarios, not just visuals, for safe AV training and
evaluation.
2.4. Problem Definition
The problem is the lack of scalable, diverse, and realistic simulation scenarios to train and test
autonomous vehicle systems.
What to do: Use generative AI to create synthetic driving environments and edge-case
scenarios.
How to do it: Train models (e.g., GANs) using existing datasets, integrate generated
outputs into a simulation engine.
What not to do: Avoid relying solely on manually coded or pre-recorded scenes, and do
not build an AV system itself.
2.5. Goals/Objectives
To design a generative AI framework that creates realistic, diverse AV simulation
scenarios.
To generate synthetic datasets for different driving conditions (e.g., night, fog, urban
chaos).
To evaluate model performance using realism metrics (e.g., FID) and scenario utility.
To integrate and test the generated scenarios in a simulation tool (e.g., CARLA).
To compare AV model responses in traditional vs. generative environments.
These objectives are specific, measurable, and goal-oriented, providing a concrete path
for project completion.
12
CHAPTER 3.
DESIGN FLOW/PROCESS
3.1. Evaluation & Selection of Specifications/Features
Based on the critical evaluation, the ideally required features for the Generative AI-powered ADV
simulation solution are:
High-fidelity and controllable generation of diverse 3D environments including varying
road networks, infrastructure, dynamic weather, and lighting conditions.
Realistic and diverse behaviour modelling of pedestrian and vehicle agents.
Generation of high-fidelity synthetic sensor data (camera, LiDAR, radar) that closely
mimics real-world sensor outputs.
Real-time simulation capabilities for interactive testing.
Seamless integration with existing ADV software stacks (e.g., ROS, CARLA).
Robust data logging and analysis tools.
Scalable and efficient generation process.
Capability for data augmentation to enhance real-world training data.
3.2. Design Constraints
The design process must comply with the following constraints:
Regulatory: Adherence to safety and simulation standards (e.g., NHTSA, ISO 26262).
Economic: Limited computing budget; avoid excessive GPU use.
Environmental: Efficient use of resources; avoid wasteful data processing.
13
Health & Safety: Scenarios must not promote unsafe driving behaviour.
Ethical: Avoid generating biased or unrealistic scenes.
Social & Political: Ensure inclusivity (diverse settings, vehicle types).
Manufacturability/Scalability: Design should be modular for future extension.
Cost: Use open-source tools (e.g., CARLA, PyTorch, GAN libraries) to minimize
expenditure.
3.3. Analysis and Feature finalization subject to constraints
After evaluating the constraints, the feature list is refined:
Feature Action Reason
Taken
Real-time rendering Retained Critical for dynamic testing
Edge-case scenario Retained High-impact feature for safety testing
generation
Ultra-high-resolution Removed Too computationally expensive
outputs
Domain adaptation Added To reduce realism gap between synthetic and real
techniques
Dataset auto-labelling Added Improves training pipeline for AV models
Table 1 Feature list of techniques
3.4. Design Flow
Two alternative designs for disease prediction are proposed:
Design 1: GAN-based Scene Generator
Use Conditional GANs to generate synthetic scenes
Input: Scene labels (weather, traffic type)
14
Output: Images/scenarios fed to simulation tool (CARLA)
Design 2: Text-to-Scene Generation using Diffusion Models
Use diffusion models (like Stable Diffusion) trained on traffic datasets
Input: Text prompt describing the driving scene
Output: Synthesized visual scenario rendered into simulation
3.5. Design selection
A comparative analysis is conducted based on accuracy, interpretability, scalability, and
computational efficiency:
Feature/Criteria Design Flow A Design Flow B Comparison
Design Flow B has
Focuses on Leverages large-scale
the potential for
procedural generation real-world datasets
higher visual realism,
with rule-based and generative
which is crucial for
systems for adversarial networks
Environment accurate sensor
environment creation, (GANs) for
Realism simulation. However,
potentially leading to potentially higher
Design Flow A offers
less photorealistic but photorealism but
greater control over
highly controllable potentially less
specific environment
environments. explicit control.
parameters.
Employs physics- Uses generative Design Flow B might
based rendering and models trained on better capture the
sensor models, real sensor data, nuances of real-world
Sensor Data Fidelity potentially offering potentially capturing sensor data, which is
accurate but complex sensor noise vital for training
computationally and artifacts more robust perception
intensive sensor data effectively but algorithms. Design
15
generation. requiring extensive Flow A offers a more
and diverse real- deterministic
world data. approach based on
known physical
principles.
Design Flow B, if
Might require more built upon existing
effort to integrate If leveraging popular simulation platforms,
with existing ADV simulation platforms likely offers better
Integration with
software stacks if and standard data integration
Existing Tools
custom formats are formats, integration possibilities. Design
used for environment might be smoother. Flow A might require
and sensor data. the development of
custom interfaces.
Table 2 Comparison table of alternate solutions
3.6. Implementation plan/methodology
Figure 1 MR metaverse-DT-assisted driving simulation
16
Figure 2 Flow diagram of pre-processing
Phase 1: Data Acquisition and Preprocessing: Gathering and preparing real-world
driving and sensor data.
Phase 2: Generative Model Development and Training: Designing, implementing, and
training environment and sensor data generation models.
Phase 3: Simulation Environment Integration: Connecting generative models with a
simulation platform (e.g., CARLA).
Phase 4: Validation and Evaluation: Assessing the realism and fidelity of generated
data and framework performance.
Phase 5: Refinement and Iteration: Continuously improving the framework based on
evaluation.
Tools: Python, TensorFlow/PyTorch, Keras/Transformers, CARLA/LGSVL/AirSim,
NumPy/Pandas/OpenCV, Git, Cloud platforms.
High-Level Flow: Real-world Data -> Data Prep -> Gen. Models -> Simulation Env. -> ADV
Stack -> Simulation Output & ADV Data -> Validation -> Refinement.
17
CHAPTER 4.
RESULTS ANALYSIS AND VALIDATION
1. Analysis
Python (NumPy, Pandas): Used for data preprocessing and statistical analysis of scenario
datasets.
Matplotlib & Seaborn: Employed to visualize training loss, scene diversity, and
evaluation metrics.
Jupyter Notebooks: For iterative development, visualization, and documentation.
2. Design Drawings / Schematics / Models
PyTorch / TensorFlow: Used to design, train, and validate the generative model (GAN
architecture).
Block Diagrams: Created using draw.io and Lucid chart to illustrate the system pipeline
and data flow.
3. Report Preparation
Microsoft Word / LaTeX: For drafting, formatting, and compiling the research report.
Canva / MS PowerPoint: Used for creating project presentation slides and infographics.
4. Project Management & Communication
GitHub: Version control system for collaborative development and repository
management.
Google Meet / Zoom: For team discussions, mentor reviews, and project communication.
5. Testing, Characterization, Interpretation & Data Validation
Fréchet Inception Distance (FID): Applied to evaluate the realism of generated scenes.
Structural Similarity Index (SSIM): Used for comparing synthetic and real images.
CARLA Logs & Metrics: Recorded vehicle responses to validate that the generated
scenarios triggered meaningful AV actions.
18
CHAPTER 5
CONCLUSION AND FUTURE WORK
5.1 CONCLUSION
This project aims to develop a robust and versatile simulation environment for Autonomously
Driven Vehicles (ADVs) by leveraging the power of Generative AI. The core objective is to create
synthetic, yet highly realistic, driving scenarios and corresponding sensor data that can effectively
augment or even partially replace the need for extensive real-world data collection and testing.
Expected Results/Outcomes:
Realistic and Diverse Simulation Environments: The generative models are expected to
produce a wide range of 3D driving environments, encompassing various road layouts,
urban and rural settings, diverse weather and lighting conditions, and realistic static and
dynamic elements (e.g., buildings, vegetation, traffic signs).
High-Fidelity Synthetic Sensor Data: The framework is expected to generate
synchronized multi-modal sensor data (camera images, LiDAR point clouds, radar data)
that closely mimics the characteristics of real-world sensor outputs.
Controllable Scenario Generation: A key expected outcome is the ability to control and
manipulate specific parameters of the generated scenarios (e.g., traffic density, pedestrian
behaviour, specific weather events).
Integration with Existing ADV Software Stacks: The simulation framework is expected
to seamlessly integrate with popular ADV development platforms (e.g., ROS, CARLA).
Improved Efficiency and Scalability of Testing: By generating vast amounts of diverse
synthetic data, the framework is expected to significantly enhance the efficiency and
scalability of ADV testing and validation.
Data Augmentation for Enhanced Algorithm Robustness: The generated synthetic data
is expected to be valuable for augmenting real-world training datasets, leading to more
robust and generalizable ADV perception and control algorithms.
19
5.2 FUTURE WORKS
Building upon the foundation established in this project, several avenues for future work can be
explored to further enhance the Generative AI-powered ADV simulation framework:
Required Modifications in the Solution:
Improved Generative Model Architectures: Investigate and implement more advanced
generative model architectures (e.g., transformer-based models, normalizing flows) that
can potentially achieve higher fidelity, better controllability, and more efficient generation
of both environment and sensor data.
Enhanced Controllability Mechanisms: Develop more intuitive and precise methods for
controlling the parameters of the generated scenarios. This could involve exploring
techniques like disentangled latent space representations or incorporating explicit control
signals during the generation process.
Physics-Informed Generative Models: Integrate physical constraints and principles into
the generative models to ensure greater realism and consistency in the generated
environments and sensor data. This could involve hybrid approaches that combine data-
driven learning with physics-based rendering techniques.
More Realistic Agent Behaviour Modelling: Enhance the models for simulating the
behaviour of other traffic participants (pedestrians, vehicles, cyclists) by incorporating
more complex decision-making processes, social interactions, and realistic error models.
Advanced Sensor Modelling: Develop more sophisticated sensor models that accurately
capture the nuances of real-world sensor characteristics, including sensor noise, calibration
errors, and the impact of various environmental factors.
20
REFERENCES
[1] M. Behrisch, L. Bieker, J. Erdmann, and D. Krajzewicz. Sumo- simulation of urban mobility:
An overview. In SIMUL 2011, The Third International Conference on Advances in System
Simulation, 2011.
[2] Y. Chen, D. J. Kempton, and R. A. Angryk. Examining effects of class imbalance on
conditional gan training. In L. Rutkowski, R. Scherer, M. Ko rytkowski, W. Pedrycz, R.
Tadeusiewicz, and J. M. Zurada, editors, Artificial Intelligence and Soft Computing, pages 475–
486, Cham, 2023. Springer Nature Switzerland
[3] R. Decther. Learning while searching in constraint-satisfcation problems. Uni versity of
California, Computer Science Department, Cognitive Systems Lab oratory., 1986
[4] D. A. Edwards. On the kantorovich–rubinstein theorem. Expositiones Math ematicae,
29(4):387–398, 2011.
[5] C. Marco, C. Casetti, and G. Gagliardi. Vehicular traffic simulation in the city of turin from
raw data. In IEEE Transactions on Mobile Computing 21.12, 2021.
[6] W. Niebel and C. Dalaff. Dlr– institute of transportation systems, traffic management research.
01 2008.
[7] M. Mackay. Injury and collision severity. 12th Stapp Car Conf., pages 207 219, 02 1968.
[8] S. Krauß, P. Wagner, and C. Gawron. Metastable states in a microscopic model of traffic flow.
Physical Review E, 55(5):5597, 1997.
[9] A. U. Kemloh Wagoum, M. Chraibi, J. Zhang, and G. Lämmel. Jupedsim: an open framework
for simulating and analyzing the dynamics of pedestrians. 12 2015.
[10] A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, et al. 2018. Scalable
and accurate deep learning with electronic health records. npj Digit. Med. 1 (2018)
21