Capstone Final
Capstone Final
Capstone Project
Submitted in partial fulfillment of the requirements for the degree of
Master of Technology
in
Software Engineering
by
AKILESH VR
20MIS0335
April, 2025
i
DECLARATION
I hereby declare that the Capstone Project entitled “Real-time Multifocus Object Detection”
submitted by me, for the award of the degree of Master of Technology in Software
Engineering, School of Computer Science Engineering and Information Systems to VIT is a
record of bonafide work carried out by me under the supervision of Prof.Hari Ram
Vishwakarma Senior Professor,SCORE,VIT Vellore.
I further declare that the work reported in this Real-time Multifocus Object Detection has not
been submitted and will not be submitted, either in part or in full, for the award of any other
degree ordiploma in this institute or any other institute or university.
Place: Vellore
Date:03.04.2025
ii
CERTIFICATE
This is to certify that the Capstone Project entitled "Rea\Time Multifocus Object
Detection" submitted by AKILESH V R - 20MIS0335, SCORE, VIT, for the award
of the degree of Master of Technology in Software Engineering, is a record ofbonafide
work carried out by him under my supervision during the period, 13.12.2024 to
17.04.2025, as per the VIT code of academic and research ethics.
The contents of this report have not been submitted and will not be submitted either in
part or in full, for the award of any other degree or diploma in this institute or any other
institute or university. The project fulfills the requirements and regulations ofthe
University and in my opinion meets the necessary standards for submission.
Place: Vellore
Date:
Signat ur~uid e
iii
ACKNOWLEDGEMENT
"I would like to express my heartfelt gratitude to Honorable Chancellor Dr. G Viswanathan;
respected Vice Presidents Mr. Sankar Viswanathan, Dr. Sekar Viswanathan, Vice
Chancellor Dr. V. S. Kanchana Bhaaskaran; Pro-Vice Chancellor Dr. Partha Sharathi
Mallick; and Registrar Dr. Jayabarathi T.
It is indeed a pleasure to thank my parents and friends who persuaded and encouraged me to
take up and complete my capstone project successfully. Last, but not least, I express my
gratitude and appreciation to all those who have helped me directly or indirectly towards the
successful completion of the project.
Place: Vellore
Date: 03.04.2025 Akilesh.VR
iv
Executive Summary
The dissertation titled "Real-Time Multifocus Object Detection" focuses on enhancing the
efficiency and accuracy of object detection systems using real-time video streams. The
research explores the integration of YOLO (You Only Look Once), a state-of-the-art object
detection algorithm, and Roboflow for supervised training to detect and classify multiple
objects from varying focal points in real-time. The objective is to create a robust system
capable of identifying and tracking objects under diverse environmental conditions, including
dynamic and cluttered scenes.
Leveraging literature on object detection techniques using OpenCV (Dewangan et al., 2020)
and YOLO (Mittal et al., 2019; Gupta et al., 2021), the dissertation examines the
effectiveness of these technologies in different applications, such as vehicle detection (Rana
et al., 2022) and facial recognition (Khan et al., 2020). The study also discusses challenges
such as handling occlusion, real-time processing, and limited computational resources, which
are addressed by optimizing the YOLO algorithm with Roboflow’s advanced training tools.
This work contributes to the development of a real-time object detection system that provides
accurate, scalable, and efficient performance for diverse applications in surveillance, traffic
monitoring, and automated systems.
v
CONTENTS Page No.
Acknowledgement i
Executive summary v
Table of Contents ii
List of Figure ix
List of Tables x
Abbreviations xii
1 INTRODUCTION 1
1.1 Objective 1
1.2 Motivation 1
1.3 Background 1
3 TECHNICAL SPECIFICATION 12
vi
5 GANTT CHART .17
6 DISSERTATION DEMONSTRATION 17
9 REFERENCES 29
vii
List of Figures
Hardware Requirements
• Computing Devices
• Cameras
• Networking Equipment
Software Components
• Roboflow
• YOLO Framework
• Operating System
Network Architecture:
viii
Server:
• Receives frames from the edge device for additional
processing when needed.
Decision Layer
Processing Results:
• Detected objects and their locations are sent back to the
edge device or server.
• Triggers actions such as alerts, alarms, or logging.
ix
List of Tables
Reduced Latency:
Multiple available paths can lead to lower end-to-end delays, which is essential when
transmitting video frames or detection results in time-sensitive applications.
Ideal for establishing multiple paths dynamically. It can be used to transmit video data
from cameras to processing nodes while ensuring redundancy.
x
Applications in Object Detection Systems
Surveillance: In security scenarios, cameras can dynamically connect and transmit video
streams to a central server for processing.
Disaster Response: In emergency situations, mobile devices can detect objects (like
survivors or obstacles) and communicate findings through established multipath routes
Traffic Monitoring: Drones equipped with cameras can use multipath routing to send real-
time video feeds for analyzing traffic patterns or detecting accidents.
xi
List of Abbreviations
xii
CHAPTER 1
INTRODUCTION
ABSTRACT
Object detection is a crucial component of modern computer vision systems, with wide-
ranging applications across industries such as autonomous driving, security surveillance, and
urban traffic management. This project presents an approach to object detection utilizing the
YOLO (You Only Look Once) model, a state-of-the-art real-time object detection algorithm,
combined with Roboflow, a powerful platform for image data management, annotation, and
augmentation. The model is implemented
within a Jupyter Notebook environment, providing an interactive setup for development,
experimentation, and evaluation.
YOLO is selected for its high accuracy and speed, achieving real-time detection by using
a single neural network to predict bounding boxes and class probabilities in one forward
pass. YOLO’s efficiency makes it well-suited for applications that require rapid
identification of multiple objects, such as vehicles, bikes, pedestrians, and more in urban
or traffic scenes. The project utilizes Roboflow to create and preprocess custom datasets
tailored to these specific detection classes. Roboflow's tools streamline the process of
image annotation, data augmentation, and dataset management, which is crucial for
enhancing model performance and generalizability, especially when working with
limited or imbalanced datasets.
Data Preparation: Using Roboflow, we curate and annotate datasets, apply augmentations such as
rotations, flips, and brightness adjustments to improve the model's robustness, and export the dataset
in a YOLO-compatible format.
1
Model Training: We configure and train a YOLO model (YOLOv8) on the annotated dataset within
Jupyter Notebook, leveraging GPU support for faster training times.
Evaluation: We evaluate the model's performance using standard metrics such as Mean Average
Precision (mAP), precision, recall, and F1-score, testing the model on real-world video or image
samples.
Deployment: To illustrate potential deployment, we demonstrate how to load the trained model for
real- time detection in video streams or surveillance feeds, enabling object tracking and situational
awareness.
The results indicate that the YOLO model, when fine-tuned on customized, well-annotated
datasets created in Roboflow, achieves high accuracy in detecting multiple classes of objects.
The integration of Roboflow’s data pipeline significantly enhances model robustness by
augmenting training data and providing efficient tools for labeling and exporting.
Additionally, running the setup in a Jupyter Notebook provides flexibility for users to modify
the dataset or model parameters, visualize intermediate results, and iteratively improve
performance.
This approach offers a scalable, cost-effective solution for a variety of object detection tasks,
demonstrating potential for further applications in smart city infrastructure, autonomous
systems, and human-activity monitoring.
2
1.1 objectives
1.2 Motivation
The motivation behind this project stems from the growing need for intelligent surveillance
systems that can operate in real time. With increasing security concerns and the need for
effective monitoring solutions, developing a robust object detection system can enhance
situational awareness and response capabilities in critical environments. This project aims to
leverage the capabilities of YOLO and Roboflow to create a system that not only detects but
also classifies objects with high precision, ultimately contributing to improved safety and
security measures.
1.3 Background
In recent years, the proliferation of digital imagery and video data has led to
significant advancements in computer vision, particularly in the field of object
detection. Object detection involves not only identifying objects within images
or video frames but also localizing them through bounding boxes, which is crucial
for applications in security surveillance, autonomous vehicles, and robotics.
Traditional methods of object detection often struggled with speed and accuracy,
necessitating the development of more advanced techniques.
1
The YOLO (You Only Look Once) algorithm has emerged as a state-of-the-art
approach to real-time object detection. By treating the detection task as a single
regression problem rather than a series of classifications, YOLO significantly
reduces processing time while maintaining competitive accuracy. Coupled with
Roboflow, a platform that streamlines data annotation, augmentation, and model
training, the implementation of YOLO becomes more accessible and efficient,
paving the way for practical applications in various domains.
2
3 Kavitha, D., et al. "Multiple Focused on OpenCV for Limited
Object Recognition Using multiple object object scalability in
OpenCV." REVISTA recognition detection and dynamic
GEINTEC-GESTAO recognition environments
INOVACAO E
TECNOLOGIAS 11.2
(2021): 1736-1747.
4 Mittal, Naman, Akarsh Investigated YOLO, May not work
Vaidya, and Shreya Kapoor. object detection OpenCV well on small or
"Object detection and using YOLO overlapping
classification using Yolo." objects
Int. J. Sci. Res. Eng. Trends
5 (2019): 562-565.
5 Gupta, Akshara, Aditya Applied YOLO YOLO, High
Verma, and A. Yadav. object detection OpenCV computation
"YOLO OBJECT with OpenCV requirement for
DETECTION USING real-time
OPENCV." International processing
Journal of Engineering
Applied Sciences and
Technology 5.10 (2021).
6 Bathija, Akansha, and Combined YOLO, Inefficiency in
Grishma Sharma. "Visual YOLO for SORT, tracking for
object detection and tracking detection and OpenCV high-speed
using yolo and sort." SORT for moving objects
International Journal of tracking
Engineering Research
Technology 8.11 (2019):
345-355.
3
7 Emami, Shervin, and Developed facial OpenCV Susceptibility to
Valentin Petrut Suciu. recognition using Haar cascade lighting changes
"Facial recognition using OpenCV for face and angle
OpenCV." Journal of detection variations
Mobile, Embedded and
Distributed Systems 4.1
(2012): 38-43.
8 Xie, Guobo, and Wen Lu. Focused on Canny edge Sensitivity to
"Image edge detection based image edge detection, noise, requires
on opencv." International detection using OpenCV fine-tuning for
Journal of Electronics and OpenCV different images
Electrical Engineering 1.2
(2013): 104-106.
9 Kumari, Sweta, Leeza Applied neural OpenCV, Limited
Gupta, and Prena Gupta. networks for Neural accuracy in
"Automatic license plate license plate networks complex
recognition using OpenCV recognition environments
and neural network." like night-time
International Journal of or low-
Computer Science Trends resolution plates
and Technology (IJCST) 5.3
(2017): 114-118.
10 Kumar, Ajay, et al. "Face Face detection Haar cascade, Less effective in
Detection and Recognition and recognition OpenCV real-time
using OpenCV." using OpenCV applications and
International Journal of under varying
Computer Applications 975: light conditions
8887.
4
11 Mohaideen Abdul Kadhar, Describes image OpenCV Limitations for
K., and G. Anand. "Image processing using image high-precision
Processing Using OpenCV." OpenCV for processing industrial
Industrial Vision Systems industrial vision techniques applications
with Raspberry Pi: Build and systems
Design Vision products
Using Python and OpenCV.
Berkeley, CA: Apress, 2024.
87-140.
12 Goel, Tushar, K. C. Tripathi, Single line OpenCV, Limited
and M. L. Sharma. "Single license plate Tesseract detection
line license plate detection detection OCR capability for
using OpenCV and complex, multi-
tesseract." International line plates
Research Journal of
Engineering and Technology
(IRJET) 5.07 (2020).
13 Rana, Md Milon, Tajkuruna Vehicle detection OpenCV, May struggle
Akter Tithy, and Md Mehedi and counting in Machine with occlusion
Hasan. "Vehicle Detection video learning or overlapping
And Count In The Captured vehicles
Stream Video Using
OpenCV In Machine
Learning." the Computer
Science & Engineering: An
International Journal
(CSEIJ). Vol. 12. No. 3.
2022.
5
14 Khan, Sikandar, Adeel Real-time OpenCV, Privacy
Akram, and Nighat Usman. attendance Face API concerns,
"Real time automatic system with face potential errors
attendance system for face recognition in crowded
recognition using face API environments
and OpenCV." Wireless
Personal Communications
113.1 (2020): 469-480.
15 Castrillón, Modesto, et al. "A Compared face Viola-Jones Inaccuracy with
comparison of face and detectors in the framework partial faces or
facial feature detectors based Viola-Jones low-quality
on the Viola–Jones general framework images
object detection framework."
Machine Vision and
Applications 22 (2011): 481-
494.
6
17 Liu, Yandong. "Moving Moving object OpenCV, Performance
object detection and distance detection and Computer drops in
calculation based on distance Vision dynamic and
Opencv." Second calculation algorithms cluttered
Guangdong-Hong Kong- environments
Macao Greater Bay Area
Artificial Intelligence and
Big Data Forum (AIBDF
2022). Vol. 12593. SPIE,
2023.
18 Bhardwaj, Sarthak, et al. Designed a cost- OpenCV, May lack high
"Object Detection effective object Low-cost accuracy
Framework Using OpenCV detection sensors compared to
for Low Cost and High framework high-end
Performance." International systems
Conference on Recent
Trends in Computing.
Singapore: Springer Nature
Singapore, 2023.
19 Wang, Wen Jun, and Meng Vehicle detection OpenCV, May not
Gao. "Vehicle detection and and counting in Video perform well in
counting in traffic video traffic video analysis complex traffic
based on OpenCV." Applied scenes with
Mechanics and Materials 361 heavy occlusion
(2013): 2232-2235.
7
20 Arya, Zankruti, and Vibha Automatic face OpenCV, Only effective
Tiwari. "Automatic face recognition and Haar cascade, for frontal faces,
recognition and detection detection recognizer low accuracy for
using OpenCV, haar cascade angles
and recognizer for frontal
face." Int. J. Eng. Res. Appl.
www. ijera. com 10.6
(2020): 13-19.
21 Firgiawan, Gustyanto, YOLO YOLO, Requires high
Nazwa Lintang Seina, and implementation OpenCV computational
Perani Rosyani. for object resources for
"Implementasi Metode You detection real-time
Only Look Once (YOLO) processing
untuk Pendeteksi Objek
dengan Tools OpenCV." AI
dan SPK: Jurnal Artificial
Intelligent dan Sistem
Penunjang Keputusan 2.2
(2024): 137-141.
22 Patel, Prinsi, and Barkha Object detection OpenCV, Performance
Bhavsar. "Object Detection and identification Machine issues with
and Identification." learning ambiguous or
International Journal 10.3 small objects
(2021).
8
23 Syahrudin, Erwin, Ema Improved YOLOv8, Still requires
Utami, and Anggit Dwi YOLOv8 for OpenCV, improvement in
Hartanto. "Enhanced Yolov8 blind-friendly Distance real-time
with OpenCV for Blind- object detection Estimation applications
Friendly Object Detection
and Distance Estimation."
Jurnal RESTI (Rekayasa
Sistem dan Teknologi
Informasi) 8.2 (2024): 199-
207.
24 Pulungan, Ali Basrah, et al. Object detection OpenCV, Limited
"Object detection with a using webcam Python detection range
webcam using the Python and
programming language." environment
Journal of Applied dependency
Engineering and
Technological Science
(JAETS) 2.2 (2021): 103-
111.
25 Kumbhar, P. Y., et al. "Real Real-time face OpenCV, Performance
time face detection and detection and Tracking degradation in
tracking using OpenCV." tracking algorithms crowded or
International journal for dynamic scenes
research in emerging science
and technology 4.4 (2017):
39-43.
9
CHAPTER 2
PROJECT DESCRIPTION AND GOALS
Project Description:
The project focuses on developing a Real-time multifocal object detection system that can
accurately detect, classify, and track multiple objects in various environments using the
YOLO (You Only Look Once) algorithm integrated with Roboflow for supervised training.
The system aims to detect objects at varying focal lengths or perspectives, providing a more
dynamic and adaptable solution for real-time object detection across a wide range of
applications.
The detection system will be powered by YOLO, which is one of the most efficient and fast
object detection algorithms, capable of processing images and videos in real-time with high
accuracy. YOLO’s capabilities will be enhanced using Roboflow, a tool for data annotation
and model training, allowing the system to be trained with diverse and labeled datasets.
These datasets will be used to recognize multiple types of objects in complex environments
such as surveillance, traffic monitoring, and automated systems.
The goal of this project is to provide a scalable, efficient, and accurate object detection
framework that can operate in real-time applications, making it suitable for industries such
as autonomous vehicles, security surveillance, and smart cities.
Goals:
Real-time Object Detection: Develop a system that can detect and track multiple objects
in real-time video streams, ensuring low latency and fast processing times.
10
High Detection Accuracy: Enhance the accuracy of object detection by training the
model using labeled datasets from Roboflow, improving its performance on complex and
cluttered environments.
Scalability and Flexibility: Create a scalable framework that can be extended to detect
and classify more object types and adapt to various use cases, such as face recognition,
vehicle detection, or general surveillance.
Optimized for Real-world Conditions: Ensure that the system works efficiently under
diverse conditions, such as varying lighting, occlusions, and fast-moving objects, making
it suitable for real-world deployment in dynamic environments.
User-Friendly Interface: Design an easy-to-use interface for users to interact with the
object detection system, allowing easy integration into existing applications, from
security systems to traffic monitoring.
Enhanced Performance with YOLO and Roboflow: Utilize YOLO for fast detection
and Roboflow for model training, ensuring that the system can process and analyze large
datasets efficiently, with reduced computational overhead.
11
CHAPTER 3
3.Technical specification
1.HARDWARE REQUIREMENTS
Processor: Intel I5
Ram: At least 8 GB of RAM
GPU (Optional): To run the models locally instead of google collab
2.SOFTWARE REQUIREMENTS
Operating System: Windows 10
Software used: Python - Jupyter
notebook Languages used:
Python
CHAPTER 4
The proposed methodology for implementing object detection using YOLO (You Only Look
Once) in conjunction with Roboflow comprises several systematic steps. The key phases of
the methodology are as follows:
Utilize Roboflow to collect and annotate a diverse dataset of images containing various
object classes. This platform simplifies the process of dataset management, including
importing, annotating, and exporting images in the required formats for YOLO.
12
Data Preprocessing:
Use Roboflow to preprocess images. This may involve resizing images, normalization, and
applying techniques such as image enhancement or augmentation to improve model
robustness. Roboflow offers easy-to-use tools to automate preprocessing tasks.
Select the appropriate YOLO model variant (e.g., YOLOv5 or YOLOv8) based on the
specific requirements of the application, such as accuracy and speed. Configure model
parameters, including input size, number of classes, and hyperparameters for training.
Use the annotated dataset from Roboflow to train the YOLO model. This involves dividing
the dataset into training, validation, and test sets, and executing the training process while
monitoring performance metrics such as mean Average Precision (mAP).
Model Evaluation:
After training, evaluate the model’s performance using the test dataset. Analyze metrics
such as precision, recall, and F1 score to assess accuracy and generalization capabilities.
Real-time Inference:
Implement real-time object detection using the trained YOLO model on new images or
video streams. The YOLO model can be used directly to detect objects in real-time,
providing immediate results without the need for additional preprocessing tools like
OpenCV.
Utilize YOLO’s built-in visualization tools to draw bounding boxes and labels on the
detected objects in the output images or video streams, enhancing interpretability and
usability.
13
4.2 system architecture
The system architecture for the proposed object detection application consists of the
following components:
Input Layer:
Images or video feeds from various sources (e.g., webcam, file uploads) are captured and
passed to the system.
This module handles image preprocessing tasks using Roboflow, which includes resizing,
normalization, and data augmentation techniques.
The YOLO model is trained on the annotated dataset sourced from Roboflow. This module
handles loading the dataset, configuring training parameters, and executing the training
process.
This module utilizes the trained YOLO model to perform inference on new images or video
streams, detecting objects in real-time.
Post-processing Module:
14
YOLO is employed in this module to visualize the results by drawing bounding boxes and
labels around detected objects.
Output Layer:
The final output, which is a visual representation of detected objects, is displayed in the
Jupyter Notebook, providing immediate feedback on the detection results.
4.3Module Descriptions
Description: Integrates with Roboflow to facilitate the collection and annotation of images.
Provides an interface to import datasets and manage annotations.
Key Features: Easy-to-use interface for annotation, support for various data formats, and
integration with YOLO.
Description: Uses Roboflow to preprocess images, ensuring they meet the input
requirements of the YOLO model.
Key Functions:
Description: Configures and trains the YOLO model using the annotated dataset.
Key Functions:
15
• Model Configuration: Set parameters like learning rate, batch size, and number of
epochs.
• Training Execution: Run the training loop and log performance metrics.
Description: Performs object detection using the trained YOLO model on new inputs.
Key Functions:
Post-processing Module:
Description: Enhances the output visualization of detected objects using YOLO's built-in
tools.
Key Functions:
Key Features: Immediate feedback on detection performance and visual results, facilitating
easy interpretation and further analysis.
16
CHAPTER 5
Gantt chart
CHAPTER 6
6.1DATA SET
connect GPU
17
Install yolo
import ultralytics
18
Install Roboflow supervision
19
6.3 SAMPLE OUTPUT
20
6.4 TEST PLAN & DATA VERIFICATION
The goal of this test plan is to verify the correctness, performance, and functionality of
the object detection system developed using YOLO (through Roboflow for data
preparation and model training) in a Jupyter Notebook environment. The system will be
evaluated on the ability to accurately detect and classify objects from images or videos,
validate data accuracy, and ensure that the model performs as expected.
2. Test Objectives: Ensure the YOLO model trained using Roboflow detects objects
accurately and efficiently.
3. Scope:
4. Testing Approach:
5. Resources:
21
b. YOLO model implementation in Python (using libraries like torch, opencv, or
tensorflow).
6. Test Deliverables:
7. Entry Criteria:
8. Exit Criteria:
Data Verification
Data verification ensures that the dataset used to train and test the YOLO model is accurate,
clean, and consistent. Since Roboflow is being used for dataset preparation, the verification
process should also ensure that the dataset annotations are correct and the data is properly
formatted.
The dataset used for training the YOLO model needs to be verified for correctness and
completeness.
22
Verification Steps:
• Check for Missing Images or Annotations: Verify that every image has
corresponding annotations (bounding boxes, labels) in the Roboflow dataset.
• Annotation Accuracy: Validate that the bounding boxes and labels are correctly
aligned with the objects in the images. This can be done by visual inspection or
automated checks (e.g., comparing the coordinates of bounding boxes).
• Data Completeness: Ensure that the dataset contains a diverse set of images covering
different scenarios and object orientations.
• Data Augmentation: If data augmentation was applied (e.g., flipping, rotation),
verify that these augmentations were correctly applied and do not distort the labels.
YOLO requires specific input formats for training. It is important to verify that the images
and annotations are correctly formatted.
• Image Format: Ensure that images are in a supported format (e.g., JPG, PNG).
• Annotation Format: YOLO typically uses text files for annotations, where each line
represents a bounding box: [class_id, x_center, y_center, width, height] (normalized
values). Verify that annotation files follow this format for all images.
• Consistency: Check that the class IDs match the correct object categories and that
there are no mismatched or incorrect class labels.
A validation set (a portion of the dataset that was not used for training) is essential to verify
that the model generalizes well to unseen data.
Test Cases:
• Test Case 1: Validation Set Performance
Input: Validation dataset (images not used in training).
Expected Output: The model should achieve a reasonable accuracy on the validation set (e.g.,
80- 90%).
Pass Criteria: Detection accuracy must meet or exceed a predefined threshold on the
23
validation set.
Chapter 7
Results and Discussion
Found Gap: A lot of research mostly uses controlled datasets to assess YOLO, including
works by Mittal et al. (2019) and Gupta et al. (2021). Research on YOLO's performance in
uncertain real-world settings with changing variables (such as lighting and weather) is
scarce.
Suggestion: To evaluate YOLO's resilience and flexibility, future research should
concentrate on implementing it in various real-world situations.
3. Metrics of Performance:
Found Gap: Although some studies (such Mishra et al., 2022) include speed and accuracy
measurements, there aren't many thorough analyses that
contrast YOLO's performance with that of other cutting-edge object identification algorithms
on a variety of benchmarks and datasets.
Suggestion: To gain a better understanding of YOLO's advantages and disadvantages,
comparison research involving metrics such as precision, recall, and F1-score across different
datasets and situations should be carried out.
24
4. Efficiency and Scalability:
Gap identified: Little is known about YOLO's scalability for large-scale applications. While
focusing on real-time processing, Gupta et al. (2021) don't discuss how YOLO manages
higher image resolutions or many object classes in high-density situations.
Suggestion: To ascertain YOLO's effectiveness and processing capacity, studies should
assess its scalability in high-density settings, such as crowded cities or sizable gatherings.
5. User-Centered Research:
Found Gap: Research like Kumari et al. (2017) and Rana et al. (2022) have
shown that there is little investigation of YOLO's use in specialized fields, such as healthcare
for medical imaging or agriculture for crop monitoring.shown that there is little investigation
of YOLO's use in specialized fields, such as healthcare for medical imaging or agriculture for
crop monitoring.It is suggested that future studies look into how YOLO might be modified
for particular uses, tackling particular problems and offering customized answers.
Found Gap: Dynamically detecting and tracking moving objects is a capability that is
25
frequently discussed but not fully assessed. For example, although tracking is covered by
Bathija and Sharma (2019), nothing is known about how well YOLO works in complicated,
dynamic contexts.
Suggestion: To improve YOLO's use in domains like autonomous driving, research should
concentrate on enhancing its performance in dynamic environments, such as crowded scenes
or rapidly moving items.
1. Mean Average Precision (mAP): The division of the total number of true positives
by the total number of true positives + false positives yields the precision of a model.
The mean of the average precision for each class is used to compute the mAP metric,
which considers both the model's recall and precision. A higher mAP number denotes
a better model performance.
2. Average Precision (AP): This represents the model's accuracy across a range of
recall thresholds. The precision can be expressed as follows: precision = true positives
/ true positives + false positives. The recall can be expressed as follows: recall is the
ratio of true positives to the sum of true positives and false negatives. AP is computed
as the precision- recall curve's area under the curve. A higher AP number denotes a
better model performance.
3. Intersection over Union (IoU): This metric quantifies how much the ground-truth
bounding box and the anticipated bounding box overlap. The computation involves
dividing the size of the two boxes' union by the intersection areas of the two boxes.
An improved match between the anticipated and ground-truth bounding boxes is
shown by a higher IoU value.
26
Chapter 8
Summary and Future Scope
Summary
The integration of Roboflow with YOLO marks a significant advancement in the field
of object detection technology. This collaboration enhances YOLO's ability to detect a
diverse range of object classes with high accuracy, as highlighted by Kavitha et al.
(2021), who demonstrate that Roboflow simplifies the training process through efficient
dataset preparation and annotation features. Firgiawan et al. (2024) further validate the
robustness of YOLO across various detection tasks, noting that Roboflow facilitates
easier experimentation with multiple data augmentation techniques. Additionally, Gupta
et al. (2021) provide compelling real-world examples of YOLO's application in
autonomous systems, showcasing its real-time capabilities, which are essential in
scenarios that demand rapid decision-making. Moreover, the potential of YOLOv8 in
enhancing accessibility technologies for visually impaired users, as illustrated by
Syahrudin et al. (2024), underscores the technology's versatility.
Despite these advancements, challenges persist, particularly regarding the impact of
environmental variations on detection accuracy in complex scenes. As noted by Khan et
al. (2020), real-time applications like automated attendance systems face difficulties due
to fluctuating lighting conditions and occlusions. The literature reviewed demonstrates a
wide array of applications, from accessibility solutions to real-time tracking, indicating
that while significant progress has been made, the path ahead is fraught with obstacles.
Future work:
Future studies should concentrate on enhancing the resilience of YOLO algorithms to address
thechallenges posed by environmental variations. Research can explore methods to improve
detection accuracy under varying conditions, such as poor lighting and occlusions.
Additionally, Kumari et al. (2017) suggest that integrating YOLO with supplementary
machine learning techniques could yield significant performance improvements in
specialized applications, such as license plate recognition. Utilizing Roboflow’s robust
annotation tools in these contexts may provide valuable insights and
27
enhancements.Moreover, further investigation into novel strategies that leverage the
combined strengths of Roboflow and YOLO could broaden the application scope and
efficacy of these technologies. By continuing to evolve these tools and methodologies,
researchers can pave the way for more reliable and effective object detection solutions that
address current limitations and meet the demands of future applications.
CHAPTER 9
REFERENCES
[1] Dewangan, Rajeshwar Kumar, and Yamini Chouhan. "A Review on Object Detection
using Open CV Method." 2020 International Research Journal Engineering and Technology
(IRJET) (2020).
[2] Mishra, Shubham, et al. "An intelligent motion detection using OpenCV." International
Journal of Scientific Research in Science, Engineering, and Technology
9.2 (2022): 51-63.
[3] Kavitha, D., et al. "Multiple Object Recognition Using OpenCV." REVISTA GEINTEC-
GESTAO INOVACAO E TECNOLOGIAS 11.2 (2021): 1736-1747.
[4] Mittal, Naman, Akarsh Vaidya, and Shreya Kapoor. "Object detection and classification
using Yolo." Int. J. Sci. Res. Eng. Trends 5 (2019): 562- 565.
[5] Gupta, Akshara, Aditya Verma, and A. Yadav. "YOLO OBJECT DETECTION USING
OPENCV." International Journal of Engineering Applied Sciences and Technology 5.10
(2021).
[6] Bathija, Akansha, and Grishma Sharma. "Visual object detection and tracking using yolo
and sort." International Journal of Engineering Research Technology 8.11 (2019): 345-355.
28
[7] Emami, Shervin, and Valentin Petrut Suciu. "Facial recognition using OpenCV." Journal
of Mobile, Embedded and Distributed Systems 4.1 (2012): 38- 43.
[8] Xie, Guobo, and Wen Lu. "Image edge detection based on opencv." International Journal
of Electronics and Electrical Engineering 1.2 (2013): 104-106.
[9] Kumari, Sweta, Leeza Gupta, and Prena Gupta. "Automatic license plate recognition
using OpenCV and neural network." International Journal of Computer Science Trends and
Technology (IJCST) 5.3 (2017): 114-118.
[10] Kumar, Ajay, et al. "Face Detection and Recognition using OpenCV." International
Journal of Computer Applications 975: 8887.
[11] Mohaideen Abdul Kadhar, K., and G. Anand. "Image Processing Using OpenCV."
Industrial Vision Systems with Raspberry Pi: Build and Design Vision products Using
Python and OpenCV. Berkeley, CA: Apress, 2024. 87-140.
[12] Goel, Tushar, K. C. Tripathi, and M. L. Sharma. "Single line license plate detection
using OpenCV and tesseract." International Research Journal of Engineering and Technology
(IRJET) 5.07 (2020).
[13] Rana, Md Milon, Tajkuruna Akter Tithy, and Md Mehedi Hasan. "Vehicle Detection
And Count In The Captured Stream Video Using OpenCV In Machine Learning." the
Computer Science & Engineering: An International Journal (CSEIJ). Vol. 12. No. 3. 2022.
[14] Khan, Sikandar, Adeel Akram, and Nighat Usman. "Real time automatic attendance
system for face recognition using face API and OpenCV." Wireless Personal
Communications 113.1 (2020): 469-480.
[15] Castrillón, Modesto, et al. "A comparison of face and facial feature detectors based on
the Viola–Jones general object detection framework." Machine Vision and Applications 22
(2011): 481-494.
29
[16] Tuohy, Shane, et al. "Distance determination for an automobile environment using
inverse perspective mapping in OpenCV." (2010): 100-105.
[17] Liu, Yandong. "Moving object detection and distance calculation based on Opencv."
Second Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big
Data Forum (AIBDF 2022). Vol. 12593. SPIE, 2023.
[18] Bhardwaj, Sarthak, et al. "Object Detection Framework Using OpenCV for Low Cost
and High Performance." International Conference on Recent Trends in Computing.
Singapore: Springer Nature Singapore, 2023.
[19] Wang, Wen Jun, and Meng Gao. "Vehicle detection and counting in traffic video based
on OpenCV." Applied Mechanics and Materials 361 (2013): 2232-2235.
[20] Arya, Zankruti, and Vibha Tiwari. "Automatic face recognition and detection using
OpenCV, haar cascade and recognizer for frontal face." Int. J. Eng. Res. Appl. www. ijera.
com 10.6 (2020): 13-19.
[21] Firgiawan, Gustyanto, Nazwa Lintang Seina, and Perani Rosyani. "Implementasi
Metode You Only Look Once (YOLO) untuk Pendeteksi Objek dengan Tools OpenCV." AI
dan SPK: Jurnal Artificial Intelligent dan Sistem Penunjang Keputusan
2.2 (2024): 137-141.
[22] Patel, Prinsi, and Barkha Bhavsar. "Object Detection and Identification." International
Journal 10.3 (2021).
[23] Syahrudin, Erwin, Ema Utami, and Anggit Dwi Hartanto. "Enhanced Yolov8 with
OpenCV for Blind-Friendly Object Detection and Distance Estimation." Jurnal RESTI
(Rekayasa Sistem dan Teknologi Informasi) 8.2 (2024): 199-207.
[24] Pulungan, Ali Basrah, et al. "Object detection with a webcam using the Python
30
programming language." Journal of Applied Engineering and Technological Science
(JAETS) 2.2 (2021): 103-111.
[25] Kumbhar, P. Y., et al. "Real time face detection and tracking using OpenCV."
International journal for research in emerging science and technology 4.4 (2017): 39- 43.
31