0% found this document useful (0 votes)
19 views45 pages

Capstone Final

The document presents a capstone project titled 'Real-time Multifocus Object Detection' by Akilesh VR, aimed at enhancing object detection systems using real-time video streams through the integration of YOLO and Roboflow. The project focuses on developing a robust system capable of identifying and tracking multiple objects under various conditions, addressing challenges such as occlusion and limited computational resources. The research contributes to applications in surveillance, traffic monitoring, and automated systems, demonstrating high accuracy and efficiency in real-time object detection.

Uploaded by

Akil Chowdary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views45 pages

Capstone Final

The document presents a capstone project titled 'Real-time Multifocus Object Detection' by Akilesh VR, aimed at enhancing object detection systems using real-time video streams through the integration of YOLO and Roboflow. The project focuses on developing a robust system capable of identifying and tracking multiple objects under various conditions, addressing challenges such as occlusion and limited computational resources. The research contributes to applications in surveillance, traffic monitoring, and automated systems, demonstrating high accuracy and efficiency in real-time object detection.

Uploaded by

Akil Chowdary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Real-time Multifocus Object Detection

Capstone Project
Submitted in partial fulfillment of the requirements for the degree of

Master of Technology

in

Software Engineering

by

AKILESH VR

20MIS0335

Under the guidance of


Prof. Hari Ram Vishwakarma

School of Computer Science Engineering and Information Systems


VIT, Vellore

April, 2025

i
DECLARATION

I hereby declare that the Capstone Project entitled “Real-time Multifocus Object Detection”
submitted by me, for the award of the degree of Master of Technology in Software
Engineering, School of Computer Science Engineering and Information Systems to VIT is a
record of bonafide work carried out by me under the supervision of Prof.Hari Ram
Vishwakarma Senior Professor,SCORE,VIT Vellore.
I further declare that the work reported in this Real-time Multifocus Object Detection has not
been submitted and will not be submitted, either in part or in full, for the award of any other
degree ordiploma in this institute or any other institute or university.

Place: Vellore

Date:03.04.2025

Signature of the Candidate

ii
CERTIFICATE

This is to certify that the Capstone Project entitled "Rea\Time Multifocus Object
Detection" submitted by AKILESH V R - 20MIS0335, SCORE, VIT, for the award
of the degree of Master of Technology in Software Engineering, is a record ofbonafide
work carried out by him under my supervision during the period, 13.12.2024 to
17.04.2025, as per the VIT code of academic and research ethics.

The contents of this report have not been submitted and will not be submitted either in
part or in full, for the award of any other degree or diploma in this institute or any other
institute or university. The project fulfills the requirements and regulations ofthe
University and in my opinion meets the necessary standards for submission.

Place: Vellore
Date:

Signat ur~uid e

Internal Examine r External Examine r

Head of the Departme nt


Departme nt of Software and Systems Engineeri ng

iii
ACKNOWLEDGEMENT

It is my pleasure to express with a deep sense of gratitude to my Capstone Project guide


Prof.Hari Ram Vishwakarma,Senior Professor School of Computer Science Engineering and
Information Systems, Vellore Institute of Technology, Vellore for his constant guidance,
continual encouragement, in my endeavor. My association with him is not confined to
academics only, but it is a great opportunity on my part to work with an intellectual and an
expert in the field of Machine learning.

"I would like to express my heartfelt gratitude to Honorable Chancellor Dr. G Viswanathan;
respected Vice Presidents Mr. Sankar Viswanathan, Dr. Sekar Viswanathan, Vice
Chancellor Dr. V. S. Kanchana Bhaaskaran; Pro-Vice Chancellor Dr. Partha Sharathi
Mallick; and Registrar Dr. Jayabarathi T.

My whole-hearted thanks to Dean Dr. Daphne Lopez, School of Computer Science


Engineering and Information Systems, Head, Department of Software and Systems
Engineering Dr. Neelu Khare, M.Tech Project Coordinator Dr. C. Navaneethan and
Dr Malathy E, SCORE School Project Coordinator Dr. Thandeeswaran R, all faculty, staff
and members working as limbs of our university for their continuous guidance throughout my
course of study in unlimited ways.

It is indeed a pleasure to thank my parents and friends who persuaded and encouraged me to
take up and complete my capstone project successfully. Last, but not least, I express my
gratitude and appreciation to all those who have helped me directly or indirectly towards the
successful completion of the project.

Place: Vellore
Date: 03.04.2025 Akilesh.VR

iv
Executive Summary

The dissertation titled "Real-Time Multifocus Object Detection" focuses on enhancing the
efficiency and accuracy of object detection systems using real-time video streams. The
research explores the integration of YOLO (You Only Look Once), a state-of-the-art object
detection algorithm, and Roboflow for supervised training to detect and classify multiple
objects from varying focal points in real-time. The objective is to create a robust system
capable of identifying and tracking objects under diverse environmental conditions, including
dynamic and cluttered scenes.

Leveraging literature on object detection techniques using OpenCV (Dewangan et al., 2020)
and YOLO (Mittal et al., 2019; Gupta et al., 2021), the dissertation examines the
effectiveness of these technologies in different applications, such as vehicle detection (Rana
et al., 2022) and facial recognition (Khan et al., 2020). The study also discusses challenges
such as handling occlusion, real-time processing, and limited computational resources, which
are addressed by optimizing the YOLO algorithm with Roboflow’s advanced training tools.

This work contributes to the development of a real-time object detection system that provides
accurate, scalable, and efficient performance for diverse applications in surveillance, traffic
monitoring, and automated systems.

v
CONTENTS Page No.

Acknowledgement i

Executive summary v

Table of Contents ii

List of Figure ix

List of Tables x

Abbreviations xii

1 INTRODUCTION 1

1.1 Objective 1

1.2 Motivation 1

1.3 Background 1

1.4 Literature survey 2

2 PROJECT DESCRIPTION AND GOALS 10

3 TECHNICAL SPECIFICATION 12

4 DESIGN APPROACH AND DETAILS (as applicable) .12

4.1 Proposed methedology .12

4.2 System Architecture .14

4.3 Module descriptions .15

vi
5 GANTT CHART .17

6 DISSERTATION DEMONSTRATION 17

6.1 Sample Codes 17


6.2 Sample Screen Shots 18.

7 .RESULTS AND DISCUSSION 24

8 SUMMARY AND FUTURE SCOPE 27

9 REFERENCES 29

vii
List of Figures

Figure No. Title Page No.


2.1 INFRA viii
STRUCTURE
BASED
NETWORK

2.1 INFRA STRUCTURE BASED NETWORK

Hardware Requirements
• Computing Devices
• Cameras
• Networking Equipment
Software Components
• Roboflow
• YOLO Framework
• Operating System

Network Architecture:

Data Capture Layer


Cameras: Capture images or video feeds.
Connection: Send data to the edge device or directly to the server over the network.
Processing Layer
Edge Device:
Processes incoming frames in real-time using the YOLO model.
Performs initial inference to reduce latency and bandwidth usage.

viii
Server:
• Receives frames from the edge device for additional
processing when needed.

Model Management Layer


Roboflow:
• Upload images, annotate them, and train the YOLO
model.

• Export the trained model in a suitable format (e.g.,


TensorFlow or PyTorch) for deployment.
Model Hosting:

• The trained model can be hosted on either the edge


device or the server.

Decision Layer
Processing Results:
• Detected objects and their locations are sent back to the
edge device or server.
• Triggers actions such as alerts, alarms, or logging.

User Interface Layer


Application:
• Web-based or mobile application to monitor real-time
detections and analytics.
• UI interacts with the server to display detection results,
statistics, and logs.

ix
List of Tables

Table No. Title Page No.

2.1 MULTIPATH ROUTING PROTOCOLS IN AMNETS x

2.1 MULTIPATH ROUTING PROTOCOLS IN AMNETS


In the context of object detection systems utilizing YOLO (You Only Look Once) and
Roboflow, multipath routing protocols can significantly enhance data transmission
efficiency and reliability among mobile nodes (such as drones or cameras) in an Ad Hoc
network (AMNET). These protocols allow multiple paths to be established between nodes,
ensuring robust communication even in dynamic environments.

Key Features of Multipath Routing Protocols


Enhanced Reliability:
By establishing multiple communication paths, the system can continue functioning even
if one or more paths fail. This is particularly important for real-time object detection
where data integrity is crucial
.
Load Balancing:
Distributes data packets across multiple paths, minimizing congestion and ensuring that
no single route becomes a bottleneck, thereby maintaining optimal performance for real-
time processing.

Reduced Latency:
Multiple available paths can lead to lower end-to-end delays, which is essential when
transmitting video frames or detection results in time-sensitive applications.

Types of Multipath Routing Protocols Suitable for Object Detection

AOMDV (Ad Hoc On-Demand Multipath Distance Vector):

Ideal for establishing multiple paths dynamically. It can be used to transmit video data
from cameras to processing nodes while ensuring redundancy.

MP-AODV (Multipath AODV):


Extends AODV by enabling the discovery of multiple paths. This protocol can ensure
that detection results are relayed efficiently, even in scenarios with high node mobility.

MRA (Multipath Routing Algorithm):


Can be tailored to optimize routing based on metrics relevant to object detection
applications, such as minimizing energy consumption of mobile devices.

x
Applications in Object Detection Systems
Surveillance: In security scenarios, cameras can dynamically connect and transmit video
streams to a central server for processing.

Disaster Response: In emergency situations, mobile devices can detect objects (like
survivors or obstacles) and communicate findings through established multipath routes

Traffic Monitoring: Drones equipped with cameras can use multipath routing to send real-
time video feeds for analyzing traffic patterns or detecting accidents.

xi
List of Abbreviations

YOLO You Only Look Once

MAP Mean Average Precision

GPU Graphics Processing Unit

CNN Convolutional Neural Networks

R-CNN Region-based Convolutional Neural Networks

xii
CHAPTER 1

INTRODUCTION

ABSTRACT

Object detection is a crucial component of modern computer vision systems, with wide-
ranging applications across industries such as autonomous driving, security surveillance, and
urban traffic management. This project presents an approach to object detection utilizing the
YOLO (You Only Look Once) model, a state-of-the-art real-time object detection algorithm,
combined with Roboflow, a powerful platform for image data management, annotation, and
augmentation. The model is implemented
within a Jupyter Notebook environment, providing an interactive setup for development,
experimentation, and evaluation.

YOLO is selected for its high accuracy and speed, achieving real-time detection by using
a single neural network to predict bounding boxes and class probabilities in one forward
pass. YOLO’s efficiency makes it well-suited for applications that require rapid
identification of multiple objects, such as vehicles, bikes, pedestrians, and more in urban
or traffic scenes. The project utilizes Roboflow to create and preprocess custom datasets
tailored to these specific detection classes. Roboflow's tools streamline the process of
image annotation, data augmentation, and dataset management, which is crucial for
enhancing model performance and generalizability, especially when working with
limited or imbalanced datasets.

In this study, we detail the end-to-end process of setting up a YOLO-based object


detection system. The workflow involves:

Data Preparation: Using Roboflow, we curate and annotate datasets, apply augmentations such as
rotations, flips, and brightness adjustments to improve the model's robustness, and export the dataset
in a YOLO-compatible format.

1
Model Training: We configure and train a YOLO model (YOLOv8) on the annotated dataset within
Jupyter Notebook, leveraging GPU support for faster training times.
Evaluation: We evaluate the model's performance using standard metrics such as Mean Average
Precision (mAP), precision, recall, and F1-score, testing the model on real-world video or image
samples.
Deployment: To illustrate potential deployment, we demonstrate how to load the trained model for
real- time detection in video streams or surveillance feeds, enabling object tracking and situational
awareness.

The results indicate that the YOLO model, when fine-tuned on customized, well-annotated
datasets created in Roboflow, achieves high accuracy in detecting multiple classes of objects.
The integration of Roboflow’s data pipeline significantly enhances model robustness by
augmenting training data and providing efficient tools for labeling and exporting.
Additionally, running the setup in a Jupyter Notebook provides flexibility for users to modify
the dataset or model parameters, visualize intermediate results, and iteratively improve
performance.

This approach offers a scalable, cost-effective solution for a variety of object detection tasks,
demonstrating potential for further applications in smart city infrastructure, autonomous
systems, and human-activity monitoring.

2
1.1 objectives

The primary objectives of this project include:


To develop a user-friendly interface that allows real-time monitoring of object
detection results.
To train the YOLO model using a diverse dataset prepared with Roboflow for
high accuracy in object detection.
To evaluate the system’s performance based on metrics such as accuracy, precision,
and speed of detection.
To deploy the model on edge devices for real-time inference, facilitating practical
applications in surveillance and monitoring.

1.2 Motivation

The motivation behind this project stems from the growing need for intelligent surveillance
systems that can operate in real time. With increasing security concerns and the need for
effective monitoring solutions, developing a robust object detection system can enhance
situational awareness and response capabilities in critical environments. This project aims to
leverage the capabilities of YOLO and Roboflow to create a system that not only detects but
also classifies objects with high precision, ultimately contributing to improved safety and
security measures.

1.3 Background

In recent years, the proliferation of digital imagery and video data has led to
significant advancements in computer vision, particularly in the field of object
detection. Object detection involves not only identifying objects within images
or video frames but also localizing them through bounding boxes, which is crucial
for applications in security surveillance, autonomous vehicles, and robotics.
Traditional methods of object detection often struggled with speed and accuracy,
necessitating the development of more advanced techniques.

1
The YOLO (You Only Look Once) algorithm has emerged as a state-of-the-art
approach to real-time object detection. By treating the detection task as a single
regression problem rather than a series of classifications, YOLO significantly
reduces processing time while maintaining competitive accuracy. Coupled with
Roboflow, a platform that streamlines data annotation, augmentation, and model
training, the implementation of YOLO becomes more accessible and efficient,
paving the way for practical applications in various domains.

1.4 Literature Survey

S.no Author(s), Title and Work Done Techniques, Limitations or


Conference/Journal Methods, Drawbacks
Approaches
1 Dewangan, Rajeshwar A review on OpenCV Lack of focus
Kumar, and Yamini object detection object on the latest
Chouhan. "A Review on techniques using detection developments,
Object Detection using OpenCV methods overview format
OpenCV Method." 2020
International Research
Journal Engineering and
Technology (IRJET) (2020).
2 Mishra, Shubham, et al. "An Proposed an OpenCV May not handle
intelligent motion detection intelligent motion complex
using OpenCV." motion detection detection environments
International Journal of system with multiple
Scientific Research in moving objects
Science, Engineering, and well
Technology 9.2 (2022): 51-
63.

2
3 Kavitha, D., et al. "Multiple Focused on OpenCV for Limited
Object Recognition Using multiple object object scalability in
OpenCV." REVISTA recognition detection and dynamic
GEINTEC-GESTAO recognition environments
INOVACAO E
TECNOLOGIAS 11.2
(2021): 1736-1747.
4 Mittal, Naman, Akarsh Investigated YOLO, May not work
Vaidya, and Shreya Kapoor. object detection OpenCV well on small or
"Object detection and using YOLO overlapping
classification using Yolo." objects
Int. J. Sci. Res. Eng. Trends
5 (2019): 562-565.
5 Gupta, Akshara, Aditya Applied YOLO YOLO, High
Verma, and A. Yadav. object detection OpenCV computation
"YOLO OBJECT with OpenCV requirement for
DETECTION USING real-time
OPENCV." International processing
Journal of Engineering
Applied Sciences and
Technology 5.10 (2021).
6 Bathija, Akansha, and Combined YOLO, Inefficiency in
Grishma Sharma. "Visual YOLO for SORT, tracking for
object detection and tracking detection and OpenCV high-speed
using yolo and sort." SORT for moving objects
International Journal of tracking
Engineering Research
Technology 8.11 (2019):
345-355.

3
7 Emami, Shervin, and Developed facial OpenCV Susceptibility to
Valentin Petrut Suciu. recognition using Haar cascade lighting changes
"Facial recognition using OpenCV for face and angle
OpenCV." Journal of detection variations
Mobile, Embedded and
Distributed Systems 4.1
(2012): 38-43.
8 Xie, Guobo, and Wen Lu. Focused on Canny edge Sensitivity to
"Image edge detection based image edge detection, noise, requires
on opencv." International detection using OpenCV fine-tuning for
Journal of Electronics and OpenCV different images
Electrical Engineering 1.2
(2013): 104-106.
9 Kumari, Sweta, Leeza Applied neural OpenCV, Limited
Gupta, and Prena Gupta. networks for Neural accuracy in
"Automatic license plate license plate networks complex
recognition using OpenCV recognition environments
and neural network." like night-time
International Journal of or low-
Computer Science Trends resolution plates
and Technology (IJCST) 5.3
(2017): 114-118.
10 Kumar, Ajay, et al. "Face Face detection Haar cascade, Less effective in
Detection and Recognition and recognition OpenCV real-time
using OpenCV." using OpenCV applications and
International Journal of under varying
Computer Applications 975: light conditions
8887.

4
11 Mohaideen Abdul Kadhar, Describes image OpenCV Limitations for
K., and G. Anand. "Image processing using image high-precision
Processing Using OpenCV." OpenCV for processing industrial
Industrial Vision Systems industrial vision techniques applications
with Raspberry Pi: Build and systems
Design Vision products
Using Python and OpenCV.
Berkeley, CA: Apress, 2024.
87-140.
12 Goel, Tushar, K. C. Tripathi, Single line OpenCV, Limited
and M. L. Sharma. "Single license plate Tesseract detection
line license plate detection detection OCR capability for
using OpenCV and complex, multi-
tesseract." International line plates
Research Journal of
Engineering and Technology
(IRJET) 5.07 (2020).
13 Rana, Md Milon, Tajkuruna Vehicle detection OpenCV, May struggle
Akter Tithy, and Md Mehedi and counting in Machine with occlusion
Hasan. "Vehicle Detection video learning or overlapping
And Count In The Captured vehicles
Stream Video Using
OpenCV In Machine
Learning." the Computer
Science & Engineering: An
International Journal
(CSEIJ). Vol. 12. No. 3.
2022.

5
14 Khan, Sikandar, Adeel Real-time OpenCV, Privacy
Akram, and Nighat Usman. attendance Face API concerns,
"Real time automatic system with face potential errors
attendance system for face recognition in crowded
recognition using face API environments
and OpenCV." Wireless
Personal Communications
113.1 (2020): 469-480.
15 Castrillón, Modesto, et al. "A Compared face Viola-Jones Inaccuracy with
comparison of face and detectors in the framework partial faces or
facial feature detectors based Viola-Jones low-quality
on the Viola–Jones general framework images
object detection framework."
Machine Vision and
Applications 22 (2011): 481-
494.

16 Tuohy, Shane, et al. Distance Inverse Limited to


"Distance determination for calculation for perspective specific
an automobile environment automobile mapping, conditions like
using inverse perspective environment OpenCV camera angle
mapping in OpenCV." and road surface
(2010): 100-105.

6
17 Liu, Yandong. "Moving Moving object OpenCV, Performance
object detection and distance detection and Computer drops in
calculation based on distance Vision dynamic and
Opencv." Second calculation algorithms cluttered
Guangdong-Hong Kong- environments
Macao Greater Bay Area
Artificial Intelligence and
Big Data Forum (AIBDF
2022). Vol. 12593. SPIE,
2023.
18 Bhardwaj, Sarthak, et al. Designed a cost- OpenCV, May lack high
"Object Detection effective object Low-cost accuracy
Framework Using OpenCV detection sensors compared to
for Low Cost and High framework high-end
Performance." International systems
Conference on Recent
Trends in Computing.
Singapore: Springer Nature
Singapore, 2023.
19 Wang, Wen Jun, and Meng Vehicle detection OpenCV, May not
Gao. "Vehicle detection and and counting in Video perform well in
counting in traffic video traffic video analysis complex traffic
based on OpenCV." Applied scenes with
Mechanics and Materials 361 heavy occlusion
(2013): 2232-2235.

7
20 Arya, Zankruti, and Vibha Automatic face OpenCV, Only effective
Tiwari. "Automatic face recognition and Haar cascade, for frontal faces,
recognition and detection detection recognizer low accuracy for
using OpenCV, haar cascade angles
and recognizer for frontal
face." Int. J. Eng. Res. Appl.
www. ijera. com 10.6
(2020): 13-19.
21 Firgiawan, Gustyanto, YOLO YOLO, Requires high
Nazwa Lintang Seina, and implementation OpenCV computational
Perani Rosyani. for object resources for
"Implementasi Metode You detection real-time
Only Look Once (YOLO) processing
untuk Pendeteksi Objek
dengan Tools OpenCV." AI
dan SPK: Jurnal Artificial
Intelligent dan Sistem
Penunjang Keputusan 2.2
(2024): 137-141.
22 Patel, Prinsi, and Barkha Object detection OpenCV, Performance
Bhavsar. "Object Detection and identification Machine issues with
and Identification." learning ambiguous or
International Journal 10.3 small objects
(2021).

8
23 Syahrudin, Erwin, Ema Improved YOLOv8, Still requires
Utami, and Anggit Dwi YOLOv8 for OpenCV, improvement in
Hartanto. "Enhanced Yolov8 blind-friendly Distance real-time
with OpenCV for Blind- object detection Estimation applications
Friendly Object Detection
and Distance Estimation."
Jurnal RESTI (Rekayasa
Sistem dan Teknologi
Informasi) 8.2 (2024): 199-
207.
24 Pulungan, Ali Basrah, et al. Object detection OpenCV, Limited
"Object detection with a using webcam Python detection range
webcam using the Python and
programming language." environment
Journal of Applied dependency
Engineering and
Technological Science
(JAETS) 2.2 (2021): 103-
111.
25 Kumbhar, P. Y., et al. "Real Real-time face OpenCV, Performance
time face detection and detection and Tracking degradation in
tracking using OpenCV." tracking algorithms crowded or
International journal for dynamic scenes
research in emerging science
and technology 4.4 (2017):
39-43.

9
CHAPTER 2
PROJECT DESCRIPTION AND GOALS

Project Description and Goals

Project Description:

The project focuses on developing a Real-time multifocal object detection system that can
accurately detect, classify, and track multiple objects in various environments using the
YOLO (You Only Look Once) algorithm integrated with Roboflow for supervised training.
The system aims to detect objects at varying focal lengths or perspectives, providing a more
dynamic and adaptable solution for real-time object detection across a wide range of
applications.

The detection system will be powered by YOLO, which is one of the most efficient and fast
object detection algorithms, capable of processing images and videos in real-time with high
accuracy. YOLO’s capabilities will be enhanced using Roboflow, a tool for data annotation
and model training, allowing the system to be trained with diverse and labeled datasets.
These datasets will be used to recognize multiple types of objects in complex environments
such as surveillance, traffic monitoring, and automated systems.

The goal of this project is to provide a scalable, efficient, and accurate object detection
framework that can operate in real-time applications, making it suitable for industries such
as autonomous vehicles, security surveillance, and smart cities.

Goals:

Real-time Object Detection: Develop a system that can detect and track multiple objects
in real-time video streams, ensuring low latency and fast processing times.

Multifocal Detection: Implement a method to handle detection at various focal lengths,


enabling the system to work effectively under different camera perspectives and
distances.

10
High Detection Accuracy: Enhance the accuracy of object detection by training the
model using labeled datasets from Roboflow, improving its performance on complex and
cluttered environments.

Scalability and Flexibility: Create a scalable framework that can be extended to detect
and classify more object types and adapt to various use cases, such as face recognition,
vehicle detection, or general surveillance.

Optimized for Real-world Conditions: Ensure that the system works efficiently under
diverse conditions, such as varying lighting, occlusions, and fast-moving objects, making
it suitable for real-world deployment in dynamic environments.

User-Friendly Interface: Design an easy-to-use interface for users to interact with the
object detection system, allowing easy integration into existing applications, from
security systems to traffic monitoring.

Enhanced Performance with YOLO and Roboflow: Utilize YOLO for fast detection
and Roboflow for model training, ensuring that the system can process and analyze large
datasets efficiently, with reduced computational overhead.

Address Limitations of Existing Systems: Overcome challenges like occlusion,


overlapping objects, and real-time processing delays that limit the effectiveness of
traditional object detection systems.

11
CHAPTER 3

3.Technical specification

1.HARDWARE REQUIREMENTS
Processor: Intel I5
Ram: At least 8 GB of RAM
GPU (Optional): To run the models locally instead of google collab

2.SOFTWARE REQUIREMENTS
Operating System: Windows 10
Software used: Python - Jupyter
notebook Languages used:
Python

CHAPTER 4

ANALYSIS & DESIGN

4.1 Proposed Methodology

The proposed methodology for implementing object detection using YOLO (You Only Look
Once) in conjunction with Roboflow comprises several systematic steps. The key phases of
the methodology are as follows:

Dataset Collection and Annotation:

Utilize Roboflow to collect and annotate a diverse dataset of images containing various
object classes. This platform simplifies the process of dataset management, including
importing, annotating, and exporting images in the required formats for YOLO.

12
Data Preprocessing:

Use Roboflow to preprocess images. This may involve resizing images, normalization, and
applying techniques such as image enhancement or augmentation to improve model
robustness. Roboflow offers easy-to-use tools to automate preprocessing tasks.

Model Selection and Configuration:

Select the appropriate YOLO model variant (e.g., YOLOv5 or YOLOv8) based on the
specific requirements of the application, such as accuracy and speed. Configure model
parameters, including input size, number of classes, and hyperparameters for training.

Training the YOLO Model:

Use the annotated dataset from Roboflow to train the YOLO model. This involves dividing
the dataset into training, validation, and test sets, and executing the training process while
monitoring performance metrics such as mean Average Precision (mAP).

Model Evaluation:

After training, evaluate the model’s performance using the test dataset. Analyze metrics
such as precision, recall, and F1 score to assess accuracy and generalization capabilities.

Real-time Inference:

Implement real-time object detection using the trained YOLO model on new images or
video streams. The YOLO model can be used directly to detect objects in real-time,
providing immediate results without the need for additional preprocessing tools like
OpenCV.

Post-processing and Visualization:

Utilize YOLO’s built-in visualization tools to draw bounding boxes and labels on the
detected objects in the output images or video streams, enhancing interpretability and
usability.

13
4.2 system architecture

The system architecture for the proposed object detection application consists of the
following components:

Input Layer:

Images or video feeds from various sources (e.g., webcam, file uploads) are captured and
passed to the system.

Data Preprocessing Module:

This module handles image preprocessing tasks using Roboflow, which includes resizing,
normalization, and data augmentation techniques.

Model Training Module:

The YOLO model is trained on the annotated dataset sourced from Roboflow. This module
handles loading the dataset, configuring training parameters, and executing the training
process.

Model Inference Module:

This module utilizes the trained YOLO model to perform inference on new images or video
streams, detecting objects in real-time.

Post-processing Module:

14
YOLO is employed in this module to visualize the results by drawing bounding boxes and
labels around detected objects.

Output Layer:

The final output, which is a visual representation of detected objects, is displayed in the
Jupyter Notebook, providing immediate feedback on the detection results.

4.3Module Descriptions

Dataset Collection and Annotation Module:

Description: Integrates with Roboflow to facilitate the collection and annotation of images.
Provides an interface to import datasets and manage annotations.

Key Features: Easy-to-use interface for annotation, support for various data formats, and
integration with YOLO.

Data Preprocessing Module:

Description: Uses Roboflow to preprocess images, ensuring they meet the input
requirements of the YOLO model.

Key Functions:

• Image Resizing: Adjust images to the input dimensions required by YOLO.


• Normalization: Scale pixel values to a suitable range.
• Data Augmentation: Apply transformations (flips, rotations) to increase dataset
diversity.

Model Training Module:

Description: Configures and trains the YOLO model using the annotated dataset.

Key Functions:

15
• Model Configuration: Set parameters like learning rate, batch size, and number of
epochs.
• Training Execution: Run the training loop and log performance metrics.

Model Inference Module:

Description: Performs object detection using the trained YOLO model on new inputs.

Key Functions:

• Load Model Weights: Import the trained model for inference.


• Run Inference: Detect objects in images or video frames and return results.

Post-processing Module:

Description: Enhances the output visualization of detected objects using YOLO's built-in
tools.

Key Functions:

• Bounding Box Drawing: Overlay boxes on detected objects.


• Labeling: Annotate objects with class names and confidence scores.

Output Visualization Module:

Description: Displays the results in the Jupyter Notebook environment.

Key Features: Immediate feedback on detection performance and visual results, facilitating
easy interpretation and further analysis.

16
CHAPTER 5

Gantt chart

CHAPTER 6

SYSTEM IMPLEMENTATION AND TESTING

6.1DATA SET

Custom Data set derived from movies

6.2 SAMPLE CODE

connect GPU

Upload data set

17
Install yolo

import ultralytics

Install byte track

18
Install Roboflow supervision

Import yolo from ultralytics

Predicting the data set

19
6.3 SAMPLE OUTPUT

20
6.4 TEST PLAN & DATA VERIFICATION

The goal of this test plan is to verify the correctness, performance, and functionality of
the object detection system developed using YOLO (through Roboflow for data
preparation and model training) in a Jupyter Notebook environment. The system will be
evaluated on the ability to accurately detect and classify objects from images or videos,
validate data accuracy, and ensure that the model performs as expected.

Test Plan Structure:

1. Test Plan ID: OD-YOLO-001

2. Test Objectives: Ensure the YOLO model trained using Roboflow detects objects
accurately and efficiently.

3. Scope:

a. YOLO model detection capabilities.

b. Integration of Roboflow for dataset creation and annotation.

c. Evaluation of object detection on new, unseen images.

d. Verification of data accuracy and integrity.

4. Testing Approach:

a. Functional testing to evaluate detection accuracy.

b. Performance testing to assess the speed of inference.

c. Regression testing to ensure model robustness.

5. Resources:

a. Jupyter Notebook environment.

21
b. YOLO model implementation in Python (using libraries like torch, opencv, or
tensorflow).

c. Roboflow API for dataset management.

d. Sample images/videos for testing.

6. Test Deliverables:

Test case results, logs, defect reports, and performance analysis.

7. Entry Criteria:

a. YOLO model is trained and exported.

b. Dataset prepared and annotated via Roboflow.

8. Exit Criteria:

a. Model accuracy reaches the required threshold.

b. No critical defects are present in object detection.

Data Verification

Data verification ensures that the dataset used to train and test the YOLO model is accurate,
clean, and consistent. Since Roboflow is being used for dataset preparation, the verification
process should also ensure that the dataset annotations are correct and the data is properly
formatted.

1. Dataset Integrity Checks

The dataset used for training the YOLO model needs to be verified for correctness and
completeness.

22
Verification Steps:
• Check for Missing Images or Annotations: Verify that every image has
corresponding annotations (bounding boxes, labels) in the Roboflow dataset.
• Annotation Accuracy: Validate that the bounding boxes and labels are correctly
aligned with the objects in the images. This can be done by visual inspection or
automated checks (e.g., comparing the coordinates of bounding boxes).
• Data Completeness: Ensure that the dataset contains a diverse set of images covering
different scenarios and object orientations.
• Data Augmentation: If data augmentation was applied (e.g., flipping, rotation),
verify that these augmentations were correctly applied and do not distort the labels.

2. Image and Annotation Format Validation

YOLO requires specific input formats for training. It is important to verify that the images
and annotations are correctly formatted.
• Image Format: Ensure that images are in a supported format (e.g., JPG, PNG).
• Annotation Format: YOLO typically uses text files for annotations, where each line
represents a bounding box: [class_id, x_center, y_center, width, height] (normalized
values). Verify that annotation files follow this format for all images.
• Consistency: Check that the class IDs match the correct object categories and that
there are no mismatched or incorrect class labels.

3. Model Validation Using a Validation Set

A validation set (a portion of the dataset that was not used for training) is essential to verify
that the model generalizes well to unseen data.
Test Cases:
• Test Case 1: Validation Set Performance
Input: Validation dataset (images not used in training).
Expected Output: The model should achieve a reasonable accuracy on the validation set (e.g.,
80- 90%).
Pass Criteria: Detection accuracy must meet or exceed a predefined threshold on the

23
validation set.

Chapter 7
Results and Discussion

1. Testing Applications in the Real World:

Found Gap: A lot of research mostly uses controlled datasets to assess YOLO, including
works by Mittal et al. (2019) and Gupta et al. (2021). Research on YOLO's performance in
uncertain real-world settings with changing variables (such as lighting and weather) is
scarce.
Suggestion: To evaluate YOLO's resilience and flexibility, future research should
concentrate on implementing it in various real-world situations.

2..Combining Emerging Technologies:


Found Gap: The integration of YOLO with cutting-edge technology like drone surveillance
or Internet of Things devices is not thoroughly explored in the literature currently in
publication, such as the study by Bathija and Sharma (2019). The majority of
implementations concentrate on conventional surveillance situations.
Suggestion: Studies may look into how YOLO might be modified for real- time use in smart
city settings or drone technologies.

3. Metrics of Performance:
Found Gap: Although some studies (such Mishra et al., 2022) include speed and accuracy
measurements, there aren't many thorough analyses that
contrast YOLO's performance with that of other cutting-edge object identification algorithms
on a variety of benchmarks and datasets.
Suggestion: To gain a better understanding of YOLO's advantages and disadvantages,
comparison research involving metrics such as precision, recall, and F1-score across different
datasets and situations should be carried out.

24
4. Efficiency and Scalability:
Gap identified: Little is known about YOLO's scalability for large-scale applications. While
focusing on real-time processing, Gupta et al. (2021) don't discuss how YOLO manages
higher image resolutions or many object classes in high-density situations.
Suggestion: To ascertain YOLO's effectiveness and processing capacity, studies should
assess its scalability in high-density settings, such as crowded cities or sizable gatherings.

5. User-Centered Research:

Found Gap: In real-world implementations of object detection systems, the literature


frequently concentrates on algorithm performance rather than user
experience or interface design, as evidenced by studies by Syahrudin et al. (2024) and
Firgiawan et al. (2024).
Suggestion: Including user-centric assessments, such as usability tests and user satisfaction
questionnaires, may offer important information about how YOLO- based systems should be
implemented in practice.

6. Applications in Specialized Domains:

Found Gap: Research like Kumari et al. (2017) and Rana et al. (2022) have
shown that there is little investigation of YOLO's use in specialized fields, such as healthcare
for medical imaging or agriculture for crop monitoring.shown that there is little investigation
of YOLO's use in specialized fields, such as healthcare for medical imaging or agriculture for
crop monitoring.It is suggested that future studies look into how YOLO might be modified
for particular uses, tackling particular problems and offering customized answers.

7. Adaptive Object Recognition:

Found Gap: Dynamically detecting and tracking moving objects is a capability that is

25
frequently discussed but not fully assessed. For example, although tracking is covered by
Bathija and Sharma (2019), nothing is known about how well YOLO works in complicated,
dynamic contexts.
Suggestion: To improve YOLO's use in domains like autonomous driving, research should
concentrate on enhancing its performance in dynamic environments, such as crowded scenes
or rapidly moving items.

RESULT ANALYIS & EVALUATION METRICS:

1. Mean Average Precision (mAP): The division of the total number of true positives
by the total number of true positives + false positives yields the precision of a model.
The mean of the average precision for each class is used to compute the mAP metric,
which considers both the model's recall and precision. A higher mAP number denotes
a better model performance.
2. Average Precision (AP): This represents the model's accuracy across a range of
recall thresholds. The precision can be expressed as follows: precision = true positives
/ true positives + false positives. The recall can be expressed as follows: recall is the
ratio of true positives to the sum of true positives and false negatives. AP is computed
as the precision- recall curve's area under the curve. A higher AP number denotes a
better model performance.
3. Intersection over Union (IoU): This metric quantifies how much the ground-truth
bounding box and the anticipated bounding box overlap. The computation involves
dividing the size of the two boxes' union by the intersection areas of the two boxes.
An improved match between the anticipated and ground-truth bounding boxes is
shown by a higher IoU value.

26
Chapter 8
Summary and Future Scope
Summary

The integration of Roboflow with YOLO marks a significant advancement in the field
of object detection technology. This collaboration enhances YOLO's ability to detect a
diverse range of object classes with high accuracy, as highlighted by Kavitha et al.
(2021), who demonstrate that Roboflow simplifies the training process through efficient
dataset preparation and annotation features. Firgiawan et al. (2024) further validate the
robustness of YOLO across various detection tasks, noting that Roboflow facilitates
easier experimentation with multiple data augmentation techniques. Additionally, Gupta
et al. (2021) provide compelling real-world examples of YOLO's application in
autonomous systems, showcasing its real-time capabilities, which are essential in
scenarios that demand rapid decision-making. Moreover, the potential of YOLOv8 in
enhancing accessibility technologies for visually impaired users, as illustrated by
Syahrudin et al. (2024), underscores the technology's versatility.
Despite these advancements, challenges persist, particularly regarding the impact of
environmental variations on detection accuracy in complex scenes. As noted by Khan et
al. (2020), real-time applications like automated attendance systems face difficulties due
to fluctuating lighting conditions and occlusions. The literature reviewed demonstrates a
wide array of applications, from accessibility solutions to real-time tracking, indicating
that while significant progress has been made, the path ahead is fraught with obstacles.

Future work:
Future studies should concentrate on enhancing the resilience of YOLO algorithms to address
thechallenges posed by environmental variations. Research can explore methods to improve
detection accuracy under varying conditions, such as poor lighting and occlusions.
Additionally, Kumari et al. (2017) suggest that integrating YOLO with supplementary
machine learning techniques could yield significant performance improvements in
specialized applications, such as license plate recognition. Utilizing Roboflow’s robust
annotation tools in these contexts may provide valuable insights and

27
enhancements.Moreover, further investigation into novel strategies that leverage the
combined strengths of Roboflow and YOLO could broaden the application scope and
efficacy of these technologies. By continuing to evolve these tools and methodologies,
researchers can pave the way for more reliable and effective object detection solutions that
address current limitations and meet the demands of future applications.

CHAPTER 9
REFERENCES

[1] Dewangan, Rajeshwar Kumar, and Yamini Chouhan. "A Review on Object Detection
using Open CV Method." 2020 International Research Journal Engineering and Technology
(IRJET) (2020).

[2] Mishra, Shubham, et al. "An intelligent motion detection using OpenCV." International
Journal of Scientific Research in Science, Engineering, and Technology
9.2 (2022): 51-63.

[3] Kavitha, D., et al. "Multiple Object Recognition Using OpenCV." REVISTA GEINTEC-
GESTAO INOVACAO E TECNOLOGIAS 11.2 (2021): 1736-1747.

[4] Mittal, Naman, Akarsh Vaidya, and Shreya Kapoor. "Object detection and classification
using Yolo." Int. J. Sci. Res. Eng. Trends 5 (2019): 562- 565.

[5] Gupta, Akshara, Aditya Verma, and A. Yadav. "YOLO OBJECT DETECTION USING
OPENCV." International Journal of Engineering Applied Sciences and Technology 5.10
(2021).

[6] Bathija, Akansha, and Grishma Sharma. "Visual object detection and tracking using yolo
and sort." International Journal of Engineering Research Technology 8.11 (2019): 345-355.

28
[7] Emami, Shervin, and Valentin Petrut Suciu. "Facial recognition using OpenCV." Journal
of Mobile, Embedded and Distributed Systems 4.1 (2012): 38- 43.

[8] Xie, Guobo, and Wen Lu. "Image edge detection based on opencv." International Journal
of Electronics and Electrical Engineering 1.2 (2013): 104-106.

[9] Kumari, Sweta, Leeza Gupta, and Prena Gupta. "Automatic license plate recognition
using OpenCV and neural network." International Journal of Computer Science Trends and
Technology (IJCST) 5.3 (2017): 114-118.

[10] Kumar, Ajay, et al. "Face Detection and Recognition using OpenCV." International
Journal of Computer Applications 975: 8887.

[11] Mohaideen Abdul Kadhar, K., and G. Anand. "Image Processing Using OpenCV."
Industrial Vision Systems with Raspberry Pi: Build and Design Vision products Using
Python and OpenCV. Berkeley, CA: Apress, 2024. 87-140.

[12] Goel, Tushar, K. C. Tripathi, and M. L. Sharma. "Single line license plate detection
using OpenCV and tesseract." International Research Journal of Engineering and Technology
(IRJET) 5.07 (2020).

[13] Rana, Md Milon, Tajkuruna Akter Tithy, and Md Mehedi Hasan. "Vehicle Detection
And Count In The Captured Stream Video Using OpenCV In Machine Learning." the
Computer Science & Engineering: An International Journal (CSEIJ). Vol. 12. No. 3. 2022.

[14] Khan, Sikandar, Adeel Akram, and Nighat Usman. "Real time automatic attendance
system for face recognition using face API and OpenCV." Wireless Personal
Communications 113.1 (2020): 469-480.

[15] Castrillón, Modesto, et al. "A comparison of face and facial feature detectors based on
the Viola–Jones general object detection framework." Machine Vision and Applications 22
(2011): 481-494.

29
[16] Tuohy, Shane, et al. "Distance determination for an automobile environment using
inverse perspective mapping in OpenCV." (2010): 100-105.

[17] Liu, Yandong. "Moving object detection and distance calculation based on Opencv."
Second Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big
Data Forum (AIBDF 2022). Vol. 12593. SPIE, 2023.

[18] Bhardwaj, Sarthak, et al. "Object Detection Framework Using OpenCV for Low Cost
and High Performance." International Conference on Recent Trends in Computing.
Singapore: Springer Nature Singapore, 2023.

[19] Wang, Wen Jun, and Meng Gao. "Vehicle detection and counting in traffic video based
on OpenCV." Applied Mechanics and Materials 361 (2013): 2232-2235.

[20] Arya, Zankruti, and Vibha Tiwari. "Automatic face recognition and detection using
OpenCV, haar cascade and recognizer for frontal face." Int. J. Eng. Res. Appl. www. ijera.
com 10.6 (2020): 13-19.

[21] Firgiawan, Gustyanto, Nazwa Lintang Seina, and Perani Rosyani. "Implementasi
Metode You Only Look Once (YOLO) untuk Pendeteksi Objek dengan Tools OpenCV." AI
dan SPK: Jurnal Artificial Intelligent dan Sistem Penunjang Keputusan
2.2 (2024): 137-141.

[22] Patel, Prinsi, and Barkha Bhavsar. "Object Detection and Identification." International
Journal 10.3 (2021).

[23] Syahrudin, Erwin, Ema Utami, and Anggit Dwi Hartanto. "Enhanced Yolov8 with
OpenCV for Blind-Friendly Object Detection and Distance Estimation." Jurnal RESTI
(Rekayasa Sistem dan Teknologi Informasi) 8.2 (2024): 199-207.

[24] Pulungan, Ali Basrah, et al. "Object detection with a webcam using the Python

30
programming language." Journal of Applied Engineering and Technological Science
(JAETS) 2.2 (2021): 103-111.

[25] Kumbhar, P. Y., et al. "Real time face detection and tracking using OpenCV."
International journal for research in emerging science and technology 4.4 (2017): 39- 43.

31

You might also like