Review 5

The document outlines a project aimed at developing a real-time object recognition and audio narration system to assist visually impaired individuals by detecting objects and converting them into speech. Utilizing the YOLOv4 model for object detection and Google Text-to-Speech for audio output, the system enhances user independence and interaction with their surroundings. Future enhancements include upgrading AI models, optimizing processing, and expanding language support.

Uploaded by

Pravina Ak444

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views17 pages

Review 5

Uploaded by

Pravina Ak444

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

REAL-TIME OBJECT RECOGNITION AND AUDIO NARRATION

SYSTEM FOR ASSISTING VISUALLY IMPAIRED INDIVIDUALS
5th review

Presented by:
Guided by
Harshinee (21UAI034)
Ilakkia Ramanan
Megha Shinoj (21UAI055)
Assistant Professor
Pravina.A (21UAI064)
Department of Artificial Intelligence and Data Science
Swettha.E (21UAI096)
PROJECT OBJECTIVE

Objective
:To develop a system that assists visually impaired individuals in obtaining
information about objects in their surroundings by detecting objects through
image captures and converting the detected objects into speech output and to
provide detailed image scene descriptions, enhancing their ability to
understand and interact with their surroundings
Role:
To create a model that uses a camera module to capture images,identifies
objects using the YOLOv4 model, and converts the recognized objects into
audio signals using Google Text-to-Speech (gTTs),thereby enabling visually
imapaired persons to independently understand their environment.
DOMAIN
Domain of the Project: ComputerVision
Domain Explanation: Computer Vision is a
field of artificial intelligence (AI) that enables
computers and systems to interpret and
understand the visual world. By using digital
images from cameras, videos, and deep
learning models, machines can accurately
identify and classify objects, and then react
to what they "see."
DOMAIN EXPLANATION

Computer Vision Object Detection Speech Synthesis

Computer Vision is a field of Object Detection Object Speech synthesis is the

artificial intelligence that detection is a crucial aspect of artificial production of human
allows machines to interpret computer vision, where speech. It plays a crucial role in
and understand the visual algorithms are trained to converting text into spoken
world. This is achieved by identify and locate specific language, making technology
enabling computers to objects within images or more accessible to users
accurately identify and classify videos. This technology finds across various domains. Here’s
objects, analyze images and applications in diverse fields how speech synthesis fits into
videos, and extract meaningful like autonomous vehicles, the broader technological
information from visual data medical imaging, and landscape
surveillance systems.
ARCHITECTURE
Camera Feed (Image Input): Captures the image
data.

Image Normalization: Preprocesses the image

for further analysis.

Parallel Processing Paths: Object Detection:

Identifies objects in the image and converts
them into text.

OCR Text Extraction: Extracts text from the

image using Optical Character Recognition
(OCR) and processes it.
5. Feature Extraction:
Uses CNN (Convolutional Neural
Networks) and LSTM (Long Short-Term
Memory) networks to extract key image
features, aiding in description generation.

6. Post Processing:
Refines extracted text and generated
descriptions.

7. Speech Conversion & Audio Output:

Converts processed text into speech for
audio output.
PERFORMANCE EVALUATION
1. Experimental Setup & Parameters

The system was tested using a laptop camera for real-time object
detection and scene description.
Model Used: YOLOv4 for object detection, LSTM for scene
description
Text-to-Speech (TTS): Google Text-to-Speech (gTTS) for audio
conversion
Image Preprocessing: OpenCV for normalization, resizing, and
grayscale conversion
Hardware: Standard laptop with CPU processing (no additional
hardware)
2. Results & Observations

Object Detection Accuracy: Achieved high

accuracy in detecting objects from various
environments.
Processing Speed: Real-time detection with
an average latency of ~0.5s per frame.
Scene Description Quality: Meaningful and
coherent descriptions were generated using
CNN + LSTM.
Audio Output: Clear and accurate narration
of detected objects and scene details.
3.Output Screens & Simulated Graphs

Detection Output: Bounding boxes and labels over

detected objects.
Scene Description Output: Generated text
summarizing the environment.
Audio Output: Successfully converted text into
speech.
Performance Graphs: FPS rate, object detection
accuracy, and latency variations over multiple test
cases.
ADVANTAGES

1. Real-Time Object Detection & 3.Independence for Visually Impaired

Description Users
Provides instant feedback, ensuring Reduces reliance on physical
quick identification of objects. assistance.
Enables seamless interaction with the Helps users navigate and understand
environment. surroundings effortlessly.

2.High Accuracy & Efficiency 4. User-Friendly & Automated

Uses YOLOv4 for robust and precise Simple interface with hands-free
object detection. operation.
CNN + LSTM ensures meaningful scene Automatic detection and narration
descriptions. without manual input.
5. Scalability & Future Enhancements 7.Independence for Visually Impaired
Can be expanded with multi-language Users
support, navigation assistance, and Reduces reliance on physical
voice commands. assistance.
Adaptable for indoor and outdoor Helps users navigate and understand
environments. surroundings effortlessly.

6. Cost-Effective & Open-Source 8. User-Friendly & Automated

Uses open-source tools (YOLO, Simple interface with hands-free
OpenCV, gTTS) reducing overall cost. operation.
Works on standard hardware without Automatic detection and narration
additional devices. without manual input.
Conclusion
The proposed system successfully enables real-time object detection
and scene description, providing instant auditory feedback for
visually impaired users. By leveraging YOLOv4 for object detection
and LSTM for scene description, the system ensures accurate and
meaningful narration of surroundings. This enhances independence
and mobility, reducing the need for external assistance.
Performance Comparison
The proposed system demonstrates significant improvements over existing
solutions:
Accuracy Improvement: Proposed system achieves higher object detection
accuracy (~90%) compared to traditional methods (~75%).
Processing Speed: Real-time processing with ~0.5s latency, significantly faster
than older models (~1.5s).
User Experience: Automated and hands-free, offering a more seamless
interaction than existing assistive technologies
Justification & Final Thoughts
The system outperforms existing assistive solutions in terms of efficiency, accuracy,
and user experience. Its ability to provide instant feedback and meaningful scene
descriptions makes it a powerful tool for visually impaired individuals. Future
enhancements, such as multi-language support, cloud-based improvements, and
optimized deep learning models, can further expand its impact.
Future Work
Enhancing System Capabilities
To keep up with recent advancements and improve system performance,
future developments may include:
Upgrading AI Models: Exploring YOLOv8 and transformer-based models for
enhanced object detection accuracy.
Optimized Processing: Reducing latency through more efficient model
compression and parallel processing techniques.
Improved Scene Understanding: Enhancing context recognition by
integrating advanced NLP techniques for more descriptive outputs.
Aligning with Recent Trends
Multi-Modal AI: Combining vision, text, and speech models to
improve interaction quality.
Cloud Integration: Implementing real-time cloud synchronization
to enhance processing efficiency.
Expanded Language Support: Developing multilingual capabilities
to cater to diverse users.
Adaptive Learning: Training models on diverse datasets to
improve robustness in different environments.
THANK YOU!

4th Review
No ratings yet
4th Review
20 pages
Third Review
No ratings yet
Third Review
19 pages
Research Paper
No ratings yet
Research Paper
7 pages
Final Invision
No ratings yet
Final Invision
6 pages
Final Invision
No ratings yet
Final Invision
6 pages
Batch 13
No ratings yet
Batch 13
2 pages
Batch 13
No ratings yet
Batch 13
9 pages
2021ucs1678 PPT
No ratings yet
2021ucs1678 PPT
15 pages
SYnopsis
No ratings yet
SYnopsis
5 pages
Project Documentation
No ratings yet
Project Documentation
1 page
Smart Sight - EndSEMpptx
No ratings yet
Smart Sight - EndSEMpptx
15 pages
EyeMate Final
No ratings yet
EyeMate Final
11 pages
Banner Template PDF
No ratings yet
Banner Template PDF
1 page
Blind Assistance: Real-Time Object Detection
No ratings yet
Blind Assistance: Real-Time Object Detection
16 pages
VisioNR Docu Stage-I
No ratings yet
VisioNR Docu Stage-I
19 pages
Eye Mate
No ratings yet
Eye Mate
11 pages
Keywords
No ratings yet
Keywords
4 pages
Final
No ratings yet
Final
13 pages
Project
No ratings yet
Project
15 pages
Auralppt
No ratings yet
Auralppt
9 pages
Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
No ratings yet
Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
4 pages
Iot Report File-1
No ratings yet
Iot Report File-1
18 pages
Wa0001.
No ratings yet
Wa0001.
14 pages
Image Captioning For Assisting The Visually Impaired
No ratings yet
Image Captioning For Assisting The Visually Impaired
10 pages
Phase 2 Report
No ratings yet
Phase 2 Report
79 pages
Review 2
No ratings yet
Review 2
30 pages
Problem Statement 3
No ratings yet
Problem Statement 3
5 pages
Project Report
No ratings yet
Project Report
53 pages
iCMLDE2019 Paper 25
No ratings yet
iCMLDE2019 Paper 25
5 pages
Improving Visual Perception Through Technology: A Comparative Analysis of Real-Time Visual Aid Systems
No ratings yet
Improving Visual Perception Through Technology: A Comparative Analysis of Real-Time Visual Aid Systems
22 pages
First Review 1MS21LVS06
No ratings yet
First Review 1MS21LVS06
12 pages
Sign Board Reader
No ratings yet
Sign Board Reader
22 pages
Deep Learning Based Mobile Assistive Device For Visually Impaired People
No ratings yet
Deep Learning Based Mobile Assistive Device For Visually Impaired People
3 pages
VisioNR Final
No ratings yet
VisioNR Final
22 pages
GRP 8
No ratings yet
GRP 8
48 pages
Presentation 4
No ratings yet
Presentation 4
17 pages
Objection Detection
No ratings yet
Objection Detection
25 pages
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
No ratings yet
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
5 pages
AI Assistant PBL Project
No ratings yet
AI Assistant PBL Project
13 pages
Chatbot Paper
No ratings yet
Chatbot Paper
10 pages
Set Conference 22mdt1034
No ratings yet
Set Conference 22mdt1034
17 pages
Saavip-Smart Ai - Assistant For Visually Impaired People
No ratings yet
Saavip-Smart Ai - Assistant For Visually Impaired People
43 pages
Math El
No ratings yet
Math El
17 pages
Voice Assisted Text Reading System For Visually Impaired Persons
No ratings yet
Voice Assisted Text Reading System For Visually Impaired Persons
6 pages
ACFrOgDue3K5VpDVWq3 TRJqGwgxWZYnRmC34d1zzUQQdfyf7mshQhNh7FuZS 1QF qkY82truBm87vRLQam2YUZRTH
No ratings yet
ACFrOgDue3K5VpDVWq3 TRJqGwgxWZYnRmC34d1zzUQQdfyf7mshQhNh7FuZS 1QF qkY82truBm87vRLQam2YUZRTH
7 pages
Real-Time Object Detection for Visually Impaired
No ratings yet
Real-Time Object Detection for Visually Impaired
5 pages
Seminar Report Final
No ratings yet
Seminar Report Final
37 pages
Prepare in Advance - 20250505 - 200620 - 0000 PDF
No ratings yet
Prepare in Advance - 20250505 - 200620 - 0000 PDF
1 page
AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies
No ratings yet
AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies
6 pages
Literature Survey1
No ratings yet
Literature Survey1
4 pages
Project Group1
No ratings yet
Project Group1
38 pages
0th Review
No ratings yet
0th Review
18 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
RM Report
No ratings yet
RM Report
4 pages
Project 2
No ratings yet
Project 2
5 pages
Dip PDF
No ratings yet
Dip PDF
30 pages
Visual Assist
No ratings yet
Visual Assist
53 pages
Function Assignment-2
100% (3)
Function Assignment-2
9 pages
Sachin Computer Presentation 3
No ratings yet
Sachin Computer Presentation 3
20 pages
Distributed Systems: Chapter 01: Introduction
No ratings yet
Distributed Systems: Chapter 01: Introduction
47 pages
Project Final
No ratings yet
Project Final
8 pages
Storage Devices & Media Revision Questions 1.: The Garden International School 1
100% (1)
Storage Devices & Media Revision Questions 1.: The Garden International School 1
8 pages
Krishi - DESK HCI - 2
No ratings yet
Krishi - DESK HCI - 2
8 pages
Oracle Utilities Software Development Kit V2.2.0 Installation Guide
No ratings yet
Oracle Utilities Software Development Kit V2.2.0 Installation Guide
77 pages
Elsevier Template 1
No ratings yet
Elsevier Template 1
3 pages
Manuale SW Seq 3 3 en
No ratings yet
Manuale SW Seq 3 3 en
62 pages
Advanced Python Web Dev Course Dubai
0% (1)
Advanced Python Web Dev Course Dubai
5 pages
China Interns Time Recording Guide
No ratings yet
China Interns Time Recording Guide
11 pages
Notebook - Music Recommendation System Reference
No ratings yet
Notebook - Music Recommendation System Reference
22 pages
Zoneworks XT HIVE: Advanced Emergency Lighting
No ratings yet
Zoneworks XT HIVE: Advanced Emergency Lighting
16 pages
Dec50103 PW5
No ratings yet
Dec50103 PW5
14 pages
Installation Security Information: GDB-PEDA Cheatsheet - Page 1
No ratings yet
Installation Security Information: GDB-PEDA Cheatsheet - Page 1
3 pages
Genai With Azure Cloud
No ratings yet
Genai With Azure Cloud
17 pages
Creative Transitions Animations Css
No ratings yet
Creative Transitions Animations Css
275 pages
Softwrer Demo
No ratings yet
Softwrer Demo
4 pages
Border Css
No ratings yet
Border Css
5 pages
Ip Lab Manual
No ratings yet
Ip Lab Manual
25 pages
White Paper: A Practical Guide To Identifying Slow Code During Development
No ratings yet
White Paper: A Practical Guide To Identifying Slow Code During Development
5 pages
Microsoft MB-500 Voct-2023
No ratings yet
Microsoft MB-500 Voct-2023
64 pages
FILO - An Instant Tutor App, Founded by Biharis, Has Raised $260K
No ratings yet
FILO - An Instant Tutor App, Founded by Biharis, Has Raised $260K
2 pages
VLIS Design Engineer - ELE - Q1201 - v3.0
No ratings yet
VLIS Design Engineer - ELE - Q1201 - v3.0
31 pages
Lululemon's AWS Cloud DevOps Case Study
No ratings yet
Lululemon's AWS Cloud DevOps Case Study
6 pages
AC 115+Hardware+Installation+and+Programming+Manual+v03+ +161213+ +english
No ratings yet
AC 115+Hardware+Installation+and+Programming+Manual+v03+ +161213+ +english
51 pages
Javed Hallikeri
No ratings yet
Javed Hallikeri
23 pages
SetApp Enabled Inverters Firmware Updates - SolarEdge
No ratings yet
SetApp Enabled Inverters Firmware Updates - SolarEdge
18 pages
The Linux Command Line: A Complete Introduction 2nd Edition William E. Shotts Install Download
No ratings yet
The Linux Command Line: A Complete Introduction 2nd Edition William E. Shotts Install Download
56 pages
DCV 2024
No ratings yet
DCV 2024
2 pages