0% found this document useful (0 votes)
61 views30 pages

Smart Drishti For Blind Report

The document discusses building a model that detects objects around visually impaired people and translates them into audio descriptions. It uses computer vision and speech recognition techniques with the MS COCO dataset. The project aims to simplify lives of visually impaired people by helping them understand their surroundings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views30 pages

Smart Drishti For Blind Report

The document discusses building a model that detects objects around visually impaired people and translates them into audio descriptions. It uses computer vision and speech recognition techniques with the MS COCO dataset. The project aims to simplify lives of visually impaired people by helping them understand their surroundings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

SMART DRISHTI FOR BLIND

CHAPTER – 1

PREAMBLE

1.1 INTRODUCTION

The fourth Industrial revolution that is also known as the technological

revolution has almost gaspedour industry. The foremost technology that is turning this

revolution into reality is Artificial Intelligence, either it is in the form of Machine

Learning or Deep Learning. There are many areas in the Machine Learning domain

such as Speech Recognition, computer vision, and naturallanguage processing. With

the humongous availability of data around the world, it has become comparatively

easier to train the algorithm, manipulate it and use it accordingly.

In this project, our prime concept is to simplify the lives of people affected

visually by translating the perceptible surroundings around them into audio comments

such that they can understand what is going on around them. Visual impairment is a

serious cause that segregates people from the basicway of survival in times like these,

making them go devoid of various opportunities and general chores that people don’t

even pay proper heed to. However, with the advent of technology and the

technological revolution, the life of visually impaired people can be made

comparatively smoother, helping them experience their surroundings in the most

optimal process. Thus, for our proposed concept and to reach our required objective,

we shall be using the aforementioned domain in ways that shall be explained here

under.

Dept. of Computer Science & Engineering 1


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

For the same purpose, we use various Machine Learning libraries, of which

OpenCV is a prime one. Open Source Computer Vision Library, generally termed as

OpenCV is an open-source library that deals with image manipulation and various

instantaneous operations. In this paper, we will be seeing the use of OpenCV for

detecting objects. Other than that Google Speech API and a suitable trainingalgorithm

are used to get the proper outcome from the model.

1.2 ABOUT THE PROJECT

In this project, our goal is to build a prototype of a model that easily sees the

objects around us and translates them in a way that visually impaired people can

perceive it with zero to no difficulty. We useareas of machine learning such as speech

recognition and computer vision to conduct this project. The dataset used to

administer this project is the MS COCO dataset.

The MS COCO dataset is comprehensive object detection, segmentation, and

labeling dataset published by Microsoft. COCO stands for Common Objects in

Context, as the image data set was created with the aim of advancing image

recognition, visual datasets for computer vision, mostly state-of-the-art neural

networks. The COCO dataset offers a variety of features including object

segmentation with detailed instance annotation, in-context detection, superpixel

segmentation to name a few. Out of a total of 300,000images, 200,000 are tagged. In

addition, COCO also provides 80+ object categories called COCO Classes, 91 object

categories called COCO Things, and provides 5 captions per image. Also, for the

purpose of pose estimation, 250,000 people with 17 different pre-trained key points

are generally used.

Dept. of Computer Science & Engineering 2


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

The project can be segmented into various steps for a better understanding of

the work. The partial objective of the project is to detect objects around us and to

identify them using the YOLOv3 algorithm that supports Darknet architecture, thus

giving a wide spherical view of the object detection and recognition process as a

whole. Furthermore, the latter half of the project aims on translating the recognized

objects into speech through the Google-Text-to-Speech API of the python library.

Combiningboth of these gives us the primary objective of the prototype.

Thereby, the initial designs of the project prototype are made, thus addressing

the required and mentionedobjectives.

1.3 PROJECT OUTLINE

This project is developed and structured using various python libraries for

working with various domains of Machine Learning, such as speech recognition,

computer vision, natural language processing. The whole project is made in Jupyter

Notebook, which is a very usable platform forprojects like such.

The project can be implemented after the successful implementation of

OpenCV 3.4+. Since OpenCV 4 is still in beta right now and the official release has

not been initiated yet, it is safer to use the version3 of the OpenCV library. Apart from

that, we need to install the YOLOv3 training dataset and MS COCO dataset to start

our project implementation.

Dept. of Computer Science & Engineering 3


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

The whole project can be visualised in tree command of the terminal as 4

directories and 19 filesinitially. The 4 directories can be further enhanced as

 image: The path to the input image. We shall be detecting objects in this image

using YOLO.

 yolo: This is the base path to the YOLO directory. The scripts will load the

required YOLOfiles in order to perform object detection on the image.

 confidence: Minimum probability to filter weak detections is tracked in this

directory. The default value is given as 0.5 and the value is open to

experimentation.

 threshold: This is our non-maxima supppression (NMS) threshold with a

default value of 0.3.

A variety of object detection techniques is used using the YOLOv3 algorithm

which shall be explained further and the detected output is translated to audio for

audio feedback. Implementing the same on a video based input meets the objective

output that we have set to plan out.

The basic concept behind our research is to propose a system that will assist

millions of visually impaired masses as mentioned at the stats in Table 1, in reading

the typed, handwritten & printed text without usingthe old fashioned tough & difficult

system of braille mapping. As discussed in the last paragraph, that number of

advancements has been made for the same purpose but none of them was completely

able to overcome the technical challenges and hurdles. So we aim to overcome all

those challenges and propose aversatile and complete system.

Dept. of Computer Science & Engineering 4


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

CHAPTER – 2

LITERATURE SURVEY

2.1 LITERATURE REVIEW

As a part of research for building our model, a literature review of a handful of

IEEE published papers were made. We shall briefly be discussing the review

hereunder with appropriate required details.

2.1.1 Real Time Object Detection with Audio Feedback using Yolo vs. Yolo_v3:

The first paper is titled “Real Time Object Detection with Audio Feedback

using Yolo vs. Yolo_v3” and was published in the year 2021. This paper uses

algorithms and techniques like the OpenCV library, Yolo, Yolo v3. The performance

recorded in this paper indicates that it works better for smaller objects with future

works mentioned as the expansion of the research on self explored dataset [1]

2.1.2 Reader and Object Detector for Blind:

The next paper is titled “Reader and Object Detector for Blind” which was

published in the year 2020. This paper uses algorithms and techniques such as

Raspberry pi, OCR, tesseract, and tensorflow for carrying out the project. Text

reading and object detection was successful but not for smaller than 16 font size is

what was recorded in the performance of the paper. As the objective for future works,

making it available for multi languages is recorded as of now [2].

Dept. of Computer Science & Engineering 5


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

2.1.3 Obstacle Detection for Visually Impaired Patients:

The paper that was studied next is titled “Obstacle Detection for Visually

Impaired Patients” which was published in the year 2014. The techniques and

algorithms used in this paper are stereoscopic sonar system, sound buzzers, voice IC-

APR 9600. Wearable optical detection system is provided that provides full body

vibration effect on obstacle detection. However, the device has a very limited range

when compared to its own size and is also found difficult for users to comprehend the

guidance signals in time[3].

2.1.4 Voice Based Smart Assistive Device for Visually Challenged:

The paper that was studied next is titled “Voice Based Smart Assistive Device

for Visually Challenged” and was published in the year 2020. The Raspberry Pi, Deep

Learning, conversational AI, speech recognition, Assistive Technology, and

algorithms and methodologies were described in the article. After being trained on

only 50 photos of each object, the model has an accuracy of 83 percent and detects

campus objects that are commonly available. However, because it was trained on

8000 photos from the Flickr 8k dataset, the accuracy drops as the image complexity

grows.[4].

2.1.5 A Wearable Assistive Technology for the Visually Impaired with Door

Knob Detection and Real-Time Feedback for Hand-to-Handle

Manipulation:

The next paper is titled “A Wearable Assistive Technology for the Visually

Impaired with Door Knob Detection and Real-Time Feedback for Hand-to-Handle

Manipulation” and was published in the year 2017. Algorithms and techniques such

Dept. of Computer Science & Engineering 6


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

as YOLOv2, Deep Learning, Neural Network were used. The performance of the

device is increased to folds if the hand detection is stable. The biggest difficulty,

however, is the consistency of the hand detection performance. More images will be

added to the database in the future, and the door knob identification feature will be

extended to more general door handles. [5].

2.1.6 VISION- Wearable Speech Based Feedback System for the Visually

Impaired using ComputerVision:

The paper is titled as “VISION- Wearable Speech Based Feedback System for

the Visually Impaired using Computer Vision”, published in 2020. It is a wearable

device based on Raspberry pi, gTTs and YOLO. The text will be read out in English

and at a slow speed that is saved as an mp3 file and future work is mentioned as

location navigation that works in low-light conditions while remaining cost-

effective.[6]

2.1.7 YOLO-compact: An Efficient YOLO Network for Single Category Real-

time Object Detection:

This IEEE paper named, “YOLO-compact:An Efficient YOLO Network for

Single Category Real-time Object Detection”, published in 2020, is an efficient way

for a location navigation that works in low-light conditions while remaining cost-

effective.. The model would be more precise if the depth, width and precision is

improved. [7].

Dept. of Computer Science & Engineering 7


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

2.1.8 CPU based YOLO: A Real Time Object Detection Algorithm:

The next 2020 published paper having title “CPU based YOLO: A Real Time

Object Detection Algorithm” is based on Faster R-CNN, YOLO, R-CNN, Fast R-

CNN, SSD, Mask R-CNN, R-FCN, OpenCV and RetinaNet. The Model discovers

objects from video at a pace of “10.12–16.29 frames persecond” on many non-GPU

platforms, with an accuracy of 80–99 percent. mAP of 31.05 percent is achieved by

CPU Based YOLO with aforementioned future work as increment of FPS and mAP

by optimizing the model.[8].

2.1.9 A Novel YOLO-based Real-time People Counting Approach:

A 2017 paper titled “A Novel YOLO-based Real-time People Counting

Approach” is based on YOLO-PCand CNN. IT provides us an automatic way to count

people in a huge crowd and the accuracy states that it works perfectly fine for a huge

crowd and for a single person as well. Future Work stated in the paper is to add

abnormal behavior detection during counting and children counting.[9]

2.1.10 Edge detection based boundary box construction algorithm for

improving the precision ofobject detection in YOLOv3:

The last paper that we have reviewed is “Edge detection based boundary box

construction algorithm for improving the precision of object detection in YOLOv3”

published in 2017. YOLOv3, Edge Detection, YOLO, YOLO9000, Boundary Box

Prediction, and Object Detection are all used in this research. The intersection over

union for the proposed algorithm and YOLO v3 is determined, and the proposed

Dept. of Computer Science & Engineering 8


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

approach outperforms YOLO v3 in terms of boundary box accuracy. When there are

sharp objects in the image or there is too much noise, the model becomes

constrained.[10]

2.2 PROBLEM STATEMENT

The visually impaired community encounters formidable challenges in

accessing printed and digital text independently. With conventional reading methods

often proving insufficient, there is a compelling needto harness the power of artificial

intelligence to empower the blind. This project aims to address these challenges by

developing an AI smart reader using Python. Visually impaired individuals frequently

rely on others for reading assistance, limiting their access to information and hindering

their autonomy. Traditional solutions fall short in providing real-time, context-aware

assistance. The proposed AI smartreader seeks to bridge this accessibility gap, offering

a transformative solution to enhance the quality oflife for the blind.

The overarching problem is the limited access to information for the visually

impaired, resulting in a dependence on sighted individuals for reading various

materials. This dependency restricts independence and hinders the ability to engage

with a diverse range of written content. The lack of an efficient and real-time solution

exacerbates these challenges, necessitating the development of an AI smart reader that

can seamlessly convert text from different sources into audio.Challenges:The

challenges faced by the visually impaired include difficulty in reading printed

materials, digitaldocuments, and online content without assistance. Existing solutions

often lack accuracy, real-time processing, and a comprehensive understanding of

context. These challenges underscore the critical need for an AI-driven solution that

Dept. of Computer Science & Engineering 9


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

can intelligently interpret text and provide immediate auditory feedback.

Visually impaired individuals face significant challenges in independently

accessing and comprehending printed or digital text, relying on external assistance for

reading various materials. This dependence limits their autonomy and access to

information, creating a need for an AI-driven smart reader. The problem is to develop

a solution that employs Python to convert text from diverse sources into audio in real-

time, empowering the blind community by promoting accessibility and independence.

The user is the visually impaired people who shall be addressed by the

proponent to develop this model.Common users who will get to access the model for

their own benefit upon the supervision of the medical facility within their own

comfortable accommodation. The following are the things to be addressed by the

proponents:

 Accuracy

 Reliability

 Usability

 Efficiency

 User Friendliness

2.3 OBJECTIVES

The prime objective of this project is to understand the underlying factors of

visual impairment and to help people suffering from visual impairment know what

they are surrounded by in a very friendly cost.

Dept. of Computer Science & Engineering 10


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

Adding on to that, the project also aims at running seamlessly without any

system barrier and high usability. The accuracy of the built model is one to look out

for, since it shall be dealing with detectingobjects and recognizing them in real time.

1. Ensuring Accurate Text Recognition

2. Facilitating Natural Text-to-Speech Conversion

3. Enhancing Real-time Object Recognition

4. Designing a User-Friendly Interface

5. Offering Customization and Personalization

2.3.1 Scope and Limitation

Scope

 The built model shall be carrying out the idea and concept of healthcare

machine learning by accurately detecting and recognizing objects around the

user.

 The model shall also be providing the best suited and most accurate ML

classification algorithm for achieving the required objective as mentioned in

this project.

 The model is most accurate when it comes to the given objective, thereby

resulting in mostdesirable and precise predictions to be made.

 The model is easy to use and understandable by regular people, giving them

the authority ofhigh accessibility and reliability.

Limitation

 The foremost drawback of the model is that although it is highly accurate, it

can never be 100%accurate.

Dept. of Computer Science & Engineering 11


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

 This is a prototype of the idea that we have proposed and thus we are still to

use hardware thatmust be required for the proper execution of this project.

 Although it is highly reliable and gives the accurate algorithm to follow, the

accuracy of the algorithms used might differ depending upon the processing

unit that has been use.

2.4 MOTIVATION

In the realm of technological innovation, the development of an AI-driven

smart reader tailored for theblind community is motivated by a deep commitment to

improving the quality of life for visually impaired individuals. This initiative seeks to

address the significant challenges they encounter in accessing written information,

providing a solution that not only enhances accessibility but also fostersindependence,

inclusivity, and a sense of empowerment.

Understanding the challenges:

Visually impaired individuals face obstacles in navigating a world primarily

designed for those with sight. The inability to independently access and comprehend

written information hampers educational, professional, and personal pursuits.

Recognizing these challenges propels the motivation to create a sophisticated AI

smart reader that transcends traditional solutions, offering a holistic and adaptive

approach to reading for the blind.

Empowering Independence:

The core motivation revolves around empowering blind users to navigate the

vast landscape of written content with confidence and autonomy. By integrating

Dept. of Computer Science & Engineering 12


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

artificial intelligence into the smart reader, the goal is to elevate the reading experience

beyond basic text-to-speech functionality. This innovation aimsto accurately recognize

diverse text formats, provide natural and intelligible speech synthesis, and incorporate

real-time object recognition for enhanced environmental awareness.

Fostering Inclusivity:

The motivation extends to fostering inclusivity and breaking down barriers

that limit the participation of visually impaired individuals in various aspects of life.

The AI smart reader aspires to be a catalyst for positive change, ensuring that blind

users can seamlessly access a multitude of written materials, regardless of the source

or format. By doing so, it contributes to creating a more inclusive and equitabledigital

landscape.

Enabling Personalization:

Understanding that each user has unique preferences and needs, the motivation

behind the smart readeris to offer a high degree of customization.

Dept. of Computer Science & Engineering 13


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

CHAPTER – 3

OVERVIEW OF SMART DRISHTI FOR BLIND

Our comprehensive literature survey revealed several key areas for

improvement and innovation in the Smart Drishti project for assisting the blind. These

insights have been critical in guiding the modifications and enhancements we plan to

implement. The survey examined existing technologies, methodologies, and

challenges in the realm of assistive devices for the visually impaired, specifically

focusing on real-time navigation and obstacle detection. Based on these findings,

several key conclusions can be drawn to guide the development of the "Smart Drishti"

project.

INTRODUCTION TO ANACONDA

Anaconda is a popular open-source distribution platform for Python and R

programming languages used in data science, machine learning, and scientific

computing. It simplifies package management and deployment by providing a

comprehensive collection of pre-built libraries and tools. Anaconda includes the

Conda package manager, which allows users to install, update, and manage packages

and dependencies seamlessly.

It's widely used in data analysis and research communities due to its ease of

use and extensive library support.

Dept. of Computer Science & Engineering 14


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

3.1 DATASET AND PACKAGES

3.1.1 DATASET

COCO Dataset stands for Common Objects In Context. It is a pre defined

dataset by Microsoft. The COCO dataset is a collection of demanding, high-quality

datasets for computer vision, mostly using state-of-the-artneural networks. This name

is also used to refer to the format in which the datasets are stored. It is an object

detection, segmentation, and captioning dataset. It contains around 330k images in

which more than 200k images are labelled, which makes it even more easier to

recognize the class (category) of detected object. It has around 1.5 million object

instances and 80 object categories. COCO annotations employ the JSON file format,

which has a top value of dictionary (key-value pairs inside braces). It can also have

nested dictionaries or lists (ordered collections of objects inside brackets), as shown

below:

"info": {…},

"licenses": […],

"images": […],

"categories": […], "annotations": […]

 Info Section: It contains metadata about the dataset like description, url,

version etc.

 Licenses section: It contains the links to the licenses for the images present in

the dataset. All thelicense contains the id field which is used to recgonize the

Dept. of Computer Science & Engineering 15


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

license.

 Image: It is the second most important dictionary of the dataset. It has the

fields like licence,file_name, coco_url, height, width and date_captured.

 Categories Section: It contains classes of the objects that may be detected on

images.

 Annotations Section: This is the most important section of the dataset, which

contains informationvital for each task for specific COCO dataset.

3.1.2 PACKAGES

The following packages has been used for building the model:

i. NumPy: NumPy is a general-purpose array-processing package. It provides

a high-performance multidimensional array object, and tools for working

with these arrays. It is the fundamental package for scientific computing

with Python. NumPy can also be used as an efficient multi- dimensional

container of generic data. Arbitrary data-types can be defined using Numpy

which allows NumPy to seamlessly and speedily integrate with a wide

variety of databases.

ii. OpenCV: Open Source Computer Vision. It is one of the most widely used

tools for computer vision and image processing tasks. It is used in various

applications such as face detection, videocapturing, tracking moving objects,

object disclosure. now it plays a major role in real-time operation which is

very important in today's systems. By using it, one can process images and

videos to identify objects, faces, or even handwriting of a human.

Dept. of Computer Science & Engineering 16


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

iii. gTTs: gTTS is a very easy to use tool which converts the text entered, into

audio which can be saved as a mp3 file. The gTTS API supports several

languages including English, Hindi, Tamil, French, German and many more.

The speech can be delivered in any one of the two available audio speeds,

fast or slow. However, as of the latest update, it is not possible to change the

voice of the generated audio.

3.2 ALGORITHM

In order to build the model, we have cross validated various ML classification

algorithms. However, thehigh accuracy rated algorithms that we have further used to

fit the model shall be discussed henceforth.

Fig. 3.1: Working Model

Dept. of Computer Science & Engineering 17


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

YOLO v3:

The term 'You Only Look Once' is abbreviated as YOLO. This is an algorithm

for detecting and recognising different items in a photograph (in real-time). Object

detection in YOLO is done as a regression problem, and the identified photos' class

probabilities are provided. Convolutional neural networks (CNN) are used in the

YOLO method to recognise objects in real time. To detect objects, the approach just

takes a single forward propagation through a neural network, as the name suggests.

This indicates that a single algorithm run is used to forecast the entire image. The

CNN is used to forecast multiple bounding boxes and class probabilities at the same

time. It is an instantaneous object identification algorithm that has a COCO test-dev

mAP of 57.9% while analyzing images at 30 frames per second rate. The main

features of YOLOv3 lie in it being very fast and accurate, which can easily be traded

off by simply customizing the size of the model, thereby requiring no retraining

whatsoever.

3.2.1 Working of YOLO algorithm:

YOLO is regression-based. Initially it takes the video input and segments the

video into 24 frames. Each frame is then divided into cells. Image classification and

localization are applied on each grid. YOLO then predicts the bounding boxes and

their corresponding class probabilities for objects.

To break down it into more simpler terms, the labelled data lets say is divided

into 3x3 grids and there are total of 3 classes in which we want it to be classified. So,

for each grid cell, label y will be defined as eightdimensional vector.

Dept. of Computer Science & Engineering 18


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

Y = pc, bx, by, bh, bw, c1, c2, c3

Here, pc is the probability of whether the object is present in the grid or not.bx, by,

bh, bw specify the bounding box if there’s an object,

and c1, c2, c3 are the classes of the detected objects.

Bounding boxes i.e bx, by, bh and bw are calculated relative to the grid cell it is a

dealing with. bx and byare the x and y coordinates of the midpoint of the object with

respect to this grid.bh is the ratio of the height of the bounding box to the height of the

corresponding grid cell. bw is the ratio of the width of the bounding box to the width

of the grid cell. bx and by will always range between 0 and 1 as the midpoint will

always lie within the grid. Whereas bh and bw can be more than 1 in case the

dimensions of the bounding box are more than the dimension of the grid.

3.2.2 Darknet Architecture

Yolo V3 is an improvement over the previous two YOLO versions where it is

more robust but a little slower than its previous versions. This model features multi-

scale detection, a stronger feature extraction network, and a few changes in the loss

function. For understanding the network architecture on a high- level, let’s divide the

entire architecture into two major components: Feature Extractor and Feature Detector

(Multi-scale Detector). The image is first given to the Feature extractor which extracts

feature embeddings and then is passed on to the feature detector part of the network

that spits out the processed image with bounding boxes around the detected classes.

Dept. of Computer Science & Engineering 19


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

3.2.2.1 Feature Extractor Darknet-19

(a custom neural network architecture developed in C and CUDA) was utilised

as a feature extractor in prior YOLO versions, with 19 layers as the name suggests.

Darknet-19 now has a total of 30 layers thanks to YOLO v2, which adds 11 extra

layers. However, because to the down sampling of the input image and the loss of

fine-grained characteristics, the system had difficulty detecting small objects. The

feature extractor utilised in YOLO V3 was a combination of YOLO v2, Darknet-53

(an ImageNet-trained network), and Residual networks, which resulted in a better

architecture (ResNet).

The network is formed with consecutive 3x3 and 1x1 convolution layers

followed by a skip connection, resulting in a total of 53 convolution layers (thus the

name Darknet-53) (introduced by ResNet to help the activations propagate through

deeper layers without gradient diminishing).

The darknet's 53 layers are piled on top of another 53 for the detection head,

giving YOLO v3 a total of 106 layers of fully convolutional underlying architecture.

As a result, it has a huge architecture, which makes it a little slower than YOLO v2,

but improves accuracy at the same time

3.2.2.2 TTS Architecture:

The TTS architecture is explained in the above block diagram of the

architecture of the proposed model Fig. 3 part B, which describes the process in the

following way. When the input image contacting text or the text file is fed into the

system, it passes through different phases before coming out as voice output.

Dept. of Computer Science & Engineering 20


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

 In the text analysis phase, the text is arranged into a manageable list of words.

 Text normalization is a conversion of the input text in a pronounceable format.

Identification of any pauses or any punctuation mark is the key aim of this

process. Fig. 4. Block diagram of the optical character recognition process and

its stepwise detail Fig. 5. Process of Text-to-Speech Smart Reader for Visually

Impaired People Based on OCR 85

 The transformation of the orthographical symbols into phonological ones by

taking the phonetic alphabetsinto account, the process is commonly referred to

as grapheme-to-phoneme conversion.

 The amalgamation of stress pattern, the rise, and fall in the speech and rhythm

is known as prosody. While the emotion of the speaker while speaking is

explained by modeling. This phase contributes to generating a natural

synthesized speech.

 Acoustic processing refers to the process in which the type of speech synthesis

is decided. The synthesis may be pre-recorded human voice or intelligible

speech. Articulatory synthesis which is computational technique speech based

on models of the human vocal tract falls in the domain of acoustic processing.

 After all processing through these phases, the intended voice output is taken

out.

Dept. of Computer Science & Engineering 21


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

CHAPTER – 4

IMPLEMENTATION

The implementation of the "Smart Drishti" project involves setting up the

hardware, integrating the software components, and ensuring real-time processing for

obstacle detection and navigation assistance. Here’s a step-by-step guide:

Software Setup

Installing Required Libraries: Install OpenCV for image processing

pip install opencv-python

Install Tensor Flow for object detection:

pip install tensor flow

Install pyttsx3 for text-to-speech conversion:

pip install pyttsx3

Install RPi. GPIO for interfacing with GPIO pins:

sudo apt-get install python3-rpi.gpio

Object Detection Model: Download and set up a pre-trained object detection model

(e.g., SSD MobileNet, YOLO). For this example, we'll assume you have a

TensorFlow model saved in the directory model.

Dept. of Computer Science & Engineering 22


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

Writing the Code: Create a Python script to integrate all components. Complete

Python Script

import cv2
import tensorflow as tf
import pyttsx3
import RPi.GPIO as GPIO
import time

# Initialize the camera and other peripherals


camera = cv2.VideoCapture(0)
model = tf.saved_model.load('path_to_saved_model')
tts = pyttsx3.init()

# GPIO setup for ultrasonic sensors


GPIO.setmode(GPIO.BCM)
TRIG = 23
ECHO = 24
GPIO.setup(TRIG, GPIO.OUT)
GPIO.setup(ECHO, GPIO.IN)

def measure_distance():
GPIO.output(TRIG, False)
time.sleep(2)
GPIO.output(TRIG, True)
time.sleep(0.00001)
GPIO.output(TRIG, False)

while GPIO.input(ECHO) == 0:
pulse_start = time.time()
while GPIO.input(ECHO) == 1:
pulse_end = time.time()

Dept. of Computer Science & Engineering 23


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

pulse_duration = pulse_end - pulse_start


distance = pulse_duration * 17150
distance = round(distance, 2)
return distance

def detect_objects(frame):
# Process frame with the object detection model
input_tensor = tf.convert_to_tensor(frame)
detections = model(input_tensor)
# Extract object detection results
return detections

while True:
ret, frame = camera.read()
if not ret:
break

objects = detect_objects(frame)
distance = measure_distance()

# Example: Generate audio feedback


tts.say(f"Obstacle detected at {distance} centimeters")
tts.runAndWait()

By following this implementation guide, you will be able to develop a

functional prototype of the "Smart Drishti for the Blind" project using Python,

providing visually impaired individuals with a valuable tool for safer and more

independent navigation.

Dept. of Computer Science & Engineering 24


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

CHAPTER – 5

RESULTS AND DISCUSSION

Real-Time Object Detection:

The system successfully detects and identifies objects in real-time using the

camera feed. The TensorFlow object detection model provides accurate bounding

boxes and labels for common obstacles such as furniture, stairs, and people.Detection

speed and accuracy are satisfactory for real-time navigation, with minimal latency.

Distance Measurement:

Ultrasonic sensors accurately measure the distance to obstacles within a range

of up to 4 meters. The distance measurements are integrated with object detection

results to provide contextual information about the environment.

Audio Feedback:

The Text-to-Speech (TTS) system effectively conveys information about

detected objects and their distances to the user. Feedback is timely, clear, and

customizable, allowing users to adjust the volume and speech rate according to their

preferences.

User Testing:

Initial user testing with visually impaired individuals showed positive results,

with users finding the system helpful for indoor navigation. Users reported feeling

more confident when navigating unfamiliar environments with the aid of audio

Dept. of Computer Science & Engineering 25


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

feedback. Feedback indicated a need for additional customization options and

potential integration with other assistive technologies.

DISCUSSION

1. System Performance:

The combination of the Raspberry Pi, camera, and ultrasonic sensors provides

a reliable and cost-effective solution for real-time obstacle detection. The use of

Python and its extensive libraries, such as OpenCV and TensorFlow, streamlines the

development process and allows for rapid prototyping and iteration.

2. Challenges Encountered:

 Computational Limitations: The Raspberry Pi, while capable, has limitations

in processing power, affecting the speed of complex object detection models.

Future iterations may benefit from using more powerful hardware or

optimizing the existing models for better performance.

 Environment Variability: Performance can vary significantly based on

environmental conditions, such as lighting and clutter. Training the model on a

more diverse dataset can improve robustness.

 Audio Feedback: Ensuring that the audio feedback is not overwhelming or

distracting is crucial. Balancing the amount and frequency of information

provided is important for user comfort.

3. User Experience:

The system's usability is a key factor in its success. Ensuring that the device is

lightweight, easy to use, and non-intrusive is vital for adoption by visually impaired

Dept. of Computer Science & Engineering 26


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

users. Continuous user feedback and iterative design improvements are essential to

refine the device and enhance user satisfaction.

Fig 5.1: Screenshot of Scanning and Detection of object

Fig 5.2: Real Scanning and Detection of object

Dept. of Computer Science & Engineering 27


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

CHAPTER – 6

CONCLUSION AND FUTURE SCOPE

The proposed method was applied to the hardware and it was tested with

different samples repeatedly. Our research methodology has successfully done the

process of an image containing text and its transformation into audible speech.

Fig. 6.1: Future model

In this project, we have hereby built a prototype of a model that can accurately

and efficiently detect, recognize and give an audio feedback of objects around us with

minimum ease. Visual impairment is a serious condition that affects lives in a

multitude of ways that we can only imagine. Day to day activitiesof life gets affected

by the same. With this project, we thus have built an idea that can help people

Dept. of Computer Science & Engineering 28


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

suffering from any of such vision related issues. The plight of these suffering people

is beyond our measures of control but we can help them substitute vision with the

advent of technology and empathy.

For building this project, all that has been learnt from various sources were

made to utility and hence, it resulted in the successful implementation of the project.

Although there are quite a few drastic future enhancements that have been suggested

as aforementioned, the model that is right now shall be the strong foundation to that

all, giving a right start that future enhancements would require.

Visual impairment is ugly and disruptive to normal leading of life in a variety

of pace, and with the booming of technology, it can not only be cured, but also in the

far future, eradicated from the face ofearth.

Dept. of Computer Science & Engineering 29


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi
SMART DRISHTI FOR BLIND

REFERENCES

[1] Patel, P., & Bhavsar, B. (2021). Object Detection and Identification. International
Journal, 10(3).

[2] Mahendru, M., & Dubey, S. K. (2021, January). Real Time Object Detection with
Audio Feedback using Yolo vs. Yolo_v3. In 2021 11th International Conference on
Cloud Computing, Data Science & Engineering (Confluence)(pp. 734-740). IEEE.

[3] Annapoorani, A., Kumar, N. S., & Vidhya, V. (2021). Blind-Sight: Object Detection
with Voice Feedback.

[4] Murali, M., Sharma, S., & Nagansure, N. (2020, July). Reader and Object Detector
for Blind. In 2020International Conference on Communication and Signal Processing
(ICCSP) (pp. 0795-0798). IEEE.

[5] Srikanteswara, R., Reddy, M. C., Himateja, M., & Kumar, K. M. (2022). Object
Detection and Voice Guidance for the Visually Impaired Using a Smart App. In
Recent Advances in Artificial Intelligence and Data Engineering (pp. 133-144).
Springer, Singapore.

[6] Karmarkar, M. R. R., & Honmane, V. N. OBJECT DETECTION SYSTEM FOR


THE BLIND WITH VOICEGUIDANCE.

[7] Dewangan, R. K., & Chaubey, S. (2021). Object Detection System with Voice Output
using Python.

[8] Samhita, M. S., Ashrita, T., Raju, D. P., & Ramachandran, B. (2021). A critical
investigation on blind guiding device using cnn algorithm based on motion stereo
tomography images. Materials Today: Proceedings.

[9] Potdar, K., Pai, C. D., & Akolkar, S. (2018). A convolutional neural network based
live object recognitionsystem as blind aid. arXiv preprint arXiv:1811.10399..

[10] Lakde, C. K., & Prasad, P. S. (2015, April). Navigation system for visually impaired
people. In 2015 International Conference on Computation of Power, Energy,
Information and Communication (ICCPEIC) (pp. 0093-0098). IEEE.

Dept. of Computer Science & Engineering 30


Faculty of Engineering and Technology (Ex-Women)
Sharnbasva University, Kalaburagi

You might also like