ABSTRACT
Visually impaired persons are those who have lost the ability of vision partially or permanently (blinds). This leads to
difficulties in performing the daily activities such as reading, socializing, walking etc. Text and speech are the main
communication medium for humans [1]. The number of visually impaired persons increases day by day due to different
eye diseases like Cataract, Refractive error, Glaucoma, childhood blindness, age related macular degeneration,
Diabetics etc. Here we set forth a camera based text reading mechanism that converts the text present on the paper
using an auto focusing camera into speech which the person can listen to by using a pair of headphones. The proposed
concept involves extracting text from the image captured using Tesseract Optical Character Recognition (OCR) and
converting the text to speech. Here, we divide texts to Syllables so as to convert it to speech, where the sound of each
syllable is prerecorded.
Text-to-speech is an AI-based system that converts text into speech. It uses Natural Language Processing and Digital
Signal Processing to achieve its goal. The paper describes a web-based
system that converts text to speech. It was developed using JavaScript and is compatible with a
variety of speech synthesis libraries. The programming language used in this project is light-weight
and can support development of web apps. It is also able to convert plain text files into an audio
format. The system is able to convert plain text files to audio. This software can be used by users
all around the world at the same time and its supported on major desktop and hand-held devices
once it is linked to the web. This system will be of great help to lecturers and students in the
classroom, e-libraries and computer aided learning for visually impaired users.
This paper strives to build an effectual camera based convenient text reading device which is intuitive.
CHAPTER ONE
1.0 INTRODUCTION
One of the greatest difficulties faced by a sightless
person is the disability to read. Text is present everywhere
ranging from bulletin to billboards to digital sections etc.
Blind people face a lot of difficulties. There have been
developments on mobile phones and computers that assist a
blind person by combining computer vision tools with other
existing expedient products such as Optical Character
Recognition (OCR) system.
The proffered system assists blind people by capturing
the text and then by reading it to them. Extracting the text
present is enacted with OCR. It is a tactic for transformation
of images of writings on a label, printed books etc. OCR
replaces binary images with texts and also detects white
spaces. It also parses the integrity of the recognized text.
Optical Character Recognition is the mechanical or
electronic transformation of images of typed, handwritten or
printed text into machine-encoded text, or from subtitle text
superimposed on an image [5]. In order to take a picture, a
delay of around 2 seconds occurs and is then processed by
Raspberry Pi. The text is splitted into syllables and the
corresponding sounds are produced which can be heard
through the audio jack
Based on the survey by World Health organization
in 2010, total population in India is 1181.4 million out of
which people who suffer with blindness, low vision and
visual impairment are 152.238 Million[1]. Figure 1 shows
the number of people who are blind, with low vision and
visually impaired (in thousands) per million population.
According to Dr.Bjorn, impaired vision can have
negative effects on learning and social interaction. It can
affect the natural development of intelligence and academic
ability, social, and profession [2]. People who are visually
impaired cannot be recovered with the help of glasses. This
causes people with low vision, they cannot even see the
normal printed paper. They can only see if the sizes of the
characters or letters are big enough. This condition impacted
the length of the reading process and made the eyes tired. To
help improve the quality of life for people with low vision a
tool to read the article is needed. The rate of vision
impairment can vary in each individual with low vision.
Therefore a device developed in this work utilized other
sensory function in receiving information from a text. The
device is specifically designed for the people with low
vision. So, that they can easily use this device without
having to ask for help from others and they can utilize this
device for academic and intelligence ability.
The society is more and more progressively focused on data handling,
processing, storage and dissemination, using electronics based technologies, today’s
computers is able stimulate several human capabilities like reading,
grasping, calculating, speaking, memory recall, comparison numbers,
drawing, passing judgments, and even interactive learning. Researchers are working
to expand these capabilities and, therefore the power of computers by developing
hardware and software that can initiate intelligent human behavior (Raiyetunbi and
Ayeh, 2020). There are researchers working on the systems that have the ability to
reason, to learn or accumulate knowledge to strive for self-improvement, and to
stimulate human sensory and mechanical capabilities. This general area of research
is known as Artificial Intelligence.
Artificial intelligence (AI) is a wide-ranging branch of computer science
concerned with building smart machines capable of performing tasks that typically
require human intelligence. Artificial intelligence (AI) is a major influence in our
society today and greatly on the state of education today, and the implications are
huge. AI has the potential to transform education and learning in particular by
empowering learners with different abilities. It is a technology that enable the
physically challenged person and semi-illiterates assess and read through electronic
documents, thus bridging the digital divide with the aid of an interactive intelligent
system.
An interactive intelligent system is an intelligent system that people interact
with. An intelligent system embodies one or more capabilities that have traditionally
been associated more strongly with humans than with computers, such as the
abilities to perceive, interpret, learn, use language, reason, plan, and decide. Systems
that exhibit these capabilities mostly use techniques that originated within the field
of artificial intelligence. One of such intelligent system is text-to-speech synthesizer.
Speech synthesis is the artificial production of human speech. A computer system
used for this purpose is called a speech computer or speech synthesizer, and can be
implemented in software or hardware products. A text-to-speech (TTS) system
(Artificial Interpreter, AI) converts normal language text into speech; other systems
render symbolic linguistic representations like phonetic transcriptions into speech,
(Allen, 1987 in Raiyetunbi and Ayeh, 2020).
Text-to-speech synthesis -TTS - is the automatic conversion of a text into
speech that resembles, as closely as possible, a native speaker of the language
reading that text. Text-to-speech synthesizer (TTS) is the technology which lets
computer speak to you. The TTS system gets the text as the input and then a
computer algorithm which called TTS engine analyses the text, pre-processes the
text and synthesizes the speech with some mathematical models (algorithms). The TTS engine usually generates sound
data in an audio format as the output (Isewon,
Oyelade and Oladipupo, 2012). TTS systems are capable of "reading" any string of
text characters to form original sentences.
In special education, that is education for the physically disabled, reading is
essential in learning. The act of reading permits students to learn vocabulary and
concepts and to access different types of reading materials (Serafini 2004). This is
why the able body persons are required to read textbooks, lecture materials and other
text to the hearing of the visually impaired person so as to enable the physically
challenge to learn. If students fall behind in reading comprehension for their
age/grade level, then students struggle to process new vocabulary and concepts
presented in textbooks and other literature. Difficulty in reading may translate into
poor school performance due to the inability to process new vocabulary and concepts
in a meaningful manner. These difficulties can evolve into students especially the
physically challenged in losing interest in education. This cycle can lead to students‟
dropping out of high school and possessing below-average reading comprehension
skills as adults. Hence the manual or traditional reading approach that are usually
employed in schools, lecture rooms , Libraries and workshop, could be faced with
certain challenges to the physically challenged. It is through reading that students
expand their vocabulary and learn about the world.
Researchers have demonstrated that the use of technology exposes struggling
readers to different types of literature and assists them with vocabulary acquisition
(Stone-Harris 2008). The significance of the study also exhibits that the use of web
based text to speech synthesizer apps can lead to an improvement in struggling
readers‟ skills and attitudes. If use of such apps can be proven to benefit struggling
readers, then educators will possess another instructional technique to assist
struggling readers improve their reading skills and attitudes.
Some of them which inspired the development of our software are as follows:
Isewon, Oyelade and Oladipupo (2012) developed the Text To Speech Robot, a simple application with the text to
speech functionality. The system was developed
using Java programming language. As a result, the software could only function
offline when installed on a computer because it was not cloud enhanced. Kaladharan
(2015) worked on An English Text to Speech Conversion System; a system which
he developed using Microsoft .Net framework. Although the system was successful,
it was not cloud enhanced or enabled. (Serafini 2004) has explained that much
research validates the importance of reading aloud to students, positing that the act
of reading aloud introduces new vocabulary and concepts, provides a fluent model,
and allows students access to literature they are unable to read independently. He
adds that audiobooks, sometimes called Text to speech system (TTS) or Artificial
Interpreter is an important component of a comprehensive reading program.
(Kylene, 1998) has said that audiobooks, when used with reluctant, struggling, or
second language learners, serve as a scaffold that allows students to read beyond
their reading level. Since the reading process develops through oral language
experiences, audiobooks benefit struggling readers by increasing comprehension
and appreciation of written text (Wolfson 2008). This benefit has long been seen by
classroom teachers. The general objective of the project is to develop a Text-tospeech synthesizer for the physically
impaired and the vocally disturbed individuals
using English language. The specific objectives are: To enable the deaf and dumb to
communicate and contribute to the growth of an organization through synthesized
voice and to enable the blind and elderly people enjoy a User-friendly computer
interface.
1.1 Background of the Study