ABSTRACT
Project is a major part of our course. It is a period in which we are introduced to
the industrial environment, in other words we can say that industrial training is
provided for the familiarization with the industrial environment, with the
advancement of computer technologies in increased automation for increasing
their production. With the advancement in the computer and electronics entering
the field of processing the inputs have to be much more accurate and the
controllers much faster in response. The state of the art used in all the process
controlled industries, these saves the function of managing the fast process,
improving operating efficiency and ensuring full protection.
The object of the project work is to raise the level of performance on one or more
of its aspect and this may be achieved by providing new knowledge and
information relevant to a job. By teaching new trends, by imbuing an individual
with new attitudes, motives and other personality characteristics. Often these
technologies are utilized with segments of a work force regardless of the existing
performance level to operate efficiently.
Project work must be focused on an individual and situation where the need is
service department coordination and cooperation provides an overview of total
training and development process.
Practical is an important part of theoretical studies. It covers all that remains
uncovered in the classrooms i.e. without it the studies remains ineffective and
offers an exposure to real management in business organization.
Also it is well known fact that practical training plays an important role in the
future building of an individual. Only gaining theoretical knowledge in not
sufficient for sure success in life, practical is must and I have been given
opportunity to gain practical experience. Available this instance in a very
satisfactory manner and think it will be very beneficial for me in building my
future.
TABLE OF CONTENTS
i). Introduction to project
Project background
Speech Recognition
ii). Project Analysis and Modelling
Requirement gathering
Use case diagram
Feasibility study
iii). Project Management issues
iv). Implementation and testing
System requirements
Working
Coding
Testing
Software Design
v). Results
Project Objective
Goal achievements
Result Analysis
vi). SWOT Analysis
vii). Conclusion and Future scope
INTRODUCTION TO PROJECT
PROJECT BACKGROUND
Speech recognition is a technology that able a computer
to capture the words spoken by a human with a help of
microphone. These words are later on recognized by
speech recognizer, and in the end, system outputs the
recognized words. The process of speech recognition
consists of different steps that will be discussed in the
following sections one by one.
TYPES OF SPEECH RECOGNITION
Speech recognition systems can be divided into the
number of classes based on their ability to recognize that
words and list of words they have. A few classes of
speech recognition are classified as under:
Isolated Speech Isolated words usually involve a pause
between two utterances; it doesn’t mean that it only
accepts a single word but instead it requires one utterance
at a time.
Connected Speech Connected words or connected
speech is similar to isolated speech but allow separate
utterances with minimal pause between them.
Continuous speech Continuous speech allow the user to
speak almost naturally, it is also called the computer
dictation.
Spontaneous Speech At a basic level, it can be thought
of as speech that is natural sounding and not rehearsed.
An ASR system with spontaneous speech ability should
be able to handle a variety of natural speech features such
as words being run together, "ums" and "ahs", and even
slight stutters.
Audio Input Analog to Digital Acoustic Model
Display Speech Engine Language Model
Speech Recognition Process
COMPONENTS OF SPEECH RECOGNITION
SYSTEM
Voice Input With the help of microphone audio is input
to the system, the pc sound card produces the equivalent
digital representation of received audio.
Digitization The process of converting an analog signal
into a digital form is known as digitization, it involves
the both sampling and quantization processes. Sampling
is converting a continuous signal into discrete signal,
while the process of approximating a continuous range
of values is known as quantization.
Acoustic Model An acoustic model is created by taking
audio recordings of speech, and their text transcriptions,
and using software to create statistical representations of
the sounds that make up each word. It is used by a speech
recognition engine to recognize speech. The software
acoustic model breaks the words into the phonemes.
Language Model Language modelling is used in many
natural language processing applications such as speech
recognition tries to capture the properties of a language
and to predict the next word in the speech sequence .The
software language model compares the phonemes to
words in its built in dictionary .
Speech engine the job of speech recognition engine is to
convert the input audio into text; to accomplish this it
uses all sorts of data, software algorithms and statistics.
Its first operation is digitization as discussed earlier, that
is to convert it into a suitable format for further
processing. Once audio signal is in proper format it then
searches the best match for it. It does this by considering
the words it knows, once the signal is recognized it
returns its corresponding text string.
PROJECT OVERVIEW
This thesis report considers an overview of speech
recognition technology, software development, and its
applications. The first section deals with the description
of speech recognition process, its applications in
different sectors, its flaws and finally the future of
technology. Later part of report covers the speech
recognition process, and the code for the software and its
working. Finally the report concludes at the different
Potentials uses of the application and further
improvements and considerations.
PROJECT OBJECTIVE
• To understand the speech recognition and its
fundamentals.
• Its working and applications in different areas.
• Its implementation as a desktop Application.
• Development for software that can mainly be used for:
- Speech Recognition
- Speech Generation
- Text Editing
- Tool for operating Machine through voice.
PROJECT SCOPE
This project has the speech recognizing and speech
synthesizing capabilities though it is not a complete
replacement of what we call a NOTEPAD but still a good
text editor to be used through voice. This software is able
to retrieve the data from Wikipedia through your voice,
search on Google, and YouTube search etc. it can also
control the personal Computer (PC) through the voice.
PROJECT ANALYSIS & MODELLING
REQUIREMENT GATHERING
POINT OF MOTIVATION
We planned to make some common tasks that every user
does on his/her computer (opening/ closing programs,
editing texts, calculating) possible not only by mouse/
keyboard, but also by voice.
USED TOOLS
For voice recognition:
Google JARVIS API.
Sphinx Voice Recognition API.
For voice Synthesis:
Google JARVIS API.
Marry TTS (Text to Speech) API.
FTTS (Free Text to Speech) API.
MODELLING
The aim of modelling technique is to use the specific feature of
the speaker for creating speaker models. The speaker modelling
technique is basically classified as speaker recognition and
speaker identification. The speaker identification technique
defines who is speaking on basis of individual information
obtained from speech signal. The speaker recognition is further
divided into two parts i.e. speaker dependent and speaker
independent. In the speaker independent mode of the speech
recognition the computer ignore the speaker specific
characteristics of the speech signal and extract the useful
message. On the other hand in case of speaker recognition
machine should extract speaker characteristics in the acoustic
signal. Then comparison of speech signal from an unknown
speaker to a database of known speaker has been done.
Speaker recognition can also be divided into two methods, text-
dependent and text independent methods. In text dependent
method the speaker speaks key words or sentences having the
same text for both training and testing trials whereas text
independent does not rely on a specific texts being spoken [8].
Following are the methods used in speech recognition process
are as follows:
i). Pattern Recognition approach: A speech pattern
representation can be in the form of a speech template or a
statistical model (e.g., a HIDDEN MARKOV MODEL or
HMM) and can be applied to a sound (smaller than a word), a
word, or a phrase. A pattern recognition has been developed
over two decades and received much attention and applied
widely in many practical problem .It involves two essential
steps namely pattern training and pattern comparison.
ii). the acoustic-phonetic approach: This method has been
studied and used for more than 40 years. This approach is based
upon theory of acoustic phonetics and postulates [10]. The
work done before to speech recognition were based on finding
speech sounds and providing appropriate labels to these
sounds.
iii).Learning based approaches: To overcome the
disadvantage of the HMMs machine learning methods which
was introduced in neural networks and genetic algorithm
programming learning based approaches has been taken. In
learning based approaches, they can be learned automatically
through emulations or evolutionary process.
iv) Knowledge based approaches: The guidance should be
taken from an expert knowledge about variations in speech is
hand coded into a system. This approach gives the advantage
of explicit modelling but this situation is difficult to obtain and
cannot used successfully. Knowledge based approach uses the
information regarding linguistic, phonetic and spectrogram.
v) Artificial intelligence approach: The artificial intelligence
approach coordinate the recognition procedure according to the
person who applies tithe intelligence of a person such as
visualizing, analysing etc. are used for making a decision on
the measured acoustic features. The Artificial Intelligence
approach [13] is a hybrid of the acoustic phonetic approach and
pattern recognition approach. In its pure form, knowledge
engineering design involves the direct and explicit
incorporation of expert’s speech knowledge into a recognition
system.
USE CASE DIAGRAM
PROBLEM STATEMENT
Speech recognition technology is one from the fast growing
engineering technologies. It has a number of applications in
different areas and provides potential benefits. Nearly 20%
people of the world are suffering from various disabilities;
many of them are blind or unable to use their hands effectively.
The speech recognition systems in those particular cases
provide a significant help to them, so that they can share
information with people by operating computer through voice
input. This project is designed and developed keeping that
factor into mind, and a little effort is made to achieve this aim.
Our project is capable to recognize the speech and convert the
input audio into text; it also enables a user to perform
operations such as “save, open, exit” a file by providing voice
input. It also helps the user to open different system software
such as opening Ms-paint, notepad and calculator. At the initial
level effort is made to provide help for basic operations as
discussed above, but the software can further be updated and
enhanced in order to cover more operations.
FEASIBILITY STUDY
(i) Gathering information needs: Automated speech
recognition (ASR) is used in many areas of medicine today.
However, not many studies have evaluated the usefulness of
ASR applications for capturing clinician information needs in
noisy environments. We evaluated 72 ASR transcribed
clinician-generated questions and assessed them for semantic
and syntactic errors. The results showed that basic user training
is not sufficient in order to capture the semantics of recordings.
(ii). Training delivery: This feasibility study examined the
possibility of using an independent voice recognition system
as the input device during a training delivery requirement. The
intent was to determine whether the voice recognition system
could be incorporated into a training delivery system designed
to train students how to use the Communications Electronics
Operating Instructions manual, a tool used for communicating
over the radio network during military operations.
This study showed how the voice recognition system worked
in an integrated voice based delivery system for the purpose
of delivering instruction. An added importance of the study
was that the voice system was an independent speech
recognition system. At the time this study was conducted,
there did not exist a reasonably priced speech recognition
system that interfaced with both graphics and authoring
software which allowed any student to speak to the system
without training the system to recognize the individual
student's voice. This feature increased the usefulness and
flexibility of the system.
PROJECT MANAGEMENT ISSUES
PROJECT MANAGEMENT
i. Before:
Team member selection and definition of roles for
each member
Continue our ongoing research, provide theoretical
results to decide methods to be used during the
project
E-mail discussion between members, share results
and information
ii. During:
A meeting at the beginning of every week
Development of the software, primarily real time
recognition phase
Preparation the project documentation
Preparation of reports
Software packaging, wrapping of libraries etc.
iii. After:
Publication of the results
Further joint research between the team members.
IMPLEMENTATION & TESTING
SYSTEM REQUIREMENTS
Minimum requirements
-Pentium 200 MHz processor
-64 MB of RAM
-Microphone
-Sound card
Best requirements
-1.6 GHz Processor
-128 MB or more of RAM
-Sound cards with very clear signals
-High quality microphones
Hardware Requirements
(i). Sound cards Speech requires relatively low bandwidth,
high quality 16 bit sound card will be better enough to work
[13]. Sound must be enabled, and proper driver should be
installed. Sound cards with the 'cleanest' A/D (analog to digital)
conversions are recommended, but most often the clarity of the
digital sample is more dependent on the microphone quality
and even more dependent on the environmental noise. Some
speech recognition systems might require specific sound cards.
(ii). Microphones A quality microphone is key when utilizing
the speech recognition system. Desktop microphones are not
suitable to continue with speech recognition system, because
they have tendency to pick up more ambient noise. The best
choice, and most common is the headset style. It allows the
ambient noise to be minimized, while allowing you to have the
microphone at the tip of your tongue all the time. Headsets are
available without earphones and with earphones (mono or
stereo).
Computer/ Processors Speech recognition applications can be
heavily dependent on processing speed. This is because a large
amount of digital filtering and signal processing can take place
in ASR.
INTERFACES
Opening Software is opened by using command prompt, after
selecting JDK, bin Directory respectively, code file is compiled
and then run.
Running File Menu Software is able to perform different
operations, the following screen shot show the contents of file
menu, currently ‘NEW’ menu is active, and whenever new
menu is clicked the new text area will be appeared where a use
can perform operation on text through both keyboard and voice.
Saving Document After writing on the software’s text area
through keyboard or voice, user can save it in particular
directory where he wants to save. The save operation can be
performed by both using keyboard/mouse or voice input .
Particular directory screen shot, here we have assumed that the
directory is to be “My Documents” .
Opening Document Any document of text format can be
opened located any directory in the computer. If a user is
working on text area of the software and suddenly he want to
open a new document then the automatic option will be
provided to save current work before opening a new document.
After selecting opening option user can browse to it’s desired
directory. Following is an assumption of my documents
directory.
Edit Menu The edit menu consist a number of options for
different operations, cut, copy, paste, select all, font size and
font style can be viewed from following screen shot.
Cut Option Cut option in the software is similar to that present
in all other. Particular selected text can be deleted from a
location and can be pasted any where else in the document.
Copy Option Performs the copy option, that is copy text from
one location of document to another location
Paste Paste the text onto the selected location of the software
in the document, that were copied by using both cut and copy
options
Select All Select all the text currently present on the text area
of the notepad.
Font Size Sets the selected/modified size of the text
Font Style Sets the user specified font style of text
Running System Commands Through this option the software
will run different softwares based on the identification of
spoken speech.
Text Through Voice After enabling this option the software
would be capable to record human speech and convert it into
the text and output it in written form based on identification of
input speech.
Verifying Text Through this option a user can verify the text;
the system will verify text in the form of speech.
WORKING
This software is designed to recognize the speech and also has
the capabilities for speaking and synthesizing means it can
convert speech to text and text to speech. This software named
‘VOICE BASED ASISTANT’ has the capability to recognize
your commands like Search on Wikipedia, Google, and
YouTube etc. This software is capable of opening windows
software such as notepad, Ms Paint, calculator through voice
input. This software is capable of search on Wikipedia, Google,
and YouTube through voice input. The synthesize part of this
software helps in speaking the result gathered from Wikipedia.
CODING
CODE FOR VOICE RECOGNITION
public void RecognizeWikipedia() throws IOException, LineUnavailableException{
duplex.setLanguage("en");
syn.setLanguage("en");
duplex.recognize(mic.getTargetDataLine(), mic.getAudioFormat());
new Thread(){
@Override
public void run(){
duplex.addResponseListener(new GSpeechResponseListener() {
String old_text = "";
String output;
@Override
public void onResponse(GoogleResponse gr) {
output = gr.getResponse();
if (gr.getResponse() == null) {
old_text = jTextField1.getText();
if (old_text.contains("(")) {
old_text = old_text.substring(0, old_text.indexOf('('));
}
System.out.println("Paragraph Line Added");
old_text = (jTextField1.getText() + "\n");
old_text = old_text.replace(")", "").replace("( ", "");
jTextField1.setText(old_text);
new Thread()
{
@Override
public void run()
{
if(old_text.contains("about"))
{
String[] s=old_text.split("about");
s[1]=s[1].trim();
if(!s[1].isEmpty())
{
String title=s[1];
try
{
result=wiki.getResult(title);
char[] ch=result.toCharArray();
t = new Timer(50, new ActionListener() {
int i=0;
@Override
public void actionPerformed(ActionEvent ae) {
if(i<ch.length-1)
jTextArea1.append(String.valueOf(ch[i++]));
else
t.stop();
}
});
t.start();
mic.close();
mic_wiki.setIcon(mic2);
}
catch (IOException | URISyntaxException ex) {
}
}
else
{
System.err.println("i dont find any topic related to it");
}
}
else
{
try {
result=wiki.getResult(old_text);
char[] ch=result.toCharArray();
t = new Timer(50, new ActionListener() {
int i=0;
@Override
public void actionPerformed(ActionEvent ae) {
if(i<ch.length-1)
jTextArea1.append(String.valueOf(ch[i++]));
else
t.stop();
}
});
t.start();
mic.close();
mic_wiki.setIcon(mic2);
} catch (IOException | URISyntaxException ex) {
Logger.getLogger(GUI.class.getName()).log(Level.SEVERE, null,
ex);
}
}
}
}.start();
return;
}
if (output.contains("(")) {
output = output.substring(0, output.indexOf('('));
}
if (!gr.getOtherPossibleResponses().isEmpty()) {
output = output + " (" + (String) gr.getOtherPossibleResponses().get(0) + ")";
}
System.out.println(output);
jTextField1.setText(output);
}
});
}
}.start();
}
SOFTWARE DESIGN
HOME TAB
HOME TAB WITH SIDE BAR
WIKIPEDIA SEARCH TAB
YOUTUBE SEARCH TAB
GOOGLE SEARCH TAB:
TESTING
EXECUTION OF TEST CASES
TEST1: VOICE TEST
While providing voice input user software it
recognize spoken word and gives the expected result in text.
TEST2: SYNTHESIZER TEST
While providing text input user software it
synthesized written words correctly and gives the expected
result in voice.
TEST3: SEARCH TEST
It takes the input as a text recognised by the
recognizer and search for the result and provide appropriate
result.
RESULTS
PROJECT OBJECTIVE
• To understand the speech recognition and its
fundamentals.
• Its working and applications in different areas
• Its implementation as a desktop Application
• Development for software that can mainly be used for:
- Speech Recognition
- Speech Generation
- Text Editing
- Tool for operating Machine through voice.
GOAL ACHIEVEMENTS
-Able to write the text through both keyboard and voice
input.
-Able to play music on the basis of voice input.
-Able to search about anything on Google, Wikipedia.
- Able to play any video on You tube on voice command.
-Requires less consumption of time in writing text.
-Provide significant help for the people with disabilities.
RESULT ANALYSIS
- While providing voice input to the software it
recognized the spoken words in few attempts, this
is due to noisy environment, variation in the voice
and multiple user factor.
- we provide voice input to run the Music Player
and they work fine according to the expectations.
- By providing voice input to the software for
searching on Google, Wikipedia & You tube it
worked fine and result in expectations, without
repeating the search twice or thrice.
SWOT ANALYSIS
STRENGTH OF PROJECT
- Speech is a very natural way to interact, and it is
not necessary to sit at a keyboard or work with a
remote control.
- No training required for users!
- Increases productivity
- Can help with menial computer tasks, such as
browsing and scrolling
- Can help people who have trouble using their
hands
- Can help people who have cognitive disabilities
- Has long term benefits for students
THREATS TO THE PROJECT:
Besides all these advantages and benefits, yet a hundred
percent perfect speech recognition system is unable to be
developed. There are number of factors that can reduce
the accuracy and performance of a speech recognition
program.
Speech recognition process is easy for a human but it is
a difficult task for a machine, comparing with a human
mind speech recognition programs seems less intelligent,
this is due to that fact that a human mind is God gifted
thing and the capability of thinking, understanding and
reacting is natural, while for a computer program it is a
complicated task, first it need to understand the spoken
words with respect to their meanings, and it has to create
a sufficient balance between the words, noise and spaces.
A human has a built in capability of filtering the noise
from a speech while a machine requires training,
computer requires help for separating the speech sound
from the other sounds.
CONCLUSION & FUTURE SCOPE
CONCLUSION
This Thesis/Project work of speech recognition started
with a brief introduction of the technology and its
applications in different sectors. The project part of the
Report was based on software development for speech
recognition. At the later stage we discussed different
tools for bringing that idea into practical work. After the
development of the software finally it was tested and
results were discussed, few deficiencies factors were
brought in front. After the testing work, advantages of
the software were described and suggestions for further
enhancement and improvement were discussed.
FUTURE SCOPE
This work can be taken into more detail and more work
can be done on the project in order to bring modifications
and additional features. The current software doesn’t
support a large vocabulary, the work will be done in
order to accumulate more number of samples and
increase the efficiency of the software.