Speech Recognition
BATCH 10
             Team
Guide: Mr. VSVS Murthy
Team members:
M Siva Badarinath   1215316025
                                 Team
G Manish Reddy      1215316015
M Sumanth           1215316027
Y Raviteja          1215316059
   Project Brief:
Technlogies used:
Python
libraries: gTTs, google Speech Recognition, PyAudio
          Abstract:
• The aim of project is to convert speech to text and browse
  the internet with voice commands
• (much like google assistant)
• Read the Text documents in english language and convert it
  into to Voice.
• To make Browising the net more comfortable and access
  the documents more eaiser.
gTTS : Google Text to Speech
      • a Python library and tool to interface with Google Translate’s
        text-to-speech API.
      • Writes spoken mp3 data to a file, a file-like object (bytestring) for
        further audio manipulation, or stdout.
      • It features flexible pre-processing and tokenizing, as well as
        automatic retrieval of supported languages
PyAudio :
      • PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library.
      • With PyAudio, you can easily use Python to play and record audio on a variety of platforms.
      • PyAudio is inspired by:
      • pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.
Speech Recognition
• Speech Recognition is an important feature in several applications used such as
  home automation, artificial intelligence, etc.
• This aims to provide an introduction on how to make use of the
  SpeechRecognition library of Python
• This is useful as it can be used on microcontrollers such as Raspberri Pis with
  the help of an external microphone
    Outcome:
• Can detect Different Voices
• generates audio files from the given text
• Reduces the time for Browsing or typing files
Conclusion 1
We have analysed and looked into the various parameters in order to take the
conventional voice engines to the next level. Our model demonstrates how the voice
engine be used to perform real-time activities such as controlling of appliances with
mere voice commands and no physical movement. The model is currently developed on
the base version of Android Froyo 2.2 and is possible to make it compatible to the later
versions by following certain procedures.
The principle of DTMF signals to demonstrate the automation of electrical appliances
has been followed as the hardware setup is cost effective, and any individual can
implement the same without much expenditure.
                                                      Author: P. Magesh Kannan
                                                     Date of Conference: 28-30 Aug. 2014
 Conclusion 2
In this paper the results of the time optimization of the real-time speaker recognition
system were presented. The obtained parameters prove that the system accuracy can be
held on the same level (or even increased) while reducing the number of computations
related to the MFCC (8 time less FFT computations) and GMM (twice less computations)
algorithms. As a consequence, the sampling rate can be increased to provide more
accurate real time speaker recognition (more than 10 %).
                                                             Author :Radosław Weychan
                                                        Date of Conference: 23-25 Sept. 2015
conclusion 3
This research project has developed a technique on converting a
text image directly to speech using Python and Raspberry Pi3
minicomputer. The hardware provides a portable and economical
way of converting an image to text. Our method is more reliable
than others as Tesseract OCR has an accuracy of 99% and eSpeak
uses two methods to read out the image with more human
compassion.
                                                                               H
                                                Author
                                                    asan U. Zaman
                                         Date of Conference: 26-28 Oct. 2018
 Conclusion 4
In this work an end to end speech to text conversion model using neural networks is
implemented. Techniques such as max pooling and batch normalization are used to
further optimize the model and boost its accuracy. The process of porting the trained
model to a Raspberry pi is explained. The usage of these kind of neural network models
is confined to the labels used in the dataset. Better datasets with more labels and
inclusion of various accents improve the application efficiency
                                                       Author :A. Pardha Saradhi
                                                      Date of conference :6, April 2019
THANK YOU