0% found this document useful (0 votes)
20 views18 pages

Final Slide

The document presents a speech recognition system that aims to improve accuracy and accessibility across various languages and accents. It outlines the challenges faced by existing technologies, the objectives of the project, and the development methodology, including the use of algorithms like Hidden Markov Models and Dynamic Time Warping. The expected outcome is a more intuitive user experience that promotes inclusivity and adapts to individual speech patterns.

Uploaded by

Abhishek Khadka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

Final Slide

The document presents a speech recognition system that aims to improve accuracy and accessibility across various languages and accents. It outlines the challenges faced by existing technologies, the objectives of the project, and the development methodology, including the use of algorithms like Hidden Markov Models and Dynamic Time Warping. The expected outcome is a more intuitive user experience that promotes inclusivity and adapts to individual speech patterns.

Uploaded by

Abhishek Khadka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

SPEECH

RECOGNITION
SYSTEM
PRESENTERS

Nirajan
Khadka Saugat
Abhishek Dahal
Khadka

Presentation On Speech Recognition System 2


Table of Contents:
1. Introduction
2. Problem Statement
3. Objectives
4. Development Methodology
5. Working Mechanism
6. Algorithms Used
7. Challanges
8. Expected Outcome

01/21/2025 Presentation On Speech Recognition System 3


1. INTRODUCTION

• Speech recognition technology has recently reached a higher level of performance


and robustness, allowing it to communicate to another user by talking .

• A Process that enables the computers to recognize and translate spoken language
into text. It is also known as "automatic speech recognition" (ASR), "computer
speech recognition", or just "speech to text" (STT).

• Speech Recognition is process of decoding acoustic speech signal captured by


microphone or telephone ,to a set of words.

4
2. PROBLEM STATEMENT
• Existing speech recognition technology falls short in delivering the
required accuracy, adaptability, and accessibility across various
applications and languages.

• The key issues include inaccuracies in transcription due to accents and


noise, a lack of adaptability to specialized domains and languages, and
limited accessibility for individuals with hearing impairments and those
in resource-constrained areas.

• This project aims to tackle these challenges by creating a superior


speech recognition system that enhances communication, efficiency, and
inclusivity across diverse user groups and applications.
3. OBJECTIVES
Objectives of the Project:

• To develop a speech recognition system capable of transcribing


spoken language across various accents, dialects, and noisy
environments.

• To build a speech recognition system capable of producing the


results with great accuracy.

01/21/2025 Presentation On Speech Recognition System 6


4. DEVELOPMENT METHODOLOGY
Agile Development Approach

Fig: Phases of Agile Development


01/21/2025 Presentation On Speech Recognition System 7
5. WORKING MECHANISM
 Audio Input: The process begins with an "Audio Input," where the system receives
spoken language in the form of audio signals.
 Audio Pre-processing: Incoming audio undergoes "Audio Pre-processing" to
enhance its quality. This involves tasks like noise reduction, normalization, and
segmentation to prepare the data for analysis.
 Feature Extraction: The pre-processed audio is then subjected to "Feature
Extraction." This step converts raw audio signals into a more manageable and
informative format, typically using techniques like Mel-frequency cepstral
coefficients (MFCCs).
 Neural Network (ASR Model): The heart of the system is the "Neural Network
(ASR Model)." This deep learning architecture processes the extracted features to
decode spoken language into text. It learns to recognize phonemes, words, and
context through training on extensive datasets.
8
Contd..
 Language Models: To improve accuracy and context understanding, "Language Models"
are integrated into the system. These models help in predicting the most likely words or
phrases based on the context of the spoken words.
 Transcription Output: The final output is the "Transcription Output," where the system
provides a text-based representation of the spoken words.

Fig: Working mechanism of Speech Recognition System

9
FLOWCHART

01/21/2025 Presentation On Speech Recognition System 10


6. Algorithms Used

6. 1 Hidden Markov Models


• Machine learning method
• Makes use of state machines
• Based on probabilistic model
• Can only observe output from states, not the states themselves
• Example: speech recognition
• Observe: acoustic signals
• Hidden States: phonemes (distinctive sounds of a language)

01/21/2025 Presentation On Speech Recognition System 11


Contd.
Mathematically Interpretation of Hidden Markov Models
•Here's a simplified mathematical interpretation of HMMs for speech recognition:

•States: Let S = {S1, S2, ………., SN} represent the set of states (e.g., phonemes) in the HMM, where N
is the number of states.

•Transition Probabilities: Define A = {aij}, where aij is the probability of transitioning from state

Si to state Sj. These probabilities are typically organized into a transition matrix.

•Observations (Features): Let O represent the observed feature sequence, which consists of T
feature vectors: O = {O1, O2, ………..., OT}.

•Emission Probabilities: For each state Si, there's an emission probability distribution Bi that

describes
01/21/2025 the likelihood of observingPresentation
the feature vector
On Speech OSystem
Recognition T given the state Si: Bi (Ot) = P(O­t|Si). 12
Contd.
•Initialization: The initial state probabilities are represented by π = {πi} where π is the probability of starting in state

S i.

•Viterbi Algorithm: The Viterbi algorithm finds the best state sequence Q* = {q1*, q2*, ……, qT*} that maximizes
the joint probability:

•Q* = argmaxQP(O, Q) = argmaxQ [πq1​ qt−1​,qt⋅​ Bqt​(Ot​)]

•In practice, the Viterbi algorithm efficiently computes the most likely state sequence by maintaining a trellis of
probabilities and backtracking to find the optimal path.

01/21/2025 Presentation On Speech Recognition System 13


6. 2 D ynamic Time Wrapping A lgorithm

• Dynamic Time Warping (DTW) is a technique used in speech recognition to align and compare sequences of feature
vectors, allowing for the recognition of spoken words or phrases with varying durations

Mathematically Interpretation of Dynamic Time Warping


 Let X= {x1​, x2​,…,xN​} represent the reference sequence, and Y={y1​,y2​,…,yM} represent the input sequence.

 Define a distance or cost matrix C such that C[i][j] represents the cost of aligning xi​ from the reference sequence with y j​
from the input sequence. This cost can be computed using a distance metric (e.g., Euclidean distance).
 Create a DTW matrix D with dimensions (N+1) ×(M+1), initialized with large values.
 Calculate the DTW matrix D using dynamic programming:

• D[i][j] =C[i][j] +min(D[i−1] [j], D[i][j−1], D[i−1] [j−1])

 Backtrack through the DTW matrix to find the optimal alignment path, which represents the alignment between X and Y.
 Compute a recognition score or distance measure based on the alignment path, and make a recognition decision based on
this score.
01/21/2025 Presentation On Speech Recognition System 14
7. CHALLENGES

• Ambient Noise

• Accents and Dialects

• Speaker Variability

• Limited Vocabulary

• Context Understanding

• Real Time Processing

01/21/2025 Presentation On Speech Recognition System 15


8. EXPECTED OUTCOME
• The system aims to enhance the user experience by delivering
accurate and efficient speech recognition, resulting in more
intuitive and convenient interactions with digital devices.

• By enabling voice interaction, the project promotes inclusivity


and ensures technology is accessible to a broader audience.

• It will make the technology more accessible to individuals with


disabilities and those facing language barriers. User will benefit
from a personalized experience as the system adapts to their
unique speech patterns and preference enhancing recognition
accuracy over time.
Any further inquiries you'd like to make?

01/21/2025 Presentation On Speech Recognition System 17


THANK YOU

You might also like