To Get Started
before the Course
Steven (吴 中)
Mobile: 13641865488
E-mail: stevenwuzhong@sues.edu.cn
#01 To Know You and Me
You Me
SN ID Name Chinese Name Nationality
1 027121101 AKTERN. Steven (吴 中)
2 027121102 AL-ADEMIB.
3 027121103 AURNABM.
4 027121104 BARUAS. Mobile: 13641865488
5 027121105 CHIRWAK. E-mail: stevenwuzhong@sues.edu.cn
6 027121106 DELANIEG.
7 027121107 ELM.
8 027121108 GHAFRIF.
9 027121109 HAQUEI.
10 027121110 HAQUEM.
11 027121111 HOSSAINM.A.
12 027121112 HOSSAINM.S.
13 027121113 HUBBLEA.
14 027121114 JAHIDS.
15 027121115 JALLAHW.
16 027121116 KAMALI.
17 027121117 KAULAC.
18 027121118 KOLISONC.
19 027121119 LAMTOUEHR.
20 027121120 M’KADDAMY.
21 027121121 MAHMUDM.
22 027121122 MOSHAROFM.
23 027121123 PAVELM.
24 027121124 PONEMASHO.
25 027121125 SABBIRM.
26 027121126 SHAIKATM.
27 027121127 SHAKERINM.
2024/3/1 28 027121128 TOKPAHE. Shanghai University of Engineering Science 2
29 037121119 ISKAKOVK.
#02 Reference Materials
Spoken Language Processing Speech and Language Processing Automatic Speech Recognition
Spoken Language Processing:
A Guide to Theory, Algorithm, and System Development
2024/3/1 Shanghai University of Engineering Science 3
#03 Contents for Theory and Experiments
Theory 16 Lectures + Experiments 8 Lectures
l Introduction 1 l Spectrum Analysis 1
l Fundamental Theory 2 l Spectrogram and MFCC 1
l Speech Features 2 l Observation Probability 1
l Hidden Markov Model 4 l Optimal State Path 1
l Language Model 2 l N-Gram 1
l DNN for SR 4 l DNN for SR 3
l Course Review and Q&A 1
2024/3/1 Shanghai University of Engineering Science 4
#04 Timeline in the Semester
l 12 weeks, delivered live on from now (1st week).
l 2 lectures per week: Tuesday, and Friday
The 13th week: final exam
February March April May
5/17: the last Lecture
2/27: the 1st Lecture
2024/3/1 Shanghai University of Engineering Science 5
#05 Assessment
l Attendance worth 10%
l Homework worth 20%
l Experiment worth 20%
l Final Exam worth 50%
2024/3/1 Shanghai University of Engineering Science 6
Chapter 1
Introduction
Steven (吴 中)
Mobile: 13641865488
E-mail: stevenwuzhong@sues.edu.cn
Lecture 01: Objectives
⭐ What is Speech Recognition?
⭐ A Typical Speech Recognition System
⭐ Speech Recognition Components
⭐ History of Speech Recognition
2024/3/1 Shanghai University of Engineering Science 8
#01 What is Speech Recognition?
Speech
Hi, Siri!
l Speech-to-Text transcription (STT)
l Transform recorded audio into w sequence of words.
l Just the words, no meaning… but do need to deal with acoustic
ambiguity: “Recognize speech?” or “Wreck a nice beach?”
2024/3/1 Shanghai University of Engineering Science 9
#01 What is Speech Recognition?
2024/3/1 Shanghai University of Engineering Science 10
#02 An Example of Speech Recognition
今天天气很好
Word seq. 今天 天气 很 好 Language
Model
P(W)
Phoneme j in1 t ian1 t ian1 q i1 h en2 h ao3
Acoustic
States s" s# s$ s% ⋯⋯⋯ Model
P(O|W)
Features
2024/3/1 Shanghai University of Engineering Science 11
#02 An Example of Speech Recognition
今天天气很好
今天 天气 很 好 Lexicon:
今天 j in1 t ian1
天气 t ian1 q i1
很 h en2
j in1 t ian1 t ian1 q i1 h en2 h ao3 好 h ao3
HMM s" s$ s% s& s'
⋯⋯⋯ s" s$ s% s& s'
Stochastic
• Speak what? o" o$ o% o& o' o* o+ o, o(
⋯⋯⋯⋯⋯
• How to speak?
2024/3/1 Shanghai University of Engineering Science 12
#03 What is “Automatic” SR?
l Computer recognition of speech
l Enabling a computer to “recognize” what was spoken
l Usually understood as the ability to faithfully transcribe what was spoken
l Something even humans cannot do often
l More completely, the ability to understand what was spoken
l Which humans do extremely well
2024/3/1 Shanghai University of Engineering Science 13
#04 Why Speech?
l Most natural form of human communication
l With modern telephones, people can communicate over long distances
l For natural-machine interaction, like voice search
l For spoken document processing: like speech mining and retrieval
l For fun: artificial and intelligent robot that talks like humans
l Voice command can free hands and eyes for other tasks
l Especially in cars, where hands and eyes are busy
2024/3/1 Shanghai University of Engineering Science 14
#05 A Framework of ASR
Lexicon
Feature
Decoder Text
Extraction
Speech Acoustic Acoustic Language Language Text
Corpora Modeling Model Model Modeling Corpora
2024/3/1 Shanghai University of Engineering Science 15
#05 A Framework of ASR
2024/3/1 Shanghai University of Engineering Science 16
#06 Hierarchical Modelling of Speech
l We generally represent recorded speech as a sequence of acoustic feature vectors
(observations) X, and the output word sequence as W
l At recognition time, our aim is to find the most likely W, given X
l To achieve this, statistical models are trained using a corpus (Xn, Wn)
Use an acoustic model, language model, and lexicon
to obtain the most probable word sequence W∗ given
the observed acoustics X
W ∗ = arg max P(W|X)
W
2024/3/1 Shanghai University of Engineering Science 17
#06 Hierarchical Modelling of Speech
2024/3/1 Shanghai University of Engineering Science 18
#07 Fundamental Equation of Statistical SR
If X is the sequence of acoustic feature vectors (observations) and W denotes a word
sequence, the most likely word sequence W∗ is given by
W ∗ = arg max P(W|X)
W
Applying Bayes’ Theorem:
. /0 1(2)
P WX = ∝ 5 / 0 6(0)
.(3)
W ∗ = arg max P X W P(W)
W
Acoustic Model Language Model
2024/3/1 Shanghai University of Engineering Science 19
#08 ASR History
Template DTW DNN End-to-End:
GMM-HMM DNN-HMM
Models Matching- VQ CTC
10 words RNN-T
Attention
Transformer
Time 1950s 1970s 1980~1990s 2006 2012 2015~
Isolated word Continuous Speech Complex Scenarios
Phase I: Phase II: Phase III:
Template Matching Statistical Models Deep Learning
2024/3/1 Shanghai University of Engineering Science 20
#09 Further Reading…
Chapter 1 in Spoken Language Processing
2024/3/1 Shanghai University of Engineering Science 21
#09 Further Reading…
Chapter 1 in Automatic Speech Recognition A Historical Perspective of Speech Recognition
A Deep Learning Approach
2024/3/1 Shanghai University of Engineering Science 22
#10 Coursework
l What is the input and the output in a typical ASR system?
l What are the main parts in ASR? Please draw the Framework of ASR.
l What are the three phases in ASR History?
2024/3/1 Shanghai University of Engineering Science 23
THANK YOU