0% found this document useful (0 votes)

8 views24 pages

Lecture01 Introduction

The document outlines a course on Speech Recognition at Shanghai University of Engineering Science, detailing the schedule, assessment criteria, and course content. It includes a list of students, reference materials, and a framework for Automatic Speech Recognition (ASR). The course spans 12 weeks with lectures and experiments focusing on various aspects of speech processing and recognition.

Uploaded by

Shubrata Barua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views24 pages

Lecture01 Introduction

Uploaded by

Shubrata Barua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

To Get Started

before the Course

Steven (吴中)

Mobile: 13641865488
E-mail: stevenwuzhong@sues.edu.cn
#01 To Know You and Me
You Me
SN ID Name Chinese Name Nationality
1 027121101 AKTERN. Steven (吴中)
2 027121102 AL-ADEMIB.
3 027121103 AURNABM.
4 027121104 BARUAS. Mobile: 13641865488
5 027121105 CHIRWAK. E-mail: stevenwuzhong@sues.edu.cn
6 027121106 DELANIEG.
7 027121107 ELM.
8 027121108 GHAFRIF.
9 027121109 HAQUEI.
10 027121110 HAQUEM.
11 027121111 HOSSAINM.A.
12 027121112 HOSSAINM.S.
13 027121113 HUBBLEA.
14 027121114 JAHIDS.
15 027121115 JALLAHW.
16 027121116 KAMALI.
17 027121117 KAULAC.
18 027121118 KOLISONC.
19 027121119 LAMTOUEHR.
20 027121120 M’KADDAMY.
21 027121121 MAHMUDM.
22 027121122 MOSHAROFM.
23 027121123 PAVELM.
24 027121124 PONEMASHO.
25 027121125 SABBIRM.
26 027121126 SHAIKATM.
27 027121127 SHAKERINM.
2024/3/1 28 027121128 TOKPAHE. Shanghai University of Engineering Science 2
29 037121119 ISKAKOVK.
#02 Reference Materials
Spoken Language Processing Speech and Language Processing Automatic Speech Recognition

Spoken Language Processing:

A Guide to Theory, Algorithm, and System Development

2024/3/1 Shanghai University of Engineering Science 3

#03 Contents for Theory and Experiments

Theory 16 Lectures + Experiments 8 Lectures

l Introduction 1 l Spectrum Analysis 1

l Fundamental Theory 2 l Spectrogram and MFCC 1

l Speech Features 2 l Observation Probability 1

l Hidden Markov Model 4 l Optimal State Path 1

l Language Model 2 l N-Gram 1

l DNN for SR 4 l DNN for SR 3

l Course Review and Q&A 1

2024/3/1 Shanghai University of Engineering Science 4

#04 Timeline in the Semester
l 12 weeks, delivered live on from now (1st week).
l 2 lectures per week: Tuesday, and Friday
The 13th week: final exam

February March April May

5/17: the last Lecture

2/27: the 1st Lecture

2024/3/1 Shanghai University of Engineering Science 5

#05 Assessment

l Attendance worth 10%

l Homework worth 20%

l Experiment worth 20%

l Final Exam worth 50%

2024/3/1 Shanghai University of Engineering Science 6

Chapter 1

Introduction

Steven (吴中)

Mobile: 13641865488
E-mail: stevenwuzhong@sues.edu.cn
Lecture 01: Objectives

⭐ What is Speech Recognition?

⭐ A Typical Speech Recognition System

⭐ Speech Recognition Components

⭐ History of Speech Recognition

2024/3/1 Shanghai University of Engineering Science 8

#01 What is Speech Recognition?

Speech

Hi, Siri!

l Speech-to-Text transcription (STT)

l Transform recorded audio into w sequence of words.
l Just the words, no meaning… but do need to deal with acoustic
ambiguity: “Recognize speech?” or “Wreck a nice beach?”

2024/3/1 Shanghai University of Engineering Science 9

#01 What is Speech Recognition?

2024/3/1 Shanghai University of Engineering Science 10

#02 An Example of Speech Recognition
今天天气很好

Word seq. 今天天气很好 Language

Model
P(W)
Phoneme j in1 t ian1 t ian1 q i1 h en2 h ao3

Acoustic
States s" s# s$ s% ⋯⋯⋯ Model
P(O|W)

Features

2024/3/1 Shanghai University of Engineering Science 11

#02 An Example of Speech Recognition
今天天气很好

今天天气很好 Lexicon:
今天 j in1 t ian1
天气 t ian1 q i1
很 h en2
j in1 t ian1 t ian1 q i1 h en2 h ao3 好 h ao3

HMM s" s$ s% s& s'

⋯⋯⋯ s" s$ s% s& s'

Stochastic

• Speak what? o" o$ o% o& o' o* o+ o, o(

⋯⋯⋯⋯⋯

• How to speak?

2024/3/1 Shanghai University of Engineering Science 12

#03 What is “Automatic” SR?

l Computer recognition of speech

l Enabling a computer to “recognize” what was spoken

l Usually understood as the ability to faithfully transcribe what was spoken

l Something even humans cannot do often

l More completely, the ability to understand what was spoken

l Which humans do extremely well

2024/3/1 Shanghai University of Engineering Science 13

#04 Why Speech?
l Most natural form of human communication

l With modern telephones, people can communicate over long distances

l For natural-machine interaction, like voice search

l For spoken document processing: like speech mining and retrieval

l For fun: artificial and intelligent robot that talks like humans

l Voice command can free hands and eyes for other tasks

l Especially in cars, where hands and eyes are busy

2024/3/1 Shanghai University of Engineering Science 14

#05 A Framework of ASR

Lexicon

Feature
Decoder Text
Extraction

Speech Acoustic Acoustic Language Language Text

Corpora Modeling Model Model Modeling Corpora

2024/3/1 Shanghai University of Engineering Science 15

#05 A Framework of ASR

2024/3/1 Shanghai University of Engineering Science 16

#06 Hierarchical Modelling of Speech

l We generally represent recorded speech as a sequence of acoustic feature vectors

(observations) X, and the output word sequence as W
l At recognition time, our aim is to find the most likely W, given X
l To achieve this, statistical models are trained using a corpus (Xn, Wn)

Use an acoustic model, language model, and lexicon

to obtain the most probable word sequence W∗ given
the observed acoustics X

W ∗ = arg max P(W|X)

2024/3/1 Shanghai University of Engineering Science 17

#06 Hierarchical Modelling of Speech

2024/3/1 Shanghai University of Engineering Science 18

#07 Fundamental Equation of Statistical SR
If X is the sequence of acoustic feature vectors (observations) and W denotes a word
sequence, the most likely word sequence W∗ is given by

W ∗ = arg max P(W|X)

W
Applying Bayes’ Theorem:

. /0 1(2)
P WX = ∝ 5 / 0 6(0)
.(3)

W ∗ = arg max P X W P(W)

W
Acoustic Model Language Model

2024/3/1 Shanghai University of Engineering Science 19

#08 ASR History

Template DTW DNN End-to-End：

GMM-HMM DNN-HMM
Models Matching- VQ CTC
10 words RNN-T
Attention
Transformer

Time 1950s 1970s 1980~1990s 2006 2012 2015~

Isolated word Continuous Speech Complex Scenarios

Phase I: Phase II: Phase III:

Template Matching Statistical Models Deep Learning

2024/3/1 Shanghai University of Engineering Science 20

#09 Further Reading…

Chapter 1 in Spoken Language Processing

2024/3/1 Shanghai University of Engineering Science 21

#09 Further Reading…
Chapter 1 in Automatic Speech Recognition A Historical Perspective of Speech Recognition
A Deep Learning Approach

2024/3/1 Shanghai University of Engineering Science 22

#10 Coursework

l What is the input and the output in a typical ASR system?

l What are the main parts in ASR? Please draw the Framework of ASR.

l What are the three phases in ASR History?

2024/3/1 Shanghai University of Engineering Science 23

THANK YOU

Fundamentals of Speech Recognitiony - Lawrence Rabiner - Biing-Hwang Juang PDF
No ratings yet
Fundamentals of Speech Recognitiony - Lawrence Rabiner - Biing-Hwang Juang PDF
546 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
224s 22 Lec1
No ratings yet
224s 22 Lec1
31 pages
ASR Insights for NLP Students
No ratings yet
ASR Insights for NLP Students
22 pages
Speech Recognition & Sentiment Analysis
No ratings yet
Speech Recognition & Sentiment Analysis
23 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Synopsis
No ratings yet
Synopsis
5 pages
Lectures 1 Rabiner Speech Processing
No ratings yet
Lectures 1 Rabiner Speech Processing
77 pages
Corpus-Based Methods in Language and Speech Processing
No ratings yet
Corpus-Based Methods in Language and Speech Processing
246 pages
Artificial Intelligence in Voice Recognition
No ratings yet
Artificial Intelligence in Voice Recognition
14 pages
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
No ratings yet
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
4 pages
Speech Recognition for Developers
No ratings yet
Speech Recognition for Developers
38 pages
Automatic Speech Recognition Documentation
No ratings yet
Automatic Speech Recognition Documentation
24 pages
A Report On
No ratings yet
A Report On
35 pages
Standfordsd Speech Recognition
No ratings yet
Standfordsd Speech Recognition
4 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
NLP 1.3.1 - Speed Recogmnition
No ratings yet
NLP 1.3.1 - Speed Recogmnition
20 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
30 pages
Research Paper
No ratings yet
Research Paper
9 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Automatic Speech Recognition A Comprehen
No ratings yet
Automatic Speech Recognition A Comprehen
27 pages
Neural Networks for Speech Recognition
No ratings yet
Neural Networks for Speech Recognition
7 pages
ASRcourse DSBA
No ratings yet
ASRcourse DSBA
100 pages
ASRcourse MOSIG2024
No ratings yet
ASRcourse MOSIG2024
97 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
HG3052 SpeechSynthesisAndRecognition Lecture 10 Update2019-20
No ratings yet
HG3052 SpeechSynthesisAndRecognition Lecture 10 Update2019-20
49 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems
No ratings yet
Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems
37 pages
Speech Recognition Introduction
No ratings yet
Speech Recognition Introduction
8 pages
Survey on Speech Recognition Systems
No ratings yet
Survey on Speech Recognition Systems
2 pages
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
No ratings yet
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
9 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
28 pages
ASR Course for Informatics Students
No ratings yet
ASR Course for Informatics Students
43 pages
ABSTRACT Seminar
No ratings yet
ABSTRACT Seminar
5 pages
ASR2018
No ratings yet
ASR2018
40 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
01 Introduction To Digital Speech Processing
No ratings yet
01 Introduction To Digital Speech Processing
28 pages
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
No ratings yet
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
63 pages
Mybokk
No ratings yet
Mybokk
20 pages
Booklet 2 Unit 4 English
No ratings yet
Booklet 2 Unit 4 English
37 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
End-to-End Automatic Speech Recognition
No ratings yet
End-to-End Automatic Speech Recognition
19 pages
224s 22 Lec7
No ratings yet
224s 22 Lec7
50 pages
Mba-Ai Speech Technologies: Prof. Brian Mak
No ratings yet
Mba-Ai Speech Technologies: Prof. Brian Mak
56 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
International Journal of Cognitive Computing in Engineering: Harsh Ahlawat, Naveen Aggarwal, Deepti Gupta
No ratings yet
International Journal of Cognitive Computing in Engineering: Harsh Ahlawat, Naveen Aggarwal, Deepti Gupta
37 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Ibrahim 2020
No ratings yet
Ibrahim 2020
5 pages
SPeech Understanding Facebook
No ratings yet
SPeech Understanding Facebook
5 pages
ASR Models: HMM vs. RNN
No ratings yet
ASR Models: HMM vs. RNN
8 pages
Iccsee 2012 359
No ratings yet
Iccsee 2012 359
4 pages
Nlpa - East 2
No ratings yet
Nlpa - East 2
80 pages
Bibliography: Books
No ratings yet
Bibliography: Books
17 pages
The Impact of Sheng in The Implementation of
No ratings yet
The Impact of Sheng in The Implementation of
22 pages
Trend Tos
86% (7)
Trend Tos
7 pages
Tips To Improve PTE Speaking Read Aloud
No ratings yet
Tips To Improve PTE Speaking Read Aloud
3 pages
Usha Bhat: Research Scholar
No ratings yet
Usha Bhat: Research Scholar
2 pages
ISDS Debate Programe - Amity Pushp Vihar
No ratings yet
ISDS Debate Programe - Amity Pushp Vihar
15 pages
Makalah Methods in English Language Teaching
No ratings yet
Makalah Methods in English Language Teaching
13 pages
Watershed Essentials of Learning Evaluation 2020
100% (1)
Watershed Essentials of Learning Evaluation 2020
43 pages
Communication For Health and Technology
No ratings yet
Communication For Health and Technology
10 pages
Changing Instructor's Roles in Virtual Worlds: January 2008
No ratings yet
Changing Instructor's Roles in Virtual Worlds: January 2008
9 pages
Online Learning: Challenges & Insights
No ratings yet
Online Learning: Challenges & Insights
6 pages
COT Rubric Teacher IV
100% (1)
COT Rubric Teacher IV
13 pages
Annex 2 - Perceptual Learning Style Preference Questionnaire
100% (1)
Annex 2 - Perceptual Learning Style Preference Questionnaire
4 pages
Destiani, 2 (2), 201-209
No ratings yet
Destiani, 2 (2), 201-209
9 pages
Eng 150 Reviewer P1
No ratings yet
Eng 150 Reviewer P1
6 pages
TLE9-NAILCARE9-Q3-M8 - EVELYN YARIN
No ratings yet
TLE9-NAILCARE9-Q3-M8 - EVELYN YARIN
14 pages
LLMOps Market Map 2023
No ratings yet
LLMOps Market Map 2023
13 pages
Generative AI MCQ
No ratings yet
Generative AI MCQ
5 pages
Tabulated Data I. Profile of The Respondents Profile of The Respondents in Terms of Age
No ratings yet
Tabulated Data I. Profile of The Respondents Profile of The Respondents in Terms of Age
13 pages
CHAPTER 2 2024 March 21, 2022
No ratings yet
CHAPTER 2 2024 March 21, 2022
11 pages
Indian Education Reforms Overview
No ratings yet
Indian Education Reforms Overview
8 pages
Agyekum Maths Final Project Work
No ratings yet
Agyekum Maths Final Project Work
46 pages
More To Read 1 - Text 132
No ratings yet
More To Read 1 - Text 132
4 pages
Contoh Job Vacancy
0% (1)
Contoh Job Vacancy
3 pages
P H I L C S T: Philippine College of Science and Technology
No ratings yet
P H I L C S T: Philippine College of Science and Technology
13 pages
Obj. 10 TANATAP-TANA Sir Zangie OK
100% (1)
Obj. 10 TANATAP-TANA Sir Zangie OK
3 pages
Properties of Exponents Lesson Plan
No ratings yet
Properties of Exponents Lesson Plan
6 pages
Week 2 - DLP-English-8-Quarter-1-module-2
No ratings yet
Week 2 - DLP-English-8-Quarter-1-module-2
7 pages
Global Strategy Presentation Guide
No ratings yet
Global Strategy Presentation Guide
2 pages

Lecture01 Introduction

Uploaded by

Lecture01 Introduction

Uploaded by

To Get Started

before the Course

Spoken Language Processing:

2024/3/1 Shanghai University of Engineering Science 3

Theory 16 Lectures + Experiments 8 Lectures

l Introduction 1 l Spectrum Analysis 1

l Speech Features 2 l Observation Probability 1

l Hidden Markov Model 4 l Optimal State Path 1

l Language Model 2 l N-Gram 1

l DNN for SR 4 l DNN for SR 3

2024/3/1 Shanghai University of Engineering Science 4

February March April May

5/17: the last Lecture

2024/3/1 Shanghai University of Engineering Science 5

l Attendance worth 10%

l Homework worth 20%

l Experiment worth 20%

l Final Exam worth 50%

2024/3/1 Shanghai University of Engineering Science 6

⭐ What is Speech Recognition?

⭐ A Typical Speech Recognition System

⭐ Speech Recognition Components

⭐ History of Speech Recognition

2024/3/1 Shanghai University of Engineering Science 8

l Speech-to-Text transcription (STT)

2024/3/1 Shanghai University of Engineering Science 9

2024/3/1 Shanghai University of Engineering Science 10

Word seq. 今天 天气 很 好 Language

2024/3/1 Shanghai University of Engineering Science 11

HMM s" s$ s% s& s'

• Speak what? o" o$ o% o& o' o* o+ o, o(

2024/3/1 Shanghai University of Engineering Science 12

l Computer recognition of speech

l Enabling a computer to “recognize” what was spoken

l Usually understood as the ability to faithfully transcribe what was spoken

l Something even humans cannot do often

l More completely, the ability to understand what was spoken

l Which humans do extremely well

2024/3/1 Shanghai University of Engineering Science 13

l With modern telephones, people can communicate over long distances

l For natural-machine interaction, like voice search

l For spoken document processing: like speech mining and retrieval

l Especially in cars, where hands and eyes are busy

2024/3/1 Shanghai University of Engineering Science 14

Speech Acoustic Acoustic Language Language Text

2024/3/1 Shanghai University of Engineering Science 15

2024/3/1 Shanghai University of Engineering Science 16

l We generally represent recorded speech as a sequence of acoustic feature vectors

Use an acoustic model, language model, and lexicon

W ∗ = arg max P(W|X)

2024/3/1 Shanghai University of Engineering Science 17

2024/3/1 Shanghai University of Engineering Science 18

W ∗ = arg max P(W|X)

W ∗ = arg max P X W P(W)

2024/3/1 Shanghai University of Engineering Science 19

Template DTW DNN End-to-End：

Time 1950s 1970s 1980~1990s 2006 2012 2015~

Isolated word Continuous Speech Complex Scenarios

Phase I: Phase II: Phase III:

2024/3/1 Shanghai University of Engineering Science 20

Chapter 1 in Spoken Language Processing

2024/3/1 Shanghai University of Engineering Science 21

2024/3/1 Shanghai University of Engineering Science 22

l What is the input and the output in a typical ASR system?

l What are the three phases in ASR History?

2024/3/1 Shanghai University of Engineering Science 23

You might also like

Word seq. 今天天气很好 Language