0% found this document useful (0 votes)

12 views14 pages

ASR Development for Indian Languages

Uploaded by

Sanath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views14 pages

ASR Development for Indian Languages

Uploaded by

Sanath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Building an Automatic

Speech Recognition (ASR)

2/20/2024 1
Sound Wave
What is ASR ?

• Speech Recognition - Automatic Speech Recognition is the

process by which a computer maps an acoustic speech
signal to text.

• The task of automatic speech recognition is the task of

converting any speech signal into its orthographic
representation.

• A text-to-speech (TTS) system converts normal language

text into speech; other systems render symbolic linguistic
representation like phonetic transcription into speech.

2/20/2024 3
Applications
• Speech to speech translation for a pair of Indian Languages

• Command and control application

• Multimodal interfaces to the computer in Indian languages

• E-mail and sms readers over the telephone

• Readers for the visually impaired

• Speech enabled Office Suite

• Entertainment productions such as games and animations

2/20/2024 4
Steps to Destination

• Preparation of Speech Data Set

• Collection of Data
• Segmentation of Data
• Warehouse the Data
• Annotation of Data

2/20/2024 5
What is Speech Corpus ?

• A speech corpus is a database of speech audio

files and text transcriptions in a format that can
be used to create acoustic model (which can
then be used with a speech recognition engine).

• Data in audio files having a programming

adopted format.

2/20/2024 6
Why do we need Speech
Corpus?

• To develop tools that facilitate collection of high

quality speech data .
▪ Collect data that can be used for building speech
recognition. speech synthesis and provide speech-
to-speech translation from one language to
another language spoken in India (including Indian
English).

2/20/2024 7
Speech Data Set

• Preplanned written data

• Contains more than nine sections

• Metadata of data set

2/20/2024 8
An e.g.(LDC-IL) Speech Dataset
1. Date Format – 2 (D)

2. Command and Control Words – 250 (W1)

3. Proper Nouns 412 place and 412 person names – 824 (W2)

4. Most Frequent Words- 1000 (W3)

5. Form and Function Words- 200 (W5)

6. News domain: news, editorial, essay – each above 500 words – 450 (T1)

7. Phonetically Balanced Vocabulary – 800 (S)

8. Phonetically Balanced Sentences – 500 (W4)

9. Connected Text created using phonetically balanced vocabulary – 6 (T2)

2/20/2024 9
Metadata of dataset

1. Name and Address of the Speaker

2. Dataset ID
3. Gender
4. Age
5. Educational Qualification
6. Place of Elementary Education
7. Mother tongue
8.Place
9. Region
10. Investigator Information

2/20/2024 10
Number of Speakers

• Data will be collected from minimum of 450

speakers (225 Male and 225 Female) of each
language.

M---F
• Age group-16 to 20--------24+24= 48
21 to 50--------135+135=270
51 above-------66+66= 132

2/20/2024 11
Speech Segmentation
Segmentation of data:
• Collected speech data is in a continuous form and
hence it has to be segmented as per the various
content types. i.e., Text, Sentences, Words.

Segmentation guideline

2/20/2024 12
Segmentation tools:
• Wave Surfer and PRAAT are the tools used for
segmentation of speech data.

Warehousing:
• After segmenting the data according to the various
content types, it has to be warehoused properly.
The data has to be warehoused for each content
type, using the Meta data information.
2/20/2024 13
2/20/2024 14

Speech Recognition
No ratings yet
Speech Recognition
17 pages
ASR Insights for NLP Students
No ratings yet
ASR Insights for NLP Students
22 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
Speech Recognition & Sentiment Analysis
No ratings yet
Speech Recognition & Sentiment Analysis
23 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
23 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Speech Recognition in AI (COMP 334)
No ratings yet
Speech Recognition in AI (COMP 334)
26 pages
Speech Recognition
No ratings yet
Speech Recognition
10 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Speech Recognition
No ratings yet
Speech Recognition
9 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
28 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
Speech Recognition Technology in A Ubiquitous Computing Environment
No ratings yet
Speech Recognition Technology in A Ubiquitous Computing Environment
24 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
Twenty-Five Years of Evolution in Speech and Language Processing
No ratings yet
Twenty-Five Years of Evolution in Speech and Language Processing
13 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Final Report
No ratings yet
Final Report
35 pages
Session 5 - Speech Recognition
No ratings yet
Session 5 - Speech Recognition
20 pages
CASE STUDY - Speech Recognition
No ratings yet
CASE STUDY - Speech Recognition
25 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
A Report On
No ratings yet
A Report On
35 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Speech Recognition Internship Report
No ratings yet
Speech Recognition Internship Report
4 pages
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
No ratings yet
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
5 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
2208.12666v1 Feature Extraction
No ratings yet
2208.12666v1 Feature Extraction
13 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Tan Pan Hassan VoiceRecognition
No ratings yet
Tan Pan Hassan VoiceRecognition
21 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Artificial Intelligence in Voice Recognition
No ratings yet
Artificial Intelligence in Voice Recognition
14 pages
Rohit
No ratings yet
Rohit
14 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Voice Recognition & Text-to-Speech
No ratings yet
Voice Recognition & Text-to-Speech
6 pages
Ann LA2 Project
No ratings yet
Ann LA2 Project
23 pages
Survey on Speech Recognition Systems
No ratings yet
Survey on Speech Recognition Systems
2 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Personal Voice Assistant in Python
100% (1)
Personal Voice Assistant in Python
30 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Personal Voice Assistant in Python
86% (22)
Personal Voice Assistant in Python
30 pages
Wa0002.
No ratings yet
Wa0002.
10 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
Tan Pan Hassan VoiceRecognition
No ratings yet
Tan Pan Hassan VoiceRecognition
21 pages
Labs 9
No ratings yet
Labs 9
4 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
SPEECH
100% (1)
SPEECH
17 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Speech Processing
No ratings yet
Speech Processing
70 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
Transcription Guide for Linguists
100% (3)
Transcription Guide for Linguists
4 pages
Dil Sinifi Ingilizce Sinav Kelime Calisma Listesi
No ratings yet
Dil Sinifi Ingilizce Sinav Kelime Calisma Listesi
42 pages
Study Guide Hungarian With Sziszi
No ratings yet
Study Guide Hungarian With Sziszi
25 pages
Prosodic Pragmatics and Feedback
No ratings yet
Prosodic Pragmatics and Feedback
12 pages
Scientific Technical Translation Slides
No ratings yet
Scientific Technical Translation Slides
60 pages
Podcasting Tips for English Learners
No ratings yet
Podcasting Tips for English Learners
6 pages
Rev Transcription Guide
100% (1)
Rev Transcription Guide
17 pages
The Language of Business Meetings
100% (4)
The Language of Business Meetings
285 pages
Origin of Zhuyin Symbols
No ratings yet
Origin of Zhuyin Symbols
6 pages
Chart of Brief Forms - Gregg Shorthand PDF
100% (6)
Chart of Brief Forms - Gregg Shorthand PDF
18 pages
Chapter 5 Discourse and Conversation Analysis Summary
No ratings yet
Chapter 5 Discourse and Conversation Analysis Summary
2 pages
Read First - Instructions With Style Guide Formats
No ratings yet
Read First - Instructions With Style Guide Formats
11 pages
Philhist Recit
No ratings yet
Philhist Recit
9 pages
OFAD 351 5 Handling Dictation and Transcription
No ratings yet
OFAD 351 5 Handling Dictation and Transcription
15 pages
Dissertation Interviews Transcripts
100% (2)
Dissertation Interviews Transcripts
4 pages
Hindi Transcription Project
No ratings yet
Hindi Transcription Project
6 pages
Licensee Verification
No ratings yet
Licensee Verification
30 pages
Transcription Staff QC Guideline: Compensation, Rewards & Penalties
No ratings yet
Transcription Staff QC Guideline: Compensation, Rewards & Penalties
15 pages
初擬語言田野調查逐字稿標記原則：以噶瑪蘭語自然對話語料為例
No ratings yet
初擬語言田野調查逐字稿標記原則：以噶瑪蘭語自然對話語料為例
41 pages
8 First Steps in Qualitative Data Analysis - Transcribing
No ratings yet
8 First Steps in Qualitative Data Analysis - Transcribing
5 pages
Transcriber Guide
No ratings yet
Transcriber Guide
14 pages
Multimodal Learning Analysis
No ratings yet
Multimodal Learning Analysis
6 pages
Transcribing Data for Students
No ratings yet
Transcribing Data for Students
22 pages
Discourse Analysis Unit 4
No ratings yet
Discourse Analysis Unit 4
14 pages
Ha, Tran, Tran (2021)
No ratings yet
Ha, Tran, Tran (2021)
21 pages
Module 2 - Legal Office Procedure
No ratings yet
Module 2 - Legal Office Procedure
8 pages
Gregg Shorthand Quick Reference - 4up
No ratings yet
Gregg Shorthand Quick Reference - 4up
1 page
English Loanwords in Japanese: of of of of
No ratings yet
English Loanwords in Japanese: of of of of
10 pages
B.A. I Sem I Syllabus (English Comp. 2014-15)
No ratings yet
B.A. I Sem I Syllabus (English Comp. 2014-15)
1 page
Html5 Master Thesis
100% (2)
Html5 Master Thesis
7 pages

ASR Development for Indian Languages

Uploaded by

ASR Development for Indian Languages

Uploaded by

Building an Automatic

Speech Recognition (ASR)

• Speech Recognition - Automatic Speech Recognition is the

• The task of automatic speech recognition is the task of

• A text-to-speech (TTS) system converts normal language

• Command and control application

• Multimodal interfaces to the computer in Indian languages

• E-mail and sms readers over the telephone

• Readers for the visually impaired

• Speech enabled Office Suite

• Entertainment productions such as games and animations

• Preparation of Speech Data Set

• A speech corpus is a database of speech audio

• Data in audio files having a programming

• To develop tools that facilitate collection of high

• Preplanned written data

• Contains more than nine sections

• Metadata of data set

2. Command and Control Words – 250 (W1)

4. Most Frequent Words- 1000 (W3)

5. Form and Function Words- 200 (W5)

7. Phonetically Balanced Vocabulary – 800 (S)

8. Phonetically Balanced Sentences – 500 (W4)

9. Connected Text created using phonetically balanced vocabulary – 6 (T2)

1. Name and Address of the Speaker

• Data will be collected from minimum of 450

You might also like