0% found this document useful (0 votes)
12 views14 pages

ASR Development for Indian Languages

Uploaded by

Sanath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

ASR Development for Indian Languages

Uploaded by

Sanath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Building an Automatic

Speech Recognition (ASR)

2/20/2024 1
Sound Wave
What is ASR ?

• Speech Recognition - Automatic Speech Recognition is the


process by which a computer maps an acoustic speech
signal to text.

• The task of automatic speech recognition is the task of


converting any speech signal into its orthographic
representation.

• A text-to-speech (TTS) system converts normal language


text into speech; other systems render symbolic linguistic
representation like phonetic transcription into speech.

2/20/2024 3
Applications
• Speech to speech translation for a pair of Indian Languages

• Command and control application

• Multimodal interfaces to the computer in Indian languages

• E-mail and sms readers over the telephone

• Readers for the visually impaired

• Speech enabled Office Suite

• Entertainment productions such as games and animations

2/20/2024 4
Steps to Destination

• Preparation of Speech Data Set


• Collection of Data
• Segmentation of Data
• Warehouse the Data
• Annotation of Data

2/20/2024 5
What is Speech Corpus ?

• A speech corpus is a database of speech audio


files and text transcriptions in a format that can
be used to create acoustic model (which can
then be used with a speech recognition engine).

• Data in audio files having a programming


adopted format.

2/20/2024 6
Why do we need Speech
Corpus?

• To develop tools that facilitate collection of high


quality speech data .
▪ Collect data that can be used for building speech
recognition. speech synthesis and provide speech-
to-speech translation from one language to
another language spoken in India (including Indian
English).

2/20/2024 7
Speech Data Set

• Preplanned written data

• Contains more than nine sections

• Metadata of data set

2/20/2024 8
An e.g.(LDC-IL) Speech Dataset
1. Date Format – 2 (D)

2. Command and Control Words – 250 (W1)

3. Proper Nouns 412 place and 412 person names – 824 (W2)

4. Most Frequent Words- 1000 (W3)

5. Form and Function Words- 200 (W5)

6. News domain: news, editorial, essay – each above 500 words – 450 (T1)

7. Phonetically Balanced Vocabulary – 800 (S)

8. Phonetically Balanced Sentences – 500 (W4)

9. Connected Text created using phonetically balanced vocabulary – 6 (T2)

2/20/2024 9
Metadata of dataset

1. Name and Address of the Speaker


2. Dataset ID
3. Gender
4. Age
5. Educational Qualification
6. Place of Elementary Education
7. Mother tongue
8.Place
9. Region
10. Investigator Information

2/20/2024 10
Number of Speakers

• Data will be collected from minimum of 450


speakers (225 Male and 225 Female) of each
language.

M---F
• Age group-16 to 20--------24+24= 48
21 to 50--------135+135=270
51 above-------66+66= 132

2/20/2024 11
Speech Segmentation
Segmentation of data:
• Collected speech data is in a continuous form and
hence it has to be segmented as per the various
content types. i.e., Text, Sentences, Words.

Segmentation guideline

2/20/2024 12
Segmentation tools:
• Wave Surfer and PRAAT are the tools used for
segmentation of speech data.

Warehousing:
• After segmenting the data according to the various
content types, it has to be warehoused properly.
The data has to be warehoused for each content
type, using the Meta data information.
2/20/2024 13
2/20/2024 14

You might also like