0% found this document useful (0 votes)

81 views4 pages

Speech Coding and Phoneme Classification Using Matlab and Neuralworks

The document describes using MATLAB and NeuralWorks software to develop methods for speech coding and phoneme classification. It introduces experiments that analyze speech signals to extract the pitch frequency and formant frequencies. Segmenting the speech into short time slices allows processing each segment as a linear time-invariant system. Cepstral analysis and homomorphic filtering are used to separate the excitation source from the vocal tract transfer function, revealing the formant frequencies. These frequencies are then used to train a neural network to classify phonemes. The experiments reinforce digital signal processing concepts like the Fourier transform, windowing, filtering and spectral analysis.

Uploaded by

David Setiawan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views4 pages

Speech Coding and Phoneme Classification Using Matlab and Neuralworks

Uploaded by

David Setiawan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Speech Coding and Phoneme Classification Using MATLAB and NeuralWorks

Brett A. St. George, Ellen C. Wooten, and Louiza Sellami Department of Electrical Engineering U.S. Naval Academy Annapolis, MD 21402
Abstract - Applications involving speech coding and phonetic classification are introduced as educational tools for reinforcing signal processing concepts learned in senior level communication classes at the U.S. Naval Academy. These applications utilize the software packages MATLAB* and NeuralWorks* and are used here to explore the concepts of impulse sampling, Fourier transforms, data windowing, and homomorphic filtering. alphabet comprised of 40 distinct sounds called phonemes. In these labs many of the important properties intrinsic to the FFT and discrete sampling are introduced. For example, the Hermitian symmetry of the amplitude spectrum would become evident in experiments with various padding lengths and data windows. Further properties of the FFT as well as various filtering routines could be instituted as labs to supplement digital signal processing theory.

Introduction Speech Coding

In 1963 the Naval Academy instituted the Trident Scholar undergraduate study program to provide a select number of exceptionally capable students the opportunity to perform independent research during their senior year. As a Trident Scholar, I used MATLAB and NeuralWorks software to create a method for identifying different voice patterns within my own speech signal. Many of the MATLAB mfiles that were written could be instituted as labs that could teach students the properties of the Fast Fourier Transform (FFT) while at the same time presenting different windowing and filtering techniques. Structuring a course around such a sequence of labs would allow students to develop a system for performing speech coding and phoneme classification while reinforcing many digital signal processing concepts learned in the classroom. Because speech coding involves a process of extracting information from an analog, time-based signal, and improving the efficiency of speech recognition, many of the techniques and algorithms employed in the process inherently involve digital filtering, data windowing, and spectral analysis. In this paper, several experiments with an analog speech signal are presented that could be instituted as labs to reinforce communication theory that students are receiving in the classroom. Specifically, these experiments allow students to extract the pitch frequency of their voice as well as those frequencies that add constructively within the oral cavity. These frequencies are known as formants and can be used to classify voice data using a phonetic The flow chart depicted in Fig. 1 maps the method of our speech coding process and classification algorithm. To understand the speech coding process, it is necessary to begin with a description of the physical nature of speech. Sound is produced when air is forced from the lungs and becomes filtered by variations in the vocal tract shape to produce a speech signal [1, p. 53]. These variations in shape determine the characteristics of the filtering function that shape the frequency spectrum of the final speech signal. If this filtering function can be directly extracted from a sampled speech signal, it can be used to identify which phonetic character is being pronounced. To begin the speech coding process, suppose that the vocal tract acts as a linear time-invariant system within a
Speech
600 0 500 0 400 0 300 0 200 0 100 0 0 -10 00 -20 00 -30 00 0

Sample speech signal and write data to .WAV file

60 msec.

100

120

MATLAB
(1) (2) (3) (4) (5) (6) (7) Access .WAV file Window voice data Add zero padding Normalize to unity RMS Cepstral Analysis Homomorphic Filtering Formant estimations

NeuralWorks
(1) (2) Construct backpropagation network Train network using processing data for all phonemes Phonetically classify time slices of unknown speech signal

(3)

*MATLAB is a trademark of The MathWorks, Inc. NeuralWorks is a trademark of The NeuralWare, Inc.

Fig. 1 Flowchart of speech coding and classification algorithm

sufficiently short time slice. This is a valid assumption for a short segment of speech where the vocal tract shape is unchanging and remains fixed in shape. Hence, the frequency representation of a speech segment, F(jZ), can be represented as the product of an excitation source, E(jZ), and a transfer function, H(jZ): F(jZ) = E(jZ)H(jZ) (1)

MATLAB and NeuralWorks Simulation

By analyzing the periodicity of time slices of a speech signal, segments can be classified as voiced or unvoiced. During voiced speech the vocal chords vibrate at a constant frequency known as the fundamental or pitch frequency. Alternatively, in unvoiced speech the vocal chords do not move and air is forced past the glottis, tongue, teeth and lips [1, pp. 36]. These characteristics become apparent when students view their speech as a MATLAB plot. In Fig. 2, the periodicity of the speech signal can be observed for the voiced phoneme e. This signal was sampled using a 16-bit analog-to-digital converter and was then stored as a binary WAV file. The sampling rate (22.05 kHz) and storage of data was controlled by a soundcard driver. The speech signal in Fig. 2 represents the sampled
All pts. of Voice Data (vd) 6000 5000 4000 3000 Mag. 2000 1000 0 -1000 -2000 -3000 0 20 40 60 msec. 80 100 120

To separate these two functions, the complex logarithm of Eq. 1 is used to create a real and imaginary part that reflect the magnitude and phase angle respectively. Ln(F(jZ)) = Ln|F(jZ)| + jTF(jZ) (2)

Because phase angle carries only information about the time origin of the signal, the imaginary part of Eq. 2 can be ignored [2, p. 376]. The real values, however, represent a magnitude function where the excitation source and transfer function are additive in the frequency domain [3, p. 266]. Ln|F(jZ)| = Ln|E(jZ)| + Ln|H(jZ)| (3)

Taking the Inverse Fast Fourier Transform (IFFT) of the real portion of the complex logarithm, Eq. 3, produces a time domain signal in which the logarithm of the excitation source and impulse response are separable. This filtering process is known as homomorphic deconvolution and involves a technique referred to as "cepstral deconvolution." To employ this technique, the combined signals need to have their main components of energy concentrated at different frequencies [3, p. 266]. This condition is true of speech when only the transfer function and excitation source are considered. The transfer function describing the vocal tract, H(jZ), has a band-limiting characteristic that confines the energy of the speech signal within a 5 kHz range. Conversely, the excitation source, E(jZ), can be modeled as a white noise source which contains an equal distribution of energy across a frequency range far in excess of 5 kHz. Because the primary energy components of both functions do not overlap, cepstral analysis applies. This technique allows students to readily understand how the properties of logarithms can be used to remove unwanted noise from a frequency banded signal. These algorithmic processes are fundamental to speech coding and are easily implemented in MATLAB as demonstrated in the next section. The process culminates when the formant frequencies derived from a segment of speech are coded into a training set vector in NeuralWorks that correlates the data to a particular phoneme. A simulation using the phoneme "e" as in "bet" is now introduced.

Fig. 2 Voice data for the phoneme "e" as in "bet" points that are passed as a data vector into MATLAB. Within MATLAB, the voice data is segmented into 30 msec time slices/records and fitted to various data windows
Hamming Window Applied to Voice Data 4000

3000

2000 Mag.

1000

-1000

-2000

15 msec.

Fig. 3

Time slice of the phoneme "e" with Hamming window applied

Ampl. Spectrum of Padded Voice Data (pvd) 250

200

150 Mag.

100

to a pitch of 125 Hz. Students can compare the periodicity of the waveform presented in figure 2 with the pitch period computed using cepstral analysis as shown in figure 5 and verify the validity of this analysis technique. Once in the cepstral domain, a time-window applied to the real part separates the impulse response from the excitation source, thus revealing the formant frequencies (Fig. 6) that cause the vocal tract to resonate. As pictured in figure 6, these formant frequencies appear as peaks and
5000 1 0.9 0.8 Normalized Log(Magnitude) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1000 2000 Hertz 3000 4000 5000 Formant Estimation(normalized to unity peak value), e

1000

2000 Hertz

3000

4000

Fig. 4

Amplitude spectrum for 30 msec time slice of phoneme "e"

(Fig. 3). The windowed data is then padded with zeros to a power of 2, normalized to unity RMS, and transformed using the radix-2 FFT (Fig. 4). This process accounts for any variations in the intensity of the voice and adds resolution to the frequency domain output. These aspects provide students an opportunity to experiment with the properties of the FFT while observing the results created by applying various windowing functions to the padded voice data. Throughout this experimental process, students are forced to consider any masking effects that these signal processing techniques might impose on the output. To insure that these techniques are being implemented correctly, various checks are available. The first of these is the computation of the pitch frequency and involves a technique called cepstral analysis. In this process, a second Fourier analysis is applied to the logarithm of the frequency spectrum (Fig. 4). The resultant function is called the cepstrum and contains a spike at each harmonic of the pitch period [2, pp. 362]. The time domain function in Fig.
Real of the Complex Cepstrum 1

Fig. 6

Formant estimation for 30 msec time slice of the phoneme "e"

0.8

0.6

0.4

0.2

-0.2

15 msec.

Fig. 5 Cepstrum for 30 msec time slice of the phoneme "e"

5 shows an initial spike at 8 msec. This spike corresponds

can be masked if the side-band leakage is too great or the resolution is poor. From such plots students can judge the effectiveness of their choice of padding length and data window. With these frequency characteristics resolved, a neural network can be used to identify different phonetic pronunciations [1, pp. 48]. The formant frequencies derived by this technique are used to create a set of vectors that is used to train a backpropagation neural network to recognize different phonemes. Because the vocal tract is very defined for the pronunciation of different phonemes but changes slowly with time, the formant frequencies reveal more information than is presented by the time domain analysis of the original speech waveform. In NeuralWorks students are able to observe the global error function converging to zero. This function is proportional to the square of the Euclidean distance between the desired output and the actual output of the network for a particular input pattern [4, pp. 69]. The error function is displayed as a strip chart in Fig. 7. In addition, a graphical matrix that quantifies the classification rate of the network is also displayed. This rate describes the ability of the network to correctly classify the training and test data. By observing these parameters and their convergence, students are able to judge the effectiveness of the data windowing and homomorphic filtering processes.

Fig. 7 NeuralWorks display of a back-propagation neural network trained to phonetically classify time slices of a speech signal. The network consist of 255 inputs that represent the frequency response of the vocal tract. The hidden layer contains 400 processing elements and is fully connected to an output layer of 40 elements. These 40 elements each correspond to one phoneme. Each of the routines described in the preceding discussion is implemented as either a programming code in MATLAB or as a design file in NeuralWorks. Through this process, students gain better insight into signal processing theory by applying speech coding techniques to sampled voice data.

2. Rabiner, Lawrence and Schafer, Ronald, Digital

Processing of Speech Signals, Prentice-Hall, 1978.

3. John Proakis and Dimitris Manolakis, Digital Signal

4. Processing 3rd Edition, 1996. NeuralWare, Inc., Neural Computing, NeuralWare, 1995.

Conclusions
In this paper, speech coding and classification techniques have been explored via MATLAB and NeuralWorks. These software tools allowed us to sample an analog speech signal, find the pitch and formant frequencies, and phonetically classify voice data. The speech coding algorithms used here involve digital filtering, data windowing, and spectral analysis. This particular application provided the means of some of the aspects of diverse signal processing theory in a graphical and procedural manner. Each of the techniques introduced in this paper can be implemented as a lab to support signal processing theory learned in the classroom. Through such a sequence of labs, students learn the properties of the FFT while discovering the trade-offs of various data windowing techniques and filtering routines.

References 1. Pelton, Gordon, Voice Processing, McGraw-Hill, 1993.

7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
Voice Processing Tool
No ratings yet
Voice Processing Tool
51 pages
Text-Independent Speaker Recognition
No ratings yet
Text-Independent Speaker Recognition
12 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
DSP Project 2
No ratings yet
DSP Project 2
10 pages
Signal Lab 3,4 2 PDF
No ratings yet
Signal Lab 3,4 2 PDF
7 pages
Speech Sound Production: Recognition Using Recurrent Neural Networks
No ratings yet
Speech Sound Production: Recognition Using Recurrent Neural Networks
20 pages
Speech To Text Matlab PGM
No ratings yet
Speech To Text Matlab PGM
5 pages
Lab 9 A
No ratings yet
Lab 9 A
12 pages
Speech Generation
No ratings yet
Speech Generation
11 pages
Lecture 7 - Automatic Speech Recognition
No ratings yet
Lecture 7 - Automatic Speech Recognition
58 pages
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
No ratings yet
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
7 pages
Speech Recognition and Retrieving Using Fuzzy Logic System
No ratings yet
Speech Recognition and Retrieving Using Fuzzy Logic System
15 pages
Abstract:: Text-Independent and Dependent Methods. in A Text
No ratings yet
Abstract:: Text-Independent and Dependent Methods. in A Text
11 pages
Homework 1
No ratings yet
Homework 1
3 pages
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
No ratings yet
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
10 pages
Sinusoidal Synthesis of Speech Using MATLAB
No ratings yet
Sinusoidal Synthesis of Speech Using MATLAB
35 pages
Speech Signal Processing
No ratings yet
Speech Signal Processing
41 pages
ECE471 Lab#3 Due: 3/27/2015 Voice Recording and FFT (20points)
No ratings yet
ECE471 Lab#3 Due: 3/27/2015 Voice Recording and FFT (20points)
1 page
Urdu Speech Number Recognition
No ratings yet
Urdu Speech Number Recognition
7 pages
SVP (1-5) Units Notes 4th Yr CSM
No ratings yet
SVP (1-5) Units Notes 4th Yr CSM
35 pages
A Tutorial On Speech Synthesis Models
No ratings yet
A Tutorial On Speech Synthesis Models
8 pages
Speech Recognition, Synthesis, and Dialogue 2
No ratings yet
Speech Recognition, Synthesis, and Dialogue 2
59 pages
MATLAB Audio Signal Processing
No ratings yet
MATLAB Audio Signal Processing
4 pages
Discrete Time Processing of Speech Signa
No ratings yet
Discrete Time Processing of Speech Signa
12 pages
Voice Morphing Seminar Report
No ratings yet
Voice Morphing Seminar Report
36 pages
Lab9: Speech Synthesis
No ratings yet
Lab9: Speech Synthesis
13 pages
DSP Final Softcopy 444
No ratings yet
DSP Final Softcopy 444
3 pages
Linear Predictive Coding Vocoder Guide
No ratings yet
Linear Predictive Coding Vocoder Guide
22 pages
Human Speech Communication
No ratings yet
Human Speech Communication
44 pages
Viva
No ratings yet
Viva
20 pages
Assignment Sound AV223
No ratings yet
Assignment Sound AV223
4 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
MFCCs in Speech Recognition
No ratings yet
MFCCs in Speech Recognition
14 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Test 1
No ratings yet
Test 1
77 pages
System For Automatic Formant Analysis of Voiced Speech
No ratings yet
System For Automatic Formant Analysis of Voiced Speech
15 pages
Voice Morphing 2
No ratings yet
Voice Morphing 2
29 pages
Unit 4 NLP Kcs072
No ratings yet
Unit 4 NLP Kcs072
9 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
An Automatic Speaker Recognition System
No ratings yet
An Automatic Speaker Recognition System
11 pages
NWU Report Template
No ratings yet
NWU Report Template
38 pages
Objective
No ratings yet
Objective
2 pages
Voice Morphing Seminar Report
100% (5)
Voice Morphing Seminar Report
31 pages
Speech Processing Lab Manual
No ratings yet
Speech Processing Lab Manual
23 pages
Eng 6 Audio Signals: Bevan Baas, Andre Knoesen
No ratings yet
Eng 6 Audio Signals: Bevan Baas, Andre Knoesen
30 pages
Amplitude/Frequency Modulation Matlab GUI Project: June 2015
No ratings yet
Amplitude/Frequency Modulation Matlab GUI Project: June 2015
163 pages
Basic Course Material Winter 2015
100% (1)
Basic Course Material Winter 2015
19 pages
Biometric Voice Recognition
100% (1)
Biometric Voice Recognition
33 pages
LPC, Which Has Mathematically Tractable and Well-Understood Model. This Model Is
No ratings yet
LPC, Which Has Mathematically Tractable and Well-Understood Model. This Model Is
14 pages
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
No ratings yet
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
31 pages
Manual Sa Akx34lm K
No ratings yet
Manual Sa Akx34lm K
123 pages
Lecture 19 - GIS Project Design and Management Case Studies
100% (1)
Lecture 19 - GIS Project Design and Management Case Studies
79 pages
Thailand Tour
No ratings yet
Thailand Tour
1 page
Team Brainwriting Workshop
No ratings yet
Team Brainwriting Workshop
15 pages
Omm D6R
No ratings yet
Omm D6R
3 pages
Using Mathcad To Derive Circuit Equations and Optimize Circuit Behavior
No ratings yet
Using Mathcad To Derive Circuit Equations and Optimize Circuit Behavior
19 pages
Unity Secondary School Physics Chapter 20: Magnetism Answers
No ratings yet
Unity Secondary School Physics Chapter 20: Magnetism Answers
6 pages
Comprehensive Business Performance Metrics
No ratings yet
Comprehensive Business Performance Metrics
4 pages
PGE 2012FinalPerUnitCostGuide
No ratings yet
PGE 2012FinalPerUnitCostGuide
9 pages
DNS Interview Q&A Guide
No ratings yet
DNS Interview Q&A Guide
13 pages
Satra 80634 14c Mat Kws Outsole
No ratings yet
Satra 80634 14c Mat Kws Outsole
10 pages
Physics Exam Prep: Electromagnetism
No ratings yet
Physics Exam Prep: Electromagnetism
18 pages
NIST CSF 2 0 Draft Overview 1691619611
No ratings yet
NIST CSF 2 0 Draft Overview 1691619611
22 pages
Schneider Electric - PowerLogic-T300 - EMS59040
No ratings yet
Schneider Electric - PowerLogic-T300 - EMS59040
2 pages
SpaceX Report
No ratings yet
SpaceX Report
47 pages
High Performance Audio Op Amp LM4562
No ratings yet
High Performance Audio Op Amp LM4562
34 pages
Objective: Graph Integers On A Number Line and Find Absolute Value
No ratings yet
Objective: Graph Integers On A Number Line and Find Absolute Value
23 pages
SanRachna - Online Internship Cum Training-Kaushal
No ratings yet
SanRachna - Online Internship Cum Training-Kaushal
1 page
KAHL Extrudeur
No ratings yet
KAHL Extrudeur
2 pages
04 Creating Weldment Profile
No ratings yet
04 Creating Weldment Profile
3 pages
BOSCHERT Safety Chucks PDF
No ratings yet
BOSCHERT Safety Chucks PDF
230 pages
Brake
No ratings yet
Brake
47 pages
Jesús Adrián Romero
No ratings yet
Jesús Adrián Romero
6 pages
PWE Series Tower Lights Specs
No ratings yet
PWE Series Tower Lights Specs
12 pages
Enercon Superseal 100 Manual
No ratings yet
Enercon Superseal 100 Manual
69 pages
Homework 2
No ratings yet
Homework 2
8 pages
Process Safety Change Guide
90% (10)
Process Safety Change Guide
15 pages
Price List of Cars
No ratings yet
Price List of Cars
3 pages
High Power Relay Specs
No ratings yet
High Power Relay Specs
2 pages
Aalborg Av 6n
No ratings yet
Aalborg Av 6n
2 pages

Speech Coding and Phoneme Classification Using Matlab and Neuralworks

Uploaded by

Speech Coding and Phoneme Classification Using Matlab and Neuralworks

Uploaded by

Speech Coding and Phoneme Classification Using MATLAB and NeuralWorks

Introduction Speech Coding

Sample speech signal and write data to .WAV file

Fig. 1 Flowchart of speech coding and classification algorithm

MATLAB and NeuralWorks Simulation

Time slice of the phoneme "e" with Hamming window applied

Ampl. Spectrum of Padded Voice Data (pvd) 250

Amplitude spectrum for 30 msec time slice of phoneme "e"

Formant estimation for 30 msec time slice of the phoneme "e"

Fig. 5 Cepstrum for 30 msec time slice of the phoneme "e"

5 shows an initial spike at 8 msec. This spike corresponds

2. Rabiner, Lawrence and Schafer, Ronald, Digital

3. John Proakis and Dimitris Manolakis, Digital Signal

References 1. Pelton, Gordon, Voice Processing, McGraw-Hill, 1993.

You might also like