0% found this document useful (0 votes)

64 views8 pages

Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms

This document summarizes techniques for voice recognition using neural networks, type-2 fuzzy logic, and genetic algorithms. It discusses using neural networks to analyze sound signals, type-2 fuzzy rules for decision making, and genetic algorithms to optimize the neural network architecture. It also reviews traditional speaker recognition methods, including normalization techniques, text-dependent approaches using template matching and hidden Markov models, and text-independent methods using vector quantization.

Uploaded by

Praveen D Jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views8 pages

Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms

Uploaded by

Praveen D Jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 8

Voice Recognition with Neural Networks,

Type-2 Fuzzy Logic and Genetic Algorithms

1. Apeksha Reddy, SDMCET, Dharwad. Id: apeksha.r.r@gmail.com
2. Ashrit Mangesh R, SDMCET, Dharwad. Id: mangeshashrit@gmail.com

Abstract we describe in this paper the use of neural there are methods in which a small set of words, such as
networks, fuzzy logic and genetic algorithms for voice digits, are used as key words and each user is prompted to
recognition. In particular, we consider the case of speaker utter a given sequence of key words that is randomly chosen
recognition by analyzing the sound signals with the help of every time the system is used. Yet even this method is not
intelligent techniques, such as the neural networks and fuzzy completely reliable, since it can be deceived with advanced
systems. We use the neural networks for analyzing the sound electronic recording equipment that can reproduce key words
signal of an unknown speaker, and after this first step, a set of in a requested order. Therefore, a text-prompted speaker
type-2 fuzzy rules is used for decision making. We need to use recognition method has recently been proposed .
fuzzy logic due to the uncertainty of the decision process. We
also use genetic algorithms to optimize the architecture of the
neural networks. We illustrate our approach with a sample of
sound signals from real speakers in our institution.

Index TermsType-2 Fuzzy Logic, Neural Networks, Genetic

Algorithms, Voice Recognition.

I. INTRODUCTION
Speaker recognition, which can be classified into
identification and verification, is the process of automatically
recognizing who is speaking on the basis of individual
information included in speech waves. This technique makes
(a) Speaker identification
it possible to use the speaker's voice to verify their identity
and control access to services such as voice dialling, banking
by telephone, telephone shopping, database access services,
information services, voice mail, security control for
confidential information areas, and remote access to
computers [10].

Fig. 1 shows the basic components of speaker

identification and verification systems. Speaker identification
is the process of determining which registered speaker
provides a given utterance. Speaker verification, on the other
hand, is the process of accepting or rejecting the identity claim
of a speaker. Most applications in which a voice is used as the
key to confirm the identity of a speaker are classified as
speaker verification [11]. (b) Speaker Verification
Speaker recognition methods can also be divided into text-
dependent and text-independent methods. The former require Fig. 1. Basic structure of speaker recognition systems
the speaker to say key words or sentences having the same
text for both training and recognition trials, whereas the latter II. TRADITIONAL METHODS FOR SPEAKER
do not rely on a specific text being spoken RECOGNITION
Both text-dependent and independent methods share a Speaker identity is correlated with the physiological and
problem however. These systems can be easily deceived behavioural characteristics of the speaker. These
because someone who plays back the recorded voice of a characteristics exist both in the spectral envelope (vocal tract
registered speaker saying the key words or sentences can be characteristics) and in the supra-segmental features (voice
accepted as the registered speaker. To cope with this problem,
source characteristics and dynamic features spanning several likelihood ratio normalization approximates optimal scoring in
segments). the Bayes sense.
A normalization method based on a posteriori probability
The most common short-term spectral measurements has also been proposed .The difference between the
currently used are Linear Predictive Coding (LPC)-derived normalization method based on the likelihood ratio and the
cepstral coefficients and their regression coefficients. A method based on a posteriori probability is whether or not the
spectral envelope reconstructed from a truncated set of claimed speaker is included in the speaker set for
cepstral coefficients is much smoother than one reconstructed normalization; the speaker set used in the method based on the
from LPC coefficients. Therefore it provides a stabler likelihood ratio does not include the claimed speaker, whereas
representation from one repetition to another of a particular the normalization term for the method based on a posteriori
speaker's utterances. As for the regression coefficients, probability is calculated by using all the reference speakers,
typically the first- and second-order coefficients are extracted including the claimed speaker.
at every frame period to represent the spectral dynamics. Experimental results indicate that the two normalization
These coefficients are derivatives of the time functions of the methods are almost equally effective .They both improve
cepstral coefficients and are respectively called the delta- and speaker separability and reduce the need for speaker-
delta-cepstral coefficients. dependent or text-dependent thresholding, as compared with
scoring using only a model of the claimed speaker.
A. Normalization Techniques A new method in which the normalization term is
The most significant factor affecting automatic speaker approximated by the likelihood of a single mixture model
recognition performance is variation in the signal representing the parameter distribution for all the reference
characteristics from trial to trial (inter-session variability and speakers has recently been proposed. An advantage of this
variability over time). Variations arise from the speaker method is that the computational cost of calculating the
themselves, from differences in recording and transmission normalization term is very small and this method has been
conditions, and from background noise. Speakers cannot confirmed to give much better results than either of the above-
repeat an utterance precisely the same way from trial to trial. mentioned normalization methods.
It is well known that samples of the same utterance recorded
D. Text-Dependent Speaker Recognition Methods
in one session are much more highly correlated than samples
recorded in separate sessions. There are also long-term Text-dependent methods are usually based on template-
changes in voices. It is important for speaker recognition matching techniques. In this approach, the input utterance is
systems to accommodate to these variations. Two types of represented by a sequence of feature vectors, generally short-
normalization techniques have been tried; one in the term spectral feature vectors. The time axes of the input
parameter domain, and the other in the distance/similarity utterance and each reference template or reference model of
domain. the registered speakers are aligned using a dynamic time
warping (DTW) algorithm and the degree of similarity
B. Parameter-Domain Normalization between them, accumulated from the beginning to the end of
Spectral equalization, the so-called blind equalization method, the utterance, is calculated.
is a typical normalization technique in the parameter domain The hidden Markov model (HMM) can efficiently model
that has been confirmed to be effective in reducing linear statistical variation in spectral features. Therefore, HMM-
channel effects and long-term spectral variation .This based methods were introduced as extensions of the DTW-
method is especially effective for text-dependent speaker based methods, and have achieved significantly better
recognition applications that use sufficiently long utterances. recognition accuracies.
Cepstral coefficients are averaged over the duration of an
E. Text-Independent Speaker Recognition Methods
entire utterance and the averaged values subtracted from the
cepstral coefficients of each frame. Additive variation in the One of the most successful text-independent recognition
log spectral domain can be compensated for fairly well by this methods is based on vector quantization (VQ). In this method,
method. However, it unavoidably removes some text- VQ code-books consisting of a small number of representative
dependent and speaker specific features; therefore it is feature vectors are used as an efficient means of characterizing
inappropriate for short utterances in speaker recognition speaker-specific features. A speaker-specific code-book is
applications. generated by clustering the training feature vectors of each
speaker. In the recognition stage, an input utterance is vector-
C. Distance/Similarity-Domain Normalization quantized using the code-book of each reference speaker and
A normalization method for distance (similarity, likelihood) the VQ distortion accumulated over the entire input utterance
values using a likelihood ratio has been proposed .The is used to make the recognition decision.
likelihood ratio is defined as the ratio of two conditional
probabilities of the observed measurements of the utterance: Temporal variation in speech signal parameters over the
the first probability is the likelihood of the acoustic data given long term can be represented by stochastic Markovian
the claimed identity of the speaker, and the second is the transitions between states. Therefore, methods using an
likelihood given that the speaker is an imposter. The ergodic HMM, where all possible transitions between states
are allowed, have been proposed. Speech segments are calculated and used for the speaker recognition decision. If the
classified into one of the broad phonetic categories likelihood is high enough, the speaker is accepted as the
corresponding to the HMM states. After the classification, claimed speaker.
appropriate features are selected. Although many recent advances and successes in speaker
In the training phase, reference templates are generated and recognition have been achieved, there are still many problems
verification thresholds are computed for each phonetic for which good solutions remain to be found. Most of these
category. In the verification phase, after the phonetic problems arise from variability, including speaker-generated
categorization, a comparison with the reference template for variability and variability in channel and recording conditions.
each particular category provides a verification score for that It is very important to investigate feature parameters that are
category. The final verification score is a weighted linear stable over time, insensitive to the variation of speaking
combination of the scores from each category. manner, including the speaking rate and level, and robust
This method was extended to the richer class of mixture against variations in voice quality due to causes such as voice
autoregressive (AR) HMMs. In these models, the states are disguise or colds. It is also important to develop a method to
described as a linear combination (mixture) of AR sources. It cope with the problem of distortion due to telephone sets and
can be shown that mixture models are equivalent to a larger channels, and background and channel noises.
HMM with simple states, with additional constraints on the From the human-interface point of view, it is important to
possible transitions between states. consider how the users should be prompted, and how
It has been shown that a continuous ergodic HMM method recognition errors should be handled. Studies on ways to
is far superior to a discrete ergodic HMM method and that a automatically extract the speech periods of each person
continuous ergodic HMM method is as robust as a VQ-based separately from a dialogue involving more than two people
method when enough training data is available. However, have recently appeared as an extension of speaker recognition
when little data is available, the VQ-based method is more technology. This section was not intended to be a
robust than a continuous HMM method .A method using comprehensive review of speaker recognition technology.
statistical dynamic features has recently been proposed. In this Rather, it was intended to give an overview of recent advances
method, a multivariate auto-regression (MAR) model is and the problems, which must be solved in the future.
applied to the time series of cepstral vectors and used to
G. Speaker Verification
characterize speakers. It was reported that identification and
verification rates were almost the same as obtained by a The speaker-specific characteristics of speech are due to
HMM-based method. differences in physiological and behavioural aspects of the
speech production system in humans. The main physiological
F. Text-Prompted Speaker Recognition Method aspect of the human speech production system is the vocal
In the text-prompted speaker recognition method, the tract shape. The vocal tract modifies the spectral content of an
recognition system prompts each user with a new key acoustic wave as it passes through it, thereby producing
sentence every time the system is used and accepts the input speech. Hence, it is common in speaker verification systems
utterance only when it decides that it was the registered to make use of features derived only from the vocal tract.
speaker who repeated the prompted sentence. The sentence The acoustic wave is produced when the airflow, from the
can be displayed as characters or spoken by a synthesized lungs, is carried by the trachea through the vocal folds. This
voice. Because the vocabulary is unlimited, prospective Source of excitation can be characterized as phonation,
impostors cannot know in advance what sentence will be whispering, frication, compression, vibration, or a
requested. Not only can this method accurately recognize combination of these. Phonated excitation occurs when the
speakers, but it can also reject utterances whose text differs airflow is modulated by the vocal folds. Whispered excitation
from the prompted text, even if it is spoken by the registered is produced by airflow rushing through a small triangular
speaker. A recorded voice can thus be correctly rejected. opening between the arytenoid cartilage at the rear of the
This method is facilitated by using speaker-specific nearly closed vocal folds. Frication excitation is produced by
phoneme models, as basic acoustic units. One of the major constrictions in the vocal tract. Compression excitation results
issues in applying this method is how to properly create these from releasing a completely closed and pressurized vocal
speaker-specific phoneme models from training utterances of tract. Vibration excitation is caused by air being forced
a limited size. The phoneme models are represented by through a closure other than the vocal folds, especially at the
Gaussian-mixture continuous HMMs or tied-mixture HMMs, tongue. Speech produced by phonated excitation is called
and they are made by adapting speaker-independent phoneme voiced, that produced by phonated excitation plus frication is
models to each speaker's voice. In order, to properly adapt the called mixed voiced, and that produced by other types of
models of phonemes that are not included in the training excitation is called unvoiced.
utterances, a new adaptation method based on tied-mixture Using cepstral analysis as described in the previous section,
HMMs was recently proposed. an utterance may be represented as a sequence of feature
In the recognition stage, the system concatenates the vectors. Utterances spoken by the same person but at different
phoneme models of each registered speaker to create a times result in similar yet a different sequence of feature
sentence HMM, according to the prompted text. Then the vectors. The purpose of voice modeling is to build a model
likelihood of the input speech matching the sentence model is that captures these variations in the extracted set of features.
There are two types of models that have been used extensively After capturing the sound signals, these voice signals are
in speaker verification and speech recognition systems: digitized at a frequency of 8 KHz, and as consequence we
stochastic models and template models. The stochastic model obtain a signal with 8008 sample points. This information is
treats the speech production process as a parametric random the one used for analyzing the voice.
process and assumes that the parameters of the underlying We also used the Sound Forge 6.0 computer program for
stochastic process can be estimated in a precise, well-defined processing the sound signal. This program allows us to cancel
manner. The template model attempts to model the speech noise in the signal, which may have come from environment
production process in a non-parametric manner by retaining a noise or sensitivity of the microphones. After using this
number of sequences of feature vectors derived from multiple computer program, we obtain a sound signal that is as pure as
utterances of the same word by the same person. Template possible. The program also can use fast Fourier transform for
models dominated early work in speaker verification and The program also can use fast Fourier transform for voice
speech recognition because the template model is intuitively filtering. We show in Figure 4 the use of the computer
more reasonable. However, recent work in stochastic models program for a particular sound signal.
has demonstrated that these models are more flexible and
hence allow for better modelling of the speech production
process. A very popular stochastic model for modelling the
speech production process is the Hidden Markov Model
(HMM). HMMs are extensions to the conventional Markov
models, wherein the observations are a probabilistic function
of the state, i.e., the model is a doubly embedded stochastic
process where the underlying stochastic process is not directly
observable (it is hidden). The HMM can only be viewed
through another set of stochastic processes that produce the
sequence of observations.
The pattern matching process involves the comparison of a
given set of input feature vectors against the speaker model
for the claimed identity and computing a matching score. For
the Hidden Markov models discussed above, the matching
score is the probability that a given set of feature vectors was
generated by a specific model. We show in Figure 2 a
schematic diagram of a typical speaker recognition system.

III. VOICE CAPTURING AND PROCESSING Fig. 4. Main window of the computer program for processing
The first step for achieving voice recognition is to capture the signals.
the sound signal of the voice. We use a standard microphone
for capturing the voice signal. After this, we use the sound
We also show in Figure 5 the use of the Fast Fourier
recorder of the Windows operating system to record the
Transform (FFT) to obtain the spectral analysis of the word
sounds that belong to the database for the voices of different
"way" in Spanish.
persons. A fixed time of recording is established to have
homogeneity in the signals. We show in Figure 3 the sound
signal recorder used in the experiments.

Fig. 5. Spectral analysis of a specific word using the FFT.

Fig. 3. Sound recorder used in the experiments. IV. NEURAL NETWORKS FOR VOICE RECOGNITION
We used the sound signals of 20 words in Spanish as training We now show in Table 2 a comparison of the recognition
data for a supervised feed forward neural network with one ability achieved with the different training algorithms for the
hidden layer. The training algorithm used was the Resilient supervised neural networks. We are showing average values of
Backpropagation (trainer) that has been used previously with experiments performed with all the training algorithms. We
good results. We show in Table 1 the results for the experiments can appreciate from this table that the resilient
with this type of neural network. backpropagation algorithm is also the most accurate method,
The results of Table I are for the Resilient Backpropagation with a 92% average recognition rate.
training algorithm because this was the fastest learning
algorithm found in all the experiment (required only 7% of the TABLE II. COMPARISON OF AVERAGE RECOGNITION OF FOUR TRAINING
total time in the experiments). The comparison of the time ALGORITHMS.
performance with other training methods is shown in Figure 6.

TABLE 1. RESULTS OF FEEDFORWARD NEURAL NETWORKS FOR 20 WORDS

IN SPANISH

We describe below some simulation results of our approach

for speaker recognition using neural networks. First, in Figure
7 we have the sound signal of the word "example" with noise.
Next, in Fig. 8 we have the identification of the word
"example" without noise. We also show in Fig. 9 the word
"layer" with noise. In Fig. 10, we show the identification of
the correct word "layer" without noise.

Fig. 7. Input signal of the word "example" with noise

From the figures 7 to 10 it is clear that simple monolithic

neural networks can be useful in voice recognition with a
small number of words. It is obvious that words even with
noise added can be identified, with at least 92% recognition
rate (for 20 words). Of course, for a larger set of words the
Fig. 6. Comparison of the time performance of several recognition rate goes down and also computation time
training algorithms. increases. For these reasons it is necessary to consider better
methods for voice recognition.
neural networks from the same training data. We describe

Fig. 8. Identification of the word "example".

Fig. 10. Identification of the word "layer".

in this section our modular neural network approach with the use
of type-2 fuzzy logic in the integration of results .

We now show some examples to illustrate the hybrid

approach. We use two modules with one neural network each
in this modular architecture. Each module is trained with the
same data, but results are somewhat different due to the
uncertainty involved in the learning process. In all cases, we
use neural networks with one hidden layer of 50 nodes and
"trainrp" as learning algorithm. The difference in the results is
then used to create a type-2 interval fuzzy set that represents
the uncertainty in the classification of the word. The first
example is of the word "example" which is shown in Fig. 11.

Fig. 9. Input signal of the word "layer" with noise added.

V. VOICE RECOGNITION WITH MODULAR NEURAL NETWORKS

AND TYPE-2 FUZZY LOGIC
We can improve on the results obtained in the previous section
by using modular neural networks because modularity enables
us to divide the problem of recognition in simpler sub-
problems, which can be more easily solved. We also use type-2
fuzzy logic to model the uncertainty in the results given by the

Fig. 11. Sound signal of the word "example" .

Considering for now only 10 words in the training, we have We now describe the complete modular neural network
that the first neural network will give the following results: architecture (Fig. 12) for voice recognition in which we now
use three neural networks in each module. Also, each module
SSE = 4.17649e-005 (Sum of squared errors) only processes a part of the word, which is divided in three
Output = [0.0023, 0.0001, 0.0000, 0.0020, 0.0113, parts one for each module.
0.0053, 0.0065, 0.9901, 0.0007, 0.0001]

The output can be interpreted as giving us the membership

values of the given sound signal to each of the 10 different
words in the database. In this case, we can appreciate that the
value of 0.9901 is the membership value to the word
"example", which is very close to 1. But, if we now train a
second neural network with the same architecture, due to the
different random inicialization of the weights, the results will
be different. We now give the results for the second neural
network:
SSE = 0.0124899

Output = [0.0002, 0.0041, 0.0037, 0.0013, 0.0091,

0.0009, 0.0004, 0.9821, 0.0007, 0.0007]

We can note that now the membership value to the word

"example" is of 0.9821. With the two different values of
membership, we can define an interval [0.9821, 0.9901],
which gives us the uncertainty in membership of the input Fig. 12. Complete modular neural network architecture for
signal belonging to the word "example" in the database. We voice recognition.
have to use centroid deffuzification to obtain a single We have also experimented with using a genetic algorithm
membership value. If we now repeat the same procedure for for optimizing the number of layers and nodes of the neural
the whole database, we obtain the results shown in Table II. In networks of the modules with very good results. The approach
this table, we can see the results for a sample of 6 different is very similar to the one described in the previous chapter.
words. We show in Fig. 13 an example of the use of a genetic
algorithm for optimizing the number of layers and nodes of
TABLE II. SUMMARY OF RESULTS FOR THE TWO MODULES (M1 AND M2)
FOR A SET OF WORDS IN "SPANISH ".
one of the neural networks in the modular architecture. In this
figure we can appreciate the minimization of the fitness
function, which takes into account two objectives: sum of
squared errors and the complexity of the neural network.

The same modular neural network approach was extended to

the previous 20 words (mentioned in the previous section) and
the recognition rate was improved to 100%, which shows the
advantage of modularity and also the utilization of type-2 fuzzy
logic. We also have to say that computation time was also Fig. 13. Genetic algorithm showing the optimization of a
reduced slightly due to the use of modularity. neural network.
VI. CONCLUSIONS VIII. ACKNOWLEDGEMENTS
We have described in this paper an intelligent approach for 1. We thank JNCE, Shivamogga for conducting
pattern recognition for the case of speaker identification. We Techzone 2k10 and giving us an opportunity to
first described the use of monolithic neural networks for voice participate in the same.
recognition. We then described a modular neural network 2. We thank Principal, Staff and the management of our
approach with type-2 fuzzy logic. We have shown examples college, SDMCET, Dharwad for continuously
for words in which a correct identification was achieved. We supporting us in completing this paper.
have performed tests with about 20 different words in wich
were spoken by three different speakers. The results are very
good for the monolithic neural network approach, and
excellent for the modular neural network approach. We have
considered increasing the database of words, and with the
modular approach we have been able to achieve about 96%
recognition rate on over 100 words. We still have to make
more tests with different words and levels of noise.

VII. REFERENCES
[1] O. Castillo, O. and P. Melin, "A New Approach for Plant Monitoring using
Type-2 Fuzzy Logic and Fractal Theory", International Journal of
General Systems, Taylor and Francis, Vol. 33, 2004, pp. 305-319.

[2] S. Furui, "Cepstral analysis technique for automatic speaker verification",

IEEE Transactions on Acoustics, Speech and Signal Processing, 29(2),
1981, pp. 254-272.

[3] S. Furui, "Research on individuality features in speech waves and

automatic speaker recognition techniques", Speech Communication,
5(2), 1986, pp. 183-197.

[4] S. Furui, "Speaker-independent isolated word recognition using dynamic

features of the speech spectrum", IEEE Transactions on Acoustics,
Speech and Signal Processing, 29(1), 1986, pp. 59-59.

[5] S. Furui, "Digital Speech Processing, Synthesis, and Recognition". Marcel

Dekker, New York, 1989.

[6] S. Furui, "Speaker-dependent-feature extraction, recognition and

processing techniques", Speech Communication, 10(5-6), 1991, pp. 505-
520.

[7] S. Furui, "An overview of speaker recognition technology", Proceedings

of the ESCA Workshop on Automatic Speaker Recognition,
Identification and Verification, 1994, pp. 1-9.

[8] A. L. Higgins, L. Bahler, and J. Porter, "Speaker verification using

randomized phrase prompting", Digital Signal Processing, Vol. 1, 1991,
pp. 89-106.

[9] N.N Karnik, and J.M. Mendel, An Introduction to Type-2 Fuzzy Logic
Systems, Technical Report, University of Southern California, 1998.

[10] T. Matsui, and S. Furui, "Concatenated phoneme models for text-variable

speaker recognition", Proceedings of ICASSP'93, 1993, pp. 391-394.

[11] T. Matsui, and S. Furui, "Similarity normalization method for speaker

verification based on a posteriori probability", Proceedings of the ESCA
Workshop on Automatic Speaker Recognition, Identification and
Verification, 1994, pp. 59-62.

[12] P. Melin, M. L. Acosta, and C. Felix, "Pattern Recognition Using Fuzzy

Logic and Neural Networks", Proceedings of IC-AI'03, Las Vegas,
USA, 2003, pp. 221-227.

[13] P. Melin, and O. Castillo, A New Method for Adaptive Control of Non-
Linear Plants Using Type-2 Fuzzy Logic and Neural Networks,
International Journal of General Systems, Taylor and Francis, Vo

Speaker Recognition Publish
No ratings yet
Speaker Recognition Publish
6 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Speaker Recognition
No ratings yet
Speaker Recognition
12 pages
MATLAB Speaker Recognition Guide
No ratings yet
MATLAB Speaker Recognition Guide
20 pages
Speaker Recognition Methods Guide
No ratings yet
Speaker Recognition Methods Guide
16 pages
About Speaker Recognition Techology
No ratings yet
About Speaker Recognition Techology
9 pages
Puede-ser-Speaker Identification Based On Hybrid Feature
No ratings yet
Puede-ser-Speaker Identification Based On Hybrid Feature
6 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Hedha Houa
No ratings yet
Hedha Houa
5 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
An Automatic Speaker Recognition System
No ratings yet
An Automatic Speaker Recognition System
11 pages
Digital Signal Processing: The Final
No ratings yet
Digital Signal Processing: The Final
13 pages
Algorithm For The Identification and Verification Phase
No ratings yet
Algorithm For The Identification and Verification Phase
9 pages
EEL6586 Final Project:: A Speaker Identification and Verification System
No ratings yet
EEL6586 Final Project:: A Speaker Identification and Verification System
16 pages
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
No ratings yet
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
6 pages
An Overview of The Development of Speaker Recognition
No ratings yet
An Overview of The Development of Speaker Recognition
11 pages
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
No ratings yet
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
22 pages
Fast Speaker Identification Using Recursive Word Sample Attributes
No ratings yet
Fast Speaker Identification Using Recursive Word Sample Attributes
7 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
MATLAB Speaker Recognition Report
No ratings yet
MATLAB Speaker Recognition Report
48 pages
Automatic Recognition of Correctly Pronounced English Words Using Machine Learning
No ratings yet
Automatic Recognition of Correctly Pronounced English Words Using Machine Learning
12 pages
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
Speaker Identification Using Power Distribution in
No ratings yet
Speaker Identification Using Power Distribution in
6 pages
DSP Implementation of Voice Recognition Using Dynamic Time Warping Algorithm
No ratings yet
DSP Implementation of Voice Recognition Using Dynamic Time Warping Algorithm
7 pages
Russia Project
No ratings yet
Russia Project
14 pages
Principle and Applications of Speaker Recognition Security System
No ratings yet
Principle and Applications of Speaker Recognition Security System
5 pages
1 en 16 Chapter OnlinePDF
No ratings yet
1 en 16 Chapter OnlinePDF
15 pages
Ma Kale
No ratings yet
Ma Kale
3 pages
Shareef Seminar Docs
No ratings yet
Shareef Seminar Docs
24 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
Monalisha Barik Paper
No ratings yet
Monalisha Barik Paper
5 pages
Text-Independent Speaker Recognition
No ratings yet
Text-Independent Speaker Recognition
12 pages
199568.speaker Recognition Method Combining FFT Wavelet Functions and Neural Networks
No ratings yet
199568.speaker Recognition Method Combining FFT Wavelet Functions and Neural Networks
4 pages
Amharic Speaker Recognition System
No ratings yet
Amharic Speaker Recognition System
5 pages
2000 - Data-Driven Temporal Filters and Alternatives To GMM in Speaker Verification
No ratings yet
2000 - Data-Driven Temporal Filters and Alternatives To GMM in Speaker Verification
20 pages
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA
No ratings yet
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA
5 pages
Speaker Identification
No ratings yet
Speaker Identification
8 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
10.1007@s11042 019 08293 7
No ratings yet
10.1007@s11042 019 08293 7
16 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
S V A P F: Peaker Erification Using Coustic and Rosodic Eatures
No ratings yet
S V A P F: Peaker Erification Using Coustic and Rosodic Eatures
7 pages
Study of Speaker Verification Methods
No ratings yet
Study of Speaker Verification Methods
4 pages
Voice Recognition System Speech To Text
No ratings yet
Voice Recognition System Speech To Text
5 pages
Voice Syn - NN
No ratings yet
Voice Syn - NN
15 pages
Automatic Speech Recognition Using Cepstral and Itakura-Saito Distances For Vocal Command
No ratings yet
Automatic Speech Recognition Using Cepstral and Itakura-Saito Distances For Vocal Command
5 pages
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
No ratings yet
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
9 pages
Person Voice Recognition Methods
No ratings yet
Person Voice Recognition Methods
6 pages
Performance Comparison of Robust Speech PDF
No ratings yet
Performance Comparison of Robust Speech PDF
6 pages
DC Motor Control
No ratings yet
DC Motor Control
2 pages
Text Independent Amharic Language Speaker Identifi
No ratings yet
Text Independent Amharic Language Speaker Identifi
6 pages
El 29 2 15
No ratings yet
El 29 2 15
8 pages
Comparative Study of Speaker Recognition System Using VQ and GMM
No ratings yet
Comparative Study of Speaker Recognition System Using VQ and GMM
7 pages
Abstract:: Text-Independent and Dependent Methods. in A Text
No ratings yet
Abstract:: Text-Independent and Dependent Methods. in A Text
11 pages
A Discriminative Training Approach For Text-Independent Speaker Recognition
No ratings yet
A Discriminative Training Approach For Text-Independent Speaker Recognition
15 pages
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
No ratings yet
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
6 pages
Final Pick and Place Report
No ratings yet
Final Pick and Place Report
48 pages
109 JMES 289 2012 Verma
No ratings yet
109 JMES 289 2012 Verma
14 pages
V-REP Robot Simulator Guide
No ratings yet
V-REP Robot Simulator Guide
51 pages
E08 Planning PDF
No ratings yet
E08 Planning PDF
1 page
16.413 Project Description: Part 2: Building The Path Planner
No ratings yet
16.413 Project Description: Part 2: Building The Path Planner
8 pages
MIT6 832s09 Pset04
No ratings yet
MIT6 832s09 Pset04
3 pages
MIT6 832s09 Pset04
No ratings yet
MIT6 832s09 Pset04
3 pages
Modeling and Simulation Principles
No ratings yet
Modeling and Simulation Principles
28 pages
Research Methodology Exam Guide
No ratings yet
Research Methodology Exam Guide
1 page
Praveen D. Jadhav: TH TH ND
No ratings yet
Praveen D. Jadhav: TH TH ND
5 pages
AM - Program Implementation
No ratings yet
AM - Program Implementation
1 page
V-REP: 3D Robot Simulator Overview
No ratings yet
V-REP: 3D Robot Simulator Overview
12 pages
Agile Process - Business Presentation
100% (1)
Agile Process - Business Presentation
27 pages
Turbomachinery Course Guide
100% (5)
Turbomachinery Course Guide
79 pages
Furore2k18 Schedule
No ratings yet
Furore2k18 Schedule
2 pages
Brain Computer Interfacing Final - 6
No ratings yet
Brain Computer Interfacing Final - 6
12 pages
Torque 7.0: National Tech Symposium
No ratings yet
Torque 7.0: National Tech Symposium
9 pages
Induction Lamps: Long-Lasting Lighting
No ratings yet
Induction Lamps: Long-Lasting Lighting
9 pages
Multicore Resource Management: by Rakshitha.K.R and Menaka.S.H
No ratings yet
Multicore Resource Management: by Rakshitha.K.R and Menaka.S.H
2 pages
Application of Zigbee Technology in Telecommunication System
No ratings yet
Application of Zigbee Technology in Telecommunication System
7 pages
Safety, Security, and Convenience: The Benefits of Voice Recognition Technology
No ratings yet
Safety, Security, and Convenience: The Benefits of Voice Recognition Technology
5 pages
Machine Learning Enhanced Voice Interation Revolutionizing Windows
No ratings yet
Machine Learning Enhanced Voice Interation Revolutionizing Windows
6 pages
Interpol Forensic PDF
No ratings yet
Interpol Forensic PDF
32 pages
Forensic - Hollien
No ratings yet
Forensic - Hollien
148 pages
Text-Independent Speaker Identification For The Amharic Language
No ratings yet
Text-Independent Speaker Identification For The Amharic Language
117 pages
Project On Biometrics Security in Banking
100% (1)
Project On Biometrics Security in Banking
21 pages
2017 Interspeech Embeddings
No ratings yet
2017 Interspeech Embeddings
5 pages
AI Based Advancements in Biometrics - Balasubramaniam S
No ratings yet
AI Based Advancements in Biometrics - Balasubramaniam S
274 pages
Overview of Speaker Modeling
No ratings yet
Overview of Speaker Modeling
27 pages
Review of Various Biometric Authentication Techniques
No ratings yet
Review of Various Biometric Authentication Techniques
7 pages
What Is The Difference Between Voice Recognition and Speech Recognition, If Any - Quora PDF
No ratings yet
What Is The Difference Between Voice Recognition and Speech Recognition, If Any - Quora PDF
3 pages
MUKISA SOLOMON TENDO Final Proposal Voice Operated Door Lock
No ratings yet
MUKISA SOLOMON TENDO Final Proposal Voice Operated Door Lock
40 pages
An Introduction To BioMetrics
No ratings yet
An Introduction To BioMetrics
26 pages
Biometrics: P.V.K.K. Institute of Technology Rudrampeta Anantapur
No ratings yet
Biometrics: P.V.K.K. Institute of Technology Rudrampeta Anantapur
10 pages
Major Project - I Final Submission Report: DSP Tools in Wireless Communication
No ratings yet
Major Project - I Final Submission Report: DSP Tools in Wireless Communication
36 pages
Speaker Identification
No ratings yet
Speaker Identification
5 pages
ALIZE 3.0: Speaker Recognition Toolkit
No ratings yet
ALIZE 3.0: Speaker Recognition Toolkit
6 pages
Forensic Speaker Identification 1st Edition Philip Rose Instant Download
No ratings yet
Forensic Speaker Identification 1st Edition Philip Rose Instant Download
52 pages
Azure Ai Services Speech Service
No ratings yet
Azure Ai Services Speech Service
1,442 pages
SESTEK Voice Biometrics
No ratings yet
SESTEK Voice Biometrics
9 pages
Advanced Speaker Recognition Tech
No ratings yet
Advanced Speaker Recognition Tech
10 pages
Research Notes On Vlsi
No ratings yet
Research Notes On Vlsi
70 pages
Forensic Discourse Analysis
No ratings yet
Forensic Discourse Analysis
7 pages
Product Brief - RSC-4128
No ratings yet
Product Brief - RSC-4128
2 pages
Biometrics
No ratings yet
Biometrics
26 pages
Biometrics As Evidence-Right To Privacy
No ratings yet
Biometrics As Evidence-Right To Privacy
316 pages
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
No ratings yet
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
6 pages

Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms

Uploaded by

Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms

Uploaded by

Voice Recognition with Neural Networks,

Type-2 Fuzzy Logic and Genetic Algorithms

Index TermsType-2 Fuzzy Logic, Neural Networks, Genetic

Fig. 1 shows the basic components of speaker

Fig. 5. Spectral analysis of a specific word using the FFT.

TABLE 1. RESULTS OF FEEDFORWARD NEURAL NETWORKS FOR 20 WORDS

We describe below some simulation results of our approach

Fig. 7. Input signal of the word "example" with noise

From the figures 7 to 10 it is clear that simple monolithic

Fig. 8. Identification of the word "example".

Fig. 10. Identification of the word "layer".

We now show some examples to illustrate the hybrid

Fig. 9. Input signal of the word "layer" with noise added.

V. VOICE RECOGNITION WITH MODULAR NEURAL NETWORKS

Fig. 11. Sound signal of the word "example" .

The output can be interpreted as giving us the membership

Output = [0.0002, 0.0041, 0.0037, 0.0013, 0.0091,

We can note that now the membership value to the word

The same modular neural network approach was extended to

[2] S. Furui, "Cepstral analysis technique for automatic speaker verification",

[3] S. Furui, "Research on individuality features in speech waves and

[4] S. Furui, "Speaker-independent isolated word recognition using dynamic

[5] S. Furui, "Digital Speech Processing, Synthesis, and Recognition". Marcel

[6] S. Furui, "Speaker-dependent-feature extraction, recognition and

[7] S. Furui, "An overview of speaker recognition technology", Proceedings

[8] A. L. Higgins, L. Bahler, and J. Porter, "Speaker verification using

[10] T. Matsui, and S. Furui, "Concatenated phoneme models for text-variable

[11] T. Matsui, and S. Furui, "Similarity normalization method for speaker

[12] P. Melin, M. L. Acosta, and C. Felix, "Pattern Recognition Using Fuzzy

You might also like