TABLE OF CONTENTS
Declaration i
Certificate of the Supervisor ii
Acknowledgements iii
List of Publications iv
Abstract v
Table of Contents vii
List of Tables xii
List of Figures xiv
List of Abbreviations xvii
CHAPTER I. INTRODUCTION 1-13
1.1 Fundamentals of Speaker Recognition 1
1.2 Applications 7
1.3 Historical Achievements in Speaker Recognition Technology 8
1.4 Challenges to the Speaker Recognition System 9
1.5 Motivation 10
1.6 Problem Formulation 11
1.7 Objectives of Research 11
1.8 Organization of Thesis 12
CHAPTER II. LITERATURE REVIEW 14- 43
2.1 Introduction 14
2.1.1 Speech Production Mechanism in Human Beings 15
2.1.2 Source Filter Model of Speech Production 17
2.1.3 Short Term Analysis of Speech Signal 19
2.2 Basic Structure of Speaker Recognition System 19
2.3 Voice Activity Detection 22
2.4 Feature Extraction Methods used in Speaker Recognition 23
vii
CONTENTS Page No.
2.4.1 Spectral Features 24
2.4.2 Dynamic Features 25
2.4.3 Prosodic Features 26
2.4.4 High-level Features 27
2.5 Speaker Modeling - Classical Approaches 27
2.5.1 Template Models 28
2.5.2 VQ Source Modeling 29
2.5.3 Hidden Markov Model 30
2.5.4 Neural Networks 31
2.5.5 Support Vector Machines 32
2.5.6 Gaussian Mixture Models 32
2.6 Dimensionality Reduction Techniques 35
2.7 Performance Terms for Speaker Recognition Task 36
2.8 Gaps in the Study 41
2.9 Conclusions 42
CHAPTER III. FEATURE EXTRACTION 44-76
3.1 Introduction 44
3.2 Pre-processing 47
3.2.1 Pre-emphasis 48
3.2.2 Voice Activity Detection 49
3.3 Proposed Method of Voice Activity Detection 52
3.4 Mel Frequency Cepstral Coefficients 54
3.4.1 Frame Blocking 55
3.4.2 Windowing 56
3.4.3 Short Term Fast Fourier Transform 57
3.4.4 Mel-Frequency Warping 57
3.4.5 Log Compression and Discrete Cosine Transform 59
3.4.6 Delta and Delta-Delta Coefficients 60
viii
CONTENTS Page No.
3.5 Simulation 62
3.5.1 Voice Activity Detection 63
3.5.2 MFCC 63
3.6 Feature Extraction using MFCC and its Derivatives 65
3.6.1 Number of filters in the filter bank vs. Identification Rate 65
3.6.2 Effect of variation in Type of Window 66
3.6.3 Effect of Adding Derivatives 67
3.7 Effect of VAD on Speaker Recognition Rate 69
3.8 Factors affecting MFCC performance 71
3.9 Conclusions 75
CHAPTER IV. SPEAKER MODELING 77-109
4.1 The Neural Network 77
4.2 Network Structures 80
4.3 Training of Artificial Neural Networks 82
4.4 Implementation of the Speaker Recognition System using Back 86
Propagation Algorithm
4.5 Support Vector Machines 89
4.6 SVM Classification Mechanism 91
4.6.1 Linear Separable Case 91
4.6.2 Linear Non-separable Case 94
4.6.3 Nonlinear Case 95
4.7 Implementation of the Speaker Recognition System using SVM 97
4.8 Performance of the Speaker Recognition System 100
4.8.1 Performance of the Speaker Identification System in Presence of 100
Noise
4.8.2 Relative Performance of SVM and Neural Network in a Speaker 102
Recognition System
ix
CONTENTS Page No.
4.9 Real Time Speaker Recognition System for Hindi Words 103
4.9.1 Methodology 104
4.9.2 Graphical User Interface (GUI) for Real Time Speaker 106
Recognition
4.9.3 Display on LCD 108
4.10 Conclusions 109
CHAPTER V. DIMENSIONALITY REDUCTION OF FEATURE 110-128
VECTORS
5.1 Introduction 110
5.2 Genetic Algorithms 113
5.3 Feature Selection using GA 116
5.4 Performance of the Speaker Recognition System using GA 117
5.4.1 Effect of Noise on Speaker Recognition Rate 119
5.4.2 Processing Time 121
5.4.3 Effect of Number of Utterances per Speaker on Recognition Rate 122
5.4.4 Relative Performance of GA and PCA in a Speaker 123
Recognition System
5.4.5 Performance of GA with Different Kernel Functions of SVM 125
using Reduced Dimensional Feature Vectors
5.5 Conclusions 127
CHAPTER VI. CONCLUSIONS AND FUTURE WORK 129-135
6.1 Introduction 129
6.2 Summary and Findings 130
6.3 Future Scope 135
x
CONTENTS Page No.
APPENDICES
A. Voicebox 136
B. Description of Speaker Databases 137
REFERENCES 139
BRIEF PROFILE OF THE RESEARCH SCHOLAR 151
xi