Skip to content

r0nl/HMMBasedDigitRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hidden Markov Model Based Digit Recognition System

=====================================================

Overview


This system implements a Hidden Markov Model (HMM) based approach for digit recognition. The core component is the Viterbi algorithm, which determines the most likely sequence of hidden states (digits) that produced an observed sequence of features extracted from audio. The recognized digits are stored in .txt format.

System Components


1. Hidden Markov Models (HMMs)

  • Definition: HMMs are statistical models that represent systems which change states according to transition probabilities.
  • Implementation: The system uses discrete HMMs where each state corresponds to a portion of a spoken digit.

2. K-means Clustering

  • Definition: K-means is an unsupervised algorithm that partitions feature vectors into K clusters.
  • Role: Used to quantize the speech feature vectors into a finite set of symbols.

3. Linde-Buzo-Gray (LBG) Algorithm

  • Definition: An algorithm for vector quantization that divides the feature space into regions based on centroids.
  • Role: Optimizes the codebook for vector quantization, improving recognition accuracy.

Workflow


Step 1: Feature Extraction

  • Extract feature vectors from audio signals.
  • Apply K-means clustering to group similar features.

Step 2: Vector Quantization

  • Use the LBG algorithm to quantize feature vectors.
  • Create codebooks for efficient representation.

Step 3: HMM Training

  • Train the HMM parameters (a, b, pi) for each digit.
  • Optimize parameters using forward-backward algorithm.

Step 4: Recognition Using Viterbi

  • For a new audio sample, extract and quantize features.
  • Use the Viterbi algorithm to find the most likely digit.
  • Store recognition results in text format.

Implementation Details


Data Structures

  • Transition Matrix: a[i][j] stores the probability of transitioning from state i to j.
  • Emission Matrix: b[i][j] stores the probability of observing symbol j in state i.
  • Viterbi Matrix: D[t][i] stores the maximum probability of being in state i at time t.
  • Backpointer Matrix: Psy[t][i] tracks the path for backtracking.

Optimizer Function

The system includes an optimizer function to improve model parameters (incomplete in the provided code).

Usage Example


  1. Training:

    • Prepare audio samples of digits.
    • Extract features and quantize them.
    • Train HMM parameters for each digit.
  2. Recognition:

    • For a new audio sample, extract features.
    • Run the Viterbi algorithm for each digit model.
    • Select the digit with the highest probability.
    • Save the recognized digit to a text file.

Future Improvements


  • Continuous HMMs: Extend to continuous observation densities.
  • Context Dependency: Implement context-dependent models for improved accuracy.
  • Deep Learning Integration: Combine HMMs with neural networks for feature extraction.

References


  • L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE, Vol. 77, No. 2, pp. 257-286, 1989.
  • Y. Linde, A. Buzo, and R. Gray, "An Algorithm for Vector Quantizer Design," IEEE Transactions on Communications, Vol. 28, pp. 84-95, Jan 1980.

About

Following is the Hidden Markov Model Implementation for Digit Recognition

Topics

Resources

License

Stars

Watchers

Forks