GitHub - LuluW8071/Deep-Speech-2: Implementation of Deep Speech 2 paper with Parallel MinGRU in Lightning AI :zap:

Deep Speech 2 with Parallel MinGRU Implementation

This repository contains an implementation of the paper Deep Speech 2: End-to-End Speech Recognition and newly proposed parallel minGRU architecture from Were RNNs All We Needed? using PyTorch 🔥 and Lightning AI ⚡.

📜 Paper & Blogs Review

📖 Introduction

Deep Speech 2 was a state-of-the-art ASR model designed to transcribe speech into text with end-to-end training using deep learning techniques in 2015.

On the other hand, Were RNNs All We Needed? introduces a new RNN-based architecture with a parallelized version of the minGRU (Minimum Gated Recurrent Unit), aiming to enhance the efficiency of RNNs by reducing the dependency on sequential data processing. This architecture enables faster training and inference, making it potentially more suitable for ASR tasks and other real-time applications.

Installation

Clone the repository:

git clone --recursive https://github.com/LuluW8071/Deep-Speech-2.git
cd deep-speech-2

Install Pytorch and required dependencies:
```
pip install -r requirements.txt
```
Ensure you have PyTorch and Lightning AI installed.

Usage

Training

Important

Before training make sure you have placed comet ml api key and project name in the environment variable file .env.

To train the Deep Speech 2 model, use the following command for default training configs:

python3 train.py

Customize the pytorch training parameters by passing arguments in train.py to suit your needs:

Refer to the provided table to change hyperparameters and train configurations.

Args	Description	Default Value
`-g, --gpus`	Number of GPUs per node	1
`-g, --num_workers`	Number of CPU workers	8
`-db, --dist_backend`	Distributed backend to use for training	ddp_find_unused_parameters_true
`--epochs`	Number of total epochs to run	50
`--batch_size`	Size of the batch	32
`-lr, --learning_rate`	Learning rate	2e-4 (0.0002)
`--checkpoint_path`	Checkpoint path to resume training from	None
`--precision`	Precision of the training	16-mixed

python3 train.py 
-g 4                   # Number of GPUs per node for parallel gpu training
-w 8                   # Number of CPU workers for parallel data loading
--epochs 10            # Number of total epochs to run
--batch_size 64        # Size of the batch
-lr 2e-5               # Learning rate
--precision 16-mixed   # Precision of the training
--checkpoint_path path_to_checkpoint.ckpt    # Checkpoint path to resume training from

Results

The model was trained on LibriSpeech train set (100 + 360 + 500 hours) and validated on the LibriSpeech test set ( ~ 10.5 hours).

Citations

@misc{amodei2015deepspeech2endtoend,
      title={Deep Speech 2: End-to-End Speech Recognition in English and Mandarin}, 
      author={Dario Amodei and Rishita Anubhai and Eric Battenberg and Carl Case and others,
      year={2015},
      url={https://arxiv.org/abs/1512.02595}, 
}

@inproceedings{Feng2024WereRA,
    title   = {Were RNNs All We Needed?},
    author  = {Leo Feng and Frederick Tung and Mohamed Osama Ahmed and Yoshua Bengio and Hossein Hajimirsadegh},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:273025630}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Speech 2 with Parallel MinGRU Implementation

📜 Paper & Blogs Review

📖 Introduction

Installation

Usage

Training

Results

Citations

About

Releases

Packages

Languages

License

LuluW8071/Deep-Speech-2

Folders and files

Latest commit

History

Repository files navigation

Deep Speech 2 with Parallel MinGRU Implementation

📜 Paper & Blogs Review

📖 Introduction

Installation

Usage

Training

Results

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages