Make sure the deepspeech version are same for training and running
- clone the deepspeech via- https://github.com/mozilla/DeepSpeech
- then
git checkout v 0.5.1
- cd DeepSpeech
pip install -r requirements.txt
- Download ngram processing tool kenlm.
$sudo apt install zlib1g-dev libbz2-dev liblzma-dev libeigen3-dev libboost1.65-all-dev cmake
$mkdir build
$cd build
$cmake ..
$sudo make install
-
Install native_client. This is the pre-processing tool that comes with deepSpeech and can help with pre-processing. runs in the root directory of deepSpeech:
python3 util/taskcluster.py --arch gpu --target ./native_client
-
Prepare dataset. There should be three folders train, test, and dev. Each folder should contain all the audio files with csv. Which should have three columns. wav_file, wav_size, wav_transcript.
-
Create the alphabet, vocubulary and lm.binary file.
-
Save the below code as train.sh and run in from the deepspeech root directory.
set -xe
if [ ! -f DeepSpeech.py ]; then
echo "Please make sure you run this from DeepSpeech's top level directory."
exit 1
fi;
python3.5 -u DeepSpeech.py \
--train_files /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/train/train.csv \
--test_files /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/test/test.csv \
--train_batch_size 80 \
--test_batch_size 40 \
--n_hidden 375 \
--epoch 3 \
--validation_step 1 \
--early_stop True \
--earlystop_nsteps 6 \
--estop_mean_thresh 0.1 \
--estop_std_thresh 0.1 \
--dropout_rate 0.22 \
--learning_rate 0.00095 \
--report_count 100 \
--use_seq_length False \
--export_dir /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/results/model_export/ \
--checkpoint_dir /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/results/checkout/ \
--decoder_library_path /home/nvidia/tensorflow/bazel-bin/native_client/libctc_decoder_with_kenlm.so \
--alphabet_config_path /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/alphabet.txt \
--lm_binary_path /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/lm.bin \
--lm_trie_path /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/trie \
"$@"
- running the model
deepspeech --model model_data/output_graph.pb --alphabet model_data/alphabet.txt --lm model_data/lm.bin --trie model_data/tri --audio model_data/x.wav