An implementation of DeepSpeech's pre-trained English model for use in the SLaM lab at the University of Florida.
- deepspeech 0.6.1
- jiwer 1.3.2
- numpy 1.17.3
- PyAudio 0.2.11
- scipy 1.3.1
- tensorflow 1.14.0
This interface is intended to make using DeepSpeech pre-trained models more compatible with large batches of audio files. It also includes build-in error calculations. Currently, Word Error Rate is implemented using the jiwer Package.
-hhelp-iinputdir, the path to a directory containing audio files; audio files should be in.wavformat; transcriber will skip any files without a.wavextension-ooutputfile, the path to an empty.txtfile where the results are written-mmodel directory, the path to the directory containing the DeepSpeech pre-trained model; should contain-rerror type, current options:wordphone; if-ris not included, error will not be calculated-gground_truth, the path to a.txtfile containing the intended transcriptions of the audio files; used to calculate error rate; transcriptions should be in sorted order, separated by new lines-alm_alpha, the relative weight of language model vs. Correctionist Temporal Classification (CTC)-blm_beta, considers more words
python3 SLaM_DeepSpeech -i $HOME/deepspeech-venv/SLaM_DeepSpeech/SLaM_DeepSpeech/temp -o $HOME/deepspeech-venv/SLaM_DeepSpeech/SLaM_DeepSpeech/temp/output.txt -m $HOME/deepspeech-venv/deepspeech-0.6.1-models -r word -g $HOME/deepspeech-venv/SLaM_DeepSpeech/SLaM_DeepSpeech/temp/gt.txt