Support for partial data usage for LibriSpeech #105

kushal-g · 2021-09-24T05:42:48Z

There should be a functionality where instead of having to download entire dataset and train on it, we could download just partial data and use only that for training. And if not, then the documentation should clearly mention how the dataset directory structure should look like so that it's easier for us to use our own partial dataset. I'm currently trying to train a RNN-T model and I keep facing issues with directory structure.

Command that I'm using
python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path=/home/guest/flsp/SpeechToText/RNN-T/openspeech/LIBRISPEECH_AUTO_DOWNLOAD/LibriSpeech dataset.manifest_file_path=/home/guest/flsp/SpeechToText/RNN-T/openspeech/LIBRISPEECH_AUTO_MANIFEST tokenizer=libri_subword model=rnn_transducer audio=melspectrogram lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu

The text was updated successfully, but these errors were encountered:

sooftware · 2021-09-25T07:58:34Z

There were many questions about the directory structure, so I thought I should document it.
Please wait for a moment.

kushal-g · 2021-09-29T06:32:48Z

What is the status of this?

sooftware added FEATURE REQUEST QUESTION Further information is requested labels Sep 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for partial data usage for LibriSpeech #105

Support for partial data usage for LibriSpeech #105

kushal-g commented Sep 24, 2021

sooftware commented Sep 25, 2021

Uh oh!

kushal-g commented Sep 29, 2021

Uh oh!

Support for partial data usage for LibriSpeech #105

Support for partial data usage for LibriSpeech #105

Comments

kushal-g commented Sep 24, 2021

sooftware commented Sep 25, 2021

Uh oh!

kushal-g commented Sep 29, 2021

Uh oh!