Replies: 2 comments
-
@nithinraok Could you please assist me with some thoughts regarding that? |
Beta Was this translation helpful? Give feedback.
0 replies
-
It is not recommended to try SSL with that size. I don;t think you would see any benefits.
However, you might not see much benefits with 300hrs of pretraining data. You could use this dataset for more hrs: https://github.com/facebookresearch/libri-light |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to run SSL (Wav2Vec-BERT) for a small speech dataset (~300 hours), for ASR task. I understand that pretraining should be implemented with big, diverse datasets, resulting in representation learning that can be fine-tuned later for downstream tasks. However, I am trying to explore the gain of SSL for an English meduim dataset of a particular domain and compare it with fine-tuning scenarios. Could you please assist me with some tips I should consider in my case? I am using the fast-conformer config following the instructions in https://github.com/NVIDIA/NeMo/tree/main/examples/asr/speech_pretraining
I also have audio of different durations varying in [1sec-100sec]. I want to make use of all the data I have since it is limited, so I kept the max_duration and min_duration to 100sec and 1sec. Can that conflict with anything in the SSL config? Should I consider small learning rates or a particular variant in terms of the model size (e.g., large vs. XLarge), etc? If I want to use Lhotse, should I consider any settings that suit the size of my dataset (~300 hours)?
I also want to try NEST for SSL. Should I have different settings than the default in the config file since I have a much smaller dataset? What factors should I consider when augmenting with the noise, given the size of the data I have? Any tips can help. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions