How to adapt knowledge-distillation to the ASR classes? #2041
Replies: 3 comments
-
|
Actually EncDecCTCModel and EncDecCTCBPEModel are LightningModule, and you can call the forward like the example you provided. They just have some extra methods to support more capabilities. You should be able to use the same trick in that example for these two models. If you follow the approach used in the example, I think you do not need to worry about the data processing pipeline. You may start by creating a new class inheriting EncDecCTCModel or EncDecCTCBPEModel and overriding init and training_step methods. For Conformer or Citrinet, BPE-based models should be better in terms of accuracy. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks a lot! I'll give it a try! |
Beta Was this translation helpful? Give feedback.
-
|
I would suggest also reading through the forward step + train/validation steps. RNNT in particular is not sufficient to just call forward - there's a lot of extra steps done after that inside the train and validation steps. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Thanks for this great toolkit!
I'm seriously looking at implementing some knowledge distillation techniques for the ASR classes. I found this very simple and elegant pytorch-lightning approach here: https://github.com/vrvlive/knowlege-distillation/blob/master/training_module.py and I wonder how I could adapt it to
EncDecCTCModelorEncDecCTCBPEModelclasses (as well as for the rnn-t or conformer cases).I would like a pointer or two, if possible, since the NeMo classes aren't trivial
pl.LightningModule. What "worries" me the most is how the audio preprocessing pipeline would impact how we can simply call "forward()" for example.Where should I start?
Beta Was this translation helpful? Give feedback.
All reactions