-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T0 (p=1) replicability #35
Comments
thanks for your patience @tuhinjubcse
I'll let @awebson confirm!
Yeah trained for 12'200 steps (don't think we ever reached even one epoch). 1'112'200 is coming from 1'000'000 t5 pertaining + 100'000 lm steps to obtain t5-lm + 12'200 steps of multitask fine-tuning
The mixtures
Since in tf the shapes are fixed (and not dynamic), we need to make sure to reduce padding as much as possible to make the best use of the compute. Packing means concatenating multiple inputs on the encoder side, and predicting the concatenation of the targets. Code: https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/dataset.py#L64 |
Hi
@VictorSanh
Thanks for releasing the code and data. I am trying to retrain it in pytorch
Some questions , in your paper you have p=1 vs p=5.7 results
Say for p=1 we take one random prompt per example of a dataset. This is fine perfectly
I have some doubts about the
The text was updated successfully, but these errors were encountered: