-
Notifications
You must be signed in to change notification settings - Fork 70
Update validation loss and links for Jorge Vanco #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I added learned value embeddings just like in the modded nano-gpt repo. I have also further reduced the number of layers to 10. I have also added de muon momentum warmup. It barely has any improvement, but as it is already very hard to improve, I might as well add it as it does not affect the speed. |
|
good! |
|
Edit 10/29/25
|
|
Thats amazing improvement! 👍 When evaluating on the validation set, I assume you do so with the new maximum context length(1792)? |
|
Thank you! |
|
Ah, I should have made this clear in this repo earlier (will add it to the readme) but verification will happen at context length 512. I think this might have been something we communicated over our class Slack which should have been added to the writeup. Could you eval using these settings? Sorry for the confusion! |
|
Great work! |
|
Sure! I'll update it in the next couple of days. Thanks for the clarification! It was to good to be true haha |
Sorry about highjacking the thread Jorge, but could you clarify a bit about what is and isn't allowed in leaderboard submissions? |
I reduced the number of layers to 12 and doubled the batch size to 256. Had to modify the number of steps to 18200 steps.