Update validation loss and links for Jorge Vanco #94

JorgeVanco · 2025-09-05T18:19:47Z

I reduced the number of layers to 12 and doubled the batch size to 256. Had to modify the number of steps to 18200 steps.

JorgeVanco · 2025-09-24T02:29:23Z

I added learned value embeddings just like in the modded nano-gpt repo. I have also further reduced the number of layers to 10.

I have also added de muon momentum warmup. It barely has any improvement, but as it is already very hard to improve, I might as well add it as it does not affect the speed.

whiteOsky · 2025-10-09T11:51:53Z

good！

JorgeVanco · 2025-10-30T00:12:11Z

Edit 10/29/25

Slightly increased the overall learning rate, as well as the learning rate for the embeddings.
Implemented YaRN to progressively increase the sequence length from 256 to 1792.
Added the NorMuon update.
Increased weight decay to $0.01$.

vskogstad · 2025-10-30T10:31:58Z

Thats amazing improvement! 👍

When evaluating on the validation set, I assume you do so with the new maximum context length(1792)?
I noticed for my model that validation loss decreases by just doubling the context length and decreasing batch size during validation. Even with no training at extended context. This is/was also done in the NanoGPT-speedrun at some point. It seems like we get some gains during validation just from decreasing the occurences when the model has very little available context and has to make a guess.
(Just to be clear: I think you validating your model with large context length is correct, as you've actually trained the model on that context length.)

JorgeVanco · 2025-10-31T00:20:07Z

Thank you!
Yes exactly, I am running validation with the maximum context length.

marcelroed · 2025-11-04T01:30:57Z

Ah, I should have made this clear in this repo earlier (will add it to the readme) but verification will happen at context length 512. I think this might have been something we communicated over our class Slack which should have been added to the writeup.

Could you eval using these settings? Sorry for the confusion!

marcelroed · 2025-11-04T01:32:40Z

Great work!

JorgeVanco · 2025-11-04T02:43:34Z

Sure! I'll update it in the next couple of days. Thanks for the clarification! It was to good to be true haha

vskogstad · 2025-11-05T13:04:44Z

Ah, I should have made this clear in this repo earlier (will add it to the readme) but verification will happen at context length 512. I think this might have been something we communicated over our class Slack which should have been added to the writeup.

Could you eval using these settings? Sorry for the confusion!

Sorry about highjacking the thread Jorge, but could you clarify a bit about what is and isn't allowed in leaderboard submissions?
For assignment 1 we are supposed to build everything from scratch, and as such there are limitations on using torch.nn. Is this requirement waivered for the leaderboard? (Basically can we use torch.nn.functional.scaled_dot_product_attention, flex_attention etc. to get system speedups)
In the readme it says:
"The code must clearly be your own work, and you can't use external implementations for systems-critical aspects of your model."
If we cant use torch.nn, can we write our own kernels? What level of abstraction still qualifies as our own work? (Cuda only or can we use higher level abstractions like Triton/Thunderkittens)

JorgeVanco added 2 commits September 5, 2025 13:15

Update validation loss and links for Jorge Vanco

bfb3c4d

Added new run for Jorge Vanco in leaderboard

32a60f6

Update validation loss for Jorge Vanco

6115350

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update validation loss and links for Jorge Vanco #94

Update validation loss and links for Jorge Vanco #94

JorgeVanco commented Sep 5, 2025

Uh oh!

JorgeVanco commented Sep 24, 2025

Uh oh!

whiteOsky commented Oct 9, 2025

Uh oh!

JorgeVanco commented Oct 30, 2025

Uh oh!

vskogstad commented Oct 30, 2025

Uh oh!

JorgeVanco commented Oct 31, 2025

Uh oh!

marcelroed commented Nov 4, 2025

Uh oh!

marcelroed commented Nov 4, 2025

Uh oh!

JorgeVanco commented Nov 4, 2025

Uh oh!

vskogstad commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update validation loss and links for Jorge Vanco #94

Are you sure you want to change the base?

Update validation loss and links for Jorge Vanco #94

Conversation

JorgeVanco commented Sep 5, 2025

Uh oh!

JorgeVanco commented Sep 24, 2025

Uh oh!

whiteOsky commented Oct 9, 2025

Uh oh!

JorgeVanco commented Oct 30, 2025

Uh oh!

vskogstad commented Oct 30, 2025

Uh oh!

JorgeVanco commented Oct 31, 2025

Uh oh!

marcelroed commented Nov 4, 2025

Uh oh!

marcelroed commented Nov 4, 2025

Uh oh!

JorgeVanco commented Nov 4, 2025

Uh oh!

vskogstad commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants