Update README.md #62

slimtune2023 · 2025-06-10T07:10:57Z

I trained this model using the optimal off-policy hyperparameters from before, with learning rate of 3e-5, train batch size = rollout batch size = 256, GRPO clip loss, no mean normalization, and no standard deviation normalization. This had the best results for my shorter runs, so I decided to use this for the final model as well.

Update README.md

a4b6999

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update README.md #62

Update README.md #62

Uh oh!

slimtune2023 commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update README.md #62

Are you sure you want to change the base?

Update README.md #62

Uh oh!

Conversation

slimtune2023 commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant