Replies: 2 comments
-
|
Yep! We have an example we'll be merging soon where we got openai's learning to summarize reward model working with TRLX on a 20b language model. We also have a very minimal version of CodeRL working, it's included as an example here. We've also been discussing TRLX with plenty of RLHF industry folks and have gotten a few seals of approval at this point. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
What's the largest PPO model size that has been trained and tested with TRLX? Can you share some performance metrics, i.e. GPU count, training time? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Has this been tested on anything?
Beta Was this translation helpful? Give feedback.
All reactions