You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On page 3 the paper mentioned that they "retrain the DeepSeek-V3-Base model.", so I'm wondering if there was a typo in both places and it should have been "the converged reasoning-oriented RL checkpoint"?
Please correct me if I am wrong, I guess you tried mentioning
DeepSeek-R1
instead ofDeepSeek-V3-Base
as shown in the image below.The text was updated successfully, but these errors were encountered: