Add overlong reward shaping for DAPO. #947

copybara-service · 2026-01-06T21:39:22Z

Add overlong reward shaping for DAPO.
Refactor rl_learner.compute_reward to use reward_manager

Refactor rl_learner.compute_reward to use reward_manager PiperOrigin-RevId: 852838075

copybara-service bot requested review from abheesht17, hgao327, jiangyangmu, lc5211, sizhit2, tianshub and wang2yn84 as code owners January 6, 2026 21:39

copybara-service bot had a problem deploying to testing January 6, 2026 21:39 Failure

Add overlong reward shaping for DAPO.

5ac4f1c

Refactor rl_learner.compute_reward to use reward_manager PiperOrigin-RevId: 852838075

copybara-service bot force-pushed the test_852838075 branch from bf3cef0 to 5ac4f1c Compare January 7, 2026 22:13

copybara-service bot had a problem deploying to testing January 7, 2026 22:14 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add overlong reward shaping for DAPO. #947

Add overlong reward shaping for DAPO. #947

Uh oh!

copybara-service bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add overlong reward shaping for DAPO. #947

Are you sure you want to change the base?

Add overlong reward shaping for DAPO. #947

Uh oh!

Conversation

copybara-service bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant