Add overlong reward shaping for DAPO. #947

copybara-service · 2026-01-06T21:39:22Z

Add overlong reward shaping for DAPO.
Refactor rl_learner.compute_reward to use reward_manager
Enable logging algo_config at learner init.

Refactor rl_learner.compute_reward to use reward_manager Enable logging `algo_config` at learner init. PiperOrigin-RevId: 857337442

copybara-service bot requested review from abheesht17, hgao327, jiangyangmu, lc5211, sizhit2, tianshub and wang2yn84 as code owners January 6, 2026 21:39

copybara-service bot had a problem deploying to testing January 6, 2026 21:39 Failure

copybara-service bot force-pushed the test_852838075 branch from bf3cef0 to 5ac4f1c Compare January 7, 2026 22:13

copybara-service bot had a problem deploying to testing January 7, 2026 22:14 Failure

copybara-service bot force-pushed the test_852838075 branch from 5ac4f1c to 349687d Compare January 13, 2026 22:48

copybara-service bot had a problem deploying to testing January 13, 2026 22:48 Failure

copybara-service bot force-pushed the test_852838075 branch from 349687d to 1b3d6ac Compare January 13, 2026 23:15

copybara-service bot had a problem deploying to testing January 13, 2026 23:15 Failure

copybara-service bot force-pushed the test_852838075 branch from 1b3d6ac to 8051428 Compare January 15, 2026 20:10

copybara-service bot had a problem deploying to testing January 15, 2026 20:10 Failure

copybara-service bot force-pushed the test_852838075 branch from 8051428 to cb3ff4a Compare January 16, 2026 00:11

copybara-service bot had a problem deploying to testing January 16, 2026 00:12 Failure

copybara-service bot temporarily deployed to testing January 16, 2026 00:12 Inactive

copybara-service bot force-pushed the test_852838075 branch from cb3ff4a to f275557 Compare January 16, 2026 00:34

copybara-service bot had a problem deploying to testing January 16, 2026 00:35 Failure

copybara-service bot had a problem deploying to testing January 16, 2026 00:35 Error

copybara-service bot force-pushed the test_852838075 branch from f275557 to 64eea94 Compare January 16, 2026 00:49

copybara-service bot had a problem deploying to testing January 16, 2026 00:50 Failure

copybara-service bot temporarily deployed to testing January 16, 2026 00:50 Inactive

copybara-service bot force-pushed the test_852838075 branch from 70c0d42 to 8bb597f Compare January 16, 2026 01:50

copybara-service bot had a problem deploying to testing January 16, 2026 01:50 Failure

copybara-service bot temporarily deployed to testing January 16, 2026 01:50 Inactive

copybara-service bot force-pushed the test_852838075 branch from 8bb597f to 7c56e6f Compare January 16, 2026 04:15

copybara-service bot had a problem deploying to testing January 16, 2026 04:15 Failure

copybara-service bot temporarily deployed to testing January 16, 2026 04:15 Inactive

copybara-service bot force-pushed the test_852838075 branch from 7c56e6f to a731aa5 Compare January 16, 2026 18:40

copybara-service bot had a problem deploying to testing January 16, 2026 18:40 Failure

copybara-service bot temporarily deployed to testing January 16, 2026 18:40 Inactive

copybara-service bot force-pushed the test_852838075 branch from a731aa5 to 450d882 Compare January 16, 2026 22:11

copybara-service bot temporarily deployed to testing January 16, 2026 22:11 Inactive

copybara-service bot force-pushed the test_852838075 branch from 450d882 to f717ea1 Compare January 16, 2026 22:52

copybara-service bot had a problem deploying to testing January 16, 2026 22:52 Failure

copybara-service bot temporarily deployed to testing January 16, 2026 22:52 Inactive

copybara-service bot force-pushed the test_852838075 branch from f717ea1 to dd3f4f7 Compare January 16, 2026 23:14

copybara-service bot temporarily deployed to testing January 16, 2026 23:15 Inactive

copybara-service bot had a problem deploying to testing January 16, 2026 23:15 Failure

copybara-service bot force-pushed the test_852838075 branch from dd3f4f7 to 3bdc2e7 Compare January 16, 2026 23:30

copybara-service bot had a problem deploying to testing January 16, 2026 23:30 Error

copybara-service bot force-pushed the test_852838075 branch from 3bdc2e7 to b06127b Compare January 16, 2026 23:40

copybara-service bot temporarily deployed to testing January 16, 2026 23:41 Inactive

Add overlong reward shaping for DAPO.

f55e0a0

Refactor rl_learner.compute_reward to use reward_manager Enable logging `algo_config` at learner init. PiperOrigin-RevId: 857337442

copybara-service bot force-pushed the test_852838075 branch from b06127b to f55e0a0 Compare January 17, 2026 00:08

copybara-service bot merged commit f55e0a0 into main Jan 17, 2026

copybara-service bot deleted the test_852838075 branch January 17, 2026 00:08

copybara-service bot temporarily deployed to testing January 17, 2026 00:08 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add overlong reward shaping for DAPO. #947

Add overlong reward shaping for DAPO. #947

Uh oh!

copybara-service bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add overlong reward shaping for DAPO. #947

Add overlong reward shaping for DAPO. #947

Uh oh!

Conversation

copybara-service bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

copybara-service bot commented Jan 6, 2026 •

edited

Loading