[fully_async] feat: standalone log prob server (Model Engine Server) support by sl-1314 · Pull Request #5990 · verl-project/verl

sl-1314 · 2026-04-13T09:51:16Z

What does this PR do?

This PR introduces a standalone Model Engine Server for the fully_async training pipeline to compute log_probs. In the existing design, old_log_probs are recomputed by the actor training engine, which requires saving/restoring actor weights. This PR decouples that computation into a dedicated inference server(allocate additional GPUs for log_probs computation) that runs concurrently with rollout generation.

Currently, only support megatron backend.

The implementation follows the existing RolloutReplica / BaseRollout / CheckpointEngineWorker architecture:

Class	Role
`ModelEngineReplica`	`RolloutReplica` subclass — resource allocation, lifecycle, weight sync
`ModelEngineWorker`	Ray Actor per GPU — `CheckpointEngineWorker` subclass, receives weights
`ModelEngineServerAdapter`	`BaseRollout` adapter — wraps `TrainingWorker` for forward-only inference
`ModelEngineServer`	Ray Actor — async batch scheduler with `pause_serving`/`resume_serving` protocol

To use Model Engine Server, you need to use mbridge and apply this PR: ISEEKYAN/mbridge#117

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Using Model Engine Server, will slightly increase timing_s/gen, but it can effectively eliminate timing_s/old_log_prob and reduce end-to-end time.

Compared to the original old_log_prob calculating method(16 gpus training and 16 gpus rollout), Model Engine Serve(8 gpus addition), resulting in an end-to-end time speedup of approximately 1.64x. Considering the increased resource consumption, the speedup ratio is about 1.09x.

API and Usage Example

Enable with:

model_engine_server.enable_standalone: True
model_engine_server.nnodes: nnodes
model_engine_server.n_gpus_per_node: n_gpus_per_node
# independent model_parallel config
model_engine_server.megatron.pipeline_model_parallel_size= pp_size
model_engine_server.megatron.tensor_model_parallel_size=tp_size
......

example script:

verl/experimental/fully_async_policy/shell/dapo_7b_math_megatron_16_16_8.sh

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

This reverts commit 5b8d9b7.

gemini-code-assist

Code Review

This pull request introduces a standalone ModelEngineServer to compute 'old' log probabilities for fully asynchronous training. Key changes include the implementation of ModelEngineReplica and ModelEngineWorker, updates to the agent loop to handle response_oldlogprobs, and enhancements to Megatron utilities for weight synchronization via async generators. A critical bug was identified in the tool_agent_loop.py where old log probabilities were being appended to the wrong data list, which would lead to incorrect training data.

verl/experimental/agent_loop/tool_agent_loop.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…meituan-search/verl into standalone_old_log_prob_support

ArronHZG

I feel it also needs to support multiple replicas and have a server manager for data distribution.

Furthermore, modifications to the main branch need to be reduced, converging to fully async.

verl/experimental/separation/ray_trainer.py

verl/experimental/fully_async_policy/fully_async_trainer.py

verl/experimental/fully_async_policy/fully_async_rollouter.py

verl/experimental/fully_async_policy/fully_async_main.py

verl/experimental/fully_async_policy/agent_loop/agent_loop.py

verl/experimental/agent_loop/agent_loop.py

verl/experimental/fully_async_policy/agent_loop/agent_loop.py

ArronHZG · 2026-04-13T10:24:30Z

In the experiment, even without rejection sampling correction, the rollout mismatch metric needs to be enabled to observe the results.

Additionally, the current implementation can display the Entropy metric.

verl/models/mcore/mbridge.py

verl/experimental/fully_async_policy/agent_loop/agent_loop.py

verl/experimental/agent_loop/agent_loop.py

verl/experimental/fully_async_policy/agent_loop/agent_loop.py

verl/experimental/fully_async_policy/fully_async_main.py

verl/experimental/fully_async_policy/fully_async_rollouter.py

sl-1314 and others added 30 commits March 12, 2026 10:50

feat: add extra param sync groups for checkpoint engine

5b8d9b7

feature: support old_log_prob_server

f9f0133

fix: unify configs

7829343

fix: add wait for

2d48570

fix: restore debug code

734cd3e

fix: refator OldLogProbServer

70bccbd

fix: run e2e exp

bb9e96b

Revert "feat: add extra param sync groups for checkpoint engine"

9c7f927

This reverts commit 5b8d9b7.

fix: misc

a2c8d3f

Merge branch 'main' into standalone_old_log_prob_support

2f4b7fa

refactor: refactor old_log_prob_server to rollout replica

15b0eb6

fix: revert unnessary change

adf2e16

refacor: move old_log_prob_server to fully_async/

8a01834

fix: resotre unnessary code

944c9f5

fix: resotre unnessary code

c41f236

simplify old_log_prob arch

5707a9c

update: notes

59d3cd7

fix: misc

9c97293

feat: support per tensor load

024b053

update: misc

0ff02e4

fix: remove redundant param

ea93e97

update: rename

060b0d2

mv

3a60c6c

fix: rename misc

3666fd1

fix: clean unused code

2a0cb1f

fix: pre commit

2adb3cb

fix: mv config

08b92ce

fix: misc

37ac30c

update: remove unused code

cedf3e4

update: remove unused code

a8e27ba

sl-1314 requested review from ArronHZG, ISEEKYAN, PeterSH6, eric-haibin-lin, tongyx361, vermouth1992 and wuxibin89 as code owners April 13, 2026 09:51

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

verl/experimental/agent_loop/tool_agent_loop.py Outdated Show resolved Hide resolved

sl-1314 and others added 2 commits April 13, 2026 17:58

Update verl/experimental/agent_loop/tool_agent_loop.py

7d75b0a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'standalone_old_log_prob_support' of https://github.com/…

0ff7de2

…meituan-search/verl into standalone_old_log_prob_support

ArronHZG reviewed Apr 13, 2026

View reviewed changes

update: revise review comments

999fb68

ArronHZG reviewed Apr 13, 2026

View reviewed changes

verl/models/mcore/mbridge.py Outdated Show resolved Hide resolved

sl-1314 added 5 commits April 14, 2026 01:29

update: rename variables

4b04e7b

update: remove mbridge code

752a831

update: rename

3be4d98

update: mv to extra_fields

98f7c4a

update: remove other fix

354b026

ArronHZG reviewed Apr 14, 2026

View reviewed changes

sl-1314 added 3 commits April 14, 2026 15:35

update: simplify validate code

c920d62

update: remove shutdown

00252c5

update: misc

3559cd8

Conversation

sl-1314 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ArronHZG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArronHZG commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sl-1314 commented Apr 13, 2026 •

edited

Loading