Skip to content

[rollout, tool] feat: add experimental agent framework and gateway runtime#5931

Draft
zackcxb wants to merge 4 commits intoverl-project:mainfrom
zackcxb:main
Draft

[rollout, tool] feat: add experimental agent framework and gateway runtime#5931
zackcxb wants to merge 4 commits intoverl-project:mainfrom
zackcxb:main

Conversation

@zackcxb
Copy link
Copy Markdown

@zackcxb zackcxb commented Apr 8, 2026

What does this PR do?

This PR adds an experimental agent framework and gateway runtime for multi-turn agent-style rollout
in VERL, according to #5790.

Specifically, it:

  • adds verl.experimental.agent_framework for a new abstraction for agent systems, with an example implementation,
  • adds verl.experimental.agent_gateway for OpenAI-compatible session serving and sticky session
    routing,
  • integrates gateway-backed session runtime into AsyncLLMServerManager,
  • adds focused tests for framework assembly, gateway actor/manager behavior, and session runtime
    ownership.

WIP:

  • Core implementation is ready for review.

  • I still need to finish the remaining TODO items before marking this ready for review:

  • e2e recipe test with SWE agents or other tool agents

  • multi-modal support

  • hygiene items for CI

    Checklist Before Starting

    • Search for similar PRs. Paste at least one query link here:
      https://github.com/verl-project/verl/pulls?q=is%3Apr+agent+framework+gateway
    • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
      • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci,
        training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc,
        perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async,
        one_step_off
      • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
      • {type} is in feat, fix, refactor, chore, test
      • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING]
        to the beginning of the title.
      • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

    Test

    • pytest -q tests/experimental/agent_framework tests/experimental/agent_gateway

    Result:

    • 22 passed, 1 warning

    API and Usage Example

    This PR adds experimental APIs under:

    • verl.experimental.agent_framework
    • verl.experimental.agent_gateway

    Example:

    from verl.experimental.agent_framework.framework import OpenAICompatibleAgentFramework
    from verl.experimental.agent_loop.agent_loop import AsyncLLMServerManager
    
    manager = AsyncLLMServerManager(
        config=None,
        servers=[],
        load_balancer_handle=None,
        gateway_count=1,
        gateway_actor_kwargs={
            "tokenizer": tokenizer,
            "backend": backend,
            "host": "127.0.0.1",
        },
    )
    
    framework = OpenAICompatibleAgentFramework(
        session_runtime=manager,
        agent_runner=agent_runner,
        reward_key="score",
    )

    Design & Code Changes

    High-level changes:

    • add experimental framework abstractions and trajectory assembly utilities,
    • add gateway actor/manager runtime for sessionized OpenAI-compatible serving,
    • let AsyncLLMServerManager own gateway lifecycle and session runtime.

    Checklist Before Submitting

    [!IMPORTANT]
    Please check all the following items before requesting a review, otherwise the reviewer might
    deprioritize this PR for review.

    (https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

    • If your PR is related to the recipe submodule, please also update the reference to the
      submodule commit via git submodule update --remote or cd recipe && git pull origin main.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new experimental agent framework for collecting and assembling agent trajectories. It includes a GatewayActor for managing OpenAI-compatible chat sessions, a GatewayManager for routing sessions across multiple gateways, and a TrajectoryAssembler to convert collected trajectories into a training-ready DataProto. Additionally, the AsyncLLMServerManager has been updated to support this new gateway-backed session runtime. I have no feedback to provide as there are no review comments.

@wuxibin89 wuxibin89 mentioned this pull request Apr 9, 2026
33 tasks
if "rm_scores" in batch_tensors:
meta_info["reward_extra_keys"] = reward_extra_keys

return DataProto(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use DataProto, use TensorDict instead.

self.config = config
self._load_balancer = load_balancer_handle
self._server_id_to_handle: dict[str, ray.actor.ActorHandle] = dict(servers)
self._server_id_to_handle: dict[str, ray.actor.ActorHandle] = dict(servers or [])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not modify agent_loop, we will adapt to agent gateway once it's mature.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we rewrite the AsyncLLMServerManager class, put it together with anything we inherit from the current agent_loop.py under a new path (e.g. verl/agent)?

await asyncio.gather(*[_await_ray_ref(gateway.shutdown.remote()) for gateway in self.owned_gateway_actors])
self.owned_gateway_actors = []
self.gateway_manager = None
self._server_id_to_handle: dict[str, ray.actor.ActorHandle] = dict(servers)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._server_id_to_handle: dict[str, ray.actor.ActorHandle] = dict(servers or [])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants