ProRLAgent Server: A Scalable Multi-turn Rollout Infrastructure for RL Agents Training

☁️ Introduction

ProRLAgent Server is a scalable multi-turn rollout system for training and evaluating RL agents. Built on top of OpenHands, it offers high concurrency and a pluggable handler interface to support diverse agent tasks.

Decoupled RL Training & Rollouts: rollouts run as a service; any RL trainer can consume the outputs.
High concurrency: execute large-scale jobs with LLM load balancing.
Pluggable AgentHandler: customize for different tasks and agents.
Lifecycle management: built-in support for status tracking, queuing, timeouts, and cleanup.
Token-in / Token-out: communicate in tokens to maintain turn alignment and ensure stable training.
Singularity runtime: rootless execution with single-file containers (.sif), seamless Slurm integration, secure multi-user support.
Efficient Bash tool: ptyprocess-based implementation for 6x speed improvements over tmux-based approach.
Efficient IPython tool: direct IPython kernel integration without network overhead.
UDS communication: Unix domain sockets for better throughput and isolation.

💻 Quick Start

Install dependencies

Install OpenHands Dependencies

poetry install --with dev,test,runtime,evaluation
pip install git+https://github.com/SWE-Gym/SWE-Bench-Package.git
pip install git+https://github.com/R2E-Gym/R2E-Gym.git

Install Singularity/Apptainer Sandbox

sudo apt-get update
sudo apt-get install -y software-properties-common curl gnupg
sudo apt-get install -y singularity-container fuse
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt-get update
sudo apt-get install -y apptainer

Start the VLLM server with your desired Hugging Face model:

vllm serve path/to/your/model --enable-auto-tool-choice --tool-call-parser hermes  --host 127.0.0.1 --port 8000 --api-key key --served-model-name model_name &

Replace path/to/your/model with the actual path to your Hugging Face model. Set up the server IP, Port, and model name.

Pull singularity sandboxs for swe tasks

python scripts/pull_swe_images.py --parquet-file /path/to/train.parquet --dest-dir /some/dir --temp-base /some/dir --log-name log

Download parquet data from Huggingface. Supported Training data:

swe-gym: https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-293-data
r2egym: https://huggingface.co/R2E-Gym
swe-bench-multimodal: https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal
swe-bench: https://huggingface.co/datasets/SWE-bench/SWE-bench
swe-smith: https://huggingface.co/datasets/SWE-bench/SWE-smith

Start the async evaluation server (FastAPI)

This command starts the FastAPI-based async evaluation server and listens on the given host/port. It exposes /start, /process, and /status endpoints, and uses --max-init-workers/--max-run-workers and --timeout to control concurrency and time limits.

python scripts/start_server.py --host 0.0.0.0 --port 8006 --max-init-workers 64 --max-run-workers 64 --timeout 300

Test the server (HTTP I/O)

Before sending jobs to /process, make sure you follow this sequence (assumes you already started a VLLM server in step 2):

Register at least one LLM server address (include /v1):

curl -X POST http://localhost:8006/add_llm_server \
  -H 'Content-Type: application/json' \
  -d '{"address":"http://127.0.0.1:8000/v1"}'

Start the worker process:

curl -X POST http://localhost:8006/start

(Optional) Check status:

curl http://localhost:8006/status

Notes:

You can call /add_llm_server before /start; the address will be buffered and applied when the worker starts.
Ensure the sampling_params.model and api_key in your request match the model name and key you used when launching VLLM in step 2.

Option 1: Quick test using the built-in script

python scripts/tests/test_server.py

Option 2: Test using curl

Quick try: send a task to /process and read the JSON result.

Input (request body):

instance: the task info (must include data_source and any fields your handler needs)
sampling_params: optional LLM/agent settings (e.g., temperature, top_p, max_tokens)
job_id (optional): your own identifier

Example:

curl -X POST http://localhost:8006/process \
  -H 'Content-Type: application/json' \
  -d '{
    "instance": {
      "data_source": "swebench",
      "instance_id": "python__mypy-16203",
      "trajectory_id": "t0",
      "patch": "",
      "metadata": {}
    },
    "sampling_params": {
      "model": "hosted_vllm/Qwen2.5-7B-Instruct",
      "api_key": "key",
      "modify_params": false,
      "log_completions": true,
      "native_tool_calling": false,
      "temperature": 0.6,
      "top_p": 0.9,
      "token_level_generation": true,
      "custom_tokenizer": "tokenizer_path",
      "max_iterations": 5
    }
  }'

Output (response body):

{
  "resolved": true,
  "report": {"pass@1": 0.0, "details": {"...": "..."}},
  "timing": {"init": 2.1, "run": 41.3, "eval": 5.2, "others": 1.4, "timeout": 300.0}
}

💻 Add a New Task/Handler

To add a new task:

Implement an AgentHandler with name, init(job_details, ...), run(job_details, ...), and eval(job_details, ...).
Register it in the registry so that instance["data_source"] == name routes requests to your handler.
Provide a final_result(job_details) function for result shaping.
Ensure your handler returns a consistent result schema and handles timeouts/errors.

Minimal sketch:

from openhands.nvidia.registry import AgentHandler, register_agent_handler

class MyTaskHandler(AgentHandler):
    @property
    def name(self) -> str: return "my_task"
    async def init(self, job_details, sid=None, **kwargs):
        return runtime, metadata, config
    async def run(self, job_details, sid=None, **kwargs):
        return {"git_patch": "...", "messages": []}
    async def eval(self, job_details, sid=None, allow_skip=True, reward=None):
        return {"report": {"resolved": True}}

register_agent_handler(MyTaskHandler())

Then submit requests with {"data_source": "my_task", ...} in the instance.

💻 Run unit tests

Example:

TEST_RUNTIME=singularity RUN_AS_OPENHANDS=False PYTHONPATH='.' pytest tests/runtime/test_browsing.py -v -s

Important Environment Variables

Image Storage Location

OH_RUNTIME_SINGULARITY_IMAGE_REPO - Specifies the directory where Singularity runtime images will be stored.

OH_RUNTIME_SINGULARITY_IMAGE_REPO=/path/to/singularity_images

📄 Documentation

💡 Current Results

To validate the functionality of the ProRLAgent servers, we conducted proof-of-concept experiments on software engineering (SWE) tasks by integrating the server with the Verl reinforcement learning (RL) framework. Specifically, we used swe-gym along with a subset of the R2E-gym dataset, comprising a total of 800 training instances, to perform GRPO training. Our experiments were carried out on the Qwen3-4B-Instruct model and evaluated on the SWE-Bench-Verified benchmark. The results demonstrate a performance improvement, with accuracy increasing from 15.0% to 20.4%.

Name		Name	Last commit message	Last commit date
Latest commit History 4,395 Commits
.devcontainer		.devcontainer
.github		.github
.openhands		.openhands
NVIDIA_Assets		NVIDIA_Assets
containers		containers
dev_config/python		dev_config/python
docs		docs
evaluation		evaluation
microagents		microagents
openhands		openhands
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.nspect-allowlist.toml		.nspect-allowlist.toml
.nvmrc		.nvmrc
CHANGE_LOG.md		CHANGE_LOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
Makefile.singularity		Makefile.singularity
README.md		README.md
SECURITY.md		SECURITY.md
build.sh		build.sh
config.template.toml		config.template.toml
docker-compose.yml		docker-compose.yml
example.config.toml		example.config.toml
package-lock.json		package-lock.json
poetry.lock		poetry.lock
pydoc-markdown.yml		pydoc-markdown.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProRLAgent Server: A Scalable Multi-turn Rollout Infrastructure for RL Agents Training

☁️ Introduction

💻 Quick Start

💻 Add a New Task/Handler

💻 Run unit tests

Important Environment Variables

Image Storage Location

📄 Documentation

💡 Current Results

About

Uh oh!

Releases

Packages

Languages

License

NVIDIA-NeMo/ProRL-Agent-Server

Folders and files

Latest commit

History

Repository files navigation

ProRLAgent Server: A Scalable Multi-turn Rollout Infrastructure for RL Agents Training

☁️ Introduction

💻 Quick Start

💻 Add a New Task/Handler

💻 Run unit tests

Important Environment Variables

Image Storage Location

📄 Documentation

💡 Current Results

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages