ProRLAgent Server is a scalable multi-turn rollout system for training and evaluating RL agents. Built on top of OpenHands, it offers high concurrency and a pluggable handler interface to support diverse agent tasks.
- Decoupled RL Training & Rollouts: rollouts run as a service; any RL trainer can consume the outputs.
- High concurrency: execute large-scale jobs with LLM load balancing.
- Pluggable AgentHandler: customize for different tasks and agents.
- Lifecycle management: built-in support for status tracking, queuing, timeouts, and cleanup.
- Token-in / Token-out: communicate in tokens to maintain turn alignment and ensure stable training.
- Singularity runtime: rootless execution with single-file containers (.sif), seamless Slurm integration, secure multi-user support.
- Efficient Bash tool: ptyprocess-based implementation for 6x speed improvements over tmux-based approach.
- Efficient IPython tool: direct IPython kernel integration without network overhead.
- UDS communication: Unix domain sockets for better throughput and isolation.
- Install dependencies
- Install OpenHands Dependencies
poetry install --with dev,test,runtime,evaluation
pip install git+https://github.com/SWE-Gym/SWE-Bench-Package.git
pip install git+https://github.com/R2E-Gym/R2E-Gym.git- Install Singularity/Apptainer Sandbox
sudo apt-get update
sudo apt-get install -y software-properties-common curl gnupg
sudo apt-get install -y singularity-container fuse
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt-get update
sudo apt-get install -y apptainer- Start the VLLM server with your desired Hugging Face model:
vllm serve path/to/your/model --enable-auto-tool-choice --tool-call-parser hermes --host 127.0.0.1 --port 8000 --api-key key --served-model-name model_name &Replace path/to/your/model with the actual path to your Hugging Face model. Set up the server IP, Port, and model name.
- Pull singularity sandboxs for swe tasks
python scripts/pull_swe_images.py --parquet-file /path/to/train.parquet --dest-dir /some/dir --temp-base /some/dir --log-name logDownload parquet data from Huggingface. Supported Training data:
- swe-gym: https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-293-data
- r2egym: https://huggingface.co/R2E-Gym
- swe-bench-multimodal: https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal
- swe-bench: https://huggingface.co/datasets/SWE-bench/SWE-bench
- swe-smith: https://huggingface.co/datasets/SWE-bench/SWE-smith
- Start the async evaluation server (FastAPI)
This command starts the FastAPI-based async evaluation server and listens on the given host/port. It exposes /start, /process, and /status endpoints, and uses --max-init-workers/--max-run-workers and --timeout to control concurrency and time limits.
python scripts/start_server.py --host 0.0.0.0 --port 8006 --max-init-workers 64 --max-run-workers 64 --timeout 300- Test the server (HTTP I/O)
Before sending jobs to /process, make sure you follow this sequence (assumes you already started a VLLM server in step 2):
- Register at least one LLM server address (include
/v1):
curl -X POST http://localhost:8006/add_llm_server \
-H 'Content-Type: application/json' \
-d '{"address":"http://127.0.0.1:8000/v1"}'- Start the worker process:
curl -X POST http://localhost:8006/start- (Optional) Check status:
curl http://localhost:8006/statusNotes:
- You can call
/add_llm_serverbefore/start; the address will be buffered and applied when the worker starts. - Ensure the
sampling_params.modelandapi_keyin your request match the model name and key you used when launching VLLM in step 2.
Option 1: Quick test using the built-in script
python scripts/tests/test_server.py
Option 2: Test using curl
Quick try: send a task to /process and read the JSON result.
Input (request body):
instance: the task info (must includedata_sourceand any fields your handler needs)sampling_params: optional LLM/agent settings (e.g.,temperature,top_p,max_tokens)job_id(optional): your own identifier
Example:
curl -X POST http://localhost:8006/process \
-H 'Content-Type: application/json' \
-d '{
"instance": {
"data_source": "swebench",
"instance_id": "python__mypy-16203",
"trajectory_id": "t0",
"patch": "",
"metadata": {}
},
"sampling_params": {
"model": "hosted_vllm/Qwen2.5-7B-Instruct",
"api_key": "key",
"modify_params": false,
"log_completions": true,
"native_tool_calling": false,
"temperature": 0.6,
"top_p": 0.9,
"token_level_generation": true,
"custom_tokenizer": "tokenizer_path",
"max_iterations": 5
}
}'Output (response body):
{
"resolved": true,
"report": {"pass@1": 0.0, "details": {"...": "..."}},
"timing": {"init": 2.1, "run": 41.3, "eval": 5.2, "others": 1.4, "timeout": 300.0}
}To add a new task:
- Implement an
AgentHandlerwithname,init(job_details, ...),run(job_details, ...), andeval(job_details, ...). - Register it in the registry so that
instance["data_source"] == nameroutes requests to your handler. - Provide a
final_result(job_details)function for result shaping. - Ensure your handler returns a consistent result schema and handles timeouts/errors.
Minimal sketch:
from openhands.nvidia.registry import AgentHandler, register_agent_handler
class MyTaskHandler(AgentHandler):
@property
def name(self) -> str: return "my_task"
async def init(self, job_details, sid=None, **kwargs):
return runtime, metadata, config
async def run(self, job_details, sid=None, **kwargs):
return {"git_patch": "...", "messages": []}
async def eval(self, job_details, sid=None, allow_skip=True, reward=None):
return {"report": {"resolved": True}}
register_agent_handler(MyTaskHandler())Then submit requests with {"data_source": "my_task", ...} in the instance.
Example:
TEST_RUNTIME=singularity RUN_AS_OPENHANDS=False PYTHONPATH='.' pytest tests/runtime/test_browsing.py -v -sOH_RUNTIME_SINGULARITY_IMAGE_REPO - Specifies the directory where Singularity runtime images will be stored.
OH_RUNTIME_SINGULARITY_IMAGE_REPO=/path/to/singularity_imagesMore module READMEs (click to open):
openhands/README.mdopenhands/nvidia/README.mdopenhands/llm/nvidia/README.mdscripts/README.mdtests/nvidia/README.md
To validate the functionality of the ProRLAgent servers, we conducted proof-of-concept experiments on software engineering (SWE) tasks by integrating the server with the Verl reinforcement learning (RL) framework. Specifically, we used swe-gym along with a subset of the R2E-gym dataset, comprising a total of 800 training instances, to perform GRPO training. Our experiments were carried out on the Qwen3-4B-Instruct model and evaluated on the SWE-Bench-Verified benchmark. The results demonstrate a performance improvement, with accuracy increasing from 15.0% to 20.4%.

