This example connects slime with strands-sglang (SGLang extension for the agentic scaffolding strands) for agentic RL training.
| Component | Agent Loop | TITO Support |
|---|---|---|
| Strands-Agents | ✅ Handles agent loop, custom hooks | ❌ text-based, requires retokenization |
| SGLang | ❌ Single generation only | ✅ Native input_ids in/out |
| strands-sglang | ✅ Via Strands | ✅ Via SGLang's native API |
strands-sglang bridges the gap by extending strands with SGLang's native /generate endpoint:
- Captures exact token IDs during generation (no retokenization drift)
- Automatically tracks
loss_maskviatoken_manager - Provides
ToolLimiterfor clean trajectory truncation
- Pull the
slimerl/slime:latestimage and enter it - Go to slime folder:
cd /root/slime - Install slime:
pip install -e . --no-deps - Go to the example folder:
cd /root/slime/examples/strands_sglang - Install other dependencies:
pip install -r requirements.txt
NOTE:
strands-sglangis under rapid development, so we recommend using the GitHub repo version:strands-sglang @ git+https://github.com/horizon-rl/strands-sglang.git
NOTE: We use camel-ai's subprocess code interpreter for python code execution, which is NOT a good practice; it's just for convenience of this example.
# hf checkpoint
huggingface-cli download Qwen/Qwen3-8B --local-dir /root/models/Qwen/Qwen3-8B
# mcore checkpoint
cd /root/slime
source scripts/models/qwen3-8B.sh
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
${MODEL_ARGS[@]} \
--hf-checkpoint /root/models/Qwen/Qwen3-8B \
--save /root/models/Qwen/Qwen3-8B_torch_distFollowing Retool, we use dapo-math-17k as training data:
from datasets import load_dataset
ds = load_dataset("zhuzilin/dapo-math-17k", split="train")
ds.to_json("/root/data/dapo-math-17k.jsonl", orient="records", lines=True)and aime-2024 as eval data:
from datasets import load_dataset
ds = load_dataset("zhuzilin/aime-2024", split="train")
ds.to_json("/root/data/aime-2024.jsonl", orient="records", lines=True)cd /root/slime
export WANDB_KEY=$your_wandb_key
bash examples/strands_sglang/strands_qwen3_8b.sh