Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
generate_with_strands.py	generate_with_strands.py
requirements.txt	requirements.txt
strands_qwen3_8b.sh	strands_qwen3_8b.sh

Name

Last commit message

Last commit date

slime x Strands-SGLang

This example connects slime with strands-sglang (SGLang extension for the agentic scaffolding strands) for agentic RL training.

Why `strands-sglang`?

Component	Agent Loop	TITO Support
Strands-Agents	✅ Handles agent loop, custom hooks	❌ text-based, requires retokenization
SGLang	❌ Single generation only	✅ Native `input_ids` in/out
strands-sglang	✅ Via Strands	✅ Via SGLang's native API

strands-sglang bridges the gap by extending strands with SGLang's native /generate endpoint:

Captures exact token IDs during generation (no retokenization drift)
Automatically tracks loss_mask via token_manager
Provides ToolLimiter for clean trajectory truncation

Install Dependencies

Pull the slimerl/slime:latest image and enter it
Go to slime folder: cd /root/slime
Install slime: pip install -e . --no-deps
Go to the example folder: cd /root/slime/examples/strands_sglang
Install other dependencies: pip install -r requirements.txt

NOTE: strands-sglang is under rapid development, so we recommend using the GitHub repo version: strands-sglang @ git+https://github.com/horizon-rl/strands-sglang.git

NOTE: We use camel-ai's subprocess code interpreter for python code execution, which is NOT a good practice; it's just for convenience of this example.

Prepare Model

# hf checkpoint
huggingface-cli download Qwen/Qwen3-8B --local-dir /root/models/Qwen/Qwen3-8B

# mcore checkpoint
cd /root/slime
source scripts/models/qwen3-8B.sh
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
    ${MODEL_ARGS[@]} \
    --hf-checkpoint /root/models/Qwen/Qwen3-8B \
    --save /root/models/Qwen/Qwen3-8B_torch_dist

Prepare Dataset

Following Retool, we use dapo-math-17k as training data:

from datasets import load_dataset
ds = load_dataset("zhuzilin/dapo-math-17k", split="train")
ds.to_json("/root/data/dapo-math-17k.jsonl", orient="records", lines=True)

and aime-2024 as eval data:

from datasets import load_dataset
ds = load_dataset("zhuzilin/aime-2024", split="train")
ds.to_json("/root/data/aime-2024.jsonl", orient="records", lines=True)

Run Training

cd /root/slime
export WANDB_KEY=$your_wandb_key
bash examples/strands_sglang/strands_qwen3_8b.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

slime x Strands-SGLang

Why `strands-sglang`?

Install Dependencies

Prepare Model

Prepare Dataset

Run Training

FilesExpand file tree

strands_sglang

Directory actions

More options

Directory actions

More options

Latest commit

History

strands_sglang

Folders and files

parent directory

README.md

slime x Strands-SGLang

Why strands-sglang?

Install Dependencies

Prepare Model

Prepare Dataset

Run Training

Why `strands-sglang`?