Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions examples/experimental/swe-agent-v2/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class ScriptArgs(U.ExecuteTrainConfig):
hf_checkpoint: str = "zai-org/GLM-4.7-Flash"
ref_load: str = "/root/GLM-4.7-Flash_torch_dist"
save_dir: str = "/root/GLM-4.7-Flash_agent_v2/"
max_seq_len: int = 16384
max_seq_len: int = 64000
prompt_data: str = "/root/swe_train.jsonl"

# Agent settings
Expand Down Expand Up @@ -107,7 +107,7 @@ def execute(args: ScriptArgs):
"--rollout-batch-size 2 "
"--n-samples-per-prompt 4 "
"--rollout-temperature 0.8 "
"--rollout-max-response-len 8192 "
"--rollout-max-response-len 16384 "
f"--max-seq-len {args.max_seq_len} "
"--global-batch-size 8 "
"--balance-data "
Expand Down Expand Up @@ -150,13 +150,24 @@ def execute(args: ScriptArgs):
)

sglang_args = (
"--rollout-num-gpus-per-engine 1 "
"--sglang-mem-fraction-static 0.7 "
"--sglang-tool-call-parser glm47 "
"--sglang-reasoning-parser glm45 "
"--use-miles-router "
"--sglang-router-port 31000 "
# TODO: speculative decoding has issue, need to fix later
# Agent tasks can run long (complex CoT + multi-step tool calls);
# default 1800s may not be enough for the hardest instances.
"--miles-router-timeout 3600 "
"--rollout-num-gpus-per-engine 8 "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting --rollout-num-gpus-per-engine to 8 will set the Tensor Parallel (TP) size to 8 in the SGLang engine (see miles/backends/sglang_utils/sglang_engine.py:622). Combined with --sglang-data-parallel-size 8, this would require 64 GPUs (TP=8 * DP=8). For an 8-GPU setup intended to use Data Parallel (DP) attention, this should be set to 1 so that each of the 8 replicas uses a single GPU.

Suggested change
"--rollout-num-gpus-per-engine 8 "
"--rollout-num-gpus-per-engine 1 "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here rollout engine = 8 is fine, but we will need EP 8 for MOE right? o.w. the MOE part would be TP 8 instead of EP 8?

"--sglang-data-parallel-size 8 "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SGLang engine implementation specifically looks for the sglang_dp_size attribute (see miles/backends/sglang_utils/sglang_engine.py:623). Using --sglang-data-parallel-size will likely result in the setting being ignored. Use --sglang-dp-size instead.

Suggested change
"--sglang-data-parallel-size 8 "
"--sglang-dp-size 8 "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sglang-dp-size is correct

"--sglang-enable-dp-attention "
"--sglang-speculative-algorithm EAGLE "
"--sglang-speculative-num-steps 2 "
"--sglang-speculative-eagle-topk 1 "
"--sglang-speculative-num-draft-tokens 3 "
Comment on lines +164 to +167
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These speculative decoding parameters are currently active, but the pull request description states they should be commented out for future use. Furthermore, the EAGLE algorithm requires a draft model to be specified (e.g., via --sglang-speculative-draft-model), otherwise the engine will fail to initialize. Note that parameters like eagle-topk should ideally be retrieved from configuration rather than being hardcoded.

Suggested change
"--sglang-speculative-algorithm EAGLE "
"--sglang-speculative-num-steps 2 "
"--sglang-speculative-eagle-topk 1 "
"--sglang-speculative-num-draft-tokens 3 "
# "--sglang-speculative-algorithm EAGLE "
# "--sglang-speculative-num-steps 2 "
# "--sglang-speculative-eagle-topk 1 "
# "--sglang-speculative-num-draft-tokens 3 "
References
  1. Model parameters, such as index_topk, should be retrieved from the model configuration rather than being hardcoded.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec decoding by default?

# "--sglang-expert-parallel-size 8 "
# "--sglang-moe-dense-tp-size 1 "
# "--sglang-enable-dp-lm-head "
Comment on lines +168 to +170
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid including commented-out code in the repository. These lines should be removed if they are not currently needed.

)

agent_args = (
Expand Down
2 changes: 1 addition & 1 deletion miles/rollout/session/session_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def __init__(self, args, backend_url: str):
self.backend_url = backend_url
self.app = FastAPI()

timeout = getattr(args, "miles_router_timeout", 600.0)
timeout = getattr(args, "miles_router_timeout", 1800.0)
self.client = httpx.AsyncClient(
limits=httpx.Limits(max_connections=1024),
timeout=httpx.Timeout(timeout),
Expand Down
Loading