[feat] update swe-agent runtime params for long-context DP attention#959
[feat] update swe-agent runtime params for long-context DP attention#959
Conversation
- Increase default session server timeout from 600s to 1800s - Increase max_seq_len to 64000 and rollout-max-response-len to 16384 - Configure 8-GPU DP attention (data-parallel-size 8, enable-dp-attention) - Explicitly set --miles-router-timeout 3600 for long agent tasks - Add commented-out speculative decoding and MoE params for future use Made-with: Cursor
There was a problem hiding this comment.
Code Review
This pull request updates the configuration for the swe-agent-v2 experiment, including increasing the maximum sequence length, adjusting rollout response lengths, and updating the SGLang engine arguments to support data-parallel attention and speculative decoding. Feedback was provided regarding incorrect GPU and data-parallel configuration flags, the need to comment out incomplete speculative decoding parameters to prevent initialization failures, and a recommendation to remove unused commented-out code.
| # Agent tasks can run long (complex CoT + multi-step tool calls); | ||
| # default 1800s may not be enough for the hardest instances. | ||
| "--miles-router-timeout 3600 " | ||
| "--rollout-num-gpus-per-engine 8 " |
There was a problem hiding this comment.
Setting --rollout-num-gpus-per-engine to 8 will set the Tensor Parallel (TP) size to 8 in the SGLang engine (see miles/backends/sglang_utils/sglang_engine.py:622). Combined with --sglang-data-parallel-size 8, this would require 64 GPUs (TP=8 * DP=8). For an 8-GPU setup intended to use Data Parallel (DP) attention, this should be set to 1 so that each of the 8 replicas uses a single GPU.
| "--rollout-num-gpus-per-engine 8 " | |
| "--rollout-num-gpus-per-engine 1 " |
There was a problem hiding this comment.
Here rollout engine = 8 is fine, but we will need EP 8 for MOE right? o.w. the MOE part would be TP 8 instead of EP 8?
| # default 1800s may not be enough for the hardest instances. | ||
| "--miles-router-timeout 3600 " | ||
| "--rollout-num-gpus-per-engine 8 " | ||
| "--sglang-data-parallel-size 8 " |
There was a problem hiding this comment.
The SGLang engine implementation specifically looks for the sglang_dp_size attribute (see miles/backends/sglang_utils/sglang_engine.py:623). Using --sglang-data-parallel-size will likely result in the setting being ignored. Use --sglang-dp-size instead.
| "--sglang-data-parallel-size 8 " | |
| "--sglang-dp-size 8 " |
There was a problem hiding this comment.
This sglang-dp-size is correct
| "--sglang-speculative-algorithm EAGLE " | ||
| "--sglang-speculative-num-steps 2 " | ||
| "--sglang-speculative-eagle-topk 1 " | ||
| "--sglang-speculative-num-draft-tokens 3 " |
There was a problem hiding this comment.
These speculative decoding parameters are currently active, but the pull request description states they should be commented out for future use. Furthermore, the EAGLE algorithm requires a draft model to be specified (e.g., via --sglang-speculative-draft-model), otherwise the engine will fail to initialize. Note that parameters like eagle-topk should ideally be retrieved from configuration rather than being hardcoded.
| "--sglang-speculative-algorithm EAGLE " | |
| "--sglang-speculative-num-steps 2 " | |
| "--sglang-speculative-eagle-topk 1 " | |
| "--sglang-speculative-num-draft-tokens 3 " | |
| # "--sglang-speculative-algorithm EAGLE " | |
| # "--sglang-speculative-num-steps 2 " | |
| # "--sglang-speculative-eagle-topk 1 " | |
| # "--sglang-speculative-num-draft-tokens 3 " |
References
- Model parameters, such as index_topk, should be retrieved from the model configuration rather than being hardcoded.
There was a problem hiding this comment.
Spec decoding by default?
| # "--sglang-expert-parallel-size 8 " | ||
| # "--sglang-moe-dense-tp-size 1 " | ||
| # "--sglang-enable-dp-lm-head " |
| # Agent tasks can run long (complex CoT + multi-step tool calls); | ||
| # default 1800s may not be enough for the hardest instances. | ||
| "--miles-router-timeout 3600 " | ||
| "--rollout-num-gpus-per-engine 8 " |
There was a problem hiding this comment.
Here rollout engine = 8 is fine, but we will need EP 8 for MOE right? o.w. the MOE part would be TP 8 instead of EP 8?
| # default 1800s may not be enough for the hardest instances. | ||
| "--miles-router-timeout 3600 " | ||
| "--rollout-num-gpus-per-engine 8 " | ||
| "--sglang-data-parallel-size 8 " |
There was a problem hiding this comment.
This sglang-dp-size is correct
| "--sglang-speculative-algorithm EAGLE " | ||
| "--sglang-speculative-num-steps 2 " | ||
| "--sglang-speculative-eagle-topk 1 " | ||
| "--sglang-speculative-num-draft-tokens 3 " |
There was a problem hiding this comment.
Spec decoding by default?
Made-with: Cursor