[feat] update swe-agent runtime params for long-context DP attention#959

Open

guapisolo wants to merge 1 commit intomainfrom

feat/update-swe-runtime-params

Collaborator

guapisolo commented Apr 7, 2026

Increase default session server timeout from 600s to 1800s
Increase max_seq_len to 64000 and rollout-max-response-len to 16384
Configure 8-GPU DP attention (data-parallel-size 8, enable-dp-attention)
Explicitly set --miles-router-timeout 3600 for long agent tasks
Add commented-out speculative decoding and MoE params for future use

Made-with: Cursor


          feat: update swe-agent runtime params for long-context DP attention

b0de4a6

- Increase default session server timeout from 600s to 1800s
- Increase max_seq_len to 64000 and rollout-max-response-len to 16384
- Configure 8-GPU DP attention (data-parallel-size 8, enable-dp-attention)
- Explicitly set --miles-router-timeout 3600 for long agent tasks
- Add commented-out speculative decoding and MoE params for future use

Made-with: Cursor

guapisolo requested review from fzyzcjy, maocheng23 and yueming-yuan as code owners

April 7, 2026 23:06

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request updates the configuration for the swe-agent-v2 experiment, including increasing the maximum sequence length, adjusting rollout response lengths, and updating the SGLang engine arguments to support data-parallel attention and speculative decoding. Feedback was provided regarding incorrect GPU and data-parallel configuration flags, the need to comment out incomplete speculative decoding parameters to prevent initialization failures, and a recommendation to remove unused commented-out code.

examples/experimental/swe-agent-v2/run.py

+                      # Agent tasks can run long (complex CoT + multi-step tool calls);
+                      # default 1800s may not be enough for the hardest instances.
+                      "--miles-router-timeout 3600 "
+                      "--rollout-num-gpus-per-engine 8 "

Contributor

gemini-code-assist bot Apr 7, 2026

Setting --rollout-num-gpus-per-engine to 8 will set the Tensor Parallel (TP) size to 8 in the SGLang engine (see miles/backends/sglang_utils/sglang_engine.py:622). Combined with --sglang-data-parallel-size 8, this would require 64 GPUs (TP=8 * DP=8). For an 8-GPU setup intended to use Data Parallel (DP) attention, this should be set to 1 so that each of the 8 replicas uses a single GPU.

Suggested change

      
                    "--rollout-num-gpus-per-engine 8 "
          
                    "--rollout-num-gpus-per-engine 1 "

Contributor

maocheng23 Apr 8, 2026

Here rollout engine = 8 is fine, but we will need EP 8 for MOE right? o.w. the MOE part would be TP 8 instead of EP 8?

examples/experimental/swe-agent-v2/run.py

+                      # default 1800s may not be enough for the hardest instances.
+                      "--miles-router-timeout 3600 "
+                      "--rollout-num-gpus-per-engine 8 "
+                      "--sglang-data-parallel-size 8 "

Contributor

gemini-code-assist bot Apr 7, 2026

The SGLang engine implementation specifically looks for the sglang_dp_size attribute (see miles/backends/sglang_utils/sglang_engine.py:623). Using --sglang-data-parallel-size will likely result in the setting being ignored. Use --sglang-dp-size instead.

Suggested change

      
                    "--sglang-data-parallel-size 8 "
          
                    "--sglang-dp-size 8 "

Contributor

maocheng23 Apr 8, 2026

This sglang-dp-size is correct

examples/experimental/swe-agent-v2/run.py

Comment on lines +164 to +167

+                      "--sglang-speculative-algorithm EAGLE "
+                      "--sglang-speculative-num-steps 2 "
+                      "--sglang-speculative-eagle-topk 1 "
+                      "--sglang-speculative-num-draft-tokens 3 "

Contributor

gemini-code-assist bot Apr 7, 2026

These speculative decoding parameters are currently active, but the pull request description states they should be commented out for future use. Furthermore, the EAGLE algorithm requires a draft model to be specified (e.g., via --sglang-speculative-draft-model), otherwise the engine will fail to initialize. Note that parameters like eagle-topk should ideally be retrieved from configuration rather than being hardcoded.

Suggested change

      
                    "--sglang-speculative-algorithm EAGLE "
          
                    "--sglang-speculative-num-steps 2 "
          
                    "--sglang-speculative-eagle-topk 1 "
          
                    "--sglang-speculative-num-draft-tokens 3 "
          
                    # "--sglang-speculative-algorithm EAGLE "
          
                    # "--sglang-speculative-num-steps 2 "
          
                    # "--sglang-speculative-eagle-topk 1 "
          
                    # "--sglang-speculative-num-draft-tokens 3 "

References

Model parameters, such as index_topk, should be retrieved from the model configuration rather than being hardcoded.

Contributor

maocheng23 Apr 8, 2026

Spec decoding by default?

examples/experimental/swe-agent-v2/run.py

Comment on lines +168 to +170

+                      # "--sglang-expert-parallel-size 8 "
+                      # "--sglang-moe-dense-tp-size 1 "
+                      # "--sglang-enable-dp-lm-head "

Contributor

gemini-code-assist bot Apr 7, 2026

Avoid including commented-out code in the repository. These lines should be removed if they are not currently needed.

maocheng23 requested changes

View reviewed changes

examples/experimental/swe-agent-v2/run.py

+                      # Agent tasks can run long (complex CoT + multi-step tool calls);
+                      # default 1800s may not be enough for the hardest instances.
+                      "--miles-router-timeout 3600 "
+                      "--rollout-num-gpus-per-engine 8 "

Contributor

maocheng23 Apr 8, 2026

Here rollout engine = 8 is fine, but we will need EP 8 for MOE right? o.w. the MOE part would be TP 8 instead of EP 8?

examples/experimental/swe-agent-v2/run.py

+                      # default 1800s may not be enough for the hardest instances.
+                      "--miles-router-timeout 3600 "
+                      "--rollout-num-gpus-per-engine 8 "
+                      "--sglang-data-parallel-size 8 "

Contributor

maocheng23 Apr 8, 2026

This sglang-dp-size is correct

examples/experimental/swe-agent-v2/run.py

Comment on lines +164 to +167

+                      "--sglang-speculative-algorithm EAGLE "
+                      "--sglang-speculative-num-steps 2 "
+                      "--sglang-speculative-eagle-topk 1 "
+                      "--sglang-speculative-num-draft-tokens 3 "

Contributor

maocheng23 Apr 8, 2026

Spec decoding by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

maocheng23 maocheng23 requested changes

fzyzcjy Awaiting requested review from fzyzcjy fzyzcjy is a code owner

yueming-yuan Awaiting requested review from yueming-yuan yueming-yuan is a code owner

+1 more reviewer

gemini-code-assist[bot] gemini-code-assist[bot] left review comments

Requested changes must be addressed to merge this pull request.

Labels

None yet