Skip to content

dsr1-fp4-b200-dynamo-sglang-mtp: 8k1k 6-variant MTP sweep on local split recipes#1688

Open
Ankur-singh wants to merge 5 commits into
mainfrom
dsr1-fp4-b200-8k1k-mtp
Open

dsr1-fp4-b200-dynamo-sglang-mtp: 8k1k 6-variant MTP sweep on local split recipes#1688
Ankur-singh wants to merge 5 commits into
mainfrom
dsr1-fp4-b200-8k1k-mtp

Conversation

@Ankur-singh

@Ankur-singh Ankur-singh commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Restructure the DeepSeek-R1 FP4 B200 dynamo-sglang MTP disagg sweep to an 8k1k-only, 6-variant configuration backed by local split recipes (one flat recipe YAML per topology).

  • 3 low-latency (1p5d / 1p3d / 1p1d, TP4 prefill / TP8 decode) + 3 MTP2 high-throughput (2p1d / 3p1d / 5p1d, DEP4 prefill / DEP8 decode) topologies
  • Bump container image to lmsysorg/sglang:v0.5.12.post1
  • Add the dsr1/fp4 recipe-copy path to launch_b200-dgxc.sh

Note

Medium Risk
Large benchmark/topology and container-version changes affect multi-node GPU job scheduling and runtime behavior, though scope is limited to the dsr1-fp4-b200-dynamo-sglang-mtp config and launch plumbing.

Overview
Moves the dsr1-fp4-b200-dynamo-sglang-mtp sweep off srt-slurm zip overrides onto local flat recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/sglang/dsr1/b200-fp4/, and points nvidia-master.yaml at those paths for both 1k1k (four MTP variants, same topologies as before) and 8k1k.

For 8k1k, the search space is reorganized into six explicit disagg topologies: three low-latency (1p5d / 1p3d / 1p1d with TP4 prefill, TP8 decode, ep: 1) and three MTP2 throughput Pareto points (2p1d / 3p1d / 5p1d with DEP4 prefill / DEP8 decode and fixed high concurrencies). Recipes pin Dynamo to dynamo.hash=5b4bc1dd… with install: true for 1k1k as well as 8k1k.

Bumps the config image to lmsysorg/sglang:v0.5.12.post1, documents the change in perf-changelog.yaml, and adds a launch_b200-dgxc.sh branch that clones srt-slurm@main and copies the local dsr1/b200-fp4 recipes into the checkout before runs.

Reviewed by Cursor Bugbot for commit d47c291. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Comment thread .github/configs/nvidia-master.yaml Outdated
…tinode runner

Only the 8k1k scenario is updated (6-variant local split recipes). The
1k1k scenario and the b200-multinode runner are unchanged from main; the
image bump to v0.5.12.post1 is shared (1k1k follows via the dynamo-sglang
container alias).
…cipes

Flatten the srt-slurm b200-fp4 1k1k recipe (base + zip_override_mtp_*[i])
into 4 standalone per-topology recipes under
recipes/sglang/dsr1/b200-fp4/1k1k/disagg/mtp/, matching the 8k1k local
layout, and point the config at them instead of srt-slurm. Behavior is
unchanged (faithful flatten; dynamo-sglang container alias preserved).

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 79a9f56. Configure here.

cd "$SRT_REPO_DIR" || exit 1
git checkout main
mkdir -p recipes/sglang/dsr1/b200-fp4
cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/sglang/dsr1/b200-fp4" recipes/sglang/dsr1/b200-fp4

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Launch branch breaks non-MTP

High Severity

The new dynamo-sglang + dsr1 + fp4 branch applies to every DeepSeek-R1 FP4 disagg run, not only the MTP sweep. It checks out srt-slurm main and copies only recipes/sglang/dsr1/b200-fp4, while dsr1-fp4-b200-dynamo-sglang still points CONFIG_FILE at recipes/b200-fp4/1k1k.yaml overrides that lived on sa-submission-q2-2026.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 79a9f56. Configure here.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

…hash

Add dynamo.hash=5b4bc1dd70965017a737c71b19db5a0aeaa88727 + install: true to
the four 1k1k MTP recipes so the 1k1k and 8k1k scenarios build dynamo from an
identical revision; the 1k1k recipes keep the dynamo-sglang container alias.
Update the perf-changelog entry to note the pin.
@github-actions

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants