Sync dsv4-fp4-b300-trt recipes with B300 agg frontier config by Oseltamivir · Pull Request #1703 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-06-10T18:05:52Z

What

B300 analog of #1699 (which did this for B200). Sync the DeepSeek-V4-Pro aggregated frontier configs into the single-node TensorRT-LLM B300 recipes and bump the feature image. The non-MTP recipe carries the MTP0 settings; the MTP recipe carries the MTP settings.

Changes

Image (`.github/configs/nvidia-master.yaml`)

dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp image bumped feat-deepseek_v4-9aa3715 → feat-deepseek_v4-c185066.

`benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt.sh` (MTP0)

Worker envs (all overridable): TRTLLM_SERVER_DISABLE_GC=1, TRTLLM_WORKER_DISABLE_GC=1, NCCL_GRAPH_MIXING_SUPPORT=0, MIMALLOC_PURGE_DELAY=0, PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.
kv_cache_config.free_gpu_memory_fraction: 0.9 (TP / no DP-attn) / 0.7 (DP-attn), was 0.50.
attention_dp_config: batching_wait_iters 0 → 30, drop timeout_iters.
stream_interval 10 → 100; moe_config.use_low_precision_moe_combine: true.
max_num_tokens drops the OSL term: ISL + 256.
MOE_BACKEND made overridable (default TRTLLM; MEGAMOE_DEEPGEMM at high conc on 1k ISL).

`benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt_mtp.sh` (MTP)

Same as above, plus:

DP-attn free_gpu_memory_fraction = 0.6.
enable_lm_head_tp_in_adp: true on the DP-attn path.
speculative_config uses max_draft_len; default level 2 → 3 (overridable via TRTLLM_DSV4_MTP_NUM_NEXTN_LAYERS), stepping back to 2 at high conc on 8k ISL.
max_num_tokens = ISL + (draft+1)*batch + 256 (drops OSL; keeps the speculative-verification headroom).

Deliberate non-changes

B300-specific bits preserved: the MODEL_PATH download block, TRTLLM_MHC_ENABLE_FUSED_HC=1, and trtllm-serve "$MODEL_PATH".
Search space left as-is. Unlike Sync dsv4-fp4-b200-trt recipes with B200 agg frontier config #1699 (which raised the B200 MTP conc-end), the B300 fixed-seq-len sweeps already cover the high-concurrency regime the recipe changes target (1k up to 2048, 8k up to 1024), so no conc-end edit is needed.
cuda_graph_config / max_batch_size left CONC-derived.
max_seq_len kept floored at ≥ 8192.

Validation

bash -n passes on both recipes.
Generated YAML for the DP/non-DP paths (incl. top-level enable_lm_head_tp_in_adp and max_draft_len) parses as valid YAML.
B300 recipes diff against the Sync dsv4-fp4-b200-trt recipes with B200 agg frontier config #1699 B200 recipes only in the B300-specific bits above.

🤖 Generated with Claude Code

Note

Low Risk
Benchmark/CI recipe and environment tuning only; no application auth or production serving paths changed.

Overview
B300 DeepSeek-V4-Pro TensorRT-LLM benchmark recipes are aligned with the aggregated frontier settings (B300 follow-on to B200 PR #1699): both dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp use image feat-deepseek_v4-c185066, and the non-MTP 8k/1k tp8/ep8 DP-attn sweep conc-end is reduced from 1024 → 256.

The dsv4_fp4_b300_trt.sh and dsv4_fp4_b300_trt_mtp.sh scripts add default runtime env (GC off, NCCL graph mixing off, alloc tweaks), raise KV cache fractions by DP path, set stream_interval 100, use_low_precision_moe_combine, and batching_wait_iters 30 (MTP drops timeout_iters). max_num_tokens no longer includes the OSL term; MoE backend switches to MEGAMOE_DEEPGEMM at high concurrency on short ISL. MTP uses max_draft_len, variable draft length defaults, and enable_lm_head_tp_in_adp on DP-attn.

perf-changelog.yaml documents the above for both config keys.

^{Reviewed by Cursor Bugbot for commit a5b4fd4. Bugbot is set up for automated code reviews on this repo. Configure here.}

B300 analog of PR #1699 (B200). Apply the same TensorRT-LLM recipe sync to dsv4_fp4_b300_trt.sh (MTP0) and dsv4_fp4_b300_trt_mtp.sh (MTP), and bump the dsv4-fp4-b300-trt / -mtp images to feat-deepseek_v4-c185066. Recipe changes (both): - Worker envs (overridable): TRTLLM_SERVER_DISABLE_GC, TRTLLM_WORKER_DISABLE_GC, NCCL_GRAPH_MIXING_SUPPORT=0, MIMALLOC_PURGE_DELAY=0, PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. - kv_cache free_gpu_memory_fraction: 0.9 (no DP-attn) / 0.7 non-MTP, 0.6 MTP (DP-attn), was 0.50. - attention_dp_config batching_wait_iters 0 -> 30, drop timeout_iters. - stream_interval 10 -> 100; moe_config.use_low_precision_moe_combine: true. - MOE_BACKEND overridable, switches to MEGAMOE_DEEPGEMM at high conc on 1k ISL. - max_num_tokens drops the OSL term. MTP additionally: max_draft_len (was num_nextn_predict_layers), default draft 3 stepping to 2 at high conc on 8k ISL, enable_lm_head_tp_in_adp on DP-attn. B300-specific bits preserved: MODEL_PATH download block, TRTLLM_MHC_ENABLE_FUSED_HC=1, trtllm-serve "$MODEL_PATH". B300 search space left as-is (already covers the high-concurrency frontier the recipe changes target). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-10T18:06:02Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Covers the dsv4-fp4-b300-trt / -mtp image bump to feat-deepseek_v4-c185066 and the B300 agg frontier recipe sync (PR #1703, B300 analog of #1699). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-10T22:09:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27297153576
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27297153576

github-actions · 2026-06-11T00:10:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27297153576
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27297153576

github-actions · 2026-06-11T02:36:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27297153576
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27297153576

Cap the 8k1k tp8/ep8 DP-attn sweep at conc 256 (was 256-1024) for dsv4-fp4-b300-trt. trt-mtp and the 1k1k sweep are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-11T05:01:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27324968855
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27324968855

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d8c3caa. Configure here.}

github-actions · 2026-06-11T05:02:55Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27325008715
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27325008715

github-actions · 2026-06-11T06:22:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27325043819
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27325043819

Oseltamivir requested a review from a team June 10, 2026 18:05

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners June 10, 2026 18:05

github-project-automation Bot added this to InferenceMAX Board Jun 10, 2026

Add perf-changelog entry for B300 DSv4 TRT image + recipe sync

c315646

Covers the dsv4-fp4-b300-trt / -mtp image bump to feat-deepseek_v4-c185066 and the B300 agg frontier recipe sync (PR #1703, B300 analog of #1699). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Oseltamivir added the full-sweep-enabled label Jun 10, 2026

Oseltamivir and others added 2 commits June 10, 2026 22:00

Trim dsv4-fp4-b300-trt 8k1k conc to max 256

1e548b9

Cap the 8k1k tp8/ep8 DP-attn sweep at conc 256 (was 256-1024) for dsv4-fp4-b300-trt. trt-mtp and the 1k1k sweep are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge branch 'main' into sync-dsv4-fp4-b300-trt-0608-config

d8c3caa

cursor Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

Update perf-changelog.yaml

d1cf7a4

Update perf-changelog.yaml

a5b4fd4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync dsv4-fp4-b300-trt recipes with B300 agg frontier config#1703

Sync dsv4-fp4-b300-trt recipes with B300 agg frontier config#1703
Oseltamivir wants to merge 6 commits into
mainfrom
sync-dsv4-fp4-b300-trt-0608-config

Oseltamivir commented Jun 10, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented Jun 10, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Image (.github/configs/nvidia-master.yaml)

benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt.sh (MTP0)

benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt_mtp.sh (MTP)

Deliberate non-changes

Validation

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oseltamivir commented Jun 10, 2026 •

edited by cursor Bot

Loading

Image (`.github/configs/nvidia-master.yaml`)

`benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt.sh` (MTP0)

`benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt_mtp.sh` (MTP)