Skip to content

Fix DSv4 MTP graph metadata diagnostics#2

Open
bhaktatejas922 wants to merge 27 commits into
mainfrom
dsv4flashv2-05102026
Open

Fix DSv4 MTP graph metadata diagnostics#2
bhaktatejas922 wants to merge 27 commits into
mainfrom
dsv4flashv2-05102026

Conversation

@bhaktatejas922
Copy link
Copy Markdown

@bhaktatejas922 bhaktatejas922 commented May 14, 2026

Summary:

  • fix DSv4 EAGLE/MTP decode metadata handling during CUDA graph capture/replay
  • add opt-in DSv4 graph-capture tracing around collectives and CUDA graph runner phases
  • add opt-in B12X sparse MLA shadow-compare diagnostics for FlashMLA reference checks
  • skip DSv4 sparse MLA shadow compare during CUDA graph capture using both torch stream-capture state and SGLang capture context
  • enrich shadow records with capture state, forward context, valid/padding row split, and row-wise compare stats so post-capture traffic can be separated from graph dummy rows
  • keep B12X sparse MLA draft-decode gated off by default while target-verify and draft-extend remain explicit opt-in paths

Latest validation:

  • python3 -m py_compile python/sglang/srt/distributed/parallel_state.py python/sglang/srt/layers/attention/deepseek_v4_backend.py python/sglang/srt/model_executor/cuda_graph_runner.py
  • Full TP4 C32 all-mode sparse gate started with CUDA graphs enabled and B12X sparse MLA hits increasing, but throughput collapsed; saved logs showed EAGLE draft graph replay padding around raw_bs 29-31 to graph_bs 32

Notes:

  • The previous shadow mismatch records from dsv4-c32-shadow-20260514T130244Z were capture dummy rows, not real traffic. This PR now prevents that class of false diagnostic.
  • Next investigation should focus on EAGLE draft/target-verify metadata and padded replay rows before promoting all-mode sparse.

@bhaktatejas922
Copy link
Copy Markdown
Author

@claude review this pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant