[Feat] HiSparse sparse attention (DeepSeek V3.2 / GLM 5.1) on Blackwell

## Motivation

Recent frontier MoE models — notably **DeepSeek V3.2** and **GLM 5.1** — use **HiSparse** sparse attention (LMSYS, *"HiSparse: Turbocharging Sparse Attention with Hierarchical Memory"*) in place of dense attention at long context. HiSparse has already been integrated into SGLang; there is currently no equivalent kernel available in FlashInfer, so serving stacks that build on FlashInfer (vLLM in particular) cannot run these architectures efficiently without falling back to dense attention, which is prohibitively expensive at the context lengths these models are designed for.

FlashInfer already tracks some adjacent pieces of the DeepSeek V3.2 stack (e.g. `model: dsv3.2` labelled issues such as the segmented top-K work in #3096) — this issue is to track the HiSparse attention kernel itself.

## Proposal

Implement HiSparse attention as a FlashInfer attention kernel on Blackwell:

- Support the hierarchical-memory sparse-attention pattern as described in the HiSparse reference (LMSYS) and as used in DeepSeek V3.2 / GLM 5.1.
- Prefill and decode paths.
- Integrate with FlashInfer's existing attention launchers / planner APIs so downstream serving stacks can select HiSparse the same way they select other attention variants today.
- Blackwell / SM100 as the primary target; other SMs as a follow-up.

## Success criteria

- Functional parity with the reference HiSparse implementation on representative shapes.
- Usable end-to-end on DeepSeek V3.2 and GLM 5.1 model architectures (validated against expected logits / generations).
- Performance advantage over dense attention at the context lengths these models are deployed at; reasonable parity with the SGLang HiSparse integration as a public baseline.

## Notes

- This is **not** the same as the ModelOpt sparse-attention integration (different sparsity mechanism).
- Reference: https://www.lmsys.org/blog/2026-04-10-sglang-hisparse/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] HiSparse sparse attention (DeepSeek V3.2 / GLM 5.1) on Blackwell #3111

Motivation

Proposal

Success criteria

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feat] HiSparse sparse attention (DeepSeek V3.2 / GLM 5.1) on Blackwell #3111

Description

Motivation

Proposal

Success criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions