[draft] add tileN = 8,16 to SM120 blockscale GEMM. by b8zhong · Pull Request #3495 · flashinfer-ai/flashinfer

b8zhong · 2026-06-02T15:39:49Z

📌 Description

Note: it's intentionally not added for grouped GEMM to reduce complexity, since there is nearly no speedup (1-2%).

🔍 Related Issues

NVIDIA/cutlass#3292

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

coderabbitai · 2026-06-02T15:40:00Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24e4007a-7b5a-4ab6-b180-f7001d216963

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces support for smaller tile sizes (tile_n = 8 and tile_n = 16) for SM120/121 in FP4 and MXFP8 GEMM kernels, along with emitting StreamK scheduler launchers. However, several critical issues were identified: adding tile_n = 8 to the dispatch macros in both MXFP4 and NVFP4 group GEMM files will cause linker errors because the corresponding template instantiations are excluded when swap_ab is false. Additionally, the new tile_n = 8 configurations are missing from the heuristic candidate array, and test coverage for tile_n = 8 needs to be added to the test suite while safely handling the invalid swap_ab = false configuration.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T15:42:57Z

+    } else if (tile_n == 8) {                      \
+      constexpr int TILE_N = 8;                    \
+      return __VA_ARGS__();                        \


Critical Linker Error Bug

Adding tile_n == 8 to the dispatch macro here will cause a linker error (undefined reference) during compilation.

Why this happens:

tile_n and swap_ab are runtime variables dispatched via nested macros (DISPATCH_TILE_N and DISPATCH_SWAP_AB).

When tile_n == 8, the compiler generates branches for both SWAP_AB = true and SWAP_AB = false.

In the SWAP_AB = false branch, it attempts to call CutlassMXFP4GroupwiseScaledGroupGEMMSM120<128, 8, 128, false>.

However, this template instantiation is explicitly excluded in csrc/group_gemm_mxfp4_groupwise_sm120_kernel_inst.jinja via {% if not (tile_n == 8 and swap_ab == "false") %}.

Because the symbol is referenced but never defined, the linker fails.

Suggested Fix:

Use if constexpr in the innermost block (around line 134) to prevent instantiating the invalid configuration:

if constexpr (is_valid_config<c_type_in_a, c_type_in_b, c_type_sf_a, c_type_sf_b, c_type_out>()) { if constexpr (TILE_N == 8 && !SWAP_AB) { TVM_FFI_ICHECK(false) << "tile_n=8 is only supported with swap_ab=true"; return false; } else { using cutlass_t_in_a = cutlass_dtype_t<c_type_in_a>; // ... existing dispatch code ... } }

gemini-code-assist · 2026-06-02T15:42:57Z

+    } else if (tile_n == 8) {                      \
+      constexpr int TILE_N = 8;                    \
+      return __VA_ARGS__();                        \


Critical Linker Error Bug

Adding tile_n == 8 to the dispatch macro here will cause a linker error (undefined reference) during compilation.

Why this happens:

tile_n and swap_ab are runtime variables dispatched via nested macros (DISPATCH_TILE_N and DISPATCH_SWAP_AB).

When tile_n == 8, the compiler generates branches for both SWAP_AB = true and SWAP_AB = false.

In the SWAP_AB = false branch, it attempts to call CutlassNVFP4GroupwiseScaledGroupGEMMSM120<128, 8, 128, false>.

However, this template instantiation is explicitly excluded in csrc/group_gemm_nvfp4_groupwise_sm120_kernel_inst.jinja via {% if not (tile_n == 8 and swap_ab == "false") %}.

Because the symbol is referenced but never defined, the linker fails.

Suggested Fix:

Use if constexpr in the innermost block (around line 141) to prevent instantiating the invalid configuration:

if constexpr (is_valid_config<c_type_in_a, c_type_in_b, c_type_sf_a, c_type_sf_b, c_type_out>()) { if constexpr (TILE_N == 8 && !SWAP_AB) { TVM_FFI_ICHECK(false) << "tile_n=8 is only supported with swap_ab=true"; return false; } else { using cutlass_t_in_a = cutlass_dtype_t<c_type_in_a>; // ... existing dispatch code ... } }

gemini-code-assist · 2026-06-02T15:42:57Z

      CutlassTileConfigSM120::CtaShape128x128x256B, CutlassTileConfigSM120::CtaShape256x128x128B,
      CutlassTileConfigSM120::CtaShape128x32x128B,  CutlassTileConfigSM120::CtaShape128x32x64B,
      CutlassTileConfigSM120::CtaShape128x64x128B,  CutlassTileConfigSM120::CtaShape128x64x64B,
+      CutlassTileConfigSM120::CtaShape128x16x128B,  CutlassTileConfigSM120::CtaShape128x16x64B,


Missing Candidate Configurations in Heuristic

You have added CtaShape128x8x128B and CtaShape128x8x64B to the CutlassTileConfigSM120 enum and the switch-case dispatchers, but they are missing from the all_tiles candidate array in get_candidate_configs_sm120.

Without adding them here, the heuristic search will never consider or profile these tile_n = 8 configurations, making them completely unreachable at runtime when using the heuristic path.

CutlassTileConfigSM120::CtaShape128x16x128B, CutlassTileConfigSM120::CtaShape128x16x64B, CutlassTileConfigSM120::CtaShape128x8x128B, CutlassTileConfigSM120::CtaShape128x8x64B,

gemini-code-assist · 2026-06-02T15:42:57Z


    for swap_ab in [True, False]:
-        for tile_n in [32, 64, 128]:
+        for tile_n in [16, 32, 64, 128]:


Missing Test Coverage for tile_n = 8

You added tile_n = 8 support to the Grouped GEMM kernels, but it is excluded from the test suite here.

To test tile_n = 8 without triggering the ValueError when swap_ab is False, you can dynamically adjust the tile_n list based on swap_ab:

for swap_ab in [True, False]: for tile_n in ([8, 16, 32, 64, 128] if swap_ab else [16, 32, 64, 128]):

gemini-code-assist · 2026-06-02T15:42:57Z

        mma_sm_list = [1]
        tile_m_list = [128]
-        tile_n_list = [32, 64, 128]
+        tile_n_list = [16, 32, 64, 128]


Missing Test Coverage for tile_n = 8

You added tile_n = 8 support to the Grouped GEMM kernels, but it is excluded from the test suite here.

To test tile_n = 8 safely, add 8 to tile_n_list and skip the invalid swap_ab = False configuration inside the test loop:

# In the loop: for mma_sm, tile_m, tile_n, tile_k, swap_ab in product(...): if tile_n == 8 and not swap_ab: continue

Suggested change

tile_n_list = [16, 32, 64, 128]

tile_n_list = [8, 16, 32, 64, 128]

add tileN = 8,16

c03d723

flashinfer-bot added the op: gemm label Jun 2, 2026

b8zhong changed the title ~~add tileN = 8,16 to SM120 blockscale GEMM.~~ [draft] add tileN = 8,16 to SM120 blockscale GEMM. Jun 2, 2026

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] add tileN = 8,16 to SM120 blockscale GEMM.#3495

[draft] add tileN = 8,16 to SM120 blockscale GEMM.#3495
b8zhong wants to merge 1 commit into
flashinfer-ai:mainfrom
bzhng-development:brayden/sm120-tile-n-16

b8zhong commented Jun 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	tile_n_list = [16, 32, 64, 128]
	tile_n_list = [8, 16, 32, 64, 128]

Conversation

b8zhong commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🧪 Tests

Reviewer Notes

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Critical Linker Error Bug

Why this happens:

Suggested Fix:

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Critical Linker Error Bug

Why this happens:

Suggested Fix:

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Missing Candidate Configurations in Heuristic

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Missing Test Coverage for tile_n = 8

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Missing Test Coverage for tile_n = 8

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

b8zhong commented Jun 2, 2026 •

edited

Loading

Missing Test Coverage for `tile_n = 8`

Missing Test Coverage for `tile_n = 8`