Add logic for block-scaled tensors with GEMM swizzled scales #2486

timmoon10 · 2025-12-06T02:48:10Z

Description

All of the supported block-scaled tensor formats (MXFP8, NVFP4, DSv3 FP8) have two ways of ordering their scaling factors:

"Compact" ordering for quantization, dequantization, and communication
"Swizzled" ordering for GEMM

The core infrastructure handles this in an ad hoc way, blindly assuming that the "right" scale ordering is used for the different operations. The PyTorch infrastructure only supports MXFP8 and NVFP4 scales are in compact order, although DSv3 FP8 does have awareness of "compact" and "GEMM-ready" formats. This situation makes it hard to implement fused kernels that can bypass the swizzle kernel.

This PR adds a with_gemm_swizzled_scales field in the C++ tensor class so that the core infrastructure can distinguish between the different scale orderings. It also adds this field in the PyTorch quantized tensor classes, and exposes a optimize_for_gemm option in the quantizer so that we can create tensors that do not need communication or checkpointing.

Progress

MXFP8
DSv3 FP8
NVFP4
Add option to pre-swizzle weights

Closes #2446.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Support GEMM swizzled scales in C++ tensor class
Support GEMM swizzled scales in PyTorch quantized tensor classes
Support optimize_for_gemm option in PyTorch quantizer
Expose PyTorch function to swizzle scales

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 added 9 commits December 4, 2025 21:51

Add general C API for setting tensor params

0563c1a

Signed-off-by: Tim Moon <[email protected]>

Implement general accessors for NVTETensor

5c9b1be

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into tmoon/pre-swizzled-scales

219ddc1

Refactor tex swizzling to skip if scales are already swizzled

1c49646

Signed-off-by: Tim Moon <[email protected]>

Add checks for non-swizzled scales in MXFP8 and NVFP4 kernels

5f60184

Signed-off-by: Tim Moon <[email protected]>

Support pre-swizzled scales in MXFP8Tensor

21ec928

Signed-off-by: Tim Moon <[email protected]>

Add tex function to swizzle MXFP8 scales

fa7e7c0

Signed-off-by: Tim Moon <[email protected]>

Fix bug in inplace swizzle function

b796c96

Signed-off-by: Tim Moon <[email protected]>

Tweak comments to use "compact/swizzled format"

52ce3a4

Signed-off-by: Tim Moon <[email protected]>

timmoon10 force-pushed the tmoon/pre-swizzled-scales branch from d274220 to 52ce3a4 Compare December 6, 2025 02:53

[pre-commit.ci] auto fixes from pre-commit.com hooks

5c7c1d9

for more information, see https://pre-commit.ci

timmoon10 added enhancement New feature or request refactor labels Dec 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add logic for block-scaled tensors with GEMM swizzled scales #2486

Add logic for block-scaled tensors with GEMM swizzled scales #2486

timmoon10 commented Dec 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add logic for block-scaled tensors with GEMM swizzled scales #2486

Are you sure you want to change the base?

Add logic for block-scaled tensors with GEMM swizzled scales #2486

Conversation

timmoon10 commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timmoon10 commented Dec 6, 2025 •

edited

Loading