[Cute] Update deprecated cute DSL APIs #2148

henrylhtsang · 2026-01-07T23:25:01Z

note: work done by claude code

Replace deprecated cute DSL function calls with their new equivalents:

cute.make_fragment → cute.make_rmem_tensor
cute.make_fragment_like → cute.make_rmem_tensor_like [Note: this doesn't contribute to the deprecation warning, see https://github.com/NVIDIA/cutlass/blob/f86feb0aa8a9490a7ab27bc991e36d7b5bf300e3/media/docs/pythonDSL/cute_dsl_api/changelog.rst#L22]
cute.arch.exp2(x) → cute.math.exp2(x, fastmath=True)

Before:

cute/test_flash_attn.py: 1500 warnings
  /home/henrylhtsang/.conda/envs/flash/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/_mlir_helpers/op.py:60: DeprecationWarning: `make_fragment` is deprecated, use `make_rmem_tensor` instead
    res_or_list = opFunc(*args, **kwargs, loc=loc)

cute/test_flash_attn.py: 9440 warnings
  /home/henrylhtsang/.conda/envs/flash/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/_mlir_helpers/op.py:60: DeprecationWarning: cute.arch.exp2 is deprecated, use cute.math.exp2 with `fastmath=True` instead
    res_or_list = opFunc(*args, **kwargs, loc=loc)

After:

cd ~/flash-attention/tests/cute
pytest .

cd ~/flash-attention/tests/cute
pytest --collect-only -q 2>/dev/null | grep "::" | sed 's|^cute/||' | shuf | head -100 | xargs pytest -x

Latest test run after rebase

cd ~/flash-attention/tests/cute
pytest .

================================================================================================== warnings summary ==================================================================================================
cute/test_mask_mod.py: 71 warnings
  /home/henrylhtsang/flash-attention/flash_attn/cute/mask.py:367: DSLOptimizationWarning: This static loop has 128 iterations, which may be very slow to compile, consider using `cutlass.range(..., unroll_full=True)` instead.
    for i in cutlass.range_constexpr(ncol):

cute/test_mask_mod.py::test_mask_mod_ima_partial_block
  /home/henrylhtsang/.conda/envs/flash/lib/python3.12/site-packages/torch/nn/attention/flex_attention.py:1687: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel.
  
  SOLUTION: Use torch.compile(flex_attention)(...)
  
  If you want to debug your score_mod/mask_mod, you can set:
  torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True
  
  This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results.
    _warn_once(

cute/test_mask_mod.py: 46 warnings
  /home/henrylhtsang/flash-attention/flash_attn/cute/mask.py:510: DSLOptimizationWarning: This static loop has 64 iterations, which may be very slow to compile, consider using `cutlass.range(..., unroll_full=True)` instead.
    for i in cutlass.range_constexpr(ncol):

cute/test_mask_mod.py: 46 warnings
  /home/henrylhtsang/flash-attention/flash_attn/cute/mask.py:533: DSLOptimizationWarning: This static loop has 64 iterations, which may be very slow to compile, consider using `cutlass.range(..., unroll_full=True)` instead.
    for i in cutlass.range_constexpr(ncol):

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================== 49957 passed, 34769 skipped, 164 warnings in 20350.92s (5:39:10) ==========================================================================

Replace deprecated cute DSL function calls with their new equivalents: - `cute.make_fragment` → `cute.make_rmem_tensor` - `cute.make_fragment_like` → `cute.make_rmem_tensor_like` - `cute.arch.exp2(x)` → `cute.math.exp2(x, fastmath=True)` This fixes ~11k deprecation warnings when running the cute tests.

henrylhtsang · 2026-01-16T00:35:30Z

maybe @jayhshah? I just finished testing.

henrylhtsang marked this pull request as draft January 7, 2026 23:27

henrylhtsang force-pushed the update_syntax branch from 522a32e to ef7515b Compare January 7, 2026 23:37

henrylhtsang marked this pull request as ready for review January 7, 2026 23:46

henrylhtsang added 3 commits January 15, 2026 10:11

lint

a2e922d

[Cute] Update remaining make_fragment_like to make_rmem_tensor_like

e68e8b0

henrylhtsang force-pushed the update_syntax branch from 8c8cbd0 to e68e8b0 Compare January 15, 2026 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cute] Update deprecated cute DSL APIs #2148

[Cute] Update deprecated cute DSL APIs #2148

Uh oh!

henrylhtsang commented Jan 7, 2026 •

edited

Loading

Uh oh!

henrylhtsang commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Cute] Update deprecated cute DSL APIs #2148

Are you sure you want to change the base?

[Cute] Update deprecated cute DSL APIs #2148

Uh oh!

Conversation

henrylhtsang commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Latest test run after rebase

Uh oh!

henrylhtsang commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

henrylhtsang commented Jan 7, 2026 •

edited

Loading