[PyTorch] Add FA4 Support #2432

yaox12 · 2025-11-28T09:53:30Z

Description

Very initial effort to add FA4.

Done:

Basic GQA/MQA (head dim = 64, 96, 128) support for SM100

Known issues:

SM90 is not working, so I just disabled it.
Some configurations only works for FWD, and I just disabled them, e.g.,
- qk_head_dim = 192 and v_head_dim = 128
- packed sequence

TODO:

Add tests
Correctly handle sliding window

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Xin Yao <[email protected]>

for more information, see https://pre-commit.ci

yaox12 · 2025-11-28T09:55:34Z

transformer_engine/pytorch/attention/dot_product_attention/backends.py

-                if not use_flash_attn_3:
+                if use_flash_attn_4:
+                    fa_4_optional_forward_kwargs = {
+                        # "window_size": window_size,


The default window_size = (-1, 0) doesn't mean no sliding window for FA4.

yaox12 · 2025-11-28T09:56:59Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py

        if use_flash_attention_2 and FlashAttentionUtils.is_installed:
            logger.debug("Disabling FlashAttention 2 as it does not support MLA.")
            use_flash_attention_2 = False
+        if use_flash_attention_4 and FlashAttentionUtils.v4_is_installed:


Currently FA4 only supports MLA in forward.

yaox12 · 2025-11-28T09:58:01Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py

        use_flash_attention_2 = False
-    if use_flash_attention_3:
+
+    if use_flash_attention_3 and FlashAttentionUtils.v3_is_installed:


Move FlashAttentionUtils.v3_is_installed ahead so we don't need to check it if not installed.

yaox12 · 2025-11-28T09:58:34Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py

                    " not supported for compute capability = sm120"
                )
            use_fused_attention = False
+        if use_flash_attention_4 and FlashAttentionUtils.v4_is_installed:


FA4 only supports packed sequence in fwd.

yaox12 · 2025-11-28T10:03:17Z

transformer_engine/pytorch/attention/dot_product_attention/backends.py

        if flash_attention_backend is not None and flash_attention_backend > PkgVersion("3.0.0b"):
            use_flash_attn_3 = True
+        use_flash_attn_4 = False
+        if flash_attention_backend is not None and str(flash_attention_backend).endswith("cute"):


The suffix cute is added in get_attention_backend because FA4 is released with the package name flash-attn-cute and version starting from 0.1.0. We need to add the ".cute" suffix to the version number to distinguish.

yaox12 · 2025-11-28T10:05:35Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py


    # `FusedAttention` and `FlashAttention` are faster backends than `UnfusedDotProductAttention`.
    # When `FusedAttention` does not support the provided attention params, and `FlashAttention`
    # does, we recommend users to install flash-attn if not installed already.


This might not be working correctly.

Since many checks above has

if use_flash_attention_3 and FlashAttentionUtils.v3_is_installed: if xxx: use_flash_attention_3 = False

Many checks are skipped if FA3 is not installed. So even use_flash_attention_3 == True here doesn't mean all the requirements are met.

yaox12 and others added 2 commits November 28, 2025 09:48

initial effort of adding fa4

db7c09e

Signed-off-by: Xin Yao <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

831cb06

for more information, see https://pre-commit.ci

yaox12 commented Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Add FA4 Support #2432

[PyTorch] Add FA4 Support #2432

Uh oh!

yaox12 commented Nov 28, 2025

Uh oh!

yaox12 Nov 28, 2025

Uh oh!

yaox12 Nov 28, 2025 •

edited

Loading

Uh oh!

yaox12 Nov 28, 2025

Uh oh!

yaox12 Nov 28, 2025

Uh oh!

yaox12 Nov 28, 2025

Uh oh!

yaox12 Nov 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[PyTorch] Add FA4 Support #2432

Are you sure you want to change the base?

[PyTorch] Add FA4 Support #2432

Uh oh!

Conversation

yaox12 commented Nov 28, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

yaox12 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

yaox12 Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaox12 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

yaox12 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

yaox12 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

yaox12 Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yaox12 Nov 28, 2025 •

edited

Loading

yaox12 Nov 28, 2025 •

edited

Loading