feat: add functional per-head FP8 quantization for FA3 #1033

happierpig · 2025-04-23T06:29:32Z

This PR adds FP8 support in FA3 to speed up compute-bound prefill kernels. It follows up on #869.

1. Bug fixes

Fixed deadlock, illegal memory access, and wrong results for varied num_heads and seq_len.
Covered by unit tests.

2. New features

Enabled FP8 in-kernel transpose logic of mainloop_sparse.cuh.
FP8 now works in:
- BatchPrefillWithPagedKVCache
- BlockSparseAttentionWrapper: support sparse and quantized attention

3. Python JIT interface

Exposed kernels to Python:
- BlockSparseAttentionWrapper
- single_prefill_with_kv_cache
Migrated tests and benchmarks to Python scripts:
- tests/test_hopper_fp8_attention.py
- benchmarks/bench_hopper_fp8_attention.py

Note: Performance is on par with #869. Need tuning.

cc @yzh119

yzh119

Thank you @happierpig , great contribution!

happierpig added 4 commits April 23, 2025 04:55

add jit for single prefill fp8 fa3

2066ffb

upd benchmarks

1c5c9cc

add test mse

65170bd

clean code

6ab0163

happierpig requested a review from yzh119 April 23, 2025 06:29

happierpig added 7 commits April 23, 2025 23:10

fix: use stsm to directly do column permutation

9036225

format

a7874a4

add fp8 v tranpose into sparse mainloop

3ba5c57

fix: fix deadlock

1211b4c

upd test cases

23022e2

fix: fix deadlock by adding vt pipeline producer sync

7fa3a82

fix: add memory barrier before WG_MMA write_o to allow STSM finish.

082cf24

happierpig changed the title ~~misc: make python interface for SingleFP8PrefillWithKVCacheDispatched~~ feat: add functional per-head FP8 quantization for FA3 Apr 24, 2025

happierpig added 3 commits April 29, 2025 18:34

fix typo

3bd14cc

upd

90469a1

upd

589fa3e

yzh119 approved these changes Apr 29, 2025

View reviewed changes

yzh119 merged commit 116d97d into flashinfer-ai:main Apr 29, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add functional per-head FP8 quantization for FA3 #1033

feat: add functional per-head FP8 quantization for FA3 #1033

happierpig commented Apr 23, 2025 •

edited

Loading

yzh119 left a comment

feat: add functional per-head FP8 quantization for FA3 #1033

feat: add functional per-head FP8 quantization for FA3 #1033

Conversation

happierpig commented Apr 23, 2025 • edited Loading

1. Bug fixes

2. New features

3. Python JIT interface

yzh119 left a comment

Choose a reason for hiding this comment

happierpig commented Apr 23, 2025 •

edited

Loading