Bf16*fp4 gemm #2801

eliotwang · 2025-09-08T12:14:23Z

Proposed changes

Added an example of bf16*fp4 gemm, where fp4 and fp4_scale are in uint8 data format. In the pipeline, matrix B(fp4) will be dequantized to bf16 before performing multiplication operations.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

spolifroni-amd

Readme looks ok.

ThomasNing

Could we add the gtest for the new developed kernel?

include/ck_tile/core/arch/amd_buffer_addressing.hpp

ThomasNing · 2025-11-11T09:37:40Z

example/ck_tile/38_block_scale_gemm/gemm_quant_mxfp4.cpp

+#include "ck_tile/host.hpp"
+#include "gemm_utils.hpp"
+
+template <typename GemmConfig,


We should also put the mx gemm into the example of blockscale gemm and share the util and example.inc code.

So, for reusing the code, my understanding is that we need to add our own mx_gemm.cpp entry interface within example/38_* instead of defining a new example/45_* like we are doing now?

Yes, we do not need to create a mxfp4_gemm example operator. We should just have a .cpp file under example/38_*/mx_gemm.cpp. The datatype should also be a configuration to that example..

ThomasNing · 2025-11-11T09:39:11Z

include/ck_tile/ops/gemm_quant/kernel/gemm_quant_kernel.hpp

-                            make_tuple(kargs.stride_B, 1),
-                            number<GemmPipeline::GetVectorSizeB()>{},
-                            number<1>{});
+                        if constexpr(std::is_same_v<BDataType, pk_fp4_raw_t>)


Why in fp4 we will transposed in the data type comparing to other data types?

In the original gemm_quant kernel, we noticed that for the tensor_view definitions of B and Bq, B's shape is taken as (N, K), while Bq is (K, N). We don't quite understand why Bq needs to be transposed here, as in our implementation both B and Bq are defined as (N, K).

I see. After reviewing the solution, we could reunify into one version without the transpose. I will create a PR to unify that soon, so we do not need that unnecessary transpose branch.

…posable_kernel into bf16_fp4_gemm

illsilin · 2025-11-13T18:37:44Z

Hi @eliotwang, please resolve conflicts and sync branch to latest develop in order to proceed! Thanks!

ThomasNing · 2025-11-17T23:31:05Z

@eliotwang LGTM overall. Please add the unit test.

eliotwang · 2025-11-19T03:29:43Z

@eliotwang LGTM overall. Please add the unit test.

We have added unit tests for bf16_mxfp4_gemm in the test/ck_tile/gemm_block_scale/ directory. Please help review it.

ThomasNing · 2025-11-20T05:46:58Z

test/ck_tile/gemm_block_scale/test_gemm_quant_base.hpp

    {
-        using ComputeType =
-            std::conditional_t<sizeof(ADataType_) < sizeof(BDataType_), ADataType_, BDataType_>;
+        // using ComputeType =


Code Leftovers?

Has been updated. Please help review it.

ThomasNing · 2025-11-21T00:58:44Z

@eliotwang LGTM, we could do the last iteration of the merging after the PR #3245 merged to the develop. Thanks!

cc. @CongMa13

eliotwang and others added 3 commits September 5, 2025 07:50

support bf16*mxfp4 gemm

d1bf200

rebase bf16*fp4 example to develop branch

4e205c4

Clean up commented debug code in GEMM kernel

52c5ed5

eliotwang requested review from a team, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, coderfeli, ddembeckAMD, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners September 8, 2025 12:14

Merge branch 'develop' into bf16_fp4_gemm

e1d0365

spolifroni-amd previously approved these changes Sep 8, 2025

View reviewed changes

Merge branch 'develop' into bf16_fp4_gemm

43db1f7

eliotwang changed the title ~~Bf16 fp4 gemm~~ Bf16*fp4 gemm Sep 9, 2025

charyang-ai requested review from charyang-ai and zhangnju September 9, 2025 04:50

rename example folder

ff89459

eliotwang dismissed spolifroni-amd’s stale review via ff89459 September 9, 2025 11:00

eliotwang added 2 commits September 9, 2025 19:03

Merge branch 'develop' into bf16_fp4_gemm

1409d62

Merge branch 'develop' into bf16_fp4_gemm

637f2e8

Update README.md

01f5c75

ThomasNing requested changes Nov 11, 2025

View reviewed changes

eliotwang and others added 5 commits November 13, 2025 06:26

update code according to reviewer's comment

92d7082

Merge branch 'bf16_fp4_gemm' of https://github.com/eliotwang/heyi_com…

7e32cb9

…posable_kernel into bf16_fp4_gemm

update code according to reviewer's comment

48e6393

Update CMakeLists.txt

88c6a8c

Merge branch 'develop' into bf16_fp4_gemm

87c4e07

eliotwang and others added 6 commits November 14, 2025 10:55

Merge remote-tracking branch 'upstream/develop' into bf16_fp4_gemm

6883654

Update README.md

3579741

Update CMakeLists.txt

9579e6f

Delete files

8a4ac27

Delete files

f6ffb76

Merge branch 'develop' into bf16_fp4_gemm

5225b4d

Merge branch 'develop' into bf16_fp4_gemm

eb15154

eliotwang closed this Nov 18, 2025

eliotwang reopened this Nov 18, 2025

eliotwang and others added 3 commits November 18, 2025 18:18

Merge branch 'develop' into bf16_fp4_gemm

ba12e7d

Add unit tests

fde6e39

Merge branch 'develop' into bf16_fp4_gemm

f54857f

eliotwang closed this Nov 19, 2025

eliotwang reopened this Nov 19, 2025

eliotwang added 2 commits November 19, 2025 19:02

Merge branch 'develop' into bf16_fp4_gemm

7ceedeb

Merge branch 'develop' into bf16_fp4_gemm

d5ce464

ThomasNing reviewed Nov 20, 2025

View reviewed changes

eliotwang added 2 commits November 20, 2025 16:25

Update test_gemm_quant_base.hpp

329c601

Merge branch 'develop' into bf16_fp4_gemm

8c75bc1

Bf16*fp4 gemm #2801

Are you sure you want to change the base?

Bf16*fp4 gemm #2801

Conversation

eliotwang commented Sep 8, 2025

Proposed changes

Checklist

Discussion

Uh oh!

spolifroni-amd left a comment

Choose a reason for hiding this comment

Uh oh!

ThomasNing left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

illsilin commented Nov 13, 2025

Uh oh!

ThomasNing commented Nov 17, 2025

Uh oh!

eliotwang commented Nov 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThomasNing commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants