Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE2P] Legalize and select VMUL.f from G_FMUL #360

Open
wants to merge 2 commits into
base: aie-public
Choose a base branch
from

Conversation

khallouh
Copy link
Collaborator

No description provided.

@khallouh khallouh force-pushed the hamza.fmul branch 2 times, most recently from 02cc673 to 872e379 Compare February 20, 2025 17:37
@khallouh khallouh marked this pull request as ready for review February 20, 2025 17:39
@khallouh khallouh changed the title [AIE2P] [WIP] Legalize and select VMUL.f from G_FMUL [AIE2P] Legalize and select VMUL.f from G_FMUL Feb 20, 2025
@@ -40,6 +40,8 @@ class VecConf {
int BMODE_16x16_b = 1;
int BMODE_32x16 = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny to have aliases here.

@@ -40,6 +40,8 @@ class VecConf {
int BMODE_16x16_b = 1;
int BMODE_32x16 = 0;

int VARIANT_BF16xBF16_1_elem_1 = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds as if there are more variants. List them all in one go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could but I'm not sure if we will ever be able to use all of them them in any patterns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the translation of a hardware enumeration into tablegen speak. I'm hoping that one day we'll have a single point of definition for these, and the full list would make them more recognisable.

@@ -59,6 +61,7 @@ class VecConf {
}

def accfp32_vecconf : VecConf { let amode = AMODE_FP32; let bmode = BMODE_16x16; }
def mulbf16_vecconf : VecConf { let amode = AMODE_FP32; let bmode = BMODE_16x16; let cmode = VARIANT_BF16xBF16_1_elem_1; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a local definition, I wouldn't mind using CMODE as prefix.

sub_1024_acc_hi)),
sub_512_hi))>;

def : Pat<(v32bf16 (fmul v32bf16:$vec1, v32bf16:$vec2)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a standard legalization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, I don't know any but for the wider v64bf16 case above we could possibly use .fewerElements to keep only one pattern. I will try it.

@@ -225,12 +225,17 @@ AIE2PLegalizerInfo::AIE2PLegalizerInfo(const AIE2PSubtarget &ST)

getActionDefinitionsBuilder(G_FABS).customFor({S16, S32, S64}).scalarize(0);

getActionDefinitionsBuilder(G_FMUL)
.legalFor({V64S16, V32S16})
.customFor({S16})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to retain .clampScalar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have custom legalization for S16 now, no need to clamp it to S32/S64. Any other scalar should be illegal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean .clampScalar(0, S16, S64)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why? the only float type under 16 bits we have is bfloat (aka S16)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Just pointing that we deviate from old behavior, s128 to s64 or s8 to s32. But you are right, it does not make sense for these types.

@@ -5,7 +5,6 @@
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates

# RUN: llc -mtriple aie2 -run-pass=legalizer %s -verify-machineinstrs -o - | FileCheck -DVER=2 --check-prefix=COMMON --check-prefix=AIE2 %s
# RUN: llc -mtriple aie2p -run-pass=legalizer %s -verify-machineinstrs -o - | FileCheck -DVER=2p --check-prefix=COMMON --check-prefix=AIE2P %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have AIE2P checkline in the test. You could also remove -DVER=2 --check-prefix=COMMON

#
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates

# RUN: llc -mtriple aie2p -run-pass=legalizer %s -verify-machineinstrs -o - | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to include the libcall tests as well.

Copy link
Collaborator Author

@khallouh khallouh Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already didn't have them but I will add them while at it.

@@ -225,12 +225,17 @@ AIE2PLegalizerInfo::AIE2PLegalizerInfo(const AIE2PSubtarget &ST)

getActionDefinitionsBuilder(G_FABS).customFor({S16, S32, S64}).scalarize(0);

getActionDefinitionsBuilder(G_FMUL)
.legalFor({V64S16, V32S16})
.customFor({S16})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a comment to explain why we would customize this for s16. I dont really get the context here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have an instruction to multiply bf16 scalars, so instead of using an inefficient and potentially unsafe libcall (e.g. in the case of hardware loops) we need custom legalization by inserting the bf16 scalar into a vector, perform the element wise multiplication with VMUL.f and extract the bf16 scalar again. I can add this explanation as a comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same as for FADD / FSUB. We implement a scalar multiplication by a full element by element vector mul.

const unsigned InsertEltOpc =
ST.getInstrInfo()->getGenericInsertVectorEltOpcode();

const Register IdxReg = MIRBuilder.buildConstant(S32, 0).getReg(0);
Copy link
Collaborator

@martien-de-jong martien-de-jong Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cheaper to broadcast? Or is this picked up by a push.lo?

@@ -222,6 +225,26 @@ def : Pat<(fadd ACC2048:$acc1, ACC2048:$acc2),
def : Pat<(fsub ACC2048:$acc1, ACC2048:$acc2),
(VSUB_f_vmac_cm2_add_reg ACC2048:$acc1, ACC2048:$acc2, (i32 accfp32_vecconf.ConfBits))>;

// MUL
def : Pat<(v64bf16 (fmul v64bf16:$vec1, v64bf16:$vec2)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check: We are performing the same multiplication twice: one for extract lo and other to extract hi. I guess we cannot express an optimized reuse of the same VMUL here, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants