-
Notifications
You must be signed in to change notification settings - Fork 252
Add grouped gemm instances for RDNA4 #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add grouped gemm instances for RDNA4 #3237
Conversation
…ot break old tests
… problems still being ran
…remove unnecessary paths
…ing fails in this case
…ved to ck namespace in develop
… unsupported instances
| BaseArgument& operator=(const BaseArgument&) = default; | ||
|
|
||
| virtual ~BaseArgument() {} | ||
| virtual __host__ __device__ ~BaseArgument() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use CK_TILE_HOST_DEVICE macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is "old" CK, not CK Tile. I'd imagine we don't want to include the header for CK tile here?
I looked for a similar macro in old CK, but there doesn't seem to be one. Other code also defines the attributes inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Thank you for the response!
|
Looks good. // Copyright (c) Advanced Micro Devices, Inc., or its affiliates. Could you please add these to every file that's missing copyright headers and replace the old ones with this new one? |
Done, added it to all new/moved files. Will add this in the future as well. |
Proposed changes
This PR adds support for running grouped gemm operations on RDNA3/4 using WMMA instructions. The PR contains:
GridwiseGemm_wmma_cshuffle_v3profile_grouped_gemm_implfunction to accept a parameter that makes it fail the test if no supported instances could be found. Previously it would silently pass the test. The parameter is optional and defaults to the old behaviour to not break old testsGridwiseGemm_wmma_cshuffle_v3::Run()interface to allow passing in a customBlock2CTilemap, which was necessary to handle the non-uniform dimensions of grouped gemmFORCE_DISABLE_XDLandFORCE_DISABLE_WMMA)Other algorithm variants for grouped gemm will be added as follow-up PRs.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered