-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[codegen][gpu] Use HWFC
filter layout for SDXL convolution Ops
#19701
Comments
Just to update on possible output of this task (and input for the im2col pipeline), I used this conv op
and used
(with minor changes to the pass)
Then manually modified it for the desired layout to
IR1 goes to IGEMM tileandfuse, while IR2 and IR3 currently go to vectordistribute which also targets intrinsic for them |
Thanks for leaving details regarding the Changes in IR3 looks reasonable, basically swapping filter last two dimensions, as well as the affine map |
Do we want to do |
|
Talked with @Max191 on this who indicate we should support both For The progress as of now is to get one flavor implemented and test with sdxl, make sure the transpose can be const-folded. Then implement and check another layout. |
Could we count this as done? |
In terms of the implementation the pass is implemented. But, I'd use this to also track
With the two PRs above, the pass will be exposed as we expected end to end. |
We may also still want to investigate the I think once we land the patches that Jerry listed above, we can close this issue, but we should open another issue tracking the performance comparison for the 2 layout options. |
@Max191 Agreed. Please refer to #20105 for the new follow-up ticket. Since all work tracked by this ticket is done, I'm closing it as finished. Relevant work of this ticket include:
|
Context
For NHWC convolution, we want to make the filter hwfc layout, making the convolution equivalent to
linalg.conv_2d_nhwc_hwfc
op. As of right now, there is no op for this, so it will need to be alinalg.generic
op representation. The reason we want this layout is so that we can load nice contiguous vectors along the inner reduction dimension (C dimension).Implementation
The conversion that needs to be supported includes
nhwc_hwcf -> nhwc_hwfc
. The other nhwc convolution op (nhwc_fhwc
) doesn't needs to be support for now sincesdxl-scripts
repo<alibaba_fp16>
branch dumped result includes onlyconv_2d_nhwc_hwcf
ops. (Takingconfigured_compiled_unet_run_forward$async_dispatch_151.mlir
as an example).The implementation should mimic an existing pass
ConvertConvFilterToChannelsLast.cpp
, but simpler since only a single input needs to be considered now.Output
linalg.conv_2d_nhwc_hwfc
op aslinalg.genric form
Thanks to @Max191 for assisting with scope of the task.
The text was updated successfully, but these errors were encountered: