You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use Linear Layout to describe 2D block loads (#3708)
This PR introduces a new linear layout in the Triton XPU Load to LLVM
lowering for block loads. I split the creation of the layouts out of the
larger PR and focused on using the layouts to compute the `(x,y)`
offsets for the 2D block load instructions to ensure correctness of the
layout. The shuffle vectors are still being generated using existing
loop variables.
The layout describes the block load in terms of three input parameters:
* `offset` which is the 1D offset into the loaded data for a single DPAS
invocation inside a sub-group
* `iteration` which identifies the DPAS invocation when multiple DPAS
invocations share a single load
* `load` which identifies the load index when multiple loads occur for a
given operand
The output of the layout function identifies the global (x,y) tensor
coordinate within a given load. This was designed to allow composition
of the DPAS layout and the load layout to go from offset, iteration,
load to block, warp, lane, register or vice versa.
Currently the block load / tile layout is implemented within the
existing loop structure. But, the layout was designed to be used to
generate the 2D block loads by iterating over layout parameters. The
existing loop structure is still in place and debug info can be enabled
which prints the previously generated values and the linear layout
values for easy debugging. I am planning to generate the shuffle vectors
using composition of layouts between the DPAS layout and load layout
next.
The linear layout is used by default but can be disabled via a flag for debugging.
cc #3008
supersedes #3487
0 commit comments