Skip to content

[IterativeTilingAndFusionPass] Wrap linalg.ops in a loop even if the shape is smaller than min tiling size #332

Open
@dchigarev

Description

@dchigarev

In cases where the shape of a linalg operation is smaller or equal to the minimal tile size (which is 32) the operation is untouched and left as it is. That's the problem as our GPU pipeline expects a for-loop (that will later describe a launch grid) after the IterativeTilingAndFusion pass. If there's no loop the pipeline breaks.

For the stability reasons, I would expect that such operations would be wrapped into a single-iteration for-loop just to make pipeline working even on those corner cases:

func.func @linalg_matmul(%arg0: tensor<32x32xf16>, %arg1: tensor<32x32xf16>,
                         %arg2: tensor<32x32xf16>) -> tensor<32x32xf16> {
  %0 = linalg.matmul ins(%arg0, %arg1 : tensor<32x32xf16>, tensor<32x32xf16>)
                     outs(%arg2 : tensor<32x32xf16>) -> tensor<32x32xf16>
  return %0 : tensor<32x32xf16>
}

// Expected output (a tiling for loop consisting of one iteration):
func.func @linalg_matmul(%arg0: tensor<32x32xf16>, %arg1: tensor<32x32xf16>, %arg2: tensor<32x32xf16>) -> tensor<32x32xf16> {
  %0 = scf.forall (%arg3, %arg4) = (0, 0) to (32, 32) step (32, 32) shared_outs(%arg5 = %arg2) -> (tensor<32x32xf16>) {
    %extracted_slice = tensor.extract_slice %arg0[%arg3, 0] [32, 32] [1, 1] : tensor<32x32xf16> to tensor<32x32xf16>
    %extracted_slice_0 = tensor.extract_slice %arg1[0, %arg4] [32, 32] [1, 1] : tensor<32x32xf16> to tensor<32x32xf16>
    %extracted_slice_1 = tensor.extract_slice %arg5[%arg3, %arg4] [32, 32] [1, 1] : tensor<32x32xf16> to tensor<32x32xf16>
    %1 = linalg.matmul ins(%extracted_slice, %extracted_slice_0 : tensor<32x32xf16>, tensor<32x32xf16>) outs(%extracted_slice_1 : tensor<32x32xf16>) -> tensor<32x32xf16>
    scf.forall.in_parallel {
      tensor.parallel_insert_slice %1 into %arg5[%arg3, %arg4] [32, 32] [1, 1] : tensor<32x32xf16> into tensor<32x32xf16>
    }
  }
  return %0 : tensor<32x32xf16>
}

P.S. this is not critical, as in real-life scenarios we would likely not meet ops with such small shapes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions