Skip to content

Conversation

cmx-Y
Copy link
Contributor

@cmx-Y cmx-Y commented May 9, 2025

This PR aims to reproduce the methods proposed in the Exo paper to enable matrix multiplication acceleration based on Gemmini.

  1. This pass has already implemented optimizations such as tile_outer_loops, fission_inner_blocks, and replace_gemmini_calls as described in Exo. Support for lift_config and on-chip memory management will be added in the future.
  2. Currently, the pass only supports input dimensions that are multiples of 16; tail cases are not yet handled.
  3. During the development of this pass, we observed that the LegalizeForLLVM phase requires generating ConstantOp, so the mvin operation does not support dynamic shapes (e.g., memref). Moreover, when the shape exceeds 16, the existing pipeline is likely to encounter bugs. In contrast, this pass is capable of handling dynamic shapes properly.

Copy link
Member

@linuxlonelyeagle linuxlonelyeagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few brief comments.

linalg.fill
ins(%1 : i8)
outs(%mem1 : memref<32x32xi8>)
linalg.matmul
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the content to be check.

MatMulOptimize.cpp
MatMulVectorization.cpp
MatMulGemmini.cpp
MatMulParallelVectorization.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rearrange the C++ files here. Use a dictionary sort.

ShapedType ATy = A.getType().cast<ShapedType>();
Type eleTy = ATy.getElementType();
ShapedType BTy = B.getType().cast<ShapedType>();
// ShapedType CTy = C.getType().cast<ShapedType>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete the content in the comments?

Value B = op->getOperand(1);
Value C = op->getOperand(2);
// Get shape of input and output
ShapedType ATy = A.getType().cast<ShapedType>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use miler::cast(A.getType())

ShapedType BTy = B.getType().cast<ShapedType>();
// ShapedType CTy = C.getType().cast<ShapedType>();

auto ctx = op->getContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best not to use auto.

auto ctx = op->getContext();
// Some constants.
const Value c0 =
rewriter.create<arith::ConstantOp>(loc, rewriter.getIndexAttr(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use constants with Value.

// This algorithm does not use the column A index.
// const Value aCol = rewriter.create<memref::DimOp>(loc, A, c1);
const Value bRow = rewriter.create<memref::DimOp>(loc, B, c0);
const Value bCol = rewriter.create<memref::DimOp>(loc, B, c1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies here.

auto forI = rewriter.create<affine::AffineForOp> (
loc, 0, rewriter.getIndexAttr(ATy.getDimSize(0)).getInt(), 1, std::nullopt);
// auto forI = rewriter.create<affine::AffineForOp>(
// loc, ValueRange{c0}, rewriter.getDimIdentityMap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete the content here?

SmallVector<affine::AffineForOp, 6> band;
band.push_back(forI);
band.push_back(forJ);
band.push_back(forK);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmallVector<affine::AffineForOp, 6> band = {init,}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants