Skip to content

Conversation

@Yssx-g
Copy link
Contributor

@Yssx-g Yssx-g commented Nov 27, 2025

Added Llama Mode matmul for the decode stage.Implemented the handwritten-style next-matmul-llama.mlir and the corresponding Pass,MatMulLlamaOptimize.cpp.Also added the test case matmul-vectorization-llama.mlir.It has been successfully registered in buddy-opt and can be directly invoked using the-matmul-vectorization-llama option.
Below is the performance comparison for the deepseekR1 decode stage use case,with the upper part representing the BLIS mode and the lower part representing the Llama mode.
PixPin_2025-11-28_00-34-56
That‘s all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant