Skip to content

Conversation

@GuoningHuang
Copy link

@GuoningHuang GuoningHuang commented Oct 27, 2025

This PR introduces an optimized implementation of Norm
It provides both frontend operator fusion and the corresponding lowering pass to support end-to-end execution.

Implemented frontend LayerNorm fusion, combining operations (reduce_sum, rsqrt, mul, add, etc.) into a single fused operator.
Added lowering pass from fused Norm to optimized vectorized IR.
Verified correctness on test_norm.py.
Can be directly integrated into the existing Deepseek R1 inference pipeline.
before opt:
image
after opt:
屏幕截图 2025-11-05 210652

Comment on lines +2082 to +2115
# #Could choose to use scf.parallel to implement the loop_b or loop_i
# loop_b_parallel = scf.ParallelOp([], [c0.result], [dim1_index.result], [c1.result],[])
# blk = loop_b_parallel.region.blocks.append()
# with ir.InsertionPoint(blk):
# b = blk.add_argument(ir.IndexType.get(), ir.Location.unknown())
loop_b = scf.ForOp(c0.result, dim1_index.result, c1.result, [])
with ir.InsertionPoint(loop_b.body):
b = loop_b.induction_variable
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant