Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Constant tensors caching pass #183

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

niuxiaog
Copy link
Contributor

Related to issue #146 .

@niuxiaog niuxiaog added documentation Improvements or additions to documentation ready to review labels Jul 24, 2024
%feature3 = linalg.matmul(%feature2, %foldedWeight2)
return %feature3
}
compiletime_fold(%weight1: tensor<*xbf16>) -> %foldedWeight0, %foldedWeight1: tensor<*xbf16>, tensor<*xbf16> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiletime_fold will be a problem because:

  1. The kernel binary will contain the whole tensor to be folded, which is too large. If we only want to use runtime_fold from it, the binary size is wasted and this is not friendly for kernel cache.
  2. For GPU device, we may want to do the folding on CPU. We shouldn't put three functions in the same module to achieve that.

If we want to support direct compile-time folding, I suggest following the direction of section 2.1 to implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the compile time available tensors, the integration can choose to:

  • lower them to arith.constant, which is not suggested,
  • put them into the arguments list of the module and mark them as compiletime_const_args,
  • put them into the arguments list of the module and mark them as runtime_const_args.

For the first two choices, they will be folded by compiletime_fold, and for the third choice, by runtime_fold. There will be no large literal tensors in the kernel for the last two choices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't put three functions in the same module to achieve that.

I'm not clear if we can generate a new module in the pass pipeline. If so, shall we put the compiletime_fold in one module, and runtime_fold and compute in another module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear if we can generate a new module in the pass pipeline. If so, shall we put the compiletime_fold in one module, and runtime_fold and compute in another module?

I think so. But my current thinking is, we can support compiletime_fold in the future when there's a demand for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will put all folding operations into runtime_fold.

@lmontigny
Copy link

waiting from OV integration, how they handle const weight..?
with TPP -> use it in IR as const.
to discuss with OV on iteration 4.

Copy link

@lmontigny lmontigny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC LGTM
Do we have an idea of the performance impact of this pass on some example?

@niuxiaog
Copy link
Contributor Author

RFC LGTM Do we have an idea of the performance impact of this pass on some example?

Ideally, the execution time of operations on weights of matmuls, such as tensor.pack and linalg.transpose, will be saved. The ratio of improvement will depend on actual models. We will collect performance data after all is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants