-
Notifications
You must be signed in to change notification settings - Fork 453
Fix the bugs about operator registration by PyTorch Dispatcher #2786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix bugs related to PyTorch Dispatcher operator registration. The changes primarily involve refactoring the rotary_embedding
operator to be an in-place operation and adjusting its registration and usage accordingly. While the changes in the Python test and usage files seem correct, I've found a few critical issues in the C++ implementation and a significant issue in the Python ops files that could lead to runtime errors or incorrect behavior. Specifically, operator definitions are missing, there's a potential for a crash due to unchecked access to an optional value, and the Python wrappers for the custom op no longer preserve the original tensor shapes, which is likely to break downstream code.
@Yikun The draft is ready, please help to take a look at it. |
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
78344d6
to
7078cb8
Compare
csrc/torch_binding.cpp
Outdated
cmd.Run(); | ||
return {query_dst, key_dst}; | ||
|
||
query.copy_(query_dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some inputs for this kernel. We Implement this rotary embedding originally is aiming for prevent additional memory reorder triggered by contiguous. Although the changes in this PR aligns the torch schema with the vllm's impl, but it may bring huge regression on e2e scenario( stride tensor -> contiguous tensor -> stride tensor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes will bring no performance advantage compared with torch_npu._npu_rotary_embedding
, If we are looking for adopt this in real workload, I do not suggest this changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thank you.
Restore all the changes and keep it as it is
I remember we won't compile the cpu version of vllm right @Yikun @wangxiyuan ? If we don't compile the cpu version, this shouldn't happen. And if we have to compile a cpu version of vllm, I suggest we just adopt another naming for this kernel, or just add a overload version of rope. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
We are recommand to install the vllm cpu version in docs: https://vllm-ascend.readthedocs.io/en/latest/installation.html#setup-vllm-and-vllm-ascend. |
Thank you for your helpful advices and have followed it to update this pr, please take a look at it, thank you. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (41.93%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2786 +/- ##
==========================================
+ Coverage 74.76% 75.03% +0.26%
==========================================
Files 150 154 +4
Lines 20891 21290 +399
==========================================
+ Hits 15620 15974 +354
- Misses 5271 5316 +45
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
**Background:** There are two principles about operator registration in PyTorch - The same namespace can be only registered once by `TORCH_LIBRARY` - The operator signatures can be only registered once by `def` Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator pattern by vLLM, all accelerators then follow this operator and complete the implementation based on their respective hardware, which is conducive to module functional abstraction. Therefore, we can rename the operator registration namespace to an Ascend-specific namespace. Signed-off-by: FFFrog <[email protected]>
LGTM |
…project#2786) **Background:** There are two principles about operator registration in PyTorch - The same namespace can be only registered once by `TORCH_LIBRARY` - The operator signatures can be only registered once by `def` Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator schema by vLLM, all accelerators then follow this operator schema and complete the implementation based on their respective hardware, which is conducive to functional abstraction. Therefore, we can rename the operator registration namespace to an Ascend-specific namespace(**_C_ascend**). Related ISSUE: vllm-project#2742 - vLLM version: main - vLLM main: vllm-project/vllm@f592b31 Signed-off-by: FFFrog <[email protected]> Signed-off-by: offline0806 <[email protected]>
Background:
There are two principles about operator registration in PyTorch
TORCH_LIBRARY
def
Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator schema by vLLM, all accelerators then follow this operator schema and complete the implementation based on their respective hardware, which is conducive to functional abstraction.
Therefore, we can rename the operator registration namespace to an Ascend-specific namespace(_C_ascend).
Related ISSUE: #2742