Skip to content

Conversation

FFFrog
Copy link
Contributor

@FFFrog FFFrog commented Sep 5, 2025

Background:

There are two principles about operator registration in PyTorch

  • The same namespace can be only registered once by TORCH_LIBRARY
  • The operator signatures can be only registered once by def

Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator schema by vLLM, all accelerators then follow this operator schema and complete the implementation based on their respective hardware, which is conducive to functional abstraction.

Therefore, we can rename the operator registration namespace to an Ascend-specific namespace(_C_ascend).

Related ISSUE: #2742

@FFFrog FFFrog marked this pull request as draft September 5, 2025 09:39
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix bugs related to PyTorch Dispatcher operator registration. The changes primarily involve refactoring the rotary_embedding operator to be an in-place operation and adjusting its registration and usage accordingly. While the changes in the Python test and usage files seem correct, I've found a few critical issues in the C++ implementation and a significant issue in the Python ops files that could lead to runtime errors or incorrect behavior. Specifically, operator definitions are missing, there's a potential for a crash due to unchecked access to an optional value, and the Python wrappers for the custom op no longer preserve the original tensor shapes, which is likely to break downstream code.

@FFFrog
Copy link
Contributor Author

FFFrog commented Sep 5, 2025

@Yikun The draft is ready, please help to take a look at it.

Copy link

github-actions bot commented Sep 5, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

cmd.Run();
return {query_dst, key_dst};

query.copy_(query_dst);
Copy link
Collaborator

@ganyi1996ppo ganyi1996ppo Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some inputs for this kernel. We Implement this rotary embedding originally is aiming for prevent additional memory reorder triggered by contiguous. Although the changes in this PR aligns the torch schema with the vllm's impl, but it may bring huge regression on e2e scenario( stride tensor -> contiguous tensor -> stride tensor).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes will bring no performance advantage compared with torch_npu._npu_rotary_embedding, If we are looking for adopt this in real workload, I do not suggest this changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you.

Restore all the changes and keep it as it is

@ganyi1996ppo
Copy link
Collaborator

Background:

There are two principles about operator registration in PyTorch

  • The same namespace can be only registered once by TORCH_LIBRARY
  • The operator signatures can be only registered once by def

How to fix:

  • for the first problem, we can use TORCH_LIBRARY_FRAGMEN to expand operators within the same NAMESPACE.
  • for the second problem, the best way to fix it is to define all the general operator schemas in vLLM insteal of in every plugin repo.

Related ISSUE: #2742

I remember we won't compile the cpu version of vllm right @Yikun @wangxiyuan ? If we don't compile the cpu version, this shouldn't happen. And if we have to compile a cpu version of vllm, I suggest we just adopt another naming for this kernel, or just add a overload version of rope.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Yikun
Copy link
Collaborator

Yikun commented Sep 11, 2025

I remember we won't compile the cpu version of vllm right @Yikun @wangxiyuan ? If we don't compile the cpu version, this shouldn't happen.

We are recommand to install the vllm cpu version in docs: https://vllm-ascend.readthedocs.io/en/latest/installation.html#setup-vllm-and-vllm-ascend.

@FFFrog
Copy link
Contributor Author

FFFrog commented Sep 11, 2025

I remember we won't compile the cpu version of vllm right @Yikun @wangxiyuan ? If we don't compile the cpu version, this shouldn't happen. And if we have to compile a cpu version of vllm, I suggest we just adopt another naming for this kernel, or just add a overload version of rope.

Thank you for your helpful advices and have followed it to update this pr, please take a look at it, thank you.

@wangxiyuan
Copy link
Collaborator

from vllm.utils import weak_ref_tensors
weak_ref_tensors is called from vLLM which use torch.ops._C. we should use our ops instead.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

codecov bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 41.93548% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.03%. Comparing base (1bbb20e) to head (124d910).
⚠️ Report is 33 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/utils.py 23.07% 10 Missing ⚠️
vllm_ascend/ops/__init__.py 0.00% 8 Missing ⚠️

❌ Your patch status has failed because the patch coverage (41.93%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2786      +/-   ##
==========================================
+ Coverage   74.76%   75.03%   +0.26%     
==========================================
  Files         150      154       +4     
  Lines       20891    21290     +399     
==========================================
+ Hits        15620    15974     +354     
- Misses       5271     5316      +45     
Flag Coverage Δ
unittests 75.03% <41.93%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 12, 2025
**Background:**

There are two principles about operator registration in PyTorch
- The same namespace can be only registered once by `TORCH_LIBRARY`
- The operator signatures can be only registered once by `def`

Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator pattern by vLLM, all accelerators then follow this operator and complete the implementation based on their respective hardware, which is conducive to module functional abstraction.

Therefore, we can rename the operator registration namespace to an Ascend-specific namespace.

Signed-off-by: FFFrog <[email protected]>
@FFFrog FFFrog changed the title [WIP] Fix the bugs about operator registration by PyTorch Dispatcher Fix the bugs about operator registration by PyTorch Dispatcher Sep 12, 2025
@wangxiyuan wangxiyuan added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Sep 12, 2025
@ganyi1996ppo
Copy link
Collaborator

LGTM

@Yikun Yikun merged commit e57cca9 into vllm-project:main Sep 13, 2025
30 of 31 checks passed
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
…project#2786)

**Background:**

There are two principles about operator registration in PyTorch
- The same namespace can be only registered once by `TORCH_LIBRARY`
- The operator signatures can be only registered once by `def`

Considering that all custom operators defined in the current repo are
only used by Ascend, instead of defining a common operator schema by
vLLM, all accelerators then follow this operator schema and complete the
implementation based on their respective hardware, which is conducive to
functional abstraction.

Therefore, we can rename the operator registration namespace to an
Ascend-specific namespace(**_C_ascend**).

Related ISSUE: vllm-project#2742

- vLLM version: main
- vLLM main:
vllm-project/vllm@f592b31

Signed-off-by: FFFrog <[email protected]>
Signed-off-by: offline0806 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants