Fix the bugs about operator registration by PyTorch Dispatcher #2786

FFFrog · 2025-09-05T09:39:08Z

Background:

There are two principles about operator registration in PyTorch

The same namespace can be only registered once by TORCH_LIBRARY
The operator signatures can be only registered once by def

Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator schema by vLLM, all accelerators then follow this operator schema and complete the implementation based on their respective hardware, which is conducive to functional abstraction.

Therefore, we can rename the operator registration namespace to an Ascend-specific namespace(_C_ascend).

Related ISSUE: #2742

vLLM version: main
vLLM main: vllm-project/vllm@f592b31

gemini-code-assist

Code Review

This pull request aims to fix bugs related to PyTorch Dispatcher operator registration. The changes primarily involve refactoring the rotary_embedding operator to be an in-place operation and adjusting its registration and usage accordingly. While the changes in the Python test and usage files seem correct, I've found a few critical issues in the C++ implementation and a significant issue in the Python ops files that could lead to runtime errors or incorrect behavior. Specifically, operator definitions are missing, there's a potential for a crash due to unchecked access to an optional value, and the Python wrappers for the custom op no longer preserve the original tensor shapes, which is likely to break downstream code.

csrc/torch_binding.cpp

csrc/torch_binding_meta.cpp

tests/e2e/singlecard/ops/test_rotary_embedding.py

vllm_ascend/ops/rotary_embedding.py

vllm_ascend/torchair/ops/torchair_rotary_embedding.py

FFFrog · 2025-09-05T09:47:56Z

@Yikun The draft is ready, please help to take a look at it.

github-actions · 2025-09-05T10:02:18Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

ganyi1996ppo · 2025-09-08T13:21:15Z

csrc/torch_binding.cpp

    cmd.Run();
-    return {query_dst, key_dst};
+
+    query.copy_(query_dst);


There are some inputs for this kernel. We Implement this rotary embedding originally is aiming for prevent additional memory reorder triggered by contiguous. Although the changes in this PR aligns the torch schema with the vllm's impl, but it may bring huge regression on e2e scenario( stride tensor -> contiguous tensor -> stride tensor).

Those changes will bring no performance advantage compared with torch_npu._npu_rotary_embedding, If we are looking for adopt this in real workload, I do not suggest this changes.

Got it, thank you.

Restore all the changes and keep it as it is

ganyi1996ppo · 2025-09-08T13:30:37Z

Background:

There are two principles about operator registration in PyTorch

The same namespace can be only registered once by TORCH_LIBRARY

The operator signatures can be only registered once by def

How to fix:

for the first problem, we can use TORCH_LIBRARY_FRAGMEN to expand operators within the same NAMESPACE.

for the second problem, the best way to fix it is to define all the general operator schemas in vLLM insteal of in every plugin repo.

Related ISSUE: #2742

vLLM version: v0.10.1.1

vLLM main: vllm-project/vllm@006e7a3

I remember we won't compile the cpu version of vllm right @Yikun @wangxiyuan ? If we don't compile the cpu version, this shouldn't happen. And if we have to compile a cpu version of vllm, I suggest we just adopt another naming for this kernel, or just add a overload version of rope.

github-actions · 2025-09-10T14:53:05Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Yikun · 2025-09-11T09:03:59Z

I remember we won't compile the cpu version of vllm right @Yikun @wangxiyuan ? If we don't compile the cpu version, this shouldn't happen.

We are recommand to install the vllm cpu version in docs: https://vllm-ascend.readthedocs.io/en/latest/installation.html#setup-vllm-and-vllm-ascend.

FFFrog · 2025-09-11T12:53:36Z

I remember we won't compile the cpu version of vllm right @Yikun @wangxiyuan ? If we don't compile the cpu version, this shouldn't happen. And if we have to compile a cpu version of vllm, I suggest we just adopt another naming for this kernel, or just add a overload version of rope.

Thank you for your helpful advices and have followed it to update this pr, please take a look at it, thank you.

wangxiyuan · 2025-09-11T14:25:03Z

vllm-ascend/vllm_ascend/compilation/acl_graph.py

Line 18 in eab3635

from vllm.utils import weak_ref_tensors

weak_ref_tensors is called from vLLM which use torch.ops._C. we should use our ops instead.

github-actions · 2025-09-12T03:05:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

codecov · 2025-09-12T03:32:44Z

Codecov Report

❌ Patch coverage is 41.93548% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.03%. Comparing base (1bbb20e) to head (124d910).
⚠️ Report is 33 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/utils.py	23.07%	10 Missing ⚠️
vllm_ascend/ops/__init__.py	0.00%	8 Missing ⚠️

❌ Your patch status has failed because the patch coverage (41.93%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2786      +/-   ##
==========================================
+ Coverage   74.76%   75.03%   +0.26%     
==========================================
  Files         150      154       +4     
  Lines       20891    21290     +399     
==========================================
+ Hits        15620    15974     +354     
- Misses       5271     5316      +45

Flag	Coverage Δ
unittests	`75.03% <41.93%> (+0.26%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

**Background:** There are two principles about operator registration in PyTorch - The same namespace can be only registered once by `TORCH_LIBRARY` - The operator signatures can be only registered once by `def` Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator pattern by vLLM, all accelerators then follow this operator and complete the implementation based on their respective hardware, which is conducive to module functional abstraction. Therefore, we can rename the operator registration namespace to an Ascend-specific namespace. Signed-off-by: FFFrog <[email protected]>

ganyi1996ppo · 2025-09-13T02:46:39Z

LGTM

…project#2786) **Background:** There are two principles about operator registration in PyTorch - The same namespace can be only registered once by `TORCH_LIBRARY` - The operator signatures can be only registered once by `def` Considering that all custom operators defined in the current repo are only used by Ascend, instead of defining a common operator schema by vLLM, all accelerators then follow this operator schema and complete the implementation based on their respective hardware, which is conducive to functional abstraction. Therefore, we can rename the operator registration namespace to an Ascend-specific namespace(**_C_ascend**). Related ISSUE: vllm-project#2742 - vLLM version: main - vLLM main: vllm-project/vllm@f592b31 Signed-off-by: FFFrog <[email protected]> Signed-off-by: offline0806 <[email protected]>

FFFrog marked this pull request as draft September 5, 2025 09:39

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

FFFrog force-pushed the binding branch from 3fa8bec to 9b0e91d Compare September 5, 2025 09:46

github-actions bot added module:tests module:ops labels Sep 5, 2025

FFFrog force-pushed the binding branch 3 times, most recently from 78344d6 to 7078cb8 Compare September 8, 2025 12:47

ganyi1996ppo reviewed Sep 8, 2025

View reviewed changes

github-actions bot added module:core merge-conflicts labels Sep 8, 2025

FFFrog force-pushed the binding branch from 7078cb8 to fe9a3b3 Compare September 11, 2025 12:50

github-actions bot removed the module:core label Sep 11, 2025

FFFrog force-pushed the binding branch from fe9a3b3 to 959ec9b Compare September 11, 2025 12:55

FFFrog marked this pull request as ready for review September 11, 2025 12:56

github-actions bot removed the merge-conflicts label Sep 11, 2025

FFFrog force-pushed the binding branch from 959ec9b to 00e3538 Compare September 11, 2025 13:28

wangxiyuan mentioned this pull request Sep 11, 2025

[Release]: Release checklist for v0.10.2rc1 #2859

Open

42 tasks

FFFrog force-pushed the binding branch from 00e3538 to c10c4bc Compare September 12, 2025 03:01

github-actions bot added the merge-conflicts label Sep 12, 2025

FFFrog force-pushed the binding branch from c10c4bc to ce8df08 Compare September 12, 2025 03:06

github-actions bot removed the merge-conflicts label Sep 12, 2025

github-actions bot added the module:core label Sep 12, 2025

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 12, 2025

FFFrog force-pushed the binding branch from ce8df08 to 124d910 Compare September 12, 2025 07:27

FFFrog changed the title ~~[WIP] Fix the bugs about operator registration by PyTorch Dispatcher~~ Fix the bugs about operator registration by PyTorch Dispatcher Sep 12, 2025

wangxiyuan added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Sep 12, 2025

ganyi1996ppo approved these changes Sep 13, 2025

View reviewed changes

Yikun approved these changes Sep 13, 2025

View reviewed changes

Yikun merged commit e57cca9 into vllm-project:main Sep 13, 2025
30 of 31 checks passed

This was referenced Sep 19, 2025

[Bug]: doctest failed due to rotary_embedding signatures mismatch #2742

Closed

[Bug]: Remove outofdate commits to improve perf test #3051

Open

Fix the bugs about operator registration by PyTorch Dispatcher #2786

Fix the bugs about operator registration by PyTorch Dispatcher #2786

Conversation

FFFrog commented Sep 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FFFrog commented Sep 5, 2025

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

ganyi1996ppo Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

FFFrog Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented Sep 8, 2025

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

Yikun commented Sep 11, 2025

Uh oh!

FFFrog commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangxiyuan commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

codecov bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ganyi1996ppo commented Sep 13, 2025

Uh oh!

Uh oh!

Uh oh!

FFFrog commented Sep 5, 2025 •

edited by github-actions bot

Loading

ganyi1996ppo Sep 8, 2025 •

edited

Loading

FFFrog commented Sep 11, 2025 •

edited

Loading

codecov bot commented Sep 12, 2025 •

edited

Loading