Skip to content

Conversation

zhaozx-cn
Copy link
Contributor

@zhaozx-cn zhaozx-cn commented Sep 5, 2025

What this PR does / why we need it?

Fix qwen torchair attention PrefillCacheHit

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@e599e2c

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with PrefillCacheHit in the torchair attention mechanism. The changes involve correctly updating the instance's key/value cache and slicing the block_tables to remove padding before passing it to the attention kernel. While the changes appear to fix the immediate issue, I have a suggestion to make the cache update logic more robust.

Comment on lines +377 to +379
if attn_metadata.attn_state == AscendAttentionState.PrefillCacheHit:
self.key_cache = key_cache
self.value_cache = value_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The self.key_cache and self.value_cache are used by the PrefillCacheHit attention state. This change correctly updates them when the state is PrefillCacheHit. However, the cache update in lines 375-376 happens for any state that has a kv_cache (e.g., DecodeOnly as well). To make the logic more robust and prevent potential issues if other states start using self.key_cache in the future, it would be better to update self.key_cache and self.value_cache unconditionally whenever the cache is modified. This ensures that self.key_cache and self.value_cache always reflect the latest state of the cache tensors passed into this forward pass.

            self.key_cache = key_cache
            self.value_cache = value_cache

Copy link

github-actions bot commented Sep 5, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.75%. Comparing base (1bbb20e) to head (99cfc07).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/torchair/torchair_attention.py 0.00% 5 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2787      +/-   ##
==========================================
- Coverage   74.76%   74.75%   -0.02%     
==========================================
  Files         150      150              
  Lines       20891    20896       +5     
==========================================
  Hits        15620    15620              
- Misses       5271     5276       +5     
Flag Coverage Δ
unittests 74.75% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 10, 2025
@wangxiyuan wangxiyuan merged commit b9a0a75 into vllm-project:main Sep 11, 2025
19 of 20 checks passed
yiz-liu pushed a commit to linfeng-yuan/vllm-ascend that referenced this pull request Sep 12, 2025
### What this PR does / why we need it?
Fix qwen torchair attention PrefillCacheHit
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
vLLM version: v0.10.1.1
vLLM main:
vllm-project/vllm@e599e2c

- vLLM version: main
- vLLM main:
vllm-project/vllm@0b9a612

Signed-off-by: zhaozixin <[email protected]>
Co-authored-by: zhaozixin <[email protected]>
Signed-off-by: Yizhou Liu <[email protected]>
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
### What this PR does / why we need it?
Fix qwen torchair attention PrefillCacheHit
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
vLLM version: v0.10.1.1
vLLM main:
vllm-project/vllm@e599e2c

- vLLM version: main
- vLLM main:
vllm-project/vllm@0b9a612

Signed-off-by: zhaozixin <[email protected]>
Co-authored-by: zhaozixin <[email protected]>
Signed-off-by: offline0806 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready read for review ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants