fix qwen torchair attention PrefillCacheHit #2787

zhaozx-cn · 2025-09-05T11:55:55Z

What this PR does / why we need it?

Fix qwen torchair attention PrefillCacheHit

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@e599e2c

vLLM version: main
vLLM main: vllm-project/vllm@0b9a612

Signed-off-by: zhaozixin <[email protected]>

gemini-code-assist

Code Review

This pull request fixes an issue with PrefillCacheHit in the torchair attention mechanism. The changes involve correctly updating the instance's key/value cache and slicing the block_tables to remove padding before passing it to the attention kernel. While the changes appear to fix the immediate issue, I have a suggestion to make the cache update logic more robust.

gemini-code-assist · 2025-09-05T11:57:23Z

vllm_ascend/torchair/torchair_attention.py

+            if attn_metadata.attn_state == AscendAttentionState.PrefillCacheHit:
+                self.key_cache = key_cache
+                self.value_cache = value_cache


The self.key_cache and self.value_cache are used by the PrefillCacheHit attention state. This change correctly updates them when the state is PrefillCacheHit. However, the cache update in lines 375-376 happens for any state that has a kv_cache (e.g., DecodeOnly as well). To make the logic more robust and prevent potential issues if other states start using self.key_cache in the future, it would be better to update self.key_cache and self.value_cache unconditionally whenever the cache is modified. This ensures that self.key_cache and self.value_cache always reflect the latest state of the cache tensors passed into this forward pass.

self.key_cache = key_cache self.value_cache = value_cache

github-actions · 2025-09-05T12:08:19Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

codecov · 2025-09-05T12:41:54Z

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.75%. Comparing base (1bbb20e) to head (99cfc07).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/torchair/torchair_attention.py	0.00%	5 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2787      +/-   ##
==========================================
- Coverage   74.76%   74.75%   -0.02%     
==========================================
  Files         150      150              
  Lines       20891    20896       +5     
==========================================
  Hits        15620    15620              
- Misses       5271     5276       +5

Flag	Coverage Δ
unittests	`74.75% <0.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

### What this PR does / why we need it? Fix qwen torchair attention PrefillCacheHit ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? vLLM version: v0.10.1.1 vLLM main: vllm-project/vllm@e599e2c - vLLM version: main - vLLM main: vllm-project/vllm@0b9a612 Signed-off-by: zhaozixin <[email protected]> Co-authored-by: zhaozixin <[email protected]> Signed-off-by: Yizhou Liu <[email protected]>

### What this PR does / why we need it? Fix qwen torchair attention PrefillCacheHit ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? vLLM version: v0.10.1.1 vLLM main: vllm-project/vllm@e599e2c - vLLM version: main - vLLM main: vllm-project/vllm@0b9a612 Signed-off-by: zhaozixin <[email protected]> Co-authored-by: zhaozixin <[email protected]> Signed-off-by: offline0806 <[email protected]>

fix qwen torchair attention PrefillCacheHit

cd9c2f8

Signed-off-by: zhaozixin <[email protected]>

zhaozx-cn force-pushed the main branch from 27afc1e to cd9c2f8 Compare September 5, 2025 11:57

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

zhaozx-cn added 4 commits September 8, 2025 10:50

Merge branch 'vllm-project:main' into main

b472d8f

Merge branch 'vllm-project:main' into main

99cfc07

Merge branch 'vllm-project:main' into main

5e7563f

Merge branch 'vllm-project:main' into main

047c7da

wangxiyuan approved these changes Sep 10, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 10, 2025

Merge branch 'vllm-project:main' into main

eed886a

wangxiyuan approved these changes Sep 11, 2025

View reviewed changes

wangxiyuan merged commit b9a0a75 into vllm-project:main Sep 11, 2025
19 of 20 checks passed

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix qwen torchair attention PrefillCacheHit #2787

fix qwen torchair attention PrefillCacheHit #2787

zhaozx-cn commented Sep 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

fix qwen torchair attention PrefillCacheHit #2787

fix qwen torchair attention PrefillCacheHit #2787

Conversation

zhaozx-cn commented Sep 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

zhaozx-cn commented Sep 5, 2025 •

edited by github-actions bot

Loading

codecov bot commented Sep 5, 2025 •

edited

Loading