replace npu_incre_flash_attention with npu_fused_infer_attention_score #2792

panchao-hub · 2025-09-06T07:37:49Z

What this PR does / why we need it?

The npu_incre_flash_attention interface is no longer maintained and is replaced by npu_fused_infer_attention_score

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: main
vLLM main: vllm-project/vllm@e93f4cc

gemini-code-assist

Code Review

This pull request replaces the deprecated npu_incre_flash_attention with the newer npu_fused_infer_attention_score function. This is a good maintenance update. The new function call appears to be mostly correct, but I've identified a potential issue with how the atten_mask parameter is being passed, which could lead to incorrect behavior. My review includes a specific suggestion to address this.

gemini-code-assist · 2025-09-06T07:39:24Z

vllm_ascend/torchair/torchair_attention.py

+                num_key_value_heads=self.num_kv_heads,
                input_layout='BSH',
-                block_size=block_size)
+                atten_mask=attn_metadata.attn_mask,


For consistency and correctness, atten_mask should be sourced from decode_meta like other parameters in this function call (e.g., block_table and seq_lens). In the DecodeOnly attention state, an attention mask is typically not required, and the previous API (npu_incre_flash_attention) did not accept one. Using decode_meta.attn_mask ensures that None is passed, preventing an unintended mask from being applied, which could happen if attn_metadata.attn_mask is not None.

Suggested change

atten_mask=attn_metadata.attn_mask,

atten_mask=decode_meta.attn_mask,

github-actions · 2025-09-06T07:42:08Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Angazenn · 2025-09-10T02:02:53Z

please note that FIA might not support 300I Duo platform, so we need a branch here.

Signed-off-by: p00465316 <[email protected]>

codecov · 2025-09-11T07:21:21Z

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.39%. Comparing base (4c90fa7) to head (1b47975).
⚠️ Report is 81 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/torchair/torchair_attention.py	0.00%	5 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2792      +/-   ##
==========================================
+ Coverage   72.99%   75.39%   +2.39%     
==========================================
  Files         153      155       +2     
  Lines       21331    21123     -208     
==========================================
+ Hits        15571    15925     +354     
+ Misses       5760     5198     -562

Flag	Coverage Δ
unittests	`75.39% <0.00%> (+2.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-09-18T06:06:58Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

gemini-code-assist bot reviewed Sep 6, 2025

View reviewed changes

panchao-hub force-pushed the fia branch from 8b8bd8d to 90474db Compare September 6, 2025 09:32

panchao-hub force-pushed the fia branch from 90474db to 1b47975 Compare September 11, 2025 06:37

replace npu_incre_flash_attention with npu_fused_infer_attention_score

1b47975

Signed-off-by: p00465316 <[email protected]>

github-actions bot added the merge-conflicts label Sep 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

replace npu_incre_flash_attention with npu_fused_infer_attention_score #2792

replace npu_incre_flash_attention with npu_fused_infer_attention_score #2792

Uh oh!

panchao-hub commented Sep 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 6, 2025

Uh oh!

github-actions bot commented Sep 6, 2025

Uh oh!

Angazenn commented Sep 10, 2025

Uh oh!

codecov bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

	atten_mask=attn_metadata.attn_mask,
	atten_mask=decode_meta.attn_mask,

replace npu_incre_flash_attention with npu_fused_infer_attention_score #2792

Are you sure you want to change the base?

replace npu_incre_flash_attention with npu_fused_infer_attention_score #2792

Uh oh!

Conversation

panchao-hub commented Sep 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 6, 2025

Uh oh!

Angazenn commented Sep 10, 2025

Uh oh!

codecov bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

panchao-hub commented Sep 6, 2025 •

edited by github-actions bot

Loading

codecov bot commented Sep 11, 2025 •

edited

Loading