Skip to content

[GPU] Fix paged_attention mixed-stage internal buffer layout mismatch (with regression test)#36380

Open
sungeunk wants to merge 2 commits into
openvinotoolkit:masterfrom
sungeunk:188245_pa_opt_func
Open

[GPU] Fix paged_attention mixed-stage internal buffer layout mismatch (with regression test)#36380
sungeunk wants to merge 2 commits into
openvinotoolkit:masterfrom
sungeunk:188245_pa_opt_func

Conversation

@sungeunk

Copy link
Copy Markdown
Contributor

Root-cause

  • Allocation-time micro/non-micro decision could diverge from execution-time decision, causing mixed-stage buffer count mismatch (expected 7, got 4).

Fix

  • Added a regression test for mixed-stage paged attention with token_type_ids to catch internal buffer layout mismatch.
  • Fixed paged_attention_opt internal buffer micro-SDPA gating to follow runtime decision consistently.

Tickets:

  • 188245

AI Assistance:

  • AI assistance used: yes
  • AI: fix
  • User: code review

@sungeunk sungeunk requested review from a team as code owners June 12, 2026 13:32
@sungeunk sungeunk added the category: GPU OpenVINO GPU plugin label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant