Skip to content

NPUW: Support batched prefill scoring via row-by-row unrolling#36375

Draft
dylanneve1 wants to merge 1 commit into
openvinotoolkit:masterfrom
dylanneve1:dneve/npuw-llm-batch-unroll
Draft

NPUW: Support batched prefill scoring via row-by-row unrolling#36375
dylanneve1 wants to merge 1 commit into
openvinotoolkit:masterfrom
dylanneve1:dneve/npuw-llm-batch-unroll

Conversation

@dylanneve1

@dylanneve1 dylanneve1 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Details:

The NPUW LLM pipeline compiles its prefill/generate models with a static batch size of 1, so a batched [N, seq] input (e.g. GenAI's TextRerankPipeline scoring N documents in one infer) fails with a shape mismatch in pad_position_ids().

Accept batch > 1 at the LLMInferRequest boundary and unroll it: each row is sliced (zero-copy ROI view) and scored in its own prefill pass over the unchanged batch-1 compiled models, with the KV-cache state reset between rows; per-row logits are aggregated into one [N, ...] output tensor. Results are identical to per-row inference since batch rows are independent. Batched generation stays unsupported and is rejected with a clear error.

Tickets:

  • ticket-id

AI Assistance:

  • AI assistance used: no / yes
  • If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

The NPUW LLM pipeline compiles its prefill/generate models with a
static batch size of 1, so a batched [N, seq] input (e.g. GenAI's
TextRerankPipeline scoring N documents in one infer) fails with a
shape mismatch in pad_position_ids().

Accept batch > 1 at the LLMInferRequest boundary and unroll it: each
row is sliced (zero-copy ROI view) and scored in its own prefill pass
over the unchanged batch-1 compiled models, with the KV-cache state
reset between rows; per-row logits are aggregated into one [N, ...]
output tensor. Results are identical to per-row inference since batch
rows are independent. Batched generation stays unsupported and is
rejected with a clear error.
@github-actions github-actions Bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant