NPUW: Support batched prefill scoring via row-by-row unrolling by dylanneve1 · Pull Request #36375 · openvinotoolkit/openvino

dylanneve1 · 2026-06-12T09:14:18Z

Details:

The NPUW LLM pipeline compiles its prefill/generate models with a static batch size of 1, so a batched [N, seq] input (e.g. GenAI's TextRerankPipeline scoring N documents in one infer) fails with a shape mismatch in pad_position_ids().

Accept batch > 1 at the LLMInferRequest boundary and unroll it: each row is sliced (zero-copy ROI view) and scored in its own prefill pass over the unchanged batch-1 compiled models, with the KV-cache state reset between rows; per-row logits are aggregated into one [N, ...] output tensor. Results are identical to per-row inference since batch rows are independent. Batched generation stays unsupported and is rejected with a clear error.

Tickets:

ticket-id

AI Assistance:

AI assistance used: no / yes
If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

The NPUW LLM pipeline compiles its prefill/generate models with a static batch size of 1, so a batched [N, seq] input (e.g. GenAI's TextRerankPipeline scoring N documents in one infer) fails with a shape mismatch in pad_position_ids(). Accept batch > 1 at the LLMInferRequest boundary and unroll it: each row is sliced (zero-copy ROI view) and scored in its own prefill pass over the unchanged batch-1 compiled models, with the KV-cache state reset between rows; per-row logits are aggregated into one [N, ...] output tensor. Results are identical to per-row inference since batch rows are independent. Batched generation stays unsupported and is rejected with a clear error.

github-actions Bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPUW: Support batched prefill scoring via row-by-row unrolling#36375

NPUW: Support batched prefill scoring via row-by-row unrolling#36375
dylanneve1 wants to merge 1 commit into
openvinotoolkit:masterfrom
dylanneve1:dneve/npuw-llm-batch-unroll

dylanneve1 commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dylanneve1 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

AI Assistance:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dylanneve1 commented Jun 12, 2026 •

edited

Loading