Fix prompt tokens causing empty transcription output by sborisov88 · Pull Request #428 · argmaxinc/WhisperKit

sborisov88 · 2026-02-22T00:10:36Z

Summary

When promptTokens are provided in DecodingOptions (e.g., via the prompt parameter in the OpenAI-compatible server API), transcription returns empty text.

Root cause: When promptTokens are set, the prefill KV cache is disabled (known TODO on line 354). This causes the decoding loop to start at tokenIndex=0 with startOfPreviousToken as input. The model then either:

Predicts EOT → sampleResult.completed = true → loop breaks immediately
Produces a low-confidence prediction → firstTokenLogProbThreshold (-1.5 default) triggers → loop breaks

Both paths result in empty transcription output.

Reproduction

# Start WhisperKit server with any model
whisperkit-cli serve --model-path <path> --port 8000

# Without prompt — works fine
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@test.wav" -F "model=<model>" -F "language=ru"
# → {"text": "Привет, это тестовая запись..."}

# With prompt — returns empty text
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@test.wav" -F "model=<model>" -F "language=ru" \
  -F "prompt=Kubernetes, Docker, RabbitMQ"
# → {"text": ""}

Fix

Two changes in TextDecoder.swift:

isFirstToken now correctly points to the first actually decoded token after the prompt (max(prefilledIndex, initialPromptIndex)) instead of tokenIndex == prefilledIndex which fires at the first prompt token during prefill.
sampleResult.completed (EOT check) is skipped during the prefill phase. Since the model is being force-fed prompt tokens, its predictions during prefill are not meaningful for early stopping decisions.

Test plan

Transcription without prompt still works correctly
Transcription with prompt now returns correct text (previously returned empty)
Tested with whisper-large-v3-turbo model via local server API

When promptTokens are provided in DecodingOptions, the prefill cache is disabled (known limitation). This causes the decoding loop to start at tokenIndex=0, where startOfPreviousToken is fed to the model. The model then predicts EOT or produces a low-confidence prediction, triggering early termination checks (sampleResult.completed or firstTokenLogProbThreshold) and breaking the loop immediately — resulting in empty transcription text. Two fixes: 1. isFirstToken now points to the first actually decoded token after the prompt (max(prefilledIndex, initialPromptIndex)) instead of tokenIndex 0 during prompt prefill. 2. sampleResult.completed (EOT) is ignored during the prefill phase, since the model is being force-fed prompt tokens and its predictions are not meaningful for early stopping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes an edge case in TextDecoder.decodeText where providing promptTokens can cause decoding to terminate during prompt prefill, producing an empty transcription.

Changes:

Adjusts how “first token” detection is computed in the main decoding loop when prompts are present.
Skips early termination on EOT while processing prompt tokens (prefill) to avoid stopping based on meaningless predictions during forced-token prefill.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-22T00:28:28Z

Sources/WhisperKit/Core/TextDecoder.swift

            let isLastPrefillToken = tokenIndex == initialPromptIndex - 1
-            let isFirstToken = tokenIndex == prefilledIndex
+            let isInPrefillPhase = isPrefill || isLastPrefillToken // tokenIndex < initialPromptIndex
+            let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex) // First actually decoded token (after prompt)


isFirstToken looks off by one relative to how nextTokenLogProb is computed. In this loop, the first decoded token after the prompt is the token predicted when tokenIndex == initialPromptIndex - 1 (see the debug log using tokenIndex + 1). With the current tokenIndex == max(prefilledIndex, initialPromptIndex), the firstTokenLogProbThreshold check will fire one iteration late (and won’t fire at all when initialPromptIndex == 1, breaking the existing firstTokenLogProbThreshold fallback behavior). Consider basing isFirstToken on max(prefilledIndex, initialPromptIndex - 1) (or equivalently tokenIndex + 1 == max(prefilledIndex + 1, initialPromptIndex)) so it aligns with the first sampled token after the prompt.

Suggested change

let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex) // First actually decoded token (after prompt)

let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex - 1) // First actually decoded token (after prompt)

Copilot · 2026-02-22T00:28:29Z

Sources/WhisperKit/Core/TextDecoder.swift

            let isSegmentCompleted =
-                sampleResult.completed ||
+                (!isInPrefillPhase && sampleResult.completed) ||
                currentTokens.count >= Constants.maxTokenContext - 1 ||
                isFirstTokenLogProbTooLow


Skipping sampleResult.completed when isInPrefillPhase also skips EOT termination on the last prompt token iteration (tokenIndex == initialPromptIndex - 1), which is exactly when the first real decoded token is sampled and appended. That can change normal decoding behavior (including the no-prompt case where initialPromptIndex == 1) by continuing past an EOT and potentially producing extra/garbage tokens. It should be enough to skip the EOT check only while the model’s predictions are being ignored (i.e., when isPrefill is true / tokenIndex < initialPromptIndex - 1), and allow EOT termination again for the boundary iteration that produces the first decoded token.

atiorh · 2026-02-23T04:17:31Z

@sborisov88 Could you please add a non-trivial test case that gets fixed by this PR? (e.g. a short audio with a keyterm that the model gets wrong even with prompting but gets right after this fix)

ZachNagengast requested a review from Copilot February 22, 2026 00:22

Copilot started reviewing on behalf of ZachNagengast February 22, 2026 00:23 View session

Copilot AI reviewed Feb 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prompt tokens causing empty transcription output#428

Fix prompt tokens causing empty transcription output#428
sborisov88 wants to merge 1 commit intoargmaxinc:mainfrom
sborisov88:fix/prompt-tokens-empty-transcription

sborisov88 commented Feb 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

atiorh commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex) // First actually decoded token (after prompt)
	let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex - 1) // First actually decoded token (after prompt)

Conversation

sborisov88 commented Feb 22, 2026

Summary

Reproduction

Fix

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

atiorh commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants