Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions Sources/WhisperKit/Core/TextDecoder.swift
Original file line number Diff line number Diff line change
Expand Up @@ -764,7 +764,8 @@ open class TextDecoder: TextDecoding, WhisperMLModel {

let isPrefill = tokenIndex < initialPromptIndex - 1 // Prefill stops at the last token of the initial prompt
let isLastPrefillToken = tokenIndex == initialPromptIndex - 1
let isFirstToken = tokenIndex == prefilledIndex
let isInPrefillPhase = isPrefill || isLastPrefillToken // tokenIndex < initialPromptIndex
let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex) // First actually decoded token (after prompt)
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFirstToken looks off by one relative to how nextTokenLogProb is computed. In this loop, the first decoded token after the prompt is the token predicted when tokenIndex == initialPromptIndex - 1 (see the debug log using tokenIndex + 1). With the current tokenIndex == max(prefilledIndex, initialPromptIndex), the firstTokenLogProbThreshold check will fire one iteration late (and won’t fire at all when initialPromptIndex == 1, breaking the existing firstTokenLogProbThreshold fallback behavior). Consider basing isFirstToken on max(prefilledIndex, initialPromptIndex - 1) (or equivalently tokenIndex + 1 == max(prefilledIndex + 1, initialPromptIndex)) so it aligns with the first sampled token after the prompt.

Suggested change
let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex) // First actually decoded token (after prompt)
let isFirstToken = tokenIndex == max(prefilledIndex, initialPromptIndex - 1) // First actually decoded token (after prompt)

Copilot uses AI. Check for mistakes.

// Check if current index is part of the initial prompt
if tokenIndex < initialPromptIndex {
Expand Down Expand Up @@ -854,8 +855,11 @@ open class TextDecoder: TextDecoding, WhisperMLModel {
} else {
false
}
// During prefill phase (processing prompt tokens), skip early termination checks:
// - The model is being force-fed prompt tokens, so EOT predictions and low log probs are expected
// - Early stopping should only apply to actually decoded tokens after the prompt
let isSegmentCompleted =
sampleResult.completed ||
(!isInPrefillPhase && sampleResult.completed) ||
currentTokens.count >= Constants.maxTokenContext - 1 ||
isFirstTokenLogProbTooLow
Comment on lines 861 to 864
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping sampleResult.completed when isInPrefillPhase also skips EOT termination on the last prompt token iteration (tokenIndex == initialPromptIndex - 1), which is exactly when the first real decoded token is sampled and appended. That can change normal decoding behavior (including the no-prompt case where initialPromptIndex == 1) by continuing past an EOT and potentially producing extra/garbage tokens. It should be enough to skip the EOT check only while the model’s predictions are being ignored (i.e., when isPrefill is true / tokenIndex < initialPromptIndex - 1), and allow EOT termination again for the boundary iteration that produces the first decoded token.

Copilot uses AI. Check for mistakes.

Expand Down