fix: Do not apply chat template to completions #2718

grahamking · 2025-08-26T19:46:21Z

Summary by CodeRabbit

New Features
- Enhanced prompt templating with support for raw prompts or rendered templates.
- Unified preprocessing across chat and completion requests for consistent behavior.
Refactor
- Centralized stop/EOS handling and tokenization into a single flow, improving reliability and batch performance.
- Streamlined generation paths to produce a common preprocessed request.
Chores
- Reduced runtime logging in reasoning parsing to cut noise.
- Clearer log messages when selecting or falling back to a reasoning parser.

coderabbitai · 2025-08-26T19:51:50Z

Walkthrough

Introduces a builder-based preprocessing flow in OpenAIPreprocessor, adding builder/apply_template/gather_tokens methods and refactoring request handling to produce PreprocessedRequest via PreprocessedRequestBuilder. Centralizes EOS/stop handling and backend_instance_id extraction. Removes debug logging in BasicReasoningParser and adjusts logging string interpolation in reasoning::mod without changing behavior.

Changes

Cohort / File(s)	Summary
LLM Preprocessing Refactor `lib/llm/src/preprocessor.rs`	Adds PreprocessedRequestBuilder integration. Introduces `builder`, `apply_template`, `gather_tokens`. Refactors preprocess paths for NvCreateChatCompletionRequest and NvCreateCompletionRequest to use the builder flow. Centralizes EOS/stop handling, nvext backend_instance_id extraction, and tokenization based on formatted prompts.
Reasoning Parser Logging Cleanup `lib/parsers/src/reasoning/base_parser.rs`	Removes tracing alias and all debug log calls in BasicReasoningParser methods; logic unchanged.
Reasoning Module Logging Format `lib/parsers/src/reasoning/mod.rs`	Updates logging macros to use inline named placeholders (`{name}`) for selected/unknown parser messages; control flow unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant OAI as OpenAIPreprocessor
  participant B as PreprocessedRequestBuilder
  participant T as Tokenizer/Template

  Caller->>OAI: preprocess (chat/completion request)
  OAI->>OAI: builder(request)
  OAI->>T: apply_template(request)
  T-->>OAI: formatted_prompt?
  OAI->>B: gather_tokens(request, builder, formatted_prompt)
  Note over OAI,B: Centralized EOS/stop handling and nvext extraction
  OAI->>B: build()
  B-->>OAI: PreprocessedRequest (+annotations)
  OAI-->>Caller: PreprocessedRequest

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

Possibly related PRs

feat: align OpenAI response IDs with distributed trace IDs #2496 — Also modifies OpenAI preprocessor generation paths for NvCreateChatCompletionRequest/NvCreateCompletionRequest; likely interacts with the new builder flow.
chore: Remove PreprocessedRequest alias BackendInput #1307 — Touches lib/llm/src/preprocessor.rs and the PreprocessedRequest type; relates to introducing PreprocessedRequestBuilder and unified preprocessing.

Poem

I nibbled through tokens, neat and small,
Built a burrow—now a Builder for all.
EOS seeds gathered, prompts aligned,
Logs grew quiet, the paths refined.
Thump-thump! I preprocess with cheer—
A rabbit’s flow, crisp and clear. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/llm/src/preprocessor.rs (1)

247-320: Bug: gather_tokens panics for text-single completions (formatted_prompt is None)

In NvCreateCompletionRequest.generate you call:

let annotations = self.gather_tokens(&request, &mut builder, None)?;

But in TextInput::Single you do:

let formatted_prompt = formatted_prompt.expect("...unreachable");

That makes completions with a single text prompt panic. Use the provided formatted_prompt when present (chat), else the raw text (completions).

Apply this focused refactor:

-                        TextInput::Single(_) => {
-                            let formatted_prompt = formatted_prompt.expect("Could not find a prompt. The paired match statements earlier should make this unreachable");
-                            let encoding = self.tokenizer.encode(&formatted_prompt)?;
-
-                            if request.has_annotation(ANNOTATION_FORMATTED_PROMPT) {
-                                annotations.insert(
-                                    ANNOTATION_FORMATTED_PROMPT.to_string(),
-                                    formatted_prompt,
-                                );
-                            }
+                        TextInput::Single(text) => {
+                            // Use chat-formatted prompt when provided; otherwise use the raw text
+                            let prompt_for_tokenization = formatted_prompt.unwrap_or(text);
+                            let encoding = self.tokenizer.encode(&prompt_for_tokenization)?;
+
+                            if request.has_annotation(ANNOTATION_FORMATTED_PROMPT) {
+                                annotations.insert(
+                                    ANNOTATION_FORMATTED_PROMPT.to_string(),
+                                    prompt_for_tokenization.clone(),
+                                );
+                            }
 
                             if request.has_annotation(ANNOTATION_TOKEN_IDS) {
                                 annotations.insert(
                                     ANNOTATION_TOKEN_IDS.to_string(),
                                     serde_json::to_string(encoding.token_ids())?,
                                 );
                             }
 
                             builder.token_ids(encoding.token_ids().to_vec());
                         }

Optional: For batch text inputs, consider using encode_batch to avoid per-item overhead and potential thread-safety concerns with parallel encode calls:

-                        TextInput::Batch(texts) => {
-                            let token_batches: Vec<Vec<u32>> = texts
-                                .par_iter()
-                                .map(|text| {
-                                    self.tokenizer
-                                        .encode(text)
-                                        .map(|encoded| encoded.token_ids().to_vec())
-                                })
-                                .collect::<Result<Vec<_>>>()?;
-                            builder.batch_token_ids(Some(token_batches));
-                            builder.token_ids(vec![]);
-                        }
+                        TextInput::Batch(texts) => {
+                            // Synchronous path; preprocess_request is not async, so blocking is OK here.
+                            let encodings = self
+                                .tokenizer
+                                .encode_batch(&texts.iter().map(|s| s.as_str()).collect::<Vec<_>>())?;
+                            let token_batches = encodings
+                                .into_iter()
+                                .map(|e| e.token_ids().to_vec())
+                                .collect::<Vec<_>>();
+                            builder.batch_token_ids(Some(token_batches));
+                            builder.token_ids(vec![]);
+                        }

Note: If encode_batch is not available or has different signature, ignore this optional diff.

🧹 Nitpick comments (4)

lib/parsers/src/reasoning/mod.rs (1)
127-128: Minor: consider structured fields for better observability

Using structured fields helps with log filtering and avoids string formatting costs. Optional change:
- tracing::warn!(
-     "Unknown reasoning parser type '{name}', falling back to Basic Reasoning Parser",
- );
+ tracing::warn!(
+     parser_name = %name,
+     "Unknown reasoning parser type, falling back to Basic Reasoning Parser",
+ );
Similarly, the debug above could be:
- tracing::debug!("Selected reasoning parser: {name}");
+ tracing::debug!(parser_name = %name, "Selected reasoning parser");
lib/llm/src/preprocessor.rs (3)

566-573: Annotations emission pathway looks good

Converting HashMap<String, String> into Annotated events before chaining works. Consider adding token_ids for batch text/tokens when requested, mirroring embeddings path. Optional.

Also applies to: 623-627

335-355: Consistency: embeddings path uses encode_batch via spawn_blocking; consider aligning

Embeddings use spawn_blocking + encode_batch. For large chat/completion batches, using encode_batch (synchronously) will likely be faster and simpler than rayon-parallel per-item encodes. See optional diff in the gather_tokens comment.

56-64: Naming nit: ANNOTATION_ constants are fine; unify with other modules if any variations exist*

If other modules already define or consume these keys, ensure we’re not drifting on naming (e.g., "formatted_prompt" vs "formattedPrompt"). If drift exists, add a small adapter or constants in a shared place.

Would you like me to scan the repo for other ANNOTATION_* keys and flag inconsistencies?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 36df35e and 7a970be.

⛔ Files ignored due to path filters (1)

lib/bindings/python/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

lib/llm/src/preprocessor.rs (5 hunks)
lib/parsers/src/reasoning/base_parser.rs (0 hunks)
lib/parsers/src/reasoning/mod.rs (1 hunks)

💤 Files with no reviewable changes (1)

lib/parsers/src/reasoning/base_parser.rs

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-06-24T20:59:35.725Z

Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

Applied to files:

lib/llm/src/preprocessor.rs

📚 Learning: 2025-08-25T22:04:45.179Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.179Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

lib/llm/src/preprocessor.rs

🧬 Code graph analysis (1)

lib/llm/src/preprocessor.rs (3)

lib/llm/src/protocols/common/preprocessor.rs (2)

builder (71-73)

builder (108-110)

lib/llm/src/protocols/openai/completions.rs (5)

builder (223-225)

annotations (102-106)

nvext (86-88)

nvext (134-136)

nvext (195-197)

lib/llm/src/protocols/openai.rs (2)

nvext (43-43)

nvext (53-53)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (8)

lib/parsers/src/reasoning/mod.rs (1)

120-120: LGTM: switched to captured identifier formatting in tracing macro

Using "Selected reasoning parser: {name}" is fine and keeps the message concise.

lib/llm/src/preprocessor.rs (7)

28-28: Import looks correct

Importing PreprocessedRequestBuilder from common::preprocessor aligns with the new builder flow.

160-165: Good refactor: centralized preprocess path via builder/apply_template/gather_tokens

This makes the flow explicit and easier to test. Returning both PreprocessedRequest and annotations is clear.

181-201: EOS/stop handling is sensible but verify desired semantics with ignore_eos=true

You ensure model EOS IDs are included in hidden stop tokens.

You only set builder.eos_token_ids when ignore_eos is false.

Double-check downstream behavior expects no eos_token_ids at all when ignore_eos=true (as opposed to passing them and letting the engine ignore). If the engine relies on absence to skip an early stop, this is correct; otherwise you may need to pass them and let the engine branch.

Do you want me to scan downstream consumers for how they interpret eos_token_ids and ignore_eos?

213-245: Template gating matches PR goal (no chat template for completions)

apply_template only renders a template for single-text inputs and is invoked from the chat path. When nvext.use_raw_prompt is true but raw_prompt is missing, you warn and fall back to render, which is reasonable.

612-615: Completion path now correctly bypasses templates; depends on gather_tokens fix

builder(...); gather_tokens(..., None) achieves “no chat template for completions.” After applying the gather_tokens patch above, single-text completions will tokenize the raw prompt safely.

456-468: Nice touch: attach LLM metrics as an annotation without clobbering existing events

The conditional set of event/comment only when empty avoids overriding error events.

612-618: Potential ISL undercount for batched inputs

response_generator.update_isl(common_request.token_ids.len() as u32) will be zero for batched requests (since token_ids is empty and batch_token_ids is set). If ISL matters for batching, you may need to sum batch_token_ids lengths or leave as zero by design. Confirm expected behavior.

I can add a small helper to compute input token count across both single and batch forms if needed.

lib/llm/src/preprocessor.rs

Signed-off-by: Graham King <[email protected]>

Because we are no longer applying the prompt template. Signed-off-by: Graham King <[email protected]>

copy-pr-bot · 2025-09-02T16:35:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

alec-flowers · 2025-09-02T16:57:45Z

/ok to test 2eaf6fe

Signed-off-by: Graham King <[email protected]>

Signed-off-by: Graham King <[email protected]> Signed-off-by: nnshah1 <[email protected]>

grahamking requested a review from a team as a code owner August 26, 2025 19:46

pull-request-size bot added the size/L label Aug 26, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 19:46 Inactive

github-actions bot added the fix label Aug 26, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 19:47 Inactive

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

lib/llm/src/preprocessor.rs Show resolved Hide resolved

grahamking requested a review from alec-flowers August 26, 2025 20:06

grahamking force-pushed the gk-completions-no-template branch from 7a970be to f20d1e3 Compare August 26, 2025 20:17

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 20:17 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 20:20 Inactive

grahamking added 3 commits September 2, 2025 12:08

fix: Do not apply chat template to completions

ba80953

Signed-off-by: Graham King <[email protected]>

Structured logging as requested by Code Rabbit

9027380

Signed-off-by: Graham King <[email protected]>

Include raw text for completions

2eaf6fe

Because we are no longer applying the prompt template. Signed-off-by: Graham King <[email protected]>

grahamking force-pushed the gk-completions-no-template branch from f20d1e3 to 2eaf6fe Compare September 2, 2025 16:35

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 16:57 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 16:58 Inactive

alec-flowers approved these changes Sep 2, 2025

View reviewed changes

alec-flowers merged commit 2422b83 into main Sep 2, 2025
14 of 15 checks passed

alec-flowers deleted the gk-completions-no-template branch September 2, 2025 18:37

dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025

fix: Do not apply chat template to completions (#2718)

d4151f6

Signed-off-by: Graham King <[email protected]>

coderabbitai bot mentioned this pull request Sep 5, 2025

fix: Load the tokenizer JSON once for chat and completions. #2910

Merged

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

fix: Do not apply chat template to completions (#2718)

9e8c8c5

Signed-off-by: Graham King <[email protected]> Signed-off-by: nnshah1 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Do not apply chat template to completions #2718

fix: Do not apply chat template to completions #2718

Uh oh!

grahamking commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 26, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

copy-pr-bot bot commented Sep 2, 2025

Uh oh!

alec-flowers commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

fix: Do not apply chat template to completions #2718

fix: Do not apply chat template to completions #2718

Uh oh!

Conversation

grahamking commented Aug 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 26, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

copy-pr-bot bot commented Sep 2, 2025

Uh oh!

alec-flowers commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

grahamking commented Aug 26, 2025 •

edited by coderabbitai bot

Loading