Skip to content

Conversation

grahamking
Copy link
Contributor

@grahamking grahamking commented Aug 26, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced prompt templating with support for raw prompts or rendered templates.
    • Unified preprocessing across chat and completion requests for consistent behavior.
  • Refactor

    • Centralized stop/EOS handling and tokenization into a single flow, improving reliability and batch performance.
    • Streamlined generation paths to produce a common preprocessed request.
  • Chores

    • Reduced runtime logging in reasoning parsing to cut noise.
    • Clearer log messages when selecting or falling back to a reasoning parser.

Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Walkthrough

Introduces a builder-based preprocessing flow in OpenAIPreprocessor, adding builder/apply_template/gather_tokens methods and refactoring request handling to produce PreprocessedRequest via PreprocessedRequestBuilder. Centralizes EOS/stop handling and backend_instance_id extraction. Removes debug logging in BasicReasoningParser and adjusts logging string interpolation in reasoning::mod without changing behavior.

Changes

Cohort / File(s) Summary
LLM Preprocessing Refactor
lib/llm/src/preprocessor.rs
Adds PreprocessedRequestBuilder integration. Introduces builder, apply_template, gather_tokens. Refactors preprocess paths for NvCreateChatCompletionRequest and NvCreateCompletionRequest to use the builder flow. Centralizes EOS/stop handling, nvext backend_instance_id extraction, and tokenization based on formatted prompts.
Reasoning Parser Logging Cleanup
lib/parsers/src/reasoning/base_parser.rs
Removes tracing alias and all debug log calls in BasicReasoningParser methods; logic unchanged.
Reasoning Module Logging Format
lib/parsers/src/reasoning/mod.rs
Updates logging macros to use inline named placeholders ({name}) for selected/unknown parser messages; control flow unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant OAI as OpenAIPreprocessor
  participant B as PreprocessedRequestBuilder
  participant T as Tokenizer/Template

  Caller->>OAI: preprocess (chat/completion request)
  OAI->>OAI: builder(request)
  OAI->>T: apply_template(request)
  T-->>OAI: formatted_prompt?
  OAI->>B: gather_tokens(request, builder, formatted_prompt)
  Note over OAI,B: Centralized EOS/stop handling and nvext extraction
  OAI->>B: build()
  B-->>OAI: PreprocessedRequest (+annotations)
  OAI-->>Caller: PreprocessedRequest
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

Possibly related PRs

Poem

I nibbled through tokens, neat and small,
Built a burrow—now a Builder for all.
EOS seeds gathered, prompts aligned,
Logs grew quiet, the paths refined.
Thump-thump! I preprocess with cheer—
A rabbit’s flow, crisp and clear. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/llm/src/preprocessor.rs (1)

247-320: Bug: gather_tokens panics for text-single completions (formatted_prompt is None)

In NvCreateCompletionRequest.generate you call:

  • let annotations = self.gather_tokens(&request, &mut builder, None)?;

But in TextInput::Single you do:

  • let formatted_prompt = formatted_prompt.expect("...unreachable");

That makes completions with a single text prompt panic. Use the provided formatted_prompt when present (chat), else the raw text (completions).

Apply this focused refactor:

-                        TextInput::Single(_) => {
-                            let formatted_prompt = formatted_prompt.expect("Could not find a prompt. The paired match statements earlier should make this unreachable");
-                            let encoding = self.tokenizer.encode(&formatted_prompt)?;
-
-                            if request.has_annotation(ANNOTATION_FORMATTED_PROMPT) {
-                                annotations.insert(
-                                    ANNOTATION_FORMATTED_PROMPT.to_string(),
-                                    formatted_prompt,
-                                );
-                            }
+                        TextInput::Single(text) => {
+                            // Use chat-formatted prompt when provided; otherwise use the raw text
+                            let prompt_for_tokenization = formatted_prompt.unwrap_or(text);
+                            let encoding = self.tokenizer.encode(&prompt_for_tokenization)?;
+
+                            if request.has_annotation(ANNOTATION_FORMATTED_PROMPT) {
+                                annotations.insert(
+                                    ANNOTATION_FORMATTED_PROMPT.to_string(),
+                                    prompt_for_tokenization.clone(),
+                                );
+                            }
 
                             if request.has_annotation(ANNOTATION_TOKEN_IDS) {
                                 annotations.insert(
                                     ANNOTATION_TOKEN_IDS.to_string(),
                                     serde_json::to_string(encoding.token_ids())?,
                                 );
                             }
 
                             builder.token_ids(encoding.token_ids().to_vec());
                         }

Optional: For batch text inputs, consider using encode_batch to avoid per-item overhead and potential thread-safety concerns with parallel encode calls:

-                        TextInput::Batch(texts) => {
-                            let token_batches: Vec<Vec<u32>> = texts
-                                .par_iter()
-                                .map(|text| {
-                                    self.tokenizer
-                                        .encode(text)
-                                        .map(|encoded| encoded.token_ids().to_vec())
-                                })
-                                .collect::<Result<Vec<_>>>()?;
-                            builder.batch_token_ids(Some(token_batches));
-                            builder.token_ids(vec![]);
-                        }
+                        TextInput::Batch(texts) => {
+                            // Synchronous path; preprocess_request is not async, so blocking is OK here.
+                            let encodings = self
+                                .tokenizer
+                                .encode_batch(&texts.iter().map(|s| s.as_str()).collect::<Vec<_>>())?;
+                            let token_batches = encodings
+                                .into_iter()
+                                .map(|e| e.token_ids().to_vec())
+                                .collect::<Vec<_>>();
+                            builder.batch_token_ids(Some(token_batches));
+                            builder.token_ids(vec![]);
+                        }

Note: If encode_batch is not available or has different signature, ignore this optional diff.

🧹 Nitpick comments (4)
lib/parsers/src/reasoning/mod.rs (1)

127-128: Minor: consider structured fields for better observability

Using structured fields helps with log filtering and avoids string formatting costs. Optional change:

- tracing::warn!(
-     "Unknown reasoning parser type '{name}', falling back to Basic Reasoning Parser",
- );
+ tracing::warn!(
+     parser_name = %name,
+     "Unknown reasoning parser type, falling back to Basic Reasoning Parser",
+ );

Similarly, the debug above could be:

- tracing::debug!("Selected reasoning parser: {name}");
+ tracing::debug!(parser_name = %name, "Selected reasoning parser");
lib/llm/src/preprocessor.rs (3)

566-573: Annotations emission pathway looks good

Converting HashMap<String, String> into Annotated events before chaining works. Consider adding token_ids for batch text/tokens when requested, mirroring embeddings path. Optional.

Also applies to: 623-627


335-355: Consistency: embeddings path uses encode_batch via spawn_blocking; consider aligning

Embeddings use spawn_blocking + encode_batch. For large chat/completion batches, using encode_batch (synchronously) will likely be faster and simpler than rayon-parallel per-item encodes. See optional diff in the gather_tokens comment.


56-64: Naming nit: ANNOTATION_ constants are fine; unify with other modules if any variations exist*

If other modules already define or consume these keys, ensure we’re not drifting on naming (e.g., "formatted_prompt" vs "formattedPrompt"). If drift exists, add a small adapter or constants in a shared place.

Would you like me to scan the repo for other ANNOTATION_* keys and flag inconsistencies?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 36df35e and 7a970be.

⛔ Files ignored due to path filters (1)
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • lib/llm/src/preprocessor.rs (5 hunks)
  • lib/parsers/src/reasoning/base_parser.rs (0 hunks)
  • lib/parsers/src/reasoning/mod.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • lib/parsers/src/reasoning/base_parser.rs
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-24T20:59:35.725Z
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

Applied to files:

  • lib/llm/src/preprocessor.rs
📚 Learning: 2025-08-25T22:04:45.179Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.179Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

  • lib/llm/src/preprocessor.rs
🧬 Code graph analysis (1)
lib/llm/src/preprocessor.rs (3)
lib/llm/src/protocols/common/preprocessor.rs (2)
  • builder (71-73)
  • builder (108-110)
lib/llm/src/protocols/openai/completions.rs (5)
  • builder (223-225)
  • annotations (102-106)
  • nvext (86-88)
  • nvext (134-136)
  • nvext (195-197)
lib/llm/src/protocols/openai.rs (2)
  • nvext (43-43)
  • nvext (53-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (8)
lib/parsers/src/reasoning/mod.rs (1)

120-120: LGTM: switched to captured identifier formatting in tracing macro

Using "Selected reasoning parser: {name}" is fine and keeps the message concise.

lib/llm/src/preprocessor.rs (7)

28-28: Import looks correct

Importing PreprocessedRequestBuilder from common::preprocessor aligns with the new builder flow.


160-165: Good refactor: centralized preprocess path via builder/apply_template/gather_tokens

This makes the flow explicit and easier to test. Returning both PreprocessedRequest and annotations is clear.


181-201: EOS/stop handling is sensible but verify desired semantics with ignore_eos=true

  • You ensure model EOS IDs are included in hidden stop tokens.
  • You only set builder.eos_token_ids when ignore_eos is false.

Double-check downstream behavior expects no eos_token_ids at all when ignore_eos=true (as opposed to passing them and letting the engine ignore). If the engine relies on absence to skip an early stop, this is correct; otherwise you may need to pass them and let the engine branch.

Do you want me to scan downstream consumers for how they interpret eos_token_ids and ignore_eos?


213-245: Template gating matches PR goal (no chat template for completions)

apply_template only renders a template for single-text inputs and is invoked from the chat path. When nvext.use_raw_prompt is true but raw_prompt is missing, you warn and fall back to render, which is reasonable.


612-615: Completion path now correctly bypasses templates; depends on gather_tokens fix

builder(...); gather_tokens(..., None) achieves “no chat template for completions.” After applying the gather_tokens patch above, single-text completions will tokenize the raw prompt safely.


456-468: Nice touch: attach LLM metrics as an annotation without clobbering existing events

The conditional set of event/comment only when empty avoids overriding error events.


612-618: Potential ISL undercount for batched inputs

response_generator.update_isl(common_request.token_ids.len() as u32) will be zero for batched requests (since token_ids is empty and batch_token_ids is set). If ISL matters for batching, you may need to sum batch_token_ids lengths or leave as zero by design. Confirm expected behavior.

I can add a small helper to compute input token count across both single and batch forms if needed.

Because we are no longer applying the prompt template.

Signed-off-by: Graham King <[email protected]>
@grahamking grahamking force-pushed the gk-completions-no-template branch from f20d1e3 to 2eaf6fe Compare September 2, 2025 16:35
Copy link

copy-pr-bot bot commented Sep 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@alec-flowers
Copy link
Contributor

/ok to test 2eaf6fe

@alec-flowers alec-flowers merged commit 2422b83 into main Sep 2, 2025
14 of 15 checks passed
@alec-flowers alec-flowers deleted the gk-completions-no-template branch September 2, 2025 18:37
dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants