Skip to content

Conversation

qimcis
Copy link
Contributor

@qimcis qimcis commented Aug 18, 2025

Overview:

Aligns OpenAI response IDs with distributed trace IDs

Details:

Replaces random UUID generation with consistent trace IDs from request context so that OpenAI API responses (chatcmpl-, cmpl-) match distributed tracing identifiers.

Where should the reviewer start?

  • lib/llm/src/protocols/openai/chat_completions/delta.rs: New request_id parameter in response_generator()
  • lib/llm/src/http/service/openai.rs: Removed UUID generation, using request.id()
  • lib/llm/src/engines.rs: Updated response generator calls with context IDs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Responses now use consistent, context-derived IDs (chatcmpl-/cmpl-) for better traceability.
    • Embeddings requests honor incoming headers to derive request IDs, improving request correlation.
  • Refactor

    • Centralized per-request ID handling across engines and services.
    • Logging aligned to use a single request identifier for clearer observability.
  • Tests

    • Updated tests to pass context-based IDs to response generators.
  • Style

    • Minor whitespace formatting adjustment.

Copy link

copy-pr-bot bot commented Aug 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

👋 Hi qimcis! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Aug 18, 2025
@qimcis
Copy link
Contributor Author

qimcis commented Aug 18, 2025

Thank you for reviewing, and please let me know if I misunderstood the feature request at all! CC: @rmccorm4

Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Walkthrough

Per-request identification is refactored to propagate context-derived IDs through HTTP handlers, preprocessors, engines, and OpenAI delta generators. Delta generator APIs now accept a request_id string, and logging uses a MistralRS-specific request ID. The embeddings HTTP handler signature now includes headers for request ID derivation. One Python file has a cosmetic change.

Changes

Cohort / File(s) Summary
Request ID propagation in engines
lib/engines/mistralrs/src/lib.rs
Centralizes per-request identification: introduces mistralrs_request_id for logging and NormalRequest, derives OpenAI-style IDs from ctx.id(), updates warmup and response wiring to use context-based IDs.
Echo engine wiring
lib/llm/src/engines.rs
Initializes ctx earlier and passes ctx.id().to_string() into request.response_generator(...) for chat/completions paths.
HTTP service ID derivation
lib/llm/src/http/service/openai.rs
Completions now reuse request.id(); embeddings handler signature adds HeaderMap, derives request_id via headers/tracing, and sets Context with that ID.
Preprocessor context-aware generator
lib/llm/src/preprocessor.rs
response_generator is called with context.id().to_string() for both chat and completion requests; downstream flow unchanged.
OpenAI chat delta generator API
lib/llm/src/protocols/openai/chat_completions/delta.rs
Adds logging import; NvCreateChatCompletionRequest::response_generator(request_id: String); DeltaGenerator::new(..., request_id: String); IDs now chatcmpl-<request_id>.
OpenAI completion delta generator API
lib/llm/src/protocols/openai/completions/delta.rs
response_generator(request_id: String); DeltaGenerator::new(..., request_id: String); IDs now cmpl-<request_id>; adds logging import.
Tests updated for context ID
lib/llm/tests/http-service.rs
CounterEngine path passes ctx.id().to_string() into response_generator; streaming assertions unchanged.
Cosmetic formatting
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py
Trailing spaces added on a list initialization line; no semantic change.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant HTTP Service
  participant Preprocessor
  participant Engine
  participant Runtime
  participant DeltaGen

  Client->>HTTP Service: Request (headers)
  HTTP Service->>HTTP Service: Derive request_id (tracing/headers or context)
  HTTP Service->>Preprocessor: Request + Context(request_id)
  Preprocessor->>DeltaGen: response_generator(request_id)
  Preprocessor->>Engine: NormalRequest(id = mistralrs_request_id)
  Engine->>Runtime: Execute (id = mistralrs_request_id)
  Runtime-->>Engine: Tokens/Finish
  Engine-->>DeltaGen: Stream signals
  DeltaGen-->>HTTP Service: OpenAI-formatted chunks (id = chatcmpl-/cmpl-<request_id>)
  HTTP Service-->>Client: Streamed response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A thump of a paw, a tick of an ID,
I hop with a context, consistent and free.
From headers to engines the numbers align,
chatcmpl carrots in a tidy line.
Streams burble softly—trace me, you’ll see—
Request by request, I log where I be. 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Walkthrough

Introduces per-request ID propagation across OpenAI chat/completions paths: response_generator now accepts a request/context-derived ID, constructors updated to use deterministic IDs (chatcmpl-<request_id>/cmpl-<request_id>), call sites adjusted in engines, preprocessor, HTTP service, and tests. One unrelated formatting change in a Python utility.

Changes

Cohort / File(s) Summary
OpenAI DeltaGenerator API
lib/llm/src/protocols/openai/chat_completions/delta.rs, lib/llm/src/protocols/openai/completions/delta.rs
Response generator now requires a request_id String; DeltaGenerator::new gains request_id parameter; IDs switch from UUIDs to chatcmpl-<request_id>/cmpl-<request_id>; minor imports added.
Engines and call sites
lib/engines/mistralrs/src/lib.rs, lib/llm/src/engines.rs
Replace internal/UUID IDs with ctx.id().to_string(); pass Some(ctx.id().to_string()) or id String into response_generator across chat/completion paths; adjust last user message extraction in chat path.
Preprocessor
lib/llm/src/preprocessor.rs
OpenAIPreprocessor now calls response_generator with context-derived id for both chat and completion requests.
HTTP service (OpenAI)
lib/llm/src/http/service/openai.rs
Completions/embeddings use existing request.id() instead of generating new UUIDs; embeddings path no longer wraps request with a new Context for id.
Tests
lib/llm/tests/http-service.rs
Tests updated to pass optional/context id to response_generator.
Formatting only
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py
Trailing spaces added; no behavioral change.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant HTTP as HTTP Service (OpenAI)
  participant Pre as Preprocessor
  participant Eng as Engine
  participant DG as DeltaGenerator

  Client->>HTTP: Request (chat/completions)
  HTTP->>HTTP: Derive request_id = request.id()
  HTTP->>Pre: Forward request + request_id/context
  Pre->>Eng: Build request, pass request_id
  Eng->>DG: response_generator(request_id)
  DG-->>Eng: Stream deltas (id: chatcmpl-<id>/cmpl-<id>)
  Eng-->>HTTP: Tokens / finish signals
  HTTP-->>Client: Streamed response (with deterministic id)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

A hop, a hop, a thread-bound gleam,
I tag each token in the stream.
chatcmpl tails, cmpl trails—so neat!
IDs in lockstep, thump-thump beat.
With every nibble, I align—
A rabbit’s trace, deterministic, fine. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (6)
lib/engines/mistralrs/src/lib.rs (2)

391-393: Typo in log message: “Unknow” → “Unknown”

Minor nit, but worth fixing in logs.

Apply this diff:

-                                tracing::warn!(request_id, stop_reason = s, "Unknow stop reason");
+                                tracing::warn!(request_id, stop_reason = s, "Unknown stop reason");

Also applies to: 587-589


592-595: Propagate finish_reason into the streamed completion choice

You compute finish_reason but don't pass it to create_choice, so clients always see None in the SSE chunks. Pass it through.

Apply this diff:

-                        let inner = response_generator.create_choice(0, Some(from_assistant), None, None);
+                        let inner = response_generator.create_choice(0, Some(from_assistant), finish_reason, None);
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

21-21: Remove unused import

The logging import isn’t used in this module.

-use dynamo_runtime::logging;
lib/llm/src/protocols/openai/completions/delta.rs (3)

18-18: Remove unused logging import.

use dynamo_runtime::logging; is not used in this file and will trigger an unused import warning (or error under deny-warnings). Please remove it.

-use dynamo_runtime::logging;

23-30: API change looks good; consider accepting Into<String> for ergonomics.

The new response_generator(&self, request_id: String) matches the PR goal. Consider accepting impl Into<String> to avoid forcing callers to allocate when they already have a stringy type; this also aligns better with flexible call sites.

-    pub fn response_generator(&self, request_id: String) -> DeltaGenerator {
+    pub fn response_generator(&self, request_id: impl Into<String>) -> DeltaGenerator {
         let options = DeltaGeneratorOptions {
             enable_usage: true,
             enable_logprobs: self.inner.logprobs.unwrap_or(0) > 0,
         };
-
-        DeltaGenerator::new(self.inner.model.clone(), options, request_id)
+        let request_id = request_id.into();
+        DeltaGenerator::new(self.inner.model.clone(), options, &request_id)
     }

Note: This pairs with the new(..., request_id: &str) suggestion below.


51-52: Prefer &str to avoid an extra allocation in new.

new(model, options, request_id: &str) avoids an unnecessary String move and makes it usable with both String and &str callers.

-    pub fn new(model: String, options: DeltaGeneratorOptions, request_id: String) -> Self {
+    pub fn new(model: String, options: DeltaGeneratorOptions, request_id: &str) -> Self {
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 844f881 and 88eecce.

📒 Files selected for processing (8)
  • components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (1 hunks)
  • lib/engines/mistralrs/src/lib.rs (3 hunks)
  • lib/llm/src/engines.rs (2 hunks)
  • lib/llm/src/http/service/openai.rs (2 hunks)
  • lib/llm/src/preprocessor.rs (2 hunks)
  • lib/llm/src/protocols/openai/chat_completions/delta.rs (3 hunks)
  • lib/llm/src/protocols/openai/completions/delta.rs (3 hunks)
  • lib/llm/tests/http-service.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
lib/llm/tests/http-service.rs (2)
lib/bindings/python/examples/openai_service/server.py (1)
  • generator (36-55)
lib/llm/src/block_manager/storage/cuda.rs (1)
  • ctx (412-414)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
lib/llm/src/protocols/openai/completions/delta.rs (2)
  • response_generator (23-30)
  • new (51-82)
lib/llm/src/preprocessor.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • response_generator (32-40)
lib/llm/src/protocols/openai/completions/delta.rs (1)
  • response_generator (23-30)
lib/llm/src/protocols/openai/completions/delta.rs (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
  • response_generator (32-40)
  • new (85-116)
lib/engines/mistralrs/src/lib.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • response_generator (32-40)
lib/llm/src/protocols/openai/completions/delta.rs (1)
  • response_generator (23-30)
lib/llm/src/engines.rs (1)
lib/llm/src/block_manager/storage/cuda.rs (1)
  • ctx (412-414)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2496/merge) by qimcis.
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py

[error] 68-68: Black formatting failed during pre-commit (pre-commit run --show-diff-on-failure --color=always --all-files). 1 file reformatted: components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py. Re-run pre-commit to apply changes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (6)
lib/engines/mistralrs/src/lib.rs (1)

543-546: Confirm NormalRequest::id is a String

It looks like NormalRequest is imported from the external mistralrs crate, and you’re assigning

id: ctx.id().to_string()

in lib/engines/mistralrs/src/lib.rs (lines 543-546). Previously, next_request_id() was used here, so please:

  • Verify that mistralrs::NormalRequest::id is indeed a String.
  • If it expects a different type (e.g., a numeric ID), either revert to next_request_id() or adjust the conversion accordingly to avoid compile-time or runtime mismatches.
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

103-107: LGTM: Deterministic chatcmpl-<request_id>

Using a stable chatcmpl-<request_id> aligns with the PR goal and improves traceability across systems.

lib/llm/src/http/service/openai.rs (1)

218-221: Non-functional note: request_id capture for Completions is consistent

Capturing request_id from Context and threading it into annotations aligns with the PR objective.

lib/llm/src/protocols/openai/completions/delta.rs (1)

71-75: ID format change to cmpl-<request_id> is aligned with the PR objective.

Using a deterministic ID derived from the context/request matches the distributed tracing requirement.

lib/llm/src/preprocessor.rs (2)

496-499: Propagating context ID to the generator: LGTM.

Passing context.id().to_string() into response_generator correctly aligns completion IDs with the distributed trace.


554-556: Same here: LGTM.

Consistently propagates the trace/request ID for standard completions as well.

@qimcis qimcis force-pushed the request-id-alignment-2248 branch from 35ead1d to 25975f7 Compare August 19, 2025 05:04
@grahamking
Copy link
Contributor

Thanks @qimcis !

Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for picking this up!

@grahamking
Copy link
Contributor

We upgraded Rust to edition 2024 on Friday which caused a lot of churn. I moved your commits over to #2695 and rebased. You still own the commits. Closing this one.

@grahamking grahamking closed this Aug 25, 2025
@grahamking
Copy link
Contributor

Humm, but I think if I merge the other PR I will get the credit, not you. Do you want to copy from branch pr-2496 over to here, or use it as a guideline to rebase. Then is should be ready to merge?

@grahamking grahamking reopened this Aug 25, 2025
@grahamking grahamking requested a review from hhzhang16 as a code owner August 25, 2025 19:25
@qimcis qimcis force-pushed the request-id-alignment-2248 branch from 9008fe8 to aeffc25 Compare August 26, 2025 05:48
@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 26, 2025
@qimcis
Copy link
Contributor Author

qimcis commented Aug 26, 2025

Should be good to go now!! Let me know if there's anything else to fix

@grahamking
Copy link
Contributor

Should be good to go now!! Let me know if there's anything else to fix

Excellent. Looks like a few more clippy to go.

@grahamking
Copy link
Contributor

Now cargo fmt

@grahamking grahamking merged commit a485ab7 into ai-dynamo:main Aug 26, 2025
11 checks passed
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
ayushag-nv pushed a commit that referenced this pull request Aug 27, 2025
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contribution Pull request is from an external contributor feat size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants