-
Notifications
You must be signed in to change notification settings - Fork 580
feat: align OpenAI response IDs with distributed trace IDs #2496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi qimcis! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
Thank you for reviewing, and please let me know if I misunderstood the feature request at all! CC: @rmccorm4 |
WalkthroughPer-request identification is refactored to propagate context-derived IDs through HTTP handlers, preprocessors, engines, and OpenAI delta generators. Delta generator APIs now accept a request_id string, and logging uses a MistralRS-specific request ID. The embeddings HTTP handler signature now includes headers for request ID derivation. One Python file has a cosmetic change. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP Service
participant Preprocessor
participant Engine
participant Runtime
participant DeltaGen
Client->>HTTP Service: Request (headers)
HTTP Service->>HTTP Service: Derive request_id (tracing/headers or context)
HTTP Service->>Preprocessor: Request + Context(request_id)
Preprocessor->>DeltaGen: response_generator(request_id)
Preprocessor->>Engine: NormalRequest(id = mistralrs_request_id)
Engine->>Runtime: Execute (id = mistralrs_request_id)
Runtime-->>Engine: Tokens/Finish
Engine-->>DeltaGen: Stream signals
DeltaGen-->>HTTP Service: OpenAI-formatted chunks (id = chatcmpl-/cmpl-<request_id>)
HTTP Service-->>Client: Streamed response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
WalkthroughIntroduces per-request ID propagation across OpenAI chat/completions paths: response_generator now accepts a request/context-derived ID, constructors updated to use deterministic IDs (chatcmpl-<request_id>/cmpl-<request_id>), call sites adjusted in engines, preprocessor, HTTP service, and tests. One unrelated formatting change in a Python utility. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP as HTTP Service (OpenAI)
participant Pre as Preprocessor
participant Eng as Engine
participant DG as DeltaGenerator
Client->>HTTP: Request (chat/completions)
HTTP->>HTTP: Derive request_id = request.id()
HTTP->>Pre: Forward request + request_id/context
Pre->>Eng: Build request, pass request_id
Eng->>DG: response_generator(request_id)
DG-->>Eng: Stream deltas (id: chatcmpl-<id>/cmpl-<id>)
Eng-->>HTTP: Tokens / finish signals
HTTP-->>Client: Streamed response (with deterministic id)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (6)
lib/engines/mistralrs/src/lib.rs (2)
391-393
: Typo in log message: “Unknow” → “Unknown”Minor nit, but worth fixing in logs.
Apply this diff:
- tracing::warn!(request_id, stop_reason = s, "Unknow stop reason"); + tracing::warn!(request_id, stop_reason = s, "Unknown stop reason");Also applies to: 587-589
592-595
: Propagate finish_reason into the streamed completion choiceYou compute finish_reason but don't pass it to create_choice, so clients always see None in the SSE chunks. Pass it through.
Apply this diff:
- let inner = response_generator.create_choice(0, Some(from_assistant), None, None); + let inner = response_generator.create_choice(0, Some(from_assistant), finish_reason, None);lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
21-21
: Remove unused importThe logging import isn’t used in this module.
-use dynamo_runtime::logging;
lib/llm/src/protocols/openai/completions/delta.rs (3)
18-18
: Remove unusedlogging
import.
use dynamo_runtime::logging;
is not used in this file and will trigger an unused import warning (or error under deny-warnings). Please remove it.-use dynamo_runtime::logging;
23-30
: API change looks good; consider acceptingInto<String>
for ergonomics.The new
response_generator(&self, request_id: String)
matches the PR goal. Consider acceptingimpl Into<String>
to avoid forcing callers to allocate when they already have a stringy type; this also aligns better with flexible call sites.- pub fn response_generator(&self, request_id: String) -> DeltaGenerator { + pub fn response_generator(&self, request_id: impl Into<String>) -> DeltaGenerator { let options = DeltaGeneratorOptions { enable_usage: true, enable_logprobs: self.inner.logprobs.unwrap_or(0) > 0, }; - - DeltaGenerator::new(self.inner.model.clone(), options, request_id) + let request_id = request_id.into(); + DeltaGenerator::new(self.inner.model.clone(), options, &request_id) }Note: This pairs with the
new(..., request_id: &str)
suggestion below.
51-52
: Prefer&str
to avoid an extra allocation innew
.
new(model, options, request_id: &str)
avoids an unnecessaryString
move and makes it usable with bothString
and&str
callers.- pub fn new(model: String, options: DeltaGeneratorOptions, request_id: String) -> Self { + pub fn new(model: String, options: DeltaGeneratorOptions, request_id: &str) -> Self {
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py
(1 hunks)lib/engines/mistralrs/src/lib.rs
(3 hunks)lib/llm/src/engines.rs
(2 hunks)lib/llm/src/http/service/openai.rs
(2 hunks)lib/llm/src/preprocessor.rs
(2 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs
(3 hunks)lib/llm/src/protocols/openai/completions/delta.rs
(3 hunks)lib/llm/tests/http-service.rs
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
lib/llm/tests/http-service.rs (2)
lib/bindings/python/examples/openai_service/server.py (1)
generator
(36-55)lib/llm/src/block_manager/storage/cuda.rs (1)
ctx
(412-414)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
lib/llm/src/protocols/openai/completions/delta.rs (2)
response_generator
(23-30)new
(51-82)
lib/llm/src/preprocessor.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
response_generator
(32-40)lib/llm/src/protocols/openai/completions/delta.rs (1)
response_generator
(23-30)
lib/llm/src/protocols/openai/completions/delta.rs (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
response_generator
(32-40)new
(85-116)
lib/engines/mistralrs/src/lib.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
response_generator
(32-40)lib/llm/src/protocols/openai/completions/delta.rs (1)
response_generator
(23-30)
lib/llm/src/engines.rs (1)
lib/llm/src/block_manager/storage/cuda.rs (1)
ctx
(412-414)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2496/merge) by qimcis.
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py
[error] 68-68: Black formatting failed during pre-commit (pre-commit run --show-diff-on-failure --color=always --all-files). 1 file reformatted: components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py. Re-run pre-commit to apply changes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (6)
lib/engines/mistralrs/src/lib.rs (1)
543-546
: ConfirmNormalRequest::id
is aString
It looks like
NormalRequest
is imported from the externalmistralrs
crate, and you’re assigningid: ctx.id().to_string()in lib/engines/mistralrs/src/lib.rs (lines 543-546). Previously,
next_request_id()
was used here, so please:
- Verify that
mistralrs::NormalRequest::id
is indeed aString
.- If it expects a different type (e.g., a numeric ID), either revert to
next_request_id()
or adjust the conversion accordingly to avoid compile-time or runtime mismatches.lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
103-107
: LGTM: Deterministic chatcmpl-<request_id>Using a stable chatcmpl-<request_id> aligns with the PR goal and improves traceability across systems.
lib/llm/src/http/service/openai.rs (1)
218-221
: Non-functional note: request_id capture for Completions is consistentCapturing request_id from Context and threading it into annotations aligns with the PR objective.
lib/llm/src/protocols/openai/completions/delta.rs (1)
71-75
: ID format change tocmpl-<request_id>
is aligned with the PR objective.Using a deterministic ID derived from the context/request matches the distributed tracing requirement.
lib/llm/src/preprocessor.rs (2)
496-499
: Propagating context ID to the generator: LGTM.Passing
context.id().to_string()
intoresponse_generator
correctly aligns completion IDs with the distributed trace.
554-556
: Same here: LGTM.Consistently propagates the trace/request ID for standard completions as well.
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py
Outdated
Show resolved
Hide resolved
35ead1d
to
25975f7
Compare
Thanks @qimcis ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for picking this up!
We upgraded Rust to edition 2024 on Friday which caused a lot of churn. I moved your commits over to #2695 and rebased. You still own the commits. Closing this one. |
Humm, but I think if I merge the other PR I will get the credit, not you. Do you want to copy from branch |
9008fe8
to
aeffc25
Compare
Should be good to go now!! Let me know if there's anything else to fix |
Excellent. Looks like a few more |
Now |
Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: Jason Zhou <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
Overview:
Aligns OpenAI response IDs with distributed trace IDs
Details:
Replaces random UUID generation with consistent trace IDs from request context so that OpenAI API responses (chatcmpl-, cmpl-) match distributed tracing identifiers.
Where should the reviewer start?
lib/llm/src/protocols/openai/chat_completions/delta.rs:
New request_id parameter in response_generator()lib/llm/src/http/service/openai.rs:
Removed UUID generation, using request.id()lib/llm/src/engines.rs:
Updated response generator calls with context IDsRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
chatcmpl-<ID>
with other notions of request ID throughout #2248Summary by CodeRabbit
New Features
Refactor
Tests
Style