-
Notifications
You must be signed in to change notification settings - Fork 627
feat: add chat_template_kwargs param to v1/chat/completion #3016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add chat_template_kwargs param to v1/chat/completion #3016
Conversation
Signed-off-by: Chi McIsaac <[email protected]>
👋 Hi qimcis! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughAdds an optional chat_template_kwargs field to NvCreateChatCompletionRequest across construction sites, tests, and protocol types, and wires it into template rendering by exposing it via OAIChatLikeRequest and merging it into the Jinja context. Defaults to None and is skipped in serialization when absent. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant HTTP as HTTP Service
participant Proto as OpenAI Protocol
participant Pre as Preprocessor
participant Tpl as Template Renderer
Client->>HTTP: NvCreateChatCompletionRequest { ..., chat_template_kwargs? }
HTTP->>Proto: Forward request (struct includes chat_template_kwargs)
Proto->>Pre: Build OAIChatLikeRequest view
Pre->>Tpl: render(messages, system, context)
note over Pre,Tpl: New: pass optional chat_template_kwargs
Tpl->>Tpl: Merge base context + chat_template_kwargs (kwargs override)
Tpl-->>Pre: Rendered prompt
Pre-->>Proto: Prepared inputs
Proto-->>HTTP: Completion response
HTTP-->>Client: Stream/response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
lib/llm/src/protocols/openai/chat_completions.rs (1)
57-60
: Field addition looks good; consider a small type alias for readability and future reuse.Def/new serde config is correct. To reduce repetition of the long type across modules, consider introducing a ChatTemplateKwargs alias and using it here.
+// Near other use statements or in a common module: +type ChatTemplateKwargs = std::collections::HashMap<String, serde_json::Value>; // ... - pub chat_template_kwargs: Option<std::collections::HashMap<String, serde_json::Value>>, + pub chat_template_kwargs: Option<ChatTemplateKwargs>,lib/llm/src/protocols/openai/responses.rs (1)
176-192
: Defaulting chat_template_kwargs to None is fine; flag for future propagation.If NvCreateResponse ever accepts per-request template kwargs, remember to propagate them here rather than forcing None.
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
226-232
: Be explicit about reserved-key overrides (messages/tools/bos/eos/add_generation_prompt).Merging extra kwargs last means request-level data can override built-ins. If this is intentional, add a brief comment. If not, filter reserved keys or namespace the kwargs (e.g., request.*) to avoid foot-guns.
- let ctx = if let Some(kwargs) = req.chat_template_kwargs() { - let extra = Value::from_serialize(kwargs); - context! { ..ctx, ..extra } + let ctx = if let Some(kwargs) = req.chat_template_kwargs() { + // Optional: prevent overriding reserved context names + // let reserved = ["messages","tools","bos_token","eos_token","unk_token","add_generation_prompt"]; + // let filtered = kwargs.iter().filter(|(k,_)| !reserved.contains(&k.as_str())) + // .collect::<std::collections::HashMap<_,_>>(); + // let extra = Value::from_serialize(&filtered); + let extra = Value::from_serialize(kwargs); + context! { ..ctx, ..extra } // request-level values take precedence } else { ctx };lib/llm/tests/preprocessor.rs (1)
269-275
: Consider adding a precedence test for chat_template_kwargs.Add a small test asserting that a kwarg (e.g., add_generation_prompt=false) or a custom key is visible in the rendered template and that precedence works as expected.
Example (new test function):
#[tokio::test] async fn test_chat_template_kwargs_precedence() { // Build an MDC/formatter as in other tests let mdc = make_mdc_from_repo( "tests/data/sample-models", "meta-llama/Llama-3.1-70B-Instruct", "1605565", Some(vec![PromptContextMixin::Llama3DateTime]), ).await; let formatter = match PromptFormatter::from_mdc(&mdc).unwrap() { PromptFormatter::OAI(f) => f, }; let mut req = Request::from(SINGLE_CHAT_MESSAGE, None, None, mdc.slug().to_string()); // Inject a kwarg and ensure it's respected let mut map = std::collections::HashMap::new(); map.insert("custom_var".to_string(), serde_json::json!("custom")); req.chat_template_kwargs = Some(map); let rendered = formatter.render(&req).unwrap(); assert!(rendered.contains("custom")); // adapt to your template }lib/llm/tests/http-service.rs (1)
767-773
: Add minimal coverage for non-None and BYOT JSON pathsTo ensure deserialization and wiring don’t regress, add one positive test that sends chat_template_kwargs via the BYOT JSON client and succeeds (no need to validate templating here).
Example addition (outside these ranges) to the generic BYOT test:
// After the first successful BYOT streaming case let request_with_kwargs = serde_json::json!({ "model": "foo", "messages": [{ "role": "user", "content": "Hi" }], "stream": true, "max_tokens": 50, "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true } }); let result = generic_byot_client.chat_stream(request_with_kwargs).await; assert!(result.is_ok(), "chat_template_kwargs should deserialize and be accepted");Also applies to: 807-813, 848-854
lib/llm/tests/test_common_ext.rs (2)
331-332
: Assert it skips serialization when NoneStrengthen serialization test to ensure the field is omitted when None.
Add (outside these lines) in test_serialization_preserves_structure after computing json:
assert!(json.get("chat_template_kwargs").is_none(), "Field should be skipped when None");
63-72
: Add roundtrip test for Some(...)Cover positive case to catch schema regressions (types and serde attributes).
New test (outside these lines):
#[test] fn test_chat_template_kwargs_roundtrip() { use dynamo_llm::protocols::openai::chat_completions::NvCreateChatCompletionRequest; let req_json = serde_json::json!({ "model": "test-model", "messages": [{ "role": "user", "content": "Hello" }], "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true } }); let request: NvCreateChatCompletionRequest = serde_json::from_value(req_json).unwrap(); // Serialize back and verify presence and contents let out = serde_json::to_value(&request).unwrap(); assert!(out.get("chat_template_kwargs").is_some()); assert_eq!(out["chat_template_kwargs"]["echo"], "on"); assert_eq!(out["chat_template_kwargs"]["n"], 3); assert_eq!(out["chat_template_kwargs"]["flag"], true); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
lib/llm/src/entrypoint/input/batch.rs
(1 hunks)lib/llm/src/entrypoint/input/text.rs
(1 hunks)lib/llm/src/http/service/openai.rs
(2 hunks)lib/llm/src/preprocessor/prompt.rs
(2 hunks)lib/llm/src/preprocessor/prompt/template/oai.rs
(2 hunks)lib/llm/src/protocols/openai/chat_completions.rs
(1 hunks)lib/llm/src/protocols/openai/responses.rs
(1 hunks)lib/llm/tests/http-service.rs
(3 hunks)lib/llm/tests/preprocessor.rs
(1 hunks)lib/llm/tests/test_common_ext.rs
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
lib/llm/src/entrypoint/input/batch.rs (1)
lib/llm/tests/http_metrics.rs (1)
None
(17-17)
lib/llm/src/preprocessor/prompt.rs (1)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
chat_template_kwargs
(130-132)
lib/llm/src/entrypoint/input/text.rs (1)
lib/llm/tests/http_metrics.rs (1)
None
(17-17)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
lib/llm/src/preprocessor/prompt.rs (1)
chat_template_kwargs
(74-76)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
lib/llm/src/entrypoint/input/text.rs (1)
121-126
: LGTM.Initializing chat_template_kwargs to None keeps behavior unchanged for the CLI path.
lib/llm/src/http/service/openai.rs (2)
1318-1329
: LGTM (tests).Adding chat_template_kwargs: None in the empty-messages test maintains parity with the struct definition.
1340-1356
: LGTM (tests).Same here; request construction reflects the new optional field.
lib/llm/src/entrypoint/input/batch.rs (1)
226-233
: LGTM.Batch path initializes the new field explicitly; no behavior change.
lib/llm/tests/http-service.rs (1)
767-773
: LGTM: explicit None preserves backward compatibilitySetting chat_template_kwargs: None in all constructions is correct and keeps existing behavior unchanged.
Also applies to: 807-813, 848-854
lib/llm/tests/test_common_ext.rs (1)
63-72
: LGTM: explicit None in struct initializersThe additions align with the new optional field and keep tests stable.
Also applies to: 327-333, 372-383
I don't think |
@qimcis Thanks for sending this. Could you:
|
Signed-off-by: Chi McIsaac <[email protected]>
06e66f6
to
02702b5
Compare
Signed-off-by: Chi McIsaac <[email protected]>
Should be good to go now @grahamking |
Signed-off-by: Chi McIsaac <[email protected]>
/ok to test a5611c8 |
@grahamking, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
/ok to test 15f50b4 |
@grahamking
Can be approved and accepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
/ok to test d0ed18a |
Signed-off-by: Chi McIsaac <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>
Overview:
Add
chat_template_kwargs
tov1/chat/completions
to pass per‑request kwargs into the chat template allowing for request‑level behavior control without separate replicas.Details:
Tested with Qwen3
Where should the reviewer start?
lib/llm/src/protocols/openai/chat_completions.rs
lib/llm/src/preprocessor/prompt.rs
lib/llm/src/preprocessor/prompt/template/oai.rs
Related Issues:
Summary by CodeRabbit