-
Notifications
You must be signed in to change notification settings - Fork 629
feat: support tool calling with custom chat template in multimodality vllm example #3450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi cchen777! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
b69babd
to
9f2d8a5
Compare
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
deploy/docker-compose.yml
(7 hunks)examples/multimodal/components/processor.py
(12 hunks)examples/multimodal/components/worker.py
(3 hunks)examples/multimodal/launch/agg.sh
(3 hunks)examples/multimodal/utils/args.py
(8 hunks)examples/multimodal/utils/chat_processor.py
(2 hunks)examples/multimodal/utils/protocol.py
(3 hunks)lib/bindings/python/rust/parsers.rs
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
examples/multimodal/utils/args.py (3)
lib/bindings/python/rust/parsers.rs (2)
get_reasoning_parser_names
(18-20)get_tool_parser_names
(12-14)lib/bindings/python/src/dynamo/_core.pyi (4)
DistributedRuntime
(33-63)block_size
(607-611)block_size
(630-634)namespace
(40-44)components/src/dynamo/vllm/ports.py (4)
DynamoPortRange
(23-37)PortAllocationRequest
(49-64)PortMetadata
(41-45)allocate_and_reserve_port_block
(77-106)
examples/multimodal/components/processor.py (4)
lib/bindings/python/src/dynamo/_core.pyi (12)
ModelInput
(826-828)ModelRuntimeConfig
(452-456)ModelType
(830-832)register_llm
(846-860)Client
(144-185)round_robin
(175-179)get
(1275-1276)endpoint
(105-109)client
(132-136)wait_for_instances
(160-167)block_size
(607-611)block_size
(630-634)lib/bindings/python/rust/lib.rs (6)
register_llm
(212-282)_core
(130-193)round_robin
(777-811)endpoint
(627-633)client
(702-716)wait_for_instances
(748-757)lib/bindings/python/rust/parsers.rs (2)
parse_tool_calls_py
(35-58)tool_calls
(47-50)examples/multimodal/utils/chat_processor.py (1)
ChatProcessor
(119-268)
lib/bindings/python/rust/parsers.rs (2)
lib/parsers/src/tool_calling/parsers.rs (2)
detect_and_parse_tool_call
(80-104)get_available_tool_parsers
(40-42)lib/bindings/python/rust/lib.rs (18)
new
(379-404)new
(1021-1025)m
(138-138)m
(139-139)m
(140-140)m
(141-141)m
(142-142)m
(143-143)m
(144-144)m
(145-145)m
(146-146)m
(147-147)m
(148-148)m
(149-149)m
(150-150)m
(151-151)m
(152-152)m
(153-153)
examples/multimodal/components/worker.py (6)
examples/multimodal/components/publisher.py (1)
StatLoggerFactory
(147-185)examples/multimodal/utils/args.py (5)
Config
(29-50)base_parse_args
(65-175)configure_ports
(202-229)overwrite_args
(232-267)parse_endpoint
(53-62)examples/multimodal/utils/image_loader.py (1)
ImageLoader
(31-107)examples/multimodal/utils/model.py (1)
construct_mm_data
(43-69)examples/multimodal/utils/protocol.py (2)
MyRequestOutput
(165-188)vLLMMultimodalRequest
(155-162)lib/bindings/python/src/dynamo/nixl_connect/__init__.py (2)
Descriptor
(723-972)begin_read
(577-627)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3450/merge) by cchen777.
examples/multimodal/utils/chat_processor.py
[error] 1-1: Black: reformatted file by this hook.
examples/multimodal/utils/args.py
[error] 60-60: Ruff: Undefined name 'sys' (F821).
[error] 60-60: Ruff: Undefined name 'sys' (F821).
examples/multimodal/components/processor.py
[error] 1-1: isort: files were modified by this hook.
[error] 1-1: Black: reformatted file by this hook.
lib/bindings/python/rust/parsers.rs
[error] 1-1: Trailing whitespace fixed by pre-commit hook.
examples/multimodal/utils/protocol.py
[error] 1-1: isort: files were modified by this hook.
[error] 1-1: Black: reformatted file by this hook.
examples/multimodal/components/worker.py
[error] 1-1: isort: files were modified by this hook.
🪛 Ruff (0.13.3)
examples/multimodal/utils/args.py
164-167: Avoid specifying long messages outside the exception class
(TRY003)
examples/multimodal/components/processor.py
304-304: Avoid specifying long messages outside the exception class
(TRY003)
395-395: Using .strip()
with multi-character strings is misleading
(B005)
427-427: Unpacked variable normal_text
is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
461-461: Do not catch blind exception: Exception
(BLE001)
examples/multimodal/utils/protocol.py
26-26: Unused noqa
directive (non-enabled: F401
)
Remove unused noqa
directive
(RUF100)
33-33: Redefinition of unused MultiModalUUIDDict
from line 25
Remove definition: MultiModalUUIDDict
(F811)
33-33: Unused noqa
directive (non-enabled: F401
)
Remove unused noqa
directive
(RUF100)
examples/multimodal/components/worker.py
284-286: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - dynamo
- GitHub Check: clippy (launch/dynamo-run)
# Handle both streaming (with "data: " prefix) and non-streaming responses | ||
if response.startswith("data: "): | ||
response = json.loads(response.lstrip("data: ")) | ||
else: | ||
response = json.loads(response) | ||
# Convert non-streaming format (message) to streaming format (delta) | ||
if "choices" in response and "message" in response["choices"][0]: | ||
message_content = response["choices"][0]["message"]["content"] | ||
response["choices"][0]["delta"] = {"content": message_content, "role": "assistant"} | ||
del response["choices"][0]["message"] | ||
response["object"] = "chat.completion.chunk" | ||
|
||
# Buffer chunks and accumulate content when tool calling is configured | ||
if ( | ||
self.tool_call_parser | ||
and raw_request.tools | ||
and "choices" in response | ||
and len(response["choices"]) > 0 | ||
): | ||
choice = response["choices"][0] | ||
|
||
# Buffer this chunk | ||
buffered_chunks.append(response) | ||
|
||
# Accumulate delta content | ||
if "delta" in choice and choice["delta"].get("content"): | ||
accumulated_content += choice["delta"]["content"] | ||
|
||
# Parse when we hit the end (finish_reason is set) | ||
finish_reason = choice.get("finish_reason") | ||
if finish_reason == "stop": | ||
if accumulated_content: | ||
logger.info(f"Attempting to parse accumulated tool calls (length={len(accumulated_content)}) with parser: {self.tool_call_parser}") | ||
try: | ||
tool_calls, normal_text = parse_tool_calls_py(accumulated_content, self.tool_call_parser) | ||
logger.info(f"Parse result: {len(tool_calls) if tool_calls else 0} tool calls found") | ||
|
||
if tool_calls: | ||
# Convert tool calls to OpenAI format | ||
tool_call_chunks = [] | ||
for idx, tc in enumerate(tool_calls): | ||
tool_call_chunks.append({ | ||
"index": idx, | ||
"id": tc["id"], | ||
"type": tc["type"], | ||
"function": { | ||
"name": tc["function"]["name"], | ||
"arguments": tc["function"]["arguments"] | ||
} | ||
}) | ||
|
||
# Clear content from ALL buffered chunks (per OpenAI spec) | ||
for buffered_chunk in buffered_chunks: | ||
if "choices" in buffered_chunk and len(buffered_chunk["choices"]) > 0: | ||
buffered_choice = buffered_chunk["choices"][0] | ||
if "delta" in buffered_choice: | ||
buffered_choice["delta"]["content"] = "" | ||
elif "message" in buffered_choice: | ||
buffered_choice["message"]["content"] = "" | ||
|
||
# Add tool_calls to the final chunk | ||
if "delta" in choice: | ||
choice["delta"]["tool_calls"] = tool_call_chunks | ||
elif "message" in choice: | ||
choice["message"]["tool_calls"] = tool_call_chunks | ||
|
||
choice["finish_reason"] = "tool_calls" | ||
logger.info(f"Cleared content from {len(buffered_chunks)} chunks and added {len(tool_calls)} tool call(s) to final chunk") | ||
except Exception as e: | ||
logger.warning(f"Failed to parse tool calls: {e}", exc_info=True) | ||
# Continue with original response if parsing fails | ||
|
||
# Yield all buffered chunks now that we've processed them | ||
for chunk in buffered_chunks: | ||
yield chunk | ||
buffered_chunks = [] | ||
else: | ||
# No tool calling, yield immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-emit strings (not dicts) and reset buffered state.
Once you call json.loads
, the code now yields dicts, but generate()
must stream strings ("data: ...\n\n"
for SSE or plain JSON). Returning dicts breaks the HTTP stack immediately. Also, accumulated_content
is never cleared after flushing, so subsequent responses inherit stale text. Please keep the original string alongside the parsed object, re-serialize before yielding (respecting streaming vs. non-streaming), and reset accumulated_content
when you drain buffered_chunks
.
🧰 Tools
🪛 Ruff (0.13.3)
395-395: Using .strip()
with multi-character strings is misleading
(B005)
427-427: Unpacked variable normal_text
is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
461-461: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In examples/multimodal/components/processor.py around lines 393 to 470, the
handler currently json.loads incoming responses and yields dicts, and never
clears accumulated_content after flushing; fix by preserving the original raw
string for each buffered chunk and re-serializing before yielding so callers
still receive strings (include the "data: " SSE prefix when the original chunk
had it, or plain JSON otherwise), and after you drain buffered_chunks reset
accumulated_content = "" (and ensure buffered_chunks = [] as you already do) so
subsequent responses don't inherit stale content; keep parsing/annotation logic
operating on the parsed dicts but always convert back to the original string
format when emitting.
from publisher import StatLoggerFactory | ||
from utils.args import ( | ||
from examples.pinterest.multimodal.utils.args import ( | ||
Config, | ||
base_parse_args, | ||
configure_ports, | ||
overwrite_args, | ||
parse_endpoint, | ||
) | ||
from utils.image_loader import ImageLoader | ||
from utils.model import construct_mm_data | ||
from utils.protocol import MyRequestOutput, vLLMMultimodalRequest | ||
from examples.multimodal.utils.image_loader import ImageLoader | ||
from examples.multimodal.utils.model import construct_mm_data | ||
from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix module path for shared utils.
These imports must come from examples.multimodal
, not examples.pinterest
. As written, python3 components/worker.py
raises ModuleNotFoundError
. Please point them back to the multimodal package.
-from examples.pinterest.multimodal.utils.args import (
+from examples.multimodal.utils.args import (
Config,
base_parse_args,
configure_ports,
overwrite_args,
parse_endpoint,
)
-from examples.multimodal.utils.image_loader import ImageLoader
-from examples.multimodal.utils.model import construct_mm_data
-from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest
+from examples.multimodal.utils.image_loader import ImageLoader
+from examples.multimodal.utils.model import construct_mm_data
+from examples.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from publisher import StatLoggerFactory | |
from utils.args import ( | |
from examples.pinterest.multimodal.utils.args import ( | |
Config, | |
base_parse_args, | |
configure_ports, | |
overwrite_args, | |
parse_endpoint, | |
) | |
from utils.image_loader import ImageLoader | |
from utils.model import construct_mm_data | |
from utils.protocol import MyRequestOutput, vLLMMultimodalRequest | |
from examples.multimodal.utils.image_loader import ImageLoader | |
from examples.multimodal.utils.model import construct_mm_data | |
from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest | |
from publisher import StatLoggerFactory | |
from examples.multimodal.utils.args import ( | |
Config, | |
base_parse_args, | |
configure_ports, | |
overwrite_args, | |
parse_endpoint, | |
) | |
from examples.multimodal.utils.image_loader import ImageLoader | |
from examples.multimodal.utils.model import construct_mm_data | |
from examples.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest |
🤖 Prompt for AI Agents
In examples/multimodal/components/worker.py around lines 27 to 38, the imports
are incorrectly pointing to examples.pinterest.multimodal.utils; update those
import paths to use examples.multimodal.utils (e.g., import Config,
base_parse_args, configure_ports, overwrite_args, parse_endpoint from
examples.multimodal.utils.args) so the modules are resolved when running python3
components/worker.py; ensure all affected import lines are changed consistently
to the examples.multimodal package.
import argparse | ||
import json | ||
import logging | ||
import os | ||
import socket | ||
import sys | ||
import time | ||
from typing import Callable, List, Optional, Tuple | ||
|
||
from vllm.config import KVTransferConfig | ||
from vllm.distributed.kv_events import KVEventsConfig | ||
from vllm.engine.arg_utils import AsyncEngineArgs | ||
|
||
from dynamo._core import get_reasoning_parser_names, get_tool_parser_names | ||
from dynamo.runtime import DistributedRuntime | ||
from dynamo.vllm.ports import ( | ||
DynamoPortRange, | ||
PortAllocationRequest, | ||
PortMetadata, | ||
allocate_and_reserve_port_block, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import sys
before using sys.exit
.
parse_endpoint
still calls sys.exit
, but sys
is no longer imported, leading to NameError
in error handling paths. Re-add the import.
-import argparse
-import logging
-import os
-import socket
+import argparse
+import logging
+import os
+import socket
+import sys
🤖 Prompt for AI Agents
In examples/multimodal/utils/args.py around lines 4 to 21, the function
parse_endpoint calls sys.exit on error but sys is not imported, causing a
NameError in error paths; add import sys to the top-level imports (near the
other stdlib imports like os and socket) so sys.exit is available, and ensure
linting/order matches project style.
from vllm.multimodal.inputs import MultiModalUUIDDict # noqa: F401 | ||
from vllm.multimodal.inputs import MultiModalDataDict # noqa: F401 | ||
from vllm.outputs import CompletionOutput | ||
from vllm.sampling_params import SamplingParams | ||
from vllm.sequence import PromptLogprobs, RequestMetrics | ||
|
||
import dynamo.nixl_connect as connect | ||
|
||
from vllm.multimodal.inputs import MultiModalUUIDDict # noqa: F401 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicated MultiModalUUIDDict
import.
MultiModalUUIDDict
is imported twice (Lines 25 & 33), triggering Ruff’s F811/redefinition error and leaving an unused noqa
behind. This fails pre-commit. Drop the duplicate by keeping a single import (you can combine it with MultiModalDataDict
if needed).
-from vllm.multimodal.inputs import MultiModalUUIDDict # noqa: F401
-from vllm.multimodal.inputs import MultiModalDataDict # noqa: F401
+from vllm.multimodal.inputs import MultiModalDataDict, MultiModalUUIDDict # noqa: F401
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.13.3)
25-25: Unused noqa
directive (non-enabled: F401
)
Remove unused noqa
directive
(RUF100)
26-26: Unused noqa
directive (non-enabled: F401
)
Remove unused noqa
directive
(RUF100)
33-33: Redefinition of unused MultiModalUUIDDict
from line 25
Remove definition: MultiModalUUIDDict
(F811)
33-33: Unused noqa
directive (non-enabled: F401
)
Remove unused noqa
directive
(RUF100)
🤖 Prompt for AI Agents
In examples/multimodal/utils/protocol.py around lines 25 to 34, there is a
duplicate import of MultiModalUUIDDict (lines 25 and 33) causing a Ruff
F811/redefinition error and an unnecessary noqa; remove the duplicate import by
keeping a single import statement (combine MultiModalUUIDDict with
MultiModalDataDict on one line if desired) and delete the extra import and its
noqa.
9f2d8a5
to
3d5602f
Compare
Overview:
In this PR we add tool calling support for multi-modal example,
Details:
Current example doesn't have a way to provide chat template, also lack of support to do the tool calling preproc / postproc, since multi-modal input is not well supported in rust implementation, we need to use a dedicated Processor to handle the request. This may not be ideal implementation but hope to shed some lights for the future implementation
Test in the following details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)