Skip to content

Conversation

cchen777
Copy link
Contributor

@cchen777 cchen777 commented Oct 7, 2025

Overview:

In this PR we add tool calling support for multi-modal example,

  1. support text, image, text + image mode, also support tool-calling
  2. based on input whether comes with tool, swap out chat template to custom one

Details:

Current example doesn't have a way to provide chat template, also lack of support to do the tool calling preproc / postproc, since multi-modal input is not well supported in rust implementation, we need to use a dedicated Processor to handle the request. This may not be ideal implementation but hope to shed some lights for the future implementation

Test in the following details:

  1. Build, prepare files, and start the dynamo services (frontend, processor, worker)
# build image & download chat template
$ container/build.sh --framework vllm
$ wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_hermes.jinja -O ./tool_chat_template_hermes.jinja

# setup etcd / nats
$ docker compose -f deploy/docker-compose.yml up -d

# prepare env and launch services
$ ./container/run.sh --framework vllm -it --mount-workspace
(venv) root@xxx:/workspace# cd examples/multimodal
(venv) root@xxx:/workspace/examples/multimodal# launch/agg.sh --model Qwen/Qwen2.5-VL-7B-Instruct --dyn-tool-call-parser hermes --custom-jinja-template /workspace/tool_chat_template_hermes.jinja
...
2025-10-05T07:41:07.263931Z  INFO modelexpress_common::providers::huggingface: Downloaded model files for Qwen/Qwen2.5-VL-7B-Instruct
2025-10-05T07:41:07.264018Z  INFO dynamo_llm::hub: ModelExpress download completed successfully for model: Qwen/Qwen2.5-VL-7B-Instruct
2025-10-05T07:41:07.478725Z  INFO processor.init: Starting to serve the dyn://dynamo.processor.generate endpoint...   
2025-10-05T07:41:07.548685Z  INFO dynamo_llm::discovery::watcher: added model model_name="Qwen/Qwen2.5-VL-7B-Instruct" namespace="dynamo"
  1. send requests with different combination
$ bash examples/multimodal/launch/test-requests.sh
================================================================================
🚀 COMPREHENSIVE MULTIMODAL API TEST SUITE 🚀
================================================================================

================================================================================
🧪 TEST: 1️⃣ 🖼️ 💬 🔧 Image + Text + Tool Calling
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "8c7200077834439e9562b142888ef27d",
  "choices": [
    {
      "index": 0,
      "message": {
        "tool_calls": [
          {
            "id": "call-c8f4e6c5-dbe6-41ad-b58a-ca7794ffbba4",
            "type": "function",
            "function": {
              "name": "describe_image",
              "arguments": "{\"objects\":[\"bus\"],\"scene\":\"foggy street\"}"
            }
          }
        ],
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "tool_calls"
    }
  ],
  "created": 1759653971,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant

🛠️  Tool Calls
  [call-c8f4e6c5-dbe6-41ad-b58a-ca7794ffbba4] 🔧 Function: describe_image
      📋 Arguments: {"objects":["bus"],"scene":"foggy street"}

✅ Test '1️⃣ 🖼️ 💬 🔧 Image + Text + Tool Calling' PASSED

================================================================================
🧪 TEST: 2️⃣ 💬 🔧 Text + Tool Calling
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "072fb779f02c4428aec33b73850891ed",
  "choices": [
    {
      "index": 0,
      "message": {
        "tool_calls": [
          {
            "id": "call-62aacb99-1fe7-464d-b9da-0bcd9d43421b",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"fahrenheit\"}"
            }
          }
        ],
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "tool_calls"
    }
  ],
  "created": 1759653973,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant

🛠️  Tool Calls
  [call-62aacb99-1fe7-464d-b9da-0bcd9d43421b] 🔧 Function: get_weather
      📋 Arguments: {"location":"San Francisco, CA","unit":"fahrenheit"}

✅ Test '2️⃣ 💬 🔧 Text + Tool Calling' PASSED

================================================================================
🧪 TEST: 3️⃣ 🖼️ 💬 Image + Text
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "c3dffe54365e44bcbccf4ad8e1eb293e",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "The image shows a public transit bus on a street, with the display at the front of the bus reading \"OUT OF SERVICE.\" The bus appears to be stationary and is positioned near a set of tracks, suggesting it may be part of a streetcar or trolley system. The scene is set during what looks like early morning or late evening, as the lighting is dim and there's a misty or foggy atmosphere. Trees and buildings are visible in the background, indicating an urban or suburban setting.",
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1759653974,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant
📝 Content: The image shows a public transit bus on a street, with the display at the front of the bus reading "OUT OF SERVICE." The bus appears to be stationary and is positioned near a set of tracks, suggesting...

✅ Test '3️⃣ 🖼️ 💬 Image + Text' PASSED

================================================================================
🧪 TEST: 4️⃣ 💬 Text Only
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "8fee249c19f44f3d8c99f4385b3b599d",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or \"neurons,\" which process information and pass it along to other neurons.\n\nIn a neural network, data is fed into the input layer, which then passes it on to the hidden layers. These hidden layers perform various computations on the data, such as identifying patterns or features within the data. The output layer then takes the results from the hidden layers and produces an output, which can be used for tasks like classification or regression.\n\nThe key idea behind neural networks is that they can learn from data and improve their performance over time through a process called training. During training, the network adjusts the weights of its connections based on the error between its predicted output and the actual output. This allows the network to gradually improve its ability to make accurate predictions.\n\nOverall, neural networks are powerful tools for solving complex problems, especially those involving large amounts of data and multiple variables. They have been successfully applied to a wide range of tasks, including image recognition, natural language processing, and speech recognition.",
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1759653977,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant
📝 Content: Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or "neurons," which process inform...

✅ Test '4️⃣ 💬 Text Only' PASSED

================================================================================
🧪 TEST: 5️⃣ 🖼️ Image Only
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "6958a63efc364c01b3ff562a7b35bc34",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "The image shows a bus with the sign \"OUT OF SERVICE\" displayed on its front digital display. The bus appears to be stationary, and it is located on a street that seems to have a light layer of fog or mist, creating a moody atmosphere. The surroundings include trees, a sidewalk, and some buildings in the background. The bus number \"0870\" is visible at the bottom of the front display. This scene could suggest early morning or late evening hours when public transportation might not be in operation.",
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1759653985,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant
📝 Content: The image shows a bus with the sign "OUT OF SERVICE" displayed on its front digital display. The bus appears to be stationary, and it is located on a street that seems to have a light layer of fog or ...

✅ Test '5️⃣ 🖼️ Image Only' PASSED

================================================================================
🧪 TEST: 6️⃣ 💬 🔧 📡 Text + Tool Calling (STREAMING)
================================================================================
✅ Status Code: 200

📥 Response (STREAMING - SSE format):

📡 Stream Chunks:


🛠️  Tool Calls Detected:
  🔧 Function: get_weather
     📋 Arguments: {"unit":"fahrenheit","location":"San Francisco, CA"}
  [DONE]


📊 Summary:
  Total chunks: 31

✅ Test '6️⃣ 💬 🔧 📡 Text + Tool Calling (STREAMING)' PASSED

================================================================================
🧪 TEST: 7️⃣ 🖼️ 💬 📡 Image + Text (STREAMING)
================================================================================
✅ Status Code: 200

📥 Response (STREAMING - SSE format):

📡 Stream Chunks:
The image shows a public transit bus on a street during what appears to be early morning or late evening, as the lighting suggests it's either dawn or dusk. The bus has an "OUT OF SERVICE" sign displayed on its front digital display. The surroundings include trees, a sidewalk, and a residential area with houses in the background. The road appears wet, possibly from recent rain, and there are visible tram tracks on the street, indicating that the area may have a light rail system. The overall atmosphere is quiet and misty.  [DONE]


📊 Summary:
  Total chunks: 109
  Content length: 527 characters

📝 Full Content:
The image shows a public transit bus on a street during what appears to be early morning or late evening, as the lighting suggests it's either dawn or dusk. The bus has an "OUT OF SERVICE" sign displayed on its front digital display. The surroundings include trees, a sidewalk, and a residential area with houses in the background. The road appears wet, possibly from recent rain, and there are visible tram tracks on the street, indicating that the area may have a light rail system. The overall atmosphere is quiet and misty.

✅ Test '7️⃣ 🖼️ 💬 📡 Image + Text (STREAMING)' PASSED

================================================================================
🧪 TEST: 8️⃣ 💬 📡 Text Only (STREAMING)
================================================================================
✅ Status Code: 200

📥 Response (STREAMING - SSE format):

📡 Stream Chunks:
Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or "neurons," which are organized into layers. The input data is fed into the first layer, which then passes it on to the next layer, and so on, until the final output is produced.Each neuron in a neural network performs a mathematical operation on the inputs it receives from the previous layer, and then passes its output to the neurons in the next layer. This process is repeated until the final output is generated.The key feature of neural networks is their ability to learn from data. During training, the network is presented with a set of input-output pairs, and it adjusts the weights of the connections between the neurons based on how well it performs. Over time, the network learns to recognize patterns in the data and make predictions about new, unseen data.There are many different types of neural networks, including feedforward networks, recurrent networks, and convolutional networks, each with their own unique characteristics and applications. However, all neural networks share the same basic principle: they use a series of interconnected nodes to process and analyze data, and they can be trained to perform a wide range of tasks, such as image recognition, natural language processing, and predictive modeling.  [DONE]


📊 Summary:
  Total chunks: 268
  Content length: 1400 characters

📝 Full Content:
Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or "neurons," which are organized into layers. The input data is fed into the first layer, which then passes it on to the next layer, and so on, until the final output is produced.Each neuron in a neural network performs a mathematical operation on the inputs it receives from the previous layer, and then passes its output to the neurons in the next layer. This process is repeated until the final output is generated.The key feature of neural networks is their ability to learn from data. During training, the network is presented with a set of input-output pairs, and it adjusts the weights of the connections between the neurons based on how well it performs. Over time, the network learns to recognize patterns in the data and make predictions about new, unseen data.There are many different types of neural networks, including feedforward networks, recurrent networks, and convolutional networks, each with their own unique characteristics and applications. However, all neural networks share the same basic principle: they use a series of interconnected nodes to process and analyze data, and they can be trained to perform a wide range of tasks, such as image recognition, natural language processing, and predictive modeling.

✅ Test '8️⃣ 💬 📡 Text Only (STREAMING)' PASSED

================================================================================
🎉 ALL TESTS COMPLETED 🎉
================================================================================

Where should the reviewer start?

  • examples/multimodal/components/processor.py
  • examples/multimodal/components/worker.py
  • examples/multimodal/utils/args.py
  • examples/multimodal/utils/protocol.py
  • examples/multimodal/utils/chat_processor.py
  • examples/multimodal/launch/agg.sh
  • lib/bindings/python/rust/parsers.rs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@cchen777 cchen777 requested review from a team as code owners October 7, 2025 00:04
Copy link

copy-pr-bot bot commented Oct 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

github-actions bot commented Oct 7, 2025

👋 Hi cchen777! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Oct 7, 2025
@cchen777 cchen777 force-pushed the pins/multimodal-tool-calling branch from b69babd to 9f2d8a5 Compare October 7, 2025 00:05
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed The pull request description adheres to the repository’s template by providing clearly labeled Overview, Details, Where should the reviewer start, and Related Issues sections, each populated with relevant content and formatted as specified.
Title Check ✅ Passed The title clearly and concisely summarizes the main enhancement—adding tool‐calling support with a custom chat template to the multimodal vLLM example—and aligns with the PR objectives without extraneous details.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c888bf and 9f2d8a5.

📒 Files selected for processing (8)
  • deploy/docker-compose.yml (7 hunks)
  • examples/multimodal/components/processor.py (12 hunks)
  • examples/multimodal/components/worker.py (3 hunks)
  • examples/multimodal/launch/agg.sh (3 hunks)
  • examples/multimodal/utils/args.py (8 hunks)
  • examples/multimodal/utils/chat_processor.py (2 hunks)
  • examples/multimodal/utils/protocol.py (3 hunks)
  • lib/bindings/python/rust/parsers.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
examples/multimodal/utils/args.py (3)
lib/bindings/python/rust/parsers.rs (2)
  • get_reasoning_parser_names (18-20)
  • get_tool_parser_names (12-14)
lib/bindings/python/src/dynamo/_core.pyi (4)
  • DistributedRuntime (33-63)
  • block_size (607-611)
  • block_size (630-634)
  • namespace (40-44)
components/src/dynamo/vllm/ports.py (4)
  • DynamoPortRange (23-37)
  • PortAllocationRequest (49-64)
  • PortMetadata (41-45)
  • allocate_and_reserve_port_block (77-106)
examples/multimodal/components/processor.py (4)
lib/bindings/python/src/dynamo/_core.pyi (12)
  • ModelInput (826-828)
  • ModelRuntimeConfig (452-456)
  • ModelType (830-832)
  • register_llm (846-860)
  • Client (144-185)
  • round_robin (175-179)
  • get (1275-1276)
  • endpoint (105-109)
  • client (132-136)
  • wait_for_instances (160-167)
  • block_size (607-611)
  • block_size (630-634)
lib/bindings/python/rust/lib.rs (6)
  • register_llm (212-282)
  • _core (130-193)
  • round_robin (777-811)
  • endpoint (627-633)
  • client (702-716)
  • wait_for_instances (748-757)
lib/bindings/python/rust/parsers.rs (2)
  • parse_tool_calls_py (35-58)
  • tool_calls (47-50)
examples/multimodal/utils/chat_processor.py (1)
  • ChatProcessor (119-268)
lib/bindings/python/rust/parsers.rs (2)
lib/parsers/src/tool_calling/parsers.rs (2)
  • detect_and_parse_tool_call (80-104)
  • get_available_tool_parsers (40-42)
lib/bindings/python/rust/lib.rs (18)
  • new (379-404)
  • new (1021-1025)
  • m (138-138)
  • m (139-139)
  • m (140-140)
  • m (141-141)
  • m (142-142)
  • m (143-143)
  • m (144-144)
  • m (145-145)
  • m (146-146)
  • m (147-147)
  • m (148-148)
  • m (149-149)
  • m (150-150)
  • m (151-151)
  • m (152-152)
  • m (153-153)
examples/multimodal/components/worker.py (6)
examples/multimodal/components/publisher.py (1)
  • StatLoggerFactory (147-185)
examples/multimodal/utils/args.py (5)
  • Config (29-50)
  • base_parse_args (65-175)
  • configure_ports (202-229)
  • overwrite_args (232-267)
  • parse_endpoint (53-62)
examples/multimodal/utils/image_loader.py (1)
  • ImageLoader (31-107)
examples/multimodal/utils/model.py (1)
  • construct_mm_data (43-69)
examples/multimodal/utils/protocol.py (2)
  • MyRequestOutput (165-188)
  • vLLMMultimodalRequest (155-162)
lib/bindings/python/src/dynamo/nixl_connect/__init__.py (2)
  • Descriptor (723-972)
  • begin_read (577-627)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3450/merge) by cchen777.
examples/multimodal/utils/chat_processor.py

[error] 1-1: Black: reformatted file by this hook.

examples/multimodal/utils/args.py

[error] 60-60: Ruff: Undefined name 'sys' (F821).


[error] 60-60: Ruff: Undefined name 'sys' (F821).

examples/multimodal/components/processor.py

[error] 1-1: isort: files were modified by this hook.


[error] 1-1: Black: reformatted file by this hook.

lib/bindings/python/rust/parsers.rs

[error] 1-1: Trailing whitespace fixed by pre-commit hook.

examples/multimodal/utils/protocol.py

[error] 1-1: isort: files were modified by this hook.


[error] 1-1: Black: reformatted file by this hook.

examples/multimodal/components/worker.py

[error] 1-1: isort: files were modified by this hook.

🪛 Ruff (0.13.3)
examples/multimodal/utils/args.py

164-167: Avoid specifying long messages outside the exception class

(TRY003)

examples/multimodal/components/processor.py

304-304: Avoid specifying long messages outside the exception class

(TRY003)


395-395: Using .strip() with multi-character strings is misleading

(B005)


427-427: Unpacked variable normal_text is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


461-461: Do not catch blind exception: Exception

(BLE001)

examples/multimodal/utils/protocol.py

26-26: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


33-33: Redefinition of unused MultiModalUUIDDict from line 25

Remove definition: MultiModalUUIDDict

(F811)


33-33: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

examples/multimodal/components/worker.py

284-286: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: clippy (launch/dynamo-run)

Comment on lines +393 to +470
# Handle both streaming (with "data: " prefix) and non-streaming responses
if response.startswith("data: "):
response = json.loads(response.lstrip("data: "))
else:
response = json.loads(response)
# Convert non-streaming format (message) to streaming format (delta)
if "choices" in response and "message" in response["choices"][0]:
message_content = response["choices"][0]["message"]["content"]
response["choices"][0]["delta"] = {"content": message_content, "role": "assistant"}
del response["choices"][0]["message"]
response["object"] = "chat.completion.chunk"

# Buffer chunks and accumulate content when tool calling is configured
if (
self.tool_call_parser
and raw_request.tools
and "choices" in response
and len(response["choices"]) > 0
):
choice = response["choices"][0]

# Buffer this chunk
buffered_chunks.append(response)

# Accumulate delta content
if "delta" in choice and choice["delta"].get("content"):
accumulated_content += choice["delta"]["content"]

# Parse when we hit the end (finish_reason is set)
finish_reason = choice.get("finish_reason")
if finish_reason == "stop":
if accumulated_content:
logger.info(f"Attempting to parse accumulated tool calls (length={len(accumulated_content)}) with parser: {self.tool_call_parser}")
try:
tool_calls, normal_text = parse_tool_calls_py(accumulated_content, self.tool_call_parser)
logger.info(f"Parse result: {len(tool_calls) if tool_calls else 0} tool calls found")

if tool_calls:
# Convert tool calls to OpenAI format
tool_call_chunks = []
for idx, tc in enumerate(tool_calls):
tool_call_chunks.append({
"index": idx,
"id": tc["id"],
"type": tc["type"],
"function": {
"name": tc["function"]["name"],
"arguments": tc["function"]["arguments"]
}
})

# Clear content from ALL buffered chunks (per OpenAI spec)
for buffered_chunk in buffered_chunks:
if "choices" in buffered_chunk and len(buffered_chunk["choices"]) > 0:
buffered_choice = buffered_chunk["choices"][0]
if "delta" in buffered_choice:
buffered_choice["delta"]["content"] = ""
elif "message" in buffered_choice:
buffered_choice["message"]["content"] = ""

# Add tool_calls to the final chunk
if "delta" in choice:
choice["delta"]["tool_calls"] = tool_call_chunks
elif "message" in choice:
choice["message"]["tool_calls"] = tool_call_chunks

choice["finish_reason"] = "tool_calls"
logger.info(f"Cleared content from {len(buffered_chunks)} chunks and added {len(tool_calls)} tool call(s) to final chunk")
except Exception as e:
logger.warning(f"Failed to parse tool calls: {e}", exc_info=True)
# Continue with original response if parsing fails

# Yield all buffered chunks now that we've processed them
for chunk in buffered_chunks:
yield chunk
buffered_chunks = []
else:
# No tool calling, yield immediately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Re-emit strings (not dicts) and reset buffered state.

Once you call json.loads, the code now yields dicts, but generate() must stream strings ("data: ...\n\n" for SSE or plain JSON). Returning dicts breaks the HTTP stack immediately. Also, accumulated_content is never cleared after flushing, so subsequent responses inherit stale text. Please keep the original string alongside the parsed object, re-serialize before yielding (respecting streaming vs. non-streaming), and reset accumulated_content when you drain buffered_chunks.

🧰 Tools
🪛 Ruff (0.13.3)

395-395: Using .strip() with multi-character strings is misleading

(B005)


427-427: Unpacked variable normal_text is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


461-461: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In examples/multimodal/components/processor.py around lines 393 to 470, the
handler currently json.loads incoming responses and yields dicts, and never
clears accumulated_content after flushing; fix by preserving the original raw
string for each buffered chunk and re-serializing before yielding so callers
still receive strings (include the "data: " SSE prefix when the original chunk
had it, or plain JSON otherwise), and after you drain buffered_chunks reset
accumulated_content = "" (and ensure buffered_chunks = [] as you already do) so
subsequent responses don't inherit stale content; keep parsing/annotation logic
operating on the parsed dicts but always convert back to the original string
format when emitting.

Comment on lines 27 to 38
from publisher import StatLoggerFactory
from utils.args import (
from examples.pinterest.multimodal.utils.args import (
Config,
base_parse_args,
configure_ports,
overwrite_args,
parse_endpoint,
)
from utils.image_loader import ImageLoader
from utils.model import construct_mm_data
from utils.protocol import MyRequestOutput, vLLMMultimodalRequest
from examples.multimodal.utils.image_loader import ImageLoader
from examples.multimodal.utils.model import construct_mm_data
from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix module path for shared utils.

These imports must come from examples.multimodal, not examples.pinterest. As written, python3 components/worker.py raises ModuleNotFoundError. Please point them back to the multimodal package.

-from examples.pinterest.multimodal.utils.args import (
+from examples.multimodal.utils.args import (
     Config,
     base_parse_args,
     configure_ports,
     overwrite_args,
     parse_endpoint,
 )
-from examples.multimodal.utils.image_loader import ImageLoader
-from examples.multimodal.utils.model import construct_mm_data
-from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest
+from examples.multimodal.utils.image_loader import ImageLoader
+from examples.multimodal.utils.model import construct_mm_data
+from examples.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from publisher import StatLoggerFactory
from utils.args import (
from examples.pinterest.multimodal.utils.args import (
Config,
base_parse_args,
configure_ports,
overwrite_args,
parse_endpoint,
)
from utils.image_loader import ImageLoader
from utils.model import construct_mm_data
from utils.protocol import MyRequestOutput, vLLMMultimodalRequest
from examples.multimodal.utils.image_loader import ImageLoader
from examples.multimodal.utils.model import construct_mm_data
from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest
from publisher import StatLoggerFactory
from examples.multimodal.utils.args import (
Config,
base_parse_args,
configure_ports,
overwrite_args,
parse_endpoint,
)
from examples.multimodal.utils.image_loader import ImageLoader
from examples.multimodal.utils.model import construct_mm_data
from examples.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest
🤖 Prompt for AI Agents
In examples/multimodal/components/worker.py around lines 27 to 38, the imports
are incorrectly pointing to examples.pinterest.multimodal.utils; update those
import paths to use examples.multimodal.utils (e.g., import Config,
base_parse_args, configure_ports, overwrite_args, parse_endpoint from
examples.multimodal.utils.args) so the modules are resolved when running python3
components/worker.py; ensure all affected import lines are changed consistently
to the examples.multimodal package.

Comment on lines 4 to +21
import argparse
import json
import logging
import os
import socket
import sys
import time
from typing import Callable, List, Optional, Tuple

from vllm.config import KVTransferConfig
from vllm.distributed.kv_events import KVEventsConfig
from vllm.engine.arg_utils import AsyncEngineArgs

from dynamo._core import get_reasoning_parser_names, get_tool_parser_names
from dynamo.runtime import DistributedRuntime
from dynamo.vllm.ports import (
DynamoPortRange,
PortAllocationRequest,
PortMetadata,
allocate_and_reserve_port_block,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Import sys before using sys.exit.

parse_endpoint still calls sys.exit, but sys is no longer imported, leading to NameError in error handling paths. Re-add the import.

-import argparse
-import logging
-import os
-import socket
+import argparse
+import logging
+import os
+import socket
+import sys
🤖 Prompt for AI Agents
In examples/multimodal/utils/args.py around lines 4 to 21, the function
parse_endpoint calls sys.exit on error but sys is not imported, causing a
NameError in error paths; add import sys to the top-level imports (near the
other stdlib imports like os and socket) so sys.exit is available, and ensure
linting/order matches project style.

Comment on lines 25 to 34
from vllm.multimodal.inputs import MultiModalUUIDDict # noqa: F401
from vllm.multimodal.inputs import MultiModalDataDict # noqa: F401
from vllm.outputs import CompletionOutput
from vllm.sampling_params import SamplingParams
from vllm.sequence import PromptLogprobs, RequestMetrics

import dynamo.nixl_connect as connect

from vllm.multimodal.inputs import MultiModalUUIDDict # noqa: F401

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove duplicated MultiModalUUIDDict import.

MultiModalUUIDDict is imported twice (Lines 25 & 33), triggering Ruff’s F811/redefinition error and leaving an unused noqa behind. This fails pre-commit. Drop the duplicate by keeping a single import (you can combine it with MultiModalDataDict if needed).

-from vllm.multimodal.inputs import MultiModalUUIDDict  # noqa: F401
-from vllm.multimodal.inputs import MultiModalDataDict  # noqa: F401
+from vllm.multimodal.inputs import MultiModalDataDict, MultiModalUUIDDict  # noqa: F401

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.13.3)

25-25: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


26-26: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


33-33: Redefinition of unused MultiModalUUIDDict from line 25

Remove definition: MultiModalUUIDDict

(F811)


33-33: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
In examples/multimodal/utils/protocol.py around lines 25 to 34, there is a
duplicate import of MultiModalUUIDDict (lines 25 and 33) causing a Ruff
F811/redefinition error and an unnecessary noqa; remove the duplicate import by
keeping a single import statement (combine MultiModalUUIDDict with
MultiModalDataDict on one line if desired) and delete the extra import and its
noqa.

@cchen777 cchen777 force-pushed the pins/multimodal-tool-calling branch from 9f2d8a5 to 3d5602f Compare October 7, 2025 00:13
@cchen777 cchen777 changed the title feat: suppor tool calling with custom chat template in multimodality vllm example feat: support tool calling with custom chat template in multimodality vllm example Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contribution Pull request is from an external contributor feat size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant