feat: support tool calling with custom chat template in multimodality vllm example #3450

cchen777 · 2025-10-07T00:04:19Z

Overview:

In this PR we add tool calling support for multi-modal example,

support text, image, text + image mode, also support tool-calling
based on input whether comes with tool, swap out chat template to custom one

Details:

Current example doesn't have a way to provide chat template, also lack of support to do the tool calling preproc / postproc, since multi-modal input is not well supported in rust implementation, we need to use a dedicated Processor to handle the request. This may not be ideal implementation but hope to shed some lights for the future implementation

Test in the following details:

Build, prepare files, and start the dynamo services (frontend, processor, worker)

# build image & download chat template
$ container/build.sh --framework vllm
$ wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_hermes.jinja -O ./tool_chat_template_hermes.jinja

# setup etcd / nats
$ docker compose -f deploy/docker-compose.yml up -d

# prepare env and launch services
$ ./container/run.sh --framework vllm -it --mount-workspace
(venv) root@xxx:/workspace# cd examples/multimodal
(venv) root@xxx:/workspace/examples/multimodal# launch/agg.sh --model Qwen/Qwen2.5-VL-7B-Instruct --dyn-tool-call-parser hermes --custom-jinja-template /workspace/tool_chat_template_hermes.jinja
...
2025-10-05T07:41:07.263931Z  INFO modelexpress_common::providers::huggingface: Downloaded model files for Qwen/Qwen2.5-VL-7B-Instruct
2025-10-05T07:41:07.264018Z  INFO dynamo_llm::hub: ModelExpress download completed successfully for model: Qwen/Qwen2.5-VL-7B-Instruct
2025-10-05T07:41:07.478725Z  INFO processor.init: Starting to serve the dyn://dynamo.processor.generate endpoint...   
2025-10-05T07:41:07.548685Z  INFO dynamo_llm::discovery::watcher: added model model_name="Qwen/Qwen2.5-VL-7B-Instruct" namespace="dynamo"

send requests with different combination

$ bash examples/multimodal/launch/test-requests.sh
================================================================================
🚀 COMPREHENSIVE MULTIMODAL API TEST SUITE 🚀
================================================================================

================================================================================
🧪 TEST: 1️⃣ 🖼️ 💬 🔧 Image + Text + Tool Calling
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "8c7200077834439e9562b142888ef27d",
  "choices": [
    {
      "index": 0,
      "message": {
        "tool_calls": [
          {
            "id": "call-c8f4e6c5-dbe6-41ad-b58a-ca7794ffbba4",
            "type": "function",
            "function": {
              "name": "describe_image",
              "arguments": "{\"objects\":[\"bus\"],\"scene\":\"foggy street\"}"
            }
          }
        ],
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "tool_calls"
    }
  ],
  "created": 1759653971,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant

🛠️  Tool Calls
  [call-c8f4e6c5-dbe6-41ad-b58a-ca7794ffbba4] 🔧 Function: describe_image
      📋 Arguments: {"objects":["bus"],"scene":"foggy street"}

✅ Test '1️⃣ 🖼️ 💬 🔧 Image + Text + Tool Calling' PASSED

================================================================================
🧪 TEST: 2️⃣ 💬 🔧 Text + Tool Calling
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "072fb779f02c4428aec33b73850891ed",
  "choices": [
    {
      "index": 0,
      "message": {
        "tool_calls": [
          {
            "id": "call-62aacb99-1fe7-464d-b9da-0bcd9d43421b",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"fahrenheit\"}"
            }
          }
        ],
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "tool_calls"
    }
  ],
  "created": 1759653973,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant

🛠️  Tool Calls
  [call-62aacb99-1fe7-464d-b9da-0bcd9d43421b] 🔧 Function: get_weather
      📋 Arguments: {"location":"San Francisco, CA","unit":"fahrenheit"}

✅ Test '2️⃣ 💬 🔧 Text + Tool Calling' PASSED

================================================================================
🧪 TEST: 3️⃣ 🖼️ 💬 Image + Text
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "c3dffe54365e44bcbccf4ad8e1eb293e",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "The image shows a public transit bus on a street, with the display at the front of the bus reading \"OUT OF SERVICE.\" The bus appears to be stationary and is positioned near a set of tracks, suggesting it may be part of a streetcar or trolley system. The scene is set during what looks like early morning or late evening, as the lighting is dim and there's a misty or foggy atmosphere. Trees and buildings are visible in the background, indicating an urban or suburban setting.",
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1759653974,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant
📝 Content: The image shows a public transit bus on a street, with the display at the front of the bus reading "OUT OF SERVICE." The bus appears to be stationary and is positioned near a set of tracks, suggesting...

✅ Test '3️⃣ 🖼️ 💬 Image + Text' PASSED

================================================================================
🧪 TEST: 4️⃣ 💬 Text Only
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "8fee249c19f44f3d8c99f4385b3b599d",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or \"neurons,\" which process information and pass it along to other neurons.\n\nIn a neural network, data is fed into the input layer, which then passes it on to the hidden layers. These hidden layers perform various computations on the data, such as identifying patterns or features within the data. The output layer then takes the results from the hidden layers and produces an output, which can be used for tasks like classification or regression.\n\nThe key idea behind neural networks is that they can learn from data and improve their performance over time through a process called training. During training, the network adjusts the weights of its connections based on the error between its predicted output and the actual output. This allows the network to gradually improve its ability to make accurate predictions.\n\nOverall, neural networks are powerful tools for solving complex problems, especially those involving large amounts of data and multiple variables. They have been successfully applied to a wide range of tasks, including image recognition, natural language processing, and speech recognition.",
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1759653977,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant
📝 Content: Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or "neurons," which process inform...

✅ Test '4️⃣ 💬 Text Only' PASSED

================================================================================
🧪 TEST: 5️⃣ 🖼️ Image Only
================================================================================
✅ Status Code: 200

📥 Response:
{
  "id": "6958a63efc364c01b3ff562a7b35bc34",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "The image shows a bus with the sign \"OUT OF SERVICE\" displayed on its front digital display. The bus appears to be stationary, and it is located on a street that seems to have a light layer of fog or mist, creating a moody atmosphere. The surroundings include trees, a sidewalk, and some buildings in the background. The bus number \"0870\" is visible at the bottom of the front display. This scene could suggest early morning or late evening hours when public transportation might not be in operation.",
        "role": "assistant",
        "reasoning_content": null
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1759653985,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "usage": null
}

💬 Message Summary
Role: assistant
📝 Content: The image shows a bus with the sign "OUT OF SERVICE" displayed on its front digital display. The bus appears to be stationary, and it is located on a street that seems to have a light layer of fog or ...

✅ Test '5️⃣ 🖼️ Image Only' PASSED

================================================================================
🧪 TEST: 6️⃣ 💬 🔧 📡 Text + Tool Calling (STREAMING)
================================================================================
✅ Status Code: 200

📥 Response (STREAMING - SSE format):

📡 Stream Chunks:


🛠️  Tool Calls Detected:
  🔧 Function: get_weather
     📋 Arguments: {"unit":"fahrenheit","location":"San Francisco, CA"}
  [DONE]


📊 Summary:
  Total chunks: 31

✅ Test '6️⃣ 💬 🔧 📡 Text + Tool Calling (STREAMING)' PASSED

================================================================================
🧪 TEST: 7️⃣ 🖼️ 💬 📡 Image + Text (STREAMING)
================================================================================
✅ Status Code: 200

📥 Response (STREAMING - SSE format):

📡 Stream Chunks:
The image shows a public transit bus on a street during what appears to be early morning or late evening, as the lighting suggests it's either dawn or dusk. The bus has an "OUT OF SERVICE" sign displayed on its front digital display. The surroundings include trees, a sidewalk, and a residential area with houses in the background. The road appears wet, possibly from recent rain, and there are visible tram tracks on the street, indicating that the area may have a light rail system. The overall atmosphere is quiet and misty.  [DONE]


📊 Summary:
  Total chunks: 109
  Content length: 527 characters

📝 Full Content:
The image shows a public transit bus on a street during what appears to be early morning or late evening, as the lighting suggests it's either dawn or dusk. The bus has an "OUT OF SERVICE" sign displayed on its front digital display. The surroundings include trees, a sidewalk, and a residential area with houses in the background. The road appears wet, possibly from recent rain, and there are visible tram tracks on the street, indicating that the area may have a light rail system. The overall atmosphere is quiet and misty.

✅ Test '7️⃣ 🖼️ 💬 📡 Image + Text (STREAMING)' PASSED

================================================================================
🧪 TEST: 8️⃣ 💬 📡 Text Only (STREAMING)
================================================================================
✅ Status Code: 200

📥 Response (STREAMING - SSE format):

📡 Stream Chunks:
Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or "neurons," which are organized into layers. The input data is fed into the first layer, which then passes it on to the next layer, and so on, until the final output is produced.Each neuron in a neural network performs a mathematical operation on the inputs it receives from the previous layer, and then passes its output to the neurons in the next layer. This process is repeated until the final output is generated.The key feature of neural networks is their ability to learn from data. During training, the network is presented with a set of input-output pairs, and it adjusts the weights of the connections between the neurons based on how well it performs. Over time, the network learns to recognize patterns in the data and make predictions about new, unseen data.There are many different types of neural networks, including feedforward networks, recurrent networks, and convolutional networks, each with their own unique characteristics and applications. However, all neural networks share the same basic principle: they use a series of interconnected nodes to process and analyze data, and they can be trained to perform a wide range of tasks, such as image recognition, natural language processing, and predictive modeling.  [DONE]


📊 Summary:
  Total chunks: 268
  Content length: 1400 characters

📝 Full Content:
Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are made up of interconnected nodes, or "neurons," which are organized into layers. The input data is fed into the first layer, which then passes it on to the next layer, and so on, until the final output is produced.Each neuron in a neural network performs a mathematical operation on the inputs it receives from the previous layer, and then passes its output to the neurons in the next layer. This process is repeated until the final output is generated.The key feature of neural networks is their ability to learn from data. During training, the network is presented with a set of input-output pairs, and it adjusts the weights of the connections between the neurons based on how well it performs. Over time, the network learns to recognize patterns in the data and make predictions about new, unseen data.There are many different types of neural networks, including feedforward networks, recurrent networks, and convolutional networks, each with their own unique characteristics and applications. However, all neural networks share the same basic principle: they use a series of interconnected nodes to process and analyze data, and they can be trained to perform a wide range of tasks, such as image recognition, natural language processing, and predictive modeling.

✅ Test '8️⃣ 💬 📡 Text Only (STREAMING)' PASSED

================================================================================
🎉 ALL TESTS COMPLETED 🎉
================================================================================

Where should the reviewer start?

examples/multimodal/components/processor.py
examples/multimodal/components/worker.py
examples/multimodal/utils/args.py
examples/multimodal/utils/protocol.py
examples/multimodal/utils/chat_processor.py
examples/multimodal/launch/agg.sh
lib/bindings/python/rust/parsers.rs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

copy-pr-bot · 2025-10-07T00:04:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-10-07T00:04:26Z

👋 Hi cchen777! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-10-07T00:06:36Z

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	The pull request description adheres to the repository’s template by providing clearly labeled Overview, Details, Where should the reviewer start, and Related Issues sections, each populated with relevant content and formatted as specified.
Title Check	✅ Passed	The title clearly and concisely summarizes the main enhancement—adding tool‐calling support with a custom chat template to the multimodal vLLM example—and aligns with the PR objectives without extraneous details.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c888bf and 9f2d8a5.

📒 Files selected for processing (8)

deploy/docker-compose.yml (7 hunks)
examples/multimodal/components/processor.py (12 hunks)
examples/multimodal/components/worker.py (3 hunks)
examples/multimodal/launch/agg.sh (3 hunks)
examples/multimodal/utils/args.py (8 hunks)
examples/multimodal/utils/chat_processor.py (2 hunks)
examples/multimodal/utils/protocol.py (3 hunks)
lib/bindings/python/rust/parsers.rs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

examples/multimodal/utils/args.py (3)

lib/bindings/python/rust/parsers.rs (2)

get_reasoning_parser_names (18-20)

get_tool_parser_names (12-14)

lib/bindings/python/src/dynamo/_core.pyi (4)

DistributedRuntime (33-63)

block_size (607-611)

block_size (630-634)

namespace (40-44)

components/src/dynamo/vllm/ports.py (4)

DynamoPortRange (23-37)

PortAllocationRequest (49-64)

PortMetadata (41-45)

allocate_and_reserve_port_block (77-106)

examples/multimodal/components/processor.py (4)

lib/bindings/python/src/dynamo/_core.pyi (12)

ModelInput (826-828)

ModelRuntimeConfig (452-456)

ModelType (830-832)

register_llm (846-860)

Client (144-185)

round_robin (175-179)

get (1275-1276)

endpoint (105-109)

client (132-136)

wait_for_instances (160-167)

block_size (607-611)

block_size (630-634)

lib/bindings/python/rust/lib.rs (6)

register_llm (212-282)

_core (130-193)

round_robin (777-811)

endpoint (627-633)

client (702-716)

wait_for_instances (748-757)

lib/bindings/python/rust/parsers.rs (2)

parse_tool_calls_py (35-58)

tool_calls (47-50)

examples/multimodal/utils/chat_processor.py (1)

ChatProcessor (119-268)

lib/bindings/python/rust/parsers.rs (2)

lib/parsers/src/tool_calling/parsers.rs (2)

detect_and_parse_tool_call (80-104)

get_available_tool_parsers (40-42)

lib/bindings/python/rust/lib.rs (18)

new (379-404)

new (1021-1025)

m (138-138)

m (139-139)

m (140-140)

m (141-141)

m (142-142)

m (143-143)

m (144-144)

m (145-145)

m (146-146)

m (147-147)

m (148-148)

m (149-149)

m (150-150)

m (151-151)

m (152-152)

m (153-153)

examples/multimodal/components/worker.py (6)

examples/multimodal/components/publisher.py (1)

StatLoggerFactory (147-185)

examples/multimodal/utils/args.py (5)

Config (29-50)

base_parse_args (65-175)

configure_ports (202-229)

overwrite_args (232-267)

parse_endpoint (53-62)

examples/multimodal/utils/image_loader.py (1)

ImageLoader (31-107)

examples/multimodal/utils/model.py (1)

construct_mm_data (43-69)

examples/multimodal/utils/protocol.py (2)

MyRequestOutput (165-188)

vLLMMultimodalRequest (155-162)

lib/bindings/python/src/dynamo/nixl_connect/__init__.py (2)

Descriptor (723-972)

begin_read (577-627)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3450/merge) by cchen777.

examples/multimodal/utils/chat_processor.py

[error] 1-1: Black: reformatted file by this hook.

examples/multimodal/utils/args.py

[error] 60-60: Ruff: Undefined name 'sys' (F821).

examples/multimodal/components/processor.py

[error] 1-1: isort: files were modified by this hook.

[error] 1-1: Black: reformatted file by this hook.

lib/bindings/python/rust/parsers.rs

[error] 1-1: Trailing whitespace fixed by pre-commit hook.

examples/multimodal/utils/protocol.py

[error] 1-1: isort: files were modified by this hook.

[error] 1-1: Black: reformatted file by this hook.

examples/multimodal/components/worker.py

[error] 1-1: isort: files were modified by this hook.

🪛 Ruff (0.13.3)

examples/multimodal/utils/args.py

164-167: Avoid specifying long messages outside the exception class

(TRY003)

examples/multimodal/components/processor.py

304-304: Avoid specifying long messages outside the exception class

(TRY003)

395-395: Using .strip() with multi-character strings is misleading

(B005)

427-427: Unpacked variable normal_text is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

461-461: Do not catch blind exception: Exception

(BLE001)

examples/multimodal/utils/protocol.py

26-26: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

33-33: Redefinition of unused MultiModalUUIDDict from line 25

Remove definition: MultiModalUUIDDict

(F811)

33-33: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

examples/multimodal/components/worker.py

284-286: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build and Test - dynamo
GitHub Check: clippy (launch/dynamo-run)

coderabbitai · 2025-10-07T00:12:42Z

examples/multimodal/components/processor.py

+            # Handle both streaming (with "data: " prefix) and non-streaming responses
+            if response.startswith("data: "):
+                response = json.loads(response.lstrip("data: "))
+            else:
+                response = json.loads(response)
+                # Convert non-streaming format (message) to streaming format (delta)
+                if "choices" in response and "message" in response["choices"][0]:
+                    message_content = response["choices"][0]["message"]["content"]
+                    response["choices"][0]["delta"] = {"content": message_content, "role": "assistant"}
+                    del response["choices"][0]["message"]
+                    response["object"] = "chat.completion.chunk"
+
+            # Buffer chunks and accumulate content when tool calling is configured
+            if (
+                self.tool_call_parser
+                and raw_request.tools
+                and "choices" in response
+                and len(response["choices"]) > 0
+            ):
+                choice = response["choices"][0]
+
+                # Buffer this chunk
+                buffered_chunks.append(response)
+
+                # Accumulate delta content
+                if "delta" in choice and choice["delta"].get("content"):
+                    accumulated_content += choice["delta"]["content"]
+
+                # Parse when we hit the end (finish_reason is set)
+                finish_reason = choice.get("finish_reason")
+                if finish_reason == "stop":
+                    if accumulated_content:
+                        logger.info(f"Attempting to parse accumulated tool calls (length={len(accumulated_content)}) with parser: {self.tool_call_parser}")
+                        try:
+                            tool_calls, normal_text = parse_tool_calls_py(accumulated_content, self.tool_call_parser)
+                            logger.info(f"Parse result: {len(tool_calls) if tool_calls else 0} tool calls found")
+
+                            if tool_calls:
+                                # Convert tool calls to OpenAI format
+                                tool_call_chunks = []
+                                for idx, tc in enumerate(tool_calls):
+                                    tool_call_chunks.append({
+                                        "index": idx,
+                                        "id": tc["id"],
+                                        "type": tc["type"],
+                                        "function": {
+                                            "name": tc["function"]["name"],
+                                            "arguments": tc["function"]["arguments"]
+                                        }
+                                    })
+
+                                # Clear content from ALL buffered chunks (per OpenAI spec)
+                                for buffered_chunk in buffered_chunks:
+                                    if "choices" in buffered_chunk and len(buffered_chunk["choices"]) > 0:
+                                        buffered_choice = buffered_chunk["choices"][0]
+                                        if "delta" in buffered_choice:
+                                            buffered_choice["delta"]["content"] = ""
+                                        elif "message" in buffered_choice:
+                                            buffered_choice["message"]["content"] = ""
+
+                                # Add tool_calls to the final chunk
+                                if "delta" in choice:
+                                    choice["delta"]["tool_calls"] = tool_call_chunks
+                                elif "message" in choice:
+                                    choice["message"]["tool_calls"] = tool_call_chunks
+
+                                choice["finish_reason"] = "tool_calls"
+                                logger.info(f"Cleared content from {len(buffered_chunks)} chunks and added {len(tool_calls)} tool call(s) to final chunk")
+                        except Exception as e:
+                            logger.warning(f"Failed to parse tool calls: {e}", exc_info=True)
+                            # Continue with original response if parsing fails
+
+                    # Yield all buffered chunks now that we've processed them
+                    for chunk in buffered_chunks:
+                        yield chunk
+                    buffered_chunks = []
+            else:
+                # No tool calling, yield immediately


⚠️ Potential issue | 🔴 Critical

Re-emit strings (not dicts) and reset buffered state.

Once you call json.loads, the code now yields dicts, but generate() must stream strings ("data: ...\n\n" for SSE or plain JSON). Returning dicts breaks the HTTP stack immediately. Also, accumulated_content is never cleared after flushing, so subsequent responses inherit stale text. Please keep the original string alongside the parsed object, re-serialize before yielding (respecting streaming vs. non-streaming), and reset accumulated_content when you drain buffered_chunks.

🧰 Tools

🪛 Ruff (0.13.3)

395-395: Using .strip() with multi-character strings is misleading

(B005)

427-427: Unpacked variable normal_text is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

461-461: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In examples/multimodal/components/processor.py around lines 393 to 470, the handler currently json.loads incoming responses and yields dicts, and never clears accumulated_content after flushing; fix by preserving the original raw string for each buffered chunk and re-serializing before yielding so callers still receive strings (include the "data: " SSE prefix when the original chunk had it, or plain JSON otherwise), and after you drain buffered_chunks reset accumulated_content = "" (and ensure buffered_chunks = [] as you already do) so subsequent responses don't inherit stale content; keep parsing/annotation logic operating on the parsed dicts but always convert back to the original string format when emitting.

coderabbitai · 2025-10-07T00:12:42Z

examples/multimodal/components/worker.py

 from publisher import StatLoggerFactory
-from utils.args import (
+from examples.pinterest.multimodal.utils.args import (
    Config,
    base_parse_args,
    configure_ports,
    overwrite_args,
    parse_endpoint,
 )
-from utils.image_loader import ImageLoader
-from utils.model import construct_mm_data
-from utils.protocol import MyRequestOutput, vLLMMultimodalRequest
+from examples.multimodal.utils.image_loader import ImageLoader
+from examples.multimodal.utils.model import construct_mm_data
+from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest



⚠️ Potential issue | 🔴 Critical

Fix module path for shared utils.

These imports must come from examples.multimodal, not examples.pinterest. As written, python3 components/worker.py raises ModuleNotFoundError. Please point them back to the multimodal package.

-from examples.pinterest.multimodal.utils.args import ( +from examples.multimodal.utils.args import ( Config, base_parse_args, configure_ports, overwrite_args, parse_endpoint, ) -from examples.multimodal.utils.image_loader import ImageLoader -from examples.multimodal.utils.model import construct_mm_data -from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest +from examples.multimodal.utils.image_loader import ImageLoader +from examples.multimodal.utils.model import construct_mm_data +from examples.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from publisher import StatLoggerFactory

from utils.args import (

from examples.pinterest.multimodal.utils.args import (

Config,

base_parse_args,

configure_ports,

overwrite_args,

parse_endpoint,

)

from utils.image_loader import ImageLoader

from utils.model import construct_mm_data

from utils.protocol import MyRequestOutput, vLLMMultimodalRequest

from examples.multimodal.utils.image_loader import ImageLoader

from examples.multimodal.utils.model import construct_mm_data

from examples.pinterest.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest

from publisher import StatLoggerFactory

from examples.multimodal.utils.args import (

Config,

base_parse_args,

configure_ports,

overwrite_args,

parse_endpoint,

)

from examples.multimodal.utils.image_loader import ImageLoader

from examples.multimodal.utils.model import construct_mm_data

from examples.multimodal.utils.protocol import MyRequestOutput, vLLMMultimodalRequest

🤖 Prompt for AI Agents

In examples/multimodal/components/worker.py around lines 27 to 38, the imports are incorrectly pointing to examples.pinterest.multimodal.utils; update those import paths to use examples.multimodal.utils (e.g., import Config, base_parse_args, configure_ports, overwrite_args, parse_endpoint from examples.multimodal.utils.args) so the modules are resolved when running python3 components/worker.py; ensure all affected import lines are changed consistently to the examples.multimodal package.

coderabbitai · 2025-10-07T00:12:42Z

examples/multimodal/utils/args.py

 import argparse
-import json
 import logging
 import os
 import socket
-import sys
-import time
 from typing import Callable, List, Optional, Tuple

 from vllm.config import KVTransferConfig
 from vllm.distributed.kv_events import KVEventsConfig
 from vllm.engine.arg_utils import AsyncEngineArgs

+from dynamo._core import get_reasoning_parser_names, get_tool_parser_names
 from dynamo.runtime import DistributedRuntime
+from dynamo.vllm.ports import (
+    DynamoPortRange,
+    PortAllocationRequest,
+    PortMetadata,
+    allocate_and_reserve_port_block,
+)


⚠️ Potential issue | 🟠 Major

Import sys before using sys.exit.

parse_endpoint still calls sys.exit, but sys is no longer imported, leading to NameError in error handling paths. Re-add the import.

-import argparse -import logging -import os -import socket +import argparse +import logging +import os +import socket +import sys

🤖 Prompt for AI Agents

In examples/multimodal/utils/args.py around lines 4 to 21, the function parse_endpoint calls sys.exit on error but sys is not imported, causing a NameError in error paths; add import sys to the top-level imports (near the other stdlib imports like os and socket) so sys.exit is available, and ensure linting/order matches project style.

coderabbitai · 2025-10-07T00:12:42Z

examples/multimodal/utils/protocol.py

 from vllm.multimodal.inputs import MultiModalUUIDDict  # noqa: F401
+from vllm.multimodal.inputs import MultiModalDataDict  # noqa: F401
 from vllm.outputs import CompletionOutput
 from vllm.sampling_params import SamplingParams
 from vllm.sequence import PromptLogprobs, RequestMetrics

 import dynamo.nixl_connect as connect

+from vllm.multimodal.inputs import MultiModalUUIDDict  # noqa: F401



⚠️ Potential issue | 🟠 Major

Remove duplicated MultiModalUUIDDict import.

MultiModalUUIDDict is imported twice (Lines 25 & 33), triggering Ruff’s F811/redefinition error and leaving an unused noqa behind. This fails pre-commit. Drop the duplicate by keeping a single import (you can combine it with MultiModalDataDict if needed).

-from vllm.multimodal.inputs import MultiModalUUIDDict # noqa: F401 -from vllm.multimodal.inputs import MultiModalDataDict # noqa: F401 +from vllm.multimodal.inputs import MultiModalDataDict, MultiModalUUIDDict # noqa: F401

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.13.3)

25-25: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

26-26: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

33-33: Redefinition of unused MultiModalUUIDDict from line 25

Remove definition: MultiModalUUIDDict

(F811)

33-33: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents

In examples/multimodal/utils/protocol.py around lines 25 to 34, there is a duplicate import of MultiModalUUIDDict (lines 25 and 33) causing a Ruff F811/redefinition error and an unnecessary noqa; remove the duplicate import by keeping a single import statement (combine MultiModalUUIDDict with MultiModalDataDict on one line if desired) and delete the extra import and its noqa.

…vllm example

cchen777 requested review from a team as code owners October 7, 2025 00:04

pull-request-size bot added the size/XL label Oct 7, 2025

github-actions bot added external-contribution Pull request is from an external contributor feat labels Oct 7, 2025

cchen777 force-pushed the pins/multimodal-tool-calling branch from b69babd to 9f2d8a5 Compare October 7, 2025 00:05

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

feat: suppor tool calling with custom chat template in multimodality …

3d5602f

…vllm example

cchen777 force-pushed the pins/multimodal-tool-calling branch from 9f2d8a5 to 3d5602f Compare October 7, 2025 00:13

add test script

68491b1

pull-request-size bot added size/XXL and removed size/XL labels Oct 7, 2025

cchen777 changed the title ~~feat: suppor tool calling with custom chat template in multimodality vllm example~~ feat: support tool calling with custom chat template in multimodality vllm example Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support tool calling with custom chat template in multimodality vllm example #3450

feat: support tool calling with custom chat template in multimodality vllm example #3450

Uh oh!

cchen777 commented Oct 7, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

coderabbitai bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 7, 2025

Uh oh!

coderabbitai bot Oct 7, 2025

Uh oh!

coderabbitai bot Oct 7, 2025

Uh oh!

coderabbitai bot Oct 7, 2025

Uh oh!

Uh oh!

feat: support tool calling with custom chat template in multimodality vllm example #3450

Are you sure you want to change the base?

feat: support tool calling with custom chat template in multimodality vllm example #3450

Uh oh!

Conversation

cchen777 commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

copy-pr-bot bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

coderabbitai bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cchen777 commented Oct 7, 2025 •

edited

Loading

coderabbitai bot commented Oct 7, 2025 •

edited

Loading