feat: preserve multimodal MCP tool results#689
Conversation
Signed-off-by: Nabin Mulepati <[email protected]>
Review: PR #689 — feat: preserve multimodal MCP tool resultsSummaryWidens MCP tool result handling so that image content from MCP tools survives the generation loop instead of being flattened to text. The core change is in FindingsCorrectness
Style / conventions
Tests
Performance
Security
VerdictLooks good to merge. The design (canonical Suggested before merge:
None of these are blocking. |
Greptile SummaryThis PR replaces the string-only MCP tool result serialization with a coercion pipeline that preserves image payloads as canonical
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/mcp/io.py | Replaces string-only serialization with a multi-branch coercion pipeline that preserves MCP image payloads as canonical image_url data URI blocks; all edge cases (dict, bare object, Pydantic model, raw base64 URL) are covered by the new helper functions and well-tested. |
| packages/data-designer-engine/src/data_designer/engine/mcp/registry.py | Widens MCPToolResult.content from str to `str |
| packages/data-designer-engine/src/data_designer/engine/models/utils.py | Widens ChatMessage.as_tool signature to accept `str |
| packages/data-designer-engine/tests/engine/mcp/test_mcp_io.py | Comprehensive coercion test suite covering text-only, image dict/object/Pydantic, mixed ordering, canonical passthrough, raw base64 normalization, data URI stripping, and rejection of malformed image blocks. |
| packages/data-designer-engine/tests/engine/mcp/test_mcp_facade.py | Adds an end-to-end facade test verifying multimodal tool result content is preserved verbatim through process_completion_response. |
| packages/data-designer-engine/tests/engine/models/test_facade.py | Adds generation-loop integration test confirming multimodal tool-result content survives the full turn boundary and arrives unchanged in the follow-up completion call. |
| packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py | Adds a mixed-blocks-with-data-uri parametrize case ensuring the Anthropic adapter translates image_url data URI tool-result blocks to Anthropic's native image/base64 source format. |
| packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py | Adds a test verifying that OpenAI-compatible clients forward canonical multimodal tool-result content unchanged to the API payload. |
| packages/data-designer-engine/tests/engine/models/test_model_utils.py | Adds a unit test confirming ChatMessage.as_tool round-trips multimodal content through both the object field and to_dict(). |
Sequence Diagram
sequenceDiagram
participant MCPServer as MCP Server
participant IOService as MCPIOService
participant Coerce as _coerce_tool_result_content
participant MCPResult as MCPToolResult
participant Facade as MCPFacade
participant Model as ModelFacade
participant Adapter as Provider Adapter<br/>(OpenAI / Anthropic)
MCPServer-->>IOService: call_tool → raw result
IOService->>Coerce: coerce content
alt text only
Coerce-->>IOService: str
else image or mixed
Coerce-->>IOService: list[image_url / text blocks]
end
IOService-->>MCPResult: "MCPToolResult(content: str | list)"
MCPResult-->>Facade: process_completion_response
Facade-->>Model: "ChatMessage(role=tool, content=str|list)"
Model->>Adapter: completion(messages)
Note over Adapter: OpenAI: passthrough list unchanged<br/>Anthropic: translate image_url → image/base64
Adapter-->>Model: ChatCompletionResponse
Reviews (4): Last reviewed commit: "Merge branch 'main' into codex/issue-607..." | Re-trigger Greptile
johnnygreco
left a comment
There was a problem hiding this comment.
Nice work on this one, @nabinchha — the multimodal result path is much clearer after this change.
Summary
This PR preserves MCP image/tool-result content as ordered multimodal blocks through MCP execution and into provider adapters. The implementation matches the PR intent overall: MCP image outputs become canonical image_url data URI blocks, OpenAI-compatible adapters pass them through, and Anthropic translation gets provider-specific coverage.
Findings
I left one inline warning on mcp/io.py around the base64/MIME heuristic.
What Looks Good
The ordered text/image preservation is nicely covered from low-level MCP coercion through the ModelFacade tool loop. The Anthropic and OpenAI-compatible adapter tests pin the provider-boundary behavior well. The architecture note is also updated in the right place, which helps keep the MCP subsystem map honest.
Verdict
Needs changes — I’d address the non-image MIME coercion before merge so generic tool/resource payloads don’t get misrouted as image content.
This review was generated by an AI assistant.
| return isinstance(item, dict) and item.get("type") == "image_url" | ||
|
|
||
|
|
||
| def _has_base64_image_payload(item: Any) -> bool: |
There was a problem hiding this comment.
Could we tighten this image detection a bit? Right now _has_base64_image_payload() treats any item with data plus mimeType/mime_type/media_type as an image payload, and _coerce_image_mime_type() accepts any provided MIME string unchanged. That means a generic structured MCP/tool result such as {"type": "resource", "data": "...", "mimeType": "application/json"} or {"base64": "...", "media_type": "application/pdf"} can be turned into an image_url block with a non-image data URI.
Could we gate the generic base64/data detection on image/*, and reserve the clear MCPToolError path for explicit type == "image" content with a non-image MIME? Non-image structured payloads could then keep the existing JSON/text fallback instead of being sent to provider adapters as invalid image content.
There was a problem hiding this comment.
Addressed in 362f1394. Generic data/base64 payload detection now only promotes content to an image block when the MIME metadata or data URI is image/*; non-image resource payloads such as application/json/application/pdf fall back to JSON/text. Explicit type == "image" content now validates provided MIME types and raises MCPToolError for non-image MIME values. Added regression coverage for both fallback cases and the explicit-image error path. Verified with the MCP coercion tests (56 passed) and the focused PR suite (243 passed).
There was a problem hiding this comment.
Looks good to me, thanks for tightening this up. Small non-blocking note: this now means bare b64_json/base64 payloads without MIME metadata fall back to JSON/text instead of image auto-detection. That seems reasonable if MIME/data URI is the contract; if you want the shorthand to keep working, magic-byte detection could be added back for that case. Your call, not blocking from my side.
Signed-off-by: Nabin Mulepati <[email protected]>
johnnygreco
left a comment
There was a problem hiding this comment.
Looks good to me. Nabin addressed the MIME-detection issue with focused tests, and I left one non-blocking note in the thread about the bare b64_json/base64 behavior being his call.
eric-tramel
left a comment
There was a problem hiding this comment.
Detailed review against #607 / PRD request:
Requested changes
- P2:
packages/data-designer-engine/src/data_designer/engine/mcp/io.pyaccepts explicit image MIME metadata before validating thatdatais actually base64. Example:{"type": "image", "data": "not-base64!!!", "mimeType": "image/png"}becomesdata:image/png;base64,not-base64!!!. Since #607 makes MCP imagedataa base64 payload and asks for clear provider-boundary behavior, please validate after stripping any data URI and raiseMCPToolErrorbefore building theimage_urlblock. - P3:
packages/data-designer-engine/src/data_designer/engine/mcp/io.pylets malformedimage_urlblocks pass through unchanged whenimage_urlis not a dict orurlis not a string. That weakens the canonical internal block contract from #607. Either normalize common shorthand into{"image_url": {"url": ...}}or reject it asMCPToolErrorat MCP coercion time. - P3:
architecture/mcp.mdsays “explicit base64 image payloads” are preserved, but the implementation only promotes genericbase64/b64_jsonpayloads when image MIME metadata or an image data URI is present. That behavior is reasonable, but the architecture note should say so precisely.
Spec alignment
The core #607 request is otherwise implemented correctly: MCPToolResult.content and ChatMessage.as_tool() are widened, MCP ImageContent becomes ordered image_url data URI blocks, mixed text/image order is preserved, MCPFacade stays generic, OpenAI-compatible payloads pass through, and Anthropic lowers blocks at the adapter edge.
Verification
Focused PR suite passed locally: 243 passed. Ruff on changed files and git diff --check also passed.
Signed-off-by: Nabin Mulepati <[email protected]>
|
Addressed the follow-up review in
Verified:
|
|
Committed and pushed the follow-up fixes in Summary of what was addressed from review
Verified locally:
Re-requesting review now. |
📋 Summary
Preserves multimodal MCP tool results through the generation loop so image outputs from MCP tools can be passed back to VLM-capable providers instead of being flattened into text. This keeps MCP result handling generic while letting provider adapters lower canonical content blocks at the API boundary.
🔗 Related Issue
Closes #607
🔄 Changes
image_urldata URI blocks.🧪 Testing
PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev pytest packages/data-designer-engine/tests/engine/mcp/test_mcp_io.py packages/data-designer-engine/tests/engine/mcp/test_mcp_facade.py packages/data-designer-engine/tests/engine/models/test_facade.py packages/data-designer-engine/tests/engine/models/test_model_utils.py packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py -q(240 passed)✅ Checklist