Skip to content

feat: preserve multimodal MCP tool results#689

Merged
nabinchha merged 4 commits into
mainfrom
codex/issue-607-mcp-multimodal
May 20, 2026
Merged

feat: preserve multimodal MCP tool results#689
nabinchha merged 4 commits into
mainfrom
codex/issue-607-mcp-multimodal

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

📋 Summary

Preserves multimodal MCP tool results through the generation loop so image outputs from MCP tools can be passed back to VLM-capable providers instead of being flattened into text. This keeps MCP result handling generic while letting provider adapters lower canonical content blocks at the API boundary.

🔗 Related Issue

Closes #607

🔄 Changes

  • Widen MCP tool result and tool-message content to allow ordered content block lists.
  • Replace string-only MCP result serialization with coercion that preserves text blocks and converts MCP image/base64 payloads into canonical image_url data URI blocks.
  • Add facade and provider adapter coverage for multimodal tool results, including OpenAI-compatible passthrough and Anthropic translation behavior.
  • Document that MCP image results require VLM-capable provider support in the MCP architecture notes.

🧪 Testing

  • Focused suite passes: PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev pytest packages/data-designer-engine/tests/engine/mcp/test_mcp_io.py packages/data-designer-engine/tests/engine/mcp/test_mcp_facade.py packages/data-designer-engine/tests/engine/models/test_facade.py packages/data-designer-engine/tests/engine/models/test_model_utils.py packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py -q (240 passed)
  • Unit tests added/updated
  • E2E tests added/updated (N/A - MCP/provider adapter unit coverage only)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@nabinchha nabinchha requested a review from a team as a code owner May 20, 2026 17:13
@github-actions
Copy link
Copy Markdown
Contributor

Review: PR #689 — feat: preserve multimodal MCP tool results

Summary

Widens MCP tool result handling so that image content from MCP tools survives the generation loop instead of being flattened to text. The core change is in packages/data-designer-engine/src/data_designer/engine/mcp/io.py: _serialize_tool_result_content (string-only) is replaced by _coerce_tool_result_content (str | list[dict[str, Any]]). MCP image content and explicit base64 payloads are normalized into canonical OpenAI-style image_url data-URI blocks; text-only paths still collapse to a single string for backwards compatibility. MCPToolResult.content and ChatMessage.as_tool are widened to accept the multimodal shape, and the existing OpenAI-compatible / Anthropic adapters take over translation at the API boundary. Architecture docs (architecture/mcp.md) note the VLM-provider requirement. Tests cover the coercion matrix, facade pass-through, OpenAI passthrough, Anthropic translation of image_url data URIs, and the model-utils widening.

Findings

Correctness

  • Backwards-compatible default path is preserved. Pure-text MCP results still return str (no change to MCPToolResult.content for the common case), so existing string consumers and serializers don't see new shapes. facade.py:288 is the only consumer that passes result.content straight into ChatMessage.as_tool, which is now widened — no leftover string-concatenation sites that would silently break on a list.
  • Image-URL pre-canonical blocks: in _coerce_image_url_block (io.py:528-538), when block["image_url"] is not a dict, the block is returned unchanged. That preserves whatever malformed shape arrived rather than failing at the MCP boundary. It's likely fine since the provider adapter will reject it loudly downstream, but it's inconsistent with the rest of the function which raises MCPToolError on bad payloads. Worth either validating here or leaving a comment that this is an intentional pass-through.
  • _has_base64_image_payload can over-match. A non-image dict that happens to contain both data and one of mimeType|mime_type|media_type (e.g., a future MCP resource content type with a non-image MIME type) would be classified as an image and fed into _build_image_url_block, which forces data:<mime>;base64,<data> regardless of whether <mime> is actually an image type. Today's MCP content types make this unlikely, but a cheap defense is to require mime_type.startswith("image/") (matching the existing guard in _extract_mime_type_from_data_uri). Same reasoning would tighten b64_json detection too.
  • MIME precedence inside _coerce_image_mime_type (io.py:551-562) prefers an explicit mime_type argument over the data-URI's declared MIME. If callers pass both and they disagree, the explicit one wins silently and the data URI gets rebuilt with a different MIME than originally encoded. Not a bug per se, but a brief comment ("explicit mime_type wins by design") would prevent confused future readers.
  • Behavior change: malformed images now raise. _serialize_tool_result_content previously degraded gracefully by stringifying anything; _coerce_tool_result_content raises MCPToolError for missing/invalid image data. This is captured by test_coerce_content_image_without_mime_type_fails_clearly. It is the right call — silent text-degradation of broken images is worse than failing — but the change in failure mode is worth calling out in the PR description / release notes since it affects users running with MCP servers that return non-conformant image content.
  • Recursion depth. _coerce_tool_result_content_item does not recurse into nested lists; a list element that is itself a list falls through to _build_text_block(str(...)). This matches the pre-PR behavior and the actual MCP content schema (flat lists), so no concern.
  • Type widening on MCPToolResult.content. Public-ish dataclass; consumers outside this repo (plugins) that pattern-match on isinstance(result.content, str) will continue to work for text but won't see image content. Acceptable for a feature addition. The PR description correctly flags VLM-provider support as a prerequisite.

Style / conventions

  • Module-level pre-compiled regex _DATA_URI_MIME_TYPE_RE and SPDX/from __future__ import annotations already in place — fits project conventions.
  • Helper functions are appropriately small, type-annotated, and use modern syntax. Keeps with STYLEGUIDE.md.
  • Architecture doc update in architecture/mcp.md is concise and correctly placed under MCPFacade.
  • No relative imports introduced; lazy heavy imports unchanged.
  • One nit: _get_content_field_from_dump tries by_alias=True then {}. Pydantic v2's model_dump accepts by_alias natively, so the TypeError retry is dead code for pydantic v2 callers. It's harmless and lets non-pydantic dump methods slip through, so leaving it is fine — but the comment-less double-try is a tiny readability cost.

Tests

  • Coverage is good: text/string/dict/list, image dict (type=image), bare b64_json, media_type-named field, raw data URI strip, canonical image_url passthrough, raw-base64 image_url normalization, bare object via getattr, mixed text+image order preservation, and real mcp.types.TextContent / ImageContent round-trip.
  • test_coerce_content_list_with_none_preserves_existing_string_fallback documents the [None] -> "None" quirk explicitly — good regression anchor.
  • Facade-level test test_generate_preserves_multimodal_mcp_tool_results_between_turns exercises the loop end-to-end via stubs, which is the right level for this change.
  • Anthropic-translation test covers the new mixed-blocks-with-data-uri parametrization through the existing image_url -> {type:image, source:base64} translator. Good integration with existing translator.
  • One gap: no test asserts that is_error=True results with multimodal content are still propagated. Probably orthogonal but worth a one-line param.

Performance

  • For a text-only result, the new path adds one extra dict-allocation per content item (building a {"type":"text","text":...} block before joining and discarding). That's negligible for typical MCP payloads (tens of items at most), but worth noting if this becomes hot. The previous path appended directly to a string list. Not actionable.
  • Image format auto-detection (detect_image_format) is only invoked when MIME type is missing AND not derivable from the data URI, so cost is bounded to genuinely-malformed inputs.

Security

  • Base64 decoding uses decode_base64_image with validate=True (per image_helpers.py), which rejects malformed input cleanly.
  • Output is a data: URI, so no untrusted URL fetching is introduced. Matches the existing model that uses image_url blocks elsewhere.
  • No secret-leak risk — base64 image bytes come from MCP server responses, not env/credentials.
  • Worth a follow-up consideration (out of scope here): a bound on image size before propagating to a provider, since malicious/buggy MCP tools could return huge base64 blobs that blow up provider context.

Verdict

Looks good to merge. The design (canonical image_url blocks at MCP boundary, provider adapters lower at the API boundary) is consistent with the project's "errors normalize at boundaries" principle and reuses existing translation paths cleanly. Test coverage is solid for the coercion matrix and integrates with both adapter dialects.

Suggested before merge:

  1. Either validate or comment the silent passthrough in _coerce_image_url_block when image_url is not a dict.
  2. Consider tightening _has_base64_image_payload to require image/ MIME prefix (defends against future MCP content types containing data+mimeType).
  3. Note in the PR body / release notes that malformed image content now raises MCPToolError instead of being stringified — visible behavior change for MCP servers returning non-conformant payloads.

None of these are blocking.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR replaces the string-only MCP tool result serialization with a coercion pipeline that preserves image payloads as canonical image_url data URI blocks, allowing VLM-capable provider adapters to handle them at the API boundary rather than flattening them to text.

  • _coerce_tool_result_content (and its item-level helper) covers dicts, bare Pydantic objects, mixed-type lists, raw base64 URLs, existing data URIs, and every fallback path, with validation side effects that raise MCPToolError early on malformed inputs.
  • MCPToolResult.content and ChatMessage.as_tool are widened from str to str | list[dict[str, Any]]; OpenAI-compatible clients pass the list through unchanged while the Anthropic adapter translates image_url data URI blocks to its native image/base64 source format.
  • 240 unit tests cover the full coercion matrix including ordering preservation, MIME detection fallback, data URI stripping, and rejection of invalid shapes.

Confidence Score: 5/5

Safe to merge — the change is additive, all existing text-only paths are preserved, and the new image coercion paths are fully validated before any block reaches a provider adapter.

The coercion logic is well-isolated in private helpers, each branch is covered by a focused test, and the type widening is backward-compatible. No existing callers are broken: text-only results still return str, and the new list path is only taken when image blocks are present.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/mcp/io.py Replaces string-only serialization with a multi-branch coercion pipeline that preserves MCP image payloads as canonical image_url data URI blocks; all edge cases (dict, bare object, Pydantic model, raw base64 URL) are covered by the new helper functions and well-tested.
packages/data-designer-engine/src/data_designer/engine/mcp/registry.py Widens MCPToolResult.content from str to `str
packages/data-designer-engine/src/data_designer/engine/models/utils.py Widens ChatMessage.as_tool signature to accept `str
packages/data-designer-engine/tests/engine/mcp/test_mcp_io.py Comprehensive coercion test suite covering text-only, image dict/object/Pydantic, mixed ordering, canonical passthrough, raw base64 normalization, data URI stripping, and rejection of malformed image blocks.
packages/data-designer-engine/tests/engine/mcp/test_mcp_facade.py Adds an end-to-end facade test verifying multimodal tool result content is preserved verbatim through process_completion_response.
packages/data-designer-engine/tests/engine/models/test_facade.py Adds generation-loop integration test confirming multimodal tool-result content survives the full turn boundary and arrives unchanged in the follow-up completion call.
packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py Adds a mixed-blocks-with-data-uri parametrize case ensuring the Anthropic adapter translates image_url data URI tool-result blocks to Anthropic's native image/base64 source format.
packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py Adds a test verifying that OpenAI-compatible clients forward canonical multimodal tool-result content unchanged to the API payload.
packages/data-designer-engine/tests/engine/models/test_model_utils.py Adds a unit test confirming ChatMessage.as_tool round-trips multimodal content through both the object field and to_dict().

Sequence Diagram

sequenceDiagram
    participant MCPServer as MCP Server
    participant IOService as MCPIOService
    participant Coerce as _coerce_tool_result_content
    participant MCPResult as MCPToolResult
    participant Facade as MCPFacade
    participant Model as ModelFacade
    participant Adapter as Provider Adapter<br/>(OpenAI / Anthropic)

    MCPServer-->>IOService: call_tool → raw result
    IOService->>Coerce: coerce content
    alt text only
        Coerce-->>IOService: str
    else image or mixed
        Coerce-->>IOService: list[image_url / text blocks]
    end
    IOService-->>MCPResult: "MCPToolResult(content: str | list)"
    MCPResult-->>Facade: process_completion_response
    Facade-->>Model: "ChatMessage(role=tool, content=str|list)"
    Model->>Adapter: completion(messages)
    Note over Adapter: OpenAI: passthrough list unchanged<br/>Anthropic: translate image_url → image/base64
    Adapter-->>Model: ChatCompletionResponse
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into codex/issue-607..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this one, @nabinchha — the multimodal result path is much clearer after this change.

Summary

This PR preserves MCP image/tool-result content as ordered multimodal blocks through MCP execution and into provider adapters. The implementation matches the PR intent overall: MCP image outputs become canonical image_url data URI blocks, OpenAI-compatible adapters pass them through, and Anthropic translation gets provider-specific coverage.

Findings

I left one inline warning on mcp/io.py around the base64/MIME heuristic.

What Looks Good

The ordered text/image preservation is nicely covered from low-level MCP coercion through the ModelFacade tool loop. The Anthropic and OpenAI-compatible adapter tests pin the provider-boundary behavior well. The architecture note is also updated in the right place, which helps keep the MCP subsystem map honest.

Verdict

Needs changes — I’d address the non-image MIME coercion before merge so generic tool/resource payloads don’t get misrouted as image content.


This review was generated by an AI assistant.

return isinstance(item, dict) and item.get("type") == "image_url"


def _has_base64_image_payload(item: Any) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we tighten this image detection a bit? Right now _has_base64_image_payload() treats any item with data plus mimeType/mime_type/media_type as an image payload, and _coerce_image_mime_type() accepts any provided MIME string unchanged. That means a generic structured MCP/tool result such as {"type": "resource", "data": "...", "mimeType": "application/json"} or {"base64": "...", "media_type": "application/pdf"} can be turned into an image_url block with a non-image data URI.

Could we gate the generic base64/data detection on image/*, and reserve the clear MCPToolError path for explicit type == "image" content with a non-image MIME? Non-image structured payloads could then keep the existing JSON/text fallback instead of being sent to provider adapters as invalid image content.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 362f1394. Generic data/base64 payload detection now only promotes content to an image block when the MIME metadata or data URI is image/*; non-image resource payloads such as application/json/application/pdf fall back to JSON/text. Explicit type == "image" content now validates provided MIME types and raises MCPToolError for non-image MIME values. Added regression coverage for both fallback cases and the explicit-image error path. Verified with the MCP coercion tests (56 passed) and the focused PR suite (243 passed).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for tightening this up. Small non-blocking note: this now means bare b64_json/base64 payloads without MIME metadata fall back to JSON/text instead of image auto-detection. That seems reasonable if MIME/data URI is the contract; if you want the shorthand to keep working, magic-byte detection could be added back for that case. Your call, not blocking from my side.

@nabinchha nabinchha requested a review from johnnygreco May 20, 2026 17:40
johnnygreco
johnnygreco previously approved these changes May 20, 2026
Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nabin addressed the MIME-detection issue with focused tests, and I left one non-blocking note in the thread about the bare b64_json/base64 behavior being his call.

Copy link
Copy Markdown
Contributor

@eric-tramel eric-tramel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed review against #607 / PRD request:

Requested changes

  • P2: packages/data-designer-engine/src/data_designer/engine/mcp/io.py accepts explicit image MIME metadata before validating that data is actually base64. Example: {"type": "image", "data": "not-base64!!!", "mimeType": "image/png"} becomes data:image/png;base64,not-base64!!!. Since #607 makes MCP image data a base64 payload and asks for clear provider-boundary behavior, please validate after stripping any data URI and raise MCPToolError before building the image_url block.
  • P3: packages/data-designer-engine/src/data_designer/engine/mcp/io.py lets malformed image_url blocks pass through unchanged when image_url is not a dict or url is not a string. That weakens the canonical internal block contract from #607. Either normalize common shorthand into {"image_url": {"url": ...}} or reject it as MCPToolError at MCP coercion time.
  • P3: architecture/mcp.md says “explicit base64 image payloads” are preserved, but the implementation only promotes generic base64 / b64_json payloads when image MIME metadata or an image data URI is present. That behavior is reasonable, but the architecture note should say so precisely.

Spec alignment

The core #607 request is otherwise implemented correctly: MCPToolResult.content and ChatMessage.as_tool() are widened, MCP ImageContent becomes ordered image_url data URI blocks, mixed text/image order is preserved, MCPFacade stays generic, OpenAI-compatible payloads pass through, and Anthropic lowers blocks at the adapter edge.

Verification

Focused PR suite passed locally: 243 passed. Ruff on changed files and git diff --check also passed.

@nabinchha
Copy link
Copy Markdown
Contributor Author

Addressed the follow-up review in df2a7b3a:

  • Explicit MCP image content now validates the stripped base64 payload before constructing an image_url data URI, so malformed values like not-base64!!! raise MCPToolError.
  • MCP image_url shorthand strings are normalized to canonical { "image_url": { "url": ... } } blocks, while malformed image_url blocks now raise MCPToolError during MCP coercion.
  • The MCP architecture note now specifies that generic base64 payloads are only promoted when they carry image/* MIME metadata or an image data URI prefix.

Verified:

  • packages/data-designer-engine/tests/engine/mcp/test_mcp_io.py: 63 passed
  • Focused PR suite: 250 passed

@nabinchha
Copy link
Copy Markdown
Contributor Author

Committed and pushed the follow-up fixes in df2a7b3a.

Summary of what was addressed from review 4331448240:

  • Explicit MCP image content now validates stripped base64 before constructing image_url data URI blocks.
  • image_url string shorthand is normalized to canonical block shape, while malformed image_url blocks raise MCPToolError during MCP coercion.
  • architecture/mcp.md now states that generic base64 payloads are promoted only with image/* MIME metadata or image data URI prefixes.

Verified locally:

  • packages/data-designer-engine/tests/engine/mcp/test_mcp_io.py: 63 passed
  • Focused PR suite: 250 passed

Re-requesting review now.

@nabinchha nabinchha merged commit a83968f into main May 20, 2026
50 checks passed
@nabinchha nabinchha deleted the codex/issue-607-mcp-multimodal branch May 20, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve multimodal MCP tool-call results through model provider adapters

3 participants