[codex] add multimodal support across providers and unify media retries by clifton · Pull Request #37 · clifton/rstructor

clifton · 2026-02-13T00:31:30Z

Summary

This PR adds first-class multimodal structured extraction support for all major providers in rstructor (OpenAI, Anthropic, Grok), and fixes Gemini media-path retry parity.

Problem

Before this change, materialize_with_media only worked in the Gemini backend. For OpenAI, Anthropic, and Grok, media inputs were accepted by the public API but effectively ignored because provider request serialization only emitted text content.

Additionally, Gemini's media path bypassed the existing retry-with-conversation-history flow used by text materialization, which reduced recovery reliability after validation failures.

User Impact

Users can now call materialize_with_media(...) consistently across OpenAI, Anthropic, Grok, and Gemini.
Structured multimodal extraction now works with inline bytes and URI/URL media forms (provider-specific mappings).
Retry behavior is now consistent for Gemini media calls as well, improving robustness for schema-constrained outputs.

Root Cause

Provider request models for OpenAI/Grok/Anthropic were implemented as content: String only.
Shared retry utility only accepted prompt text and always built initial messages as ChatMessage::user(prompt).

Fix

Added a new shared retry utility entry point that accepts initial message history:
- generate_with_retry_with_initial_messages(...)
- Existing generate_with_retry_with_history(...) now delegates to it.
Implemented provider-native multimodal content serialization:
- OpenAI/Grok: mixed content parts with type: text and type: image_url.
- Anthropic: content blocks with type: text and type: image + source (base64 or url).
Updated each provider backend to override materialize_with_media(...) and invoke retry with initial media-bearing messages.
Updated Gemini materialize_with_media(...) to use retry/history path (parity fix).
Added shared media_to_url(...) helper for normalized data-URL/URI handling in compatible providers.
Updated model enums to include latest discovered IDs used in current docs/API lists:
- OpenAI: gpt-5.2-chat-latest, gpt-5.2-codex
- Gemini: gemini-2.5-flash-image, gemini-2.0-flash-lite-001
Expanded docs/examples for multimodal usage across providers.

Validation

cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-features --no-run
cargo test --all-features --test openai_multimodal_tests -- --nocapture
cargo test --all-features --test anthropic_multimodal_tests -- --nocapture
cargo test --all-features --test grok_multimodal_tests -- --nocapture
cargo test --all-features --test gemini_multimodal_tests -- --nocapture
cargo test --all-features --test model_string_test
cargo test --all-features test_generate_with_retry_with_initial_messages -- --nocapture

All checks passed locally.

clifton added 4 commits February 12, 2026 18:31

add multimodal support across providers and unify media retries

687dc50

refactor media serialization into shared backend module

7fa8287

refactor media materialization retry flow into shared helper

a09686a

merge origin/main and resolve gemini multimodal test conflict

01bf371

clifton marked this pull request as ready for review February 13, 2026 15:11

refactor: unify OpenAI-compatible request shaping

d4a074f

clifton merged commit 4ea98ce into main Feb 13, 2026
8 checks passed

clifton deleted the universal-multi-modal branch February 13, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] add multimodal support across providers and unify media retries#37

[codex] add multimodal support across providers and unify media retries#37
clifton merged 5 commits intomainfrom
universal-multi-modal

clifton commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

clifton commented Feb 13, 2026

Summary

Problem

User Impact

Root Cause

Fix

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant