Skip to content

Add multimodal embeddings support#351

Open
virgildotcodes wants to merge 1 commit intolaravel:0.xfrom
virgildotcodes:gemini-multimodal-embeddings
Open

Add multimodal embeddings support#351
virgildotcodes wants to merge 1 commit intolaravel:0.xfrom
virgildotcodes:gemini-multimodal-embeddings

Conversation

@virgildotcodes
Copy link
Copy Markdown

@virgildotcodes virgildotcodes commented Apr 5, 2026

Closes #308

Summary

Laravel AI currently only supports text inputs for embeddings. Prism added multimodal embeddings support, but the Laravel SDK did not expose it cleanly and had a few provider-specific gaps.

This PR adds multimodal embeddings input support for embeddings, including:

  • images
  • audio
  • documents
  • video

It also validates unsupported provider / model combinations early, preserves the original media source when mapping Prism embeddings inputs, and avoids fetching remote media when generating embeddings cache keys.

Examples

Gemini multimodal embeddings

use Laravel\Ai\Embeddings;
use Laravel\Ai\Files\Image;

$response = Embeddings::for([
    Image::fromPath('/path/to/image.png'),
])->generate(provider: 'gemini', model: 'gemini-embedding-2-preview');

Voyage AI image embeddings

use Laravel\Ai\Embeddings;
use Laravel\Ai\Files\Image;

$response = Embeddings::for([
    Image::fromUrl('https://example.com/image.png'),
])->dimensions(1024)->generate(
    provider: 'voyageai',
    model: 'voyage-multimodal-3',
);

Changes

  • widen embeddings inputs to accept text, images, audio, documents, and video
  • add explicit validation for unsupported provider / input combinations
  • resolve Gemini provider-backed files to file URIs before sending them to Prism
  • preserve remote, local, stored, and base64 media sources when converting embeddings inputs
  • avoid remote fetches when building cache keys for remote embeddings inputs

Notes

  • Gemini multimodal embeddings require gemini-embedding-2-preview
  • Voyage AI currently supports text and image embeddings inputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support multimodal embeddings (like gemini-embeddings-002)

1 participant