Skip to content

Multimodal (Vision) support #175

@liwii

Description

@liwii

Now some OpenAI LLMs support image inputs, so it'd be great if we could support evaluation with image inputs.

Ref: https://platform.openai.com/docs/guides/vision

The goal is to update EvalClient interface and allow metrics with image inputs like the following:

prompts = ["What is in the image?", "What is in the image?",...]
generated_outputs = ["A green parrot flying away from...", "A huge robot woking for...", ...]
images = [
  # Link to the image or base64 encoded image
  "https://...",
  "data:image/jpeg;base64,b656...",
  ...
]

results = langcheck.metrics.answer_relevance_with_input_image(
  generated_outputs=generated_outputs,
  prompts=prompts,
  image_urls=image_urls,
  eval_model=eval_client
)

Probably there are more things to discuss before actually shipping this feature, but we can prototype and run some metrics first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions