Multimodal (Vision) support

Now some OpenAI LLMs support image inputs, so it'd be great if we could support evaluation with image inputs.

Ref: https://platform.openai.com/docs/guides/vision


The goal is to update `EvalClient` interface and allow metrics with image inputs like the following:

```python
prompts = ["What is in the image?", "What is in the image?",...]
generated_outputs = ["A green parrot flying away from...", "A huge robot woking for...", ...]
images = [
  # Link to the image or base64 encoded image
  "https://...",
  "data:image/jpeg;base64,b656...",
  ...
]

results = langcheck.metrics.answer_relevance_with_input_image(
  generated_outputs=generated_outputs,
  prompts=prompts,
  image_urls=image_urls,
  eval_model=eval_client
)
```

Probably there are more things to discuss before actually shipping this feature, but we can prototype and run some metrics first. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal (Vision) support #175

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multimodal (Vision) support #175

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions