Add multimodal attachments to chat

## Summary

Make chat multimodal so users can attach images, use the camera, and upload files directly in conversations.

## Problem

Chat is currently too text-centric for many real user workflows. Users need to send screenshots, photos, camera captures, PDFs, documents, and other files so the agent can inspect, summarize, answer questions, and take actions with that context.

Without multimodal chat, users have to describe visual/file context manually or leave OpenHuman to process documents elsewhere.

## Solution (optional)

Add multimodal input support to chat:
- Image upload from disk.
- Camera capture where supported.
- File upload for common document formats.
- Attachment previews in the composer and message history.
- Backend/core handling for file metadata, storage, memory ingestion, and model/tool routing.
- Clear errors for unsupported file types, oversized files, or unavailable vision/file models.

Start with desktop chat, then make the implementation reusable for mobile/web if applicable.

## Acceptance criteria

- [ ] **Image upload** — Users can attach images to chat and the agent can reason over them when a vision-capable model/tool is available.
- [ ] **Camera capture** — Users can capture an image from camera/webcam where supported and send it into chat.
- [ ] **File upload** — Users can upload common files such as PDFs, text files, docs, CSVs, and images.
- [ ] **Attachment preview** — Composer and message history show clear attachment previews, file names, sizes, and upload state.
- [ ] **Model/tool routing** — Attachments are routed to the correct vision, document, memory, or file-processing path.
- [ ] **Memory integration** — Uploaded files can be stored/ingested into memory when appropriate, with user-visible status.
- [ ] **Failure states** — Unsupported formats, oversized files, upload errors, and missing model capability are shown clearly.
- [ ] **Privacy controls** — Users understand whether files are local-only, sent to cloud models, or stored in memory.
- [ ] **Regression safety** — Unit/E2E coverage verifies image upload, file upload, camera capture fallback, and error states.
- [ ] **Diff coverage ≥ 80%** — the implementing PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by [`.github/workflows/coverage.yml`](../../.github/workflows/coverage.yml)) when code changes are involved.

## Related

- Prior user request: support multimodal chat with images, camera, and files.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multimodal attachments to chat #2662

Summary

Problem

Solution (optional)

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add multimodal attachments to chat #2662

Description

Summary

Problem

Solution (optional)

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions