Skip to content

Hilderin/llm-openai-vision-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

llm-openai-vision-mcp

Local MCP server for analyzing images using the OpenAI vision API. Designed for OpenCode agents to inspect screenshots, diagrams, UI renders, and other visual assets during development.

What it does

The server exposes an MCP tool so agents can submit an image file path along with acceptance criteria and get back a detailed vision-based analysis — all locally, no external services beyond the OpenAI API.

How it works

Image file on disk → base64 encode → OpenAI Chat Completions (vision model)
                                         ↓
Agent query + acceptance criteria → structured prompt → PASS/FAIL verdict per criterion

The image is read from disk, base64-encoded, and sent to OpenAI's vision model (default gpt-4o). The prompt includes acceptance criteria, an optional expected outcome, and optional known tolerances. The model returns a detailed analysis with PASS/FAIL for each criterion.

MCP tools

Tool What it does
analyze_image Analyze an image against acceptance criteria using a vision model

analyze_image parameters

Parameter Required Description
path Yes Absolute path to the image file on disk
acceptance_criteria Yes Description of what to look for and validate
expected No Expected outcome or reference description
known_tolerances No Known tolerances or acceptable deviations

Quick start

OPENAI_API_KEY=sk-... uv run --directory tools/vision-mcp python -m vision_mcp.server

OpenCode config example

Add to your opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "vision": {
      "type": "local",
      "command": ["uv", "run", "--directory", "tools/vision-mcp", "python", "-m", "vision_mcp.server"],
      "enabled": true,
      "environment": {
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "VISION_MODEL": "gpt-4o"
      }
    }
  }
}

Built with curiosity, Python and a lot of AI.

About

Give your AI eyes - an MCP server that lets agents visually inspect images via OpenAI's vision API, with pass/fail verdicts against your criteria.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages