Valet is an AI Inference Gateway that orchestrates Ollama, vLLM, cloud providers, and vision services into a unified, production-ready platform.
"Keep using the inference engines you love. Valet just makes them work together."
| Component | Description | Port |
|---|---|---|
| valet-gateway | AI Inference Gateway - routes LLM requests | 9300 |
| valet-visual | Vision Services - detection & segmentation | 9400 |
# Clone the repository
git clone https://github.com/languageseed/valet-gateway.git
cd valet-gateway
# Start everything with GPU support
docker-compose up -d
# Check health
curl http://localhost:9300/health
curl http://localhost:9400/health
# Chat with an LLM (OpenAI-compatible API)
curl http://localhost:9300/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "mistral-small3.2", "messages": [{"role": "user", "content": "Hello!"}]}'┌─────────────────────────────────────────────────────────────────────────────┐
│ CLIENTS │
│ Applications • AI Agents • CLI Tools • Web Apps • Pipelines │
└─────────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ VALET GATEWAY (:9300) │
│ • OpenAI-Compatible API • Rate Limiting & Quotas │
│ • Intelligent Routing • Request Queuing (Priority) │
│ • Health-Based Load Balancing • Prometheus Metrics │
│ • Cloud Overflow • OpenTelemetry Tracing │
└───────┬──────────────┬──────────────┬──────────────┬────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌───────────┐ ┌───────────────┐
│ Ollama │ │ vLLM │ │ Cloud │ │ Valet Visual │
│ (Local) │ │ (Local) │ │ APIs │ │ (:9400) │
└─────────┘ └─────────┘ └───────────┘ └───────────────┘
- OpenAI-Compatible API - Drop-in replacement for OpenAI
- Multi-Backend Routing - Ollama, vLLM, cloud providers
- Cloud Overflow - Automatic failover to Mistral AI, OpenRouter
- GPU Cluster - Load balance across multiple GPUs
- Queue System - Priority, vision, batch, chat queues
- Rate Limiting - Per-client limits with quotas
- Observability - Prometheus metrics, OpenTelemetry tracing
- Admin UI - SvelteKit dashboard
- Object Detection - YOLO11, DocLayout-YOLO, YOLO-World, GroundingDINO
- Segmentation - SAM, SAM2, SAM3 (Meta's latest)
- Dynamic Loading - Load/unload models on demand
- Service Profiles - Pre-configured model combinations
- VRAM Management - Optimize GPU memory usage
Copy the example environment file and customize:
cp valet-gateway/env.example valet-gateway/.env
# Edit .env with your settings# Gateway development
cd valet-gateway
pip install -e ".[dev]"
python -m src.main
# Visual development
cd valet-visual
pip install -r requirements.txt
python app.py
# UI development
cd valet-gateway/ui
npm install && npm run devMIT License - see LICENSE
See CONTRIBUTING.md