Valet - AI Inference Gateway

Valet is an AI Inference Gateway that orchestrates Ollama, vLLM, cloud providers, and vision services into a unified, production-ready platform.

"Keep using the inference engines you love. Valet just makes them work together."

What's Included

Component	Description	Port
valet-gateway	AI Inference Gateway - routes LLM requests	9300
valet-visual	Vision Services - detection & segmentation	9400

Quick Start

# Clone the repository
git clone https://github.com/languageseed/valet-gateway.git
cd valet-gateway

# Start everything with GPU support
docker-compose up -d

# Check health
curl http://localhost:9300/health
curl http://localhost:9400/health

# Chat with an LLM (OpenAI-compatible API)
curl http://localhost:9300/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "mistral-small3.2", "messages": [{"role": "user", "content": "Hello!"}]}'

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                                  CLIENTS                                     │
│         Applications • AI Agents • CLI Tools • Web Apps • Pipelines         │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           VALET GATEWAY (:9300)                              │
│  • OpenAI-Compatible API          • Rate Limiting & Quotas                  │
│  • Intelligent Routing             • Request Queuing (Priority)             │
│  • Health-Based Load Balancing     • Prometheus Metrics                     │
│  • Cloud Overflow                  • OpenTelemetry Tracing                  │
└───────┬──────────────┬──────────────┬──────────────┬────────────────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
   ┌─────────┐   ┌─────────┐   ┌───────────┐   ┌───────────────┐
   │ Ollama  │   │  vLLM   │   │   Cloud   │   │ Valet Visual  │
   │ (Local) │   │ (Local) │   │   APIs    │   │    (:9400)    │
   └─────────┘   └─────────┘   └───────────┘   └───────────────┘

Features

Valet Gateway

OpenAI-Compatible API - Drop-in replacement for OpenAI
Multi-Backend Routing - Ollama, vLLM, cloud providers
Cloud Overflow - Automatic failover to Mistral AI, OpenRouter
GPU Cluster - Load balance across multiple GPUs
Queue System - Priority, vision, batch, chat queues
Rate Limiting - Per-client limits with quotas
Observability - Prometheus metrics, OpenTelemetry tracing
Admin UI - SvelteKit dashboard

Valet Visual

Object Detection - YOLO11, DocLayout-YOLO, YOLO-World, GroundingDINO
Segmentation - SAM, SAM2, SAM3 (Meta's latest)
Dynamic Loading - Load/unload models on demand
Service Profiles - Pre-configured model combinations
VRAM Management - Optimize GPU memory usage

Documentation

Configuration

Copy the example environment file and customize:

cp valet-gateway/env.example valet-gateway/.env
# Edit .env with your settings

Development

# Gateway development
cd valet-gateway
pip install -e ".[dev]"
python -m src.main

# Visual development
cd valet-visual
pip install -r requirements.txt
python app.py

# UI development
cd valet-gateway/ui
npm install && npm run dev

License

MIT License - see LICENSE

Contributing

See CONTRIBUTING.md

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
deployment		deployment
valet-gateway		valet-gateway
valet-visual		valet-visual
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Valet - AI Inference Gateway

What's Included

Quick Start

Architecture

Features

Valet Gateway

Valet Visual

Documentation

Configuration

Development

License

Contributing

About

Uh oh!

Releases

Packages

Languages

License

languageseed/valet-gateway

Folders and files

Latest commit

History

Repository files navigation

Valet - AI Inference Gateway

What's Included

Quick Start

Architecture

Features

Valet Gateway

Valet Visual

Documentation

Configuration

Development

License

Contributing

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages