Skip to content

AI Inference Gateway - orchestrates Ollama, vLLM, cloud providers, and vision services into a unified, production-ready platform

License

Notifications You must be signed in to change notification settings

languageseed/valet-gateway

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Valet - AI Inference Gateway

Valet is an AI Inference Gateway that orchestrates Ollama, vLLM, cloud providers, and vision services into a unified, production-ready platform.

"Keep using the inference engines you love. Valet just makes them work together."

What's Included

Component Description Port
valet-gateway AI Inference Gateway - routes LLM requests 9300
valet-visual Vision Services - detection & segmentation 9400

Quick Start

# Clone the repository
git clone https://github.com/languageseed/valet-gateway.git
cd valet-gateway

# Start everything with GPU support
docker-compose up -d

# Check health
curl http://localhost:9300/health
curl http://localhost:9400/health

# Chat with an LLM (OpenAI-compatible API)
curl http://localhost:9300/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "mistral-small3.2", "messages": [{"role": "user", "content": "Hello!"}]}'

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                                  CLIENTS                                     │
│         Applications • AI Agents • CLI Tools • Web Apps • Pipelines         │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           VALET GATEWAY (:9300)                              │
│  • OpenAI-Compatible API          • Rate Limiting & Quotas                  │
│  • Intelligent Routing             • Request Queuing (Priority)             │
│  • Health-Based Load Balancing     • Prometheus Metrics                     │
│  • Cloud Overflow                  • OpenTelemetry Tracing                  │
└───────┬──────────────┬──────────────┬──────────────┬────────────────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
   ┌─────────┐   ┌─────────┐   ┌───────────┐   ┌───────────────┐
   │ Ollama  │   │  vLLM   │   │   Cloud   │   │ Valet Visual  │
   │ (Local) │   │ (Local) │   │   APIs    │   │    (:9400)    │
   └─────────┘   └─────────┘   └───────────┘   └───────────────┘

Features

Valet Gateway

  • OpenAI-Compatible API - Drop-in replacement for OpenAI
  • Multi-Backend Routing - Ollama, vLLM, cloud providers
  • Cloud Overflow - Automatic failover to Mistral AI, OpenRouter
  • GPU Cluster - Load balance across multiple GPUs
  • Queue System - Priority, vision, batch, chat queues
  • Rate Limiting - Per-client limits with quotas
  • Observability - Prometheus metrics, OpenTelemetry tracing
  • Admin UI - SvelteKit dashboard

Valet Visual

  • Object Detection - YOLO11, DocLayout-YOLO, YOLO-World, GroundingDINO
  • Segmentation - SAM, SAM2, SAM3 (Meta's latest)
  • Dynamic Loading - Load/unload models on demand
  • Service Profiles - Pre-configured model combinations
  • VRAM Management - Optimize GPU memory usage

Documentation

Configuration

Copy the example environment file and customize:

cp valet-gateway/env.example valet-gateway/.env
# Edit .env with your settings

Development

# Gateway development
cd valet-gateway
pip install -e ".[dev]"
python -m src.main

# Visual development
cd valet-visual
pip install -r requirements.txt
python app.py

# UI development
cd valet-gateway/ui
npm install && npm run dev

License

MIT License - see LICENSE

Contributing

See CONTRIBUTING.md

About

AI Inference Gateway - orchestrates Ollama, vLLM, cloud providers, and vision services into a unified, production-ready platform

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published