diff --git a/CLAUDE.md b/CLAUDE.md
index 7d859040..36916745 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -9,8 +9,8 @@ Full documentation available at: https://thushan.github.io/olla/
## Commands
```bash
-make ready # Run before commit (test-short + test-race + fmt + lint + align)
-make ready-tools # Check code with tools only (fmt + lint + align)
+make ready # Run before commit (test-short + test-race + fmt + vet + lint + align)
+make ready-tools # Check code with tools only (fmt + vet + lint + align)
make test # Run all tests
make test-race # Run tests with race detection
make test-stress # Run comprehensive stress tests
@@ -107,9 +107,17 @@ olla/
- `config.yaml` - Main configuration
- `internal/app/handlers/server_routes.go` - Route registration & API setup
- `internal/app/handlers/handler_proxy.go` - Request routing logic
+- `internal/app/handlers/handler_translation.go` - Translation handler with passthrough logic
- `internal/adapter/proxy/sherpa/service.go` - Sherpa proxy implementation
- `internal/adapter/proxy/olla/service.go` - Olla proxy implementation
- `internal/adapter/translator/` - API translation layer (OpenAI ↔ Provider formats)
+- `internal/adapter/translator/types.go` - PassthroughCapable interface and translator types
+- `internal/adapter/translator/anthropic/` - Anthropic translator implementation
+- `internal/adapter/stats/translator_collector.go` - Translator metrics collector
+- `internal/core/constants/translator.go` - TranslatorMode and FallbackReason constants
+- `internal/core/ports/stats.go` - StatsCollector interface with translator tracking
+- `internal/core/domain/profile_config.go` - AnthropicSupportConfig for backend profiles
+- `config/profiles/*.yaml` - Backend profiles with `anthropic_support` sections
- `internal/version/version.go` - Version information embedded at build time
- `/test/scripts/logic/test-model-routing.sh` - Test routing & headers
@@ -121,6 +129,7 @@ olla/
- `/internal/status/endpoints` - Endpoints status details
- `/internal/status/models` - Models status details
- `/internal/stats/models` - Model statistics
+- `/internal/stats/translators` - Translator statistics
- `/internal/process` - Process statistics
- `/version` - Version information
@@ -135,12 +144,20 @@ olla/
### Translator Endpoints
Dynamically registered based on configured translators (e.g., Anthropic Messages API)
+- `/olla/anthropic/v1/messages` - Anthropic Messages API (POST) - supports passthrough and translation modes
+- `/olla/anthropic/v1/models` - List models in Anthropic format (GET)
+- `/olla/anthropic/v1/messages/count_tokens` - Token count estimation (POST)
+
## Response Headers
- `X-Olla-Endpoint`: Backend name
- `X-Olla-Model`: Model used
- `X-Olla-Backend-Type`: ollama/openai/openai-compatible/lm-studio/vllm/sglang/llamacpp/lemonade
- `X-Olla-Request-ID`: Request ID
- `X-Olla-Response-Time`: Total processing time
+- `X-Olla-Mode`: Translator mode used (`passthrough` or absent for translation) - set on Anthropic translator requests
+- `X-Olla-Routing-Strategy`: Routing strategy used (when model routing is active)
+- `X-Olla-Routing-Decision`: Routing decision made (routed/fallback/rejected)
+- `X-Olla-Routing-Reason`: Human-readable reason for routing decision
## Testing
@@ -181,7 +198,9 @@ Always run `make ready` before committing changes.
- **Application Layer** (`internal/app`): HTTP handlers, middleware, and services
### Key Components
-- **Translator Layer**: Enables API format translation (e.g., OpenAI ↔ Anthropic)
+- **Translator Layer**: Enables API format translation (e.g., OpenAI ↔ Anthropic) with passthrough optimisation for backends with native support
+- **Passthrough Mode**: When a backend natively supports the Anthropic Messages API (vLLM, llama.cpp, LM Studio, Ollama), requests bypass translation entirely
+- **Translator Metrics**: Thread-safe per-translator statistics tracking passthrough/translation rates, fallback reasons, latency, and streaming breakdown (`internal/adapter/stats/translator_collector.go`)
- **Proxy Engines**: Choose Sherpa (simple) or Olla (high-performance)
- **Load Balancing**: Priority-based recommended for production
- **Version Management**: Build-time version injection via `internal/version`
@@ -191,4 +210,15 @@ Always run `make ready` before committing changes.
- Australian English for comments and documentation
- Comment on **why** rather than **what**
- Always run `make ready` before committing
-- Use `make help` to see all available commands
\ No newline at end of file
+- Use `make help` to see all available commands
+
+## SUB-AGENT DELEGATION
+
+CRITICAL: Always delegate tasks to the appropriate subagent. Do NOT perform work directly in the main context.
+
+- Code Review → Use the appropriate language subagent (Eg. Go Architect) or reviewer subagent
+- Code changes → Use the appropriate language subagent (Eg. Go Architect) or implementer subagent
+- Research/exploration → Use the explore subagent
+- Testing → Use the test subagent
+
+Only use the main context for orchestration and task decomposition.
\ No newline at end of file
diff --git a/assets/diagrams/features.excalidraw.png b/assets/diagrams/features.excalidraw.png
index 4522b1a5..bb65b870 100644
Binary files a/assets/diagrams/features.excalidraw.png and b/assets/diagrams/features.excalidraw.png differ
diff --git a/config/config.yaml b/config/config.yaml
index 753c0863..ef66cb59 100644
--- a/config/config.yaml
+++ b/config/config.yaml
@@ -130,12 +130,15 @@ model_registry:
translators:
#####
- # !Experimental! v0.0.20+
- # Anthropic translation is very early stages of development, so please let us know
- # if you come across issues or have feedback.
+ # Anthropic Messages API Translation (v0.0.20+)
+ # Enabled by default. Still actively being improved - please report any issues or feedback.
#####
anthropic:
- enabled: false
+ enabled: true
+ # passthrough_enabled only applies when enabled=true
+ # When true: Forwards requests directly to backends with native Anthropic support (optimal performance)
+ # When false: Always translates Anthropic ↔ OpenAI format (useful for debugging/testing)
+ passthrough_enabled: true
max_message_size: 10485760 # 10MB - Anthropic API limit
# !! WARNING: Do not enable inspector in production without reviewing data privacy !!
# Anthropic messages may contain sensitive user data.
diff --git a/config/profiles/llamacpp.yaml b/config/profiles/llamacpp.yaml
index 82895046..6dbd5fb7 100644
--- a/config/profiles/llamacpp.yaml
+++ b/config/profiles/llamacpp.yaml
@@ -16,6 +16,16 @@ routing:
# API compatibility
api:
openai_compatible: true
+
+ # Anthropic Messages API support (b4847+)
+ # llama.cpp is the ONLY backend that supports full token counting via /v1/messages/count_tokens
+ # This enables accurate prompt token estimation without making actual inference requests
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: true
+ min_version: "b4847"
+
paths:
# Model management (OpenAI-compatible)
- /v1/models # 4: list models (typically returns single model)
diff --git a/config/profiles/lmstudio.yaml b/config/profiles/lmstudio.yaml
index 06dbc1a9..92ca1971 100644
--- a/config/profiles/lmstudio.yaml
+++ b/config/profiles/lmstudio.yaml
@@ -14,6 +14,16 @@ routing:
# API compatibility
api:
openai_compatible: true
+
+ # Anthropic Messages API support (v0.4.1+)
+ # Added specifically for Claude Code integration, enabling native Anthropic API support
+ # without requiring translation middleware
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: false
+ min_version: "0.4.1"
+
paths:
- /v1/models # 0: health check & models
- /v1/chat/completions # 1: chat completions
diff --git a/config/profiles/ollama.yaml b/config/profiles/ollama.yaml
index a8c47cad..4d082b90 100644
--- a/config/profiles/ollama.yaml
+++ b/config/profiles/ollama.yaml
@@ -12,6 +12,19 @@ routing:
# API compatibility
api:
openai_compatible: true
+
+ # Anthropic Messages API support (v0.14.0+)
+ # UNSUPPORTED:
+ # - /v1/messages/count_tokens
+ # [11-01-2026]: https://docs.ollama.com/api/anthropic-compatibility#not-supported
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: false
+ min_version: "0.14.0"
+ limitations:
+ - token_counting_404
+
paths:
- / # 0: health check
- /api/generate # 1: text completion
diff --git a/config/profiles/vllm.yaml b/config/profiles/vllm.yaml
index 0f729ab5..59d0308b 100644
--- a/config/profiles/vllm.yaml
+++ b/config/profiles/vllm.yaml
@@ -13,6 +13,18 @@ routing:
# API compatibility
api:
openai_compatible: true
+
+ # Anthropic Messages API support (v0.11.1+)
+ # vLLM v0.11.1+ natively supports the Anthropic Messages API, allowing direct forwarding
+ # of Anthropic-format requests without translation overhead
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: false
+ min_version: "0.11.1"
+ limitations:
+ - no_token_counting
+
paths:
# Health and system endpoints
- /health # 0: health check (vLLM-specific endpoint)
diff --git a/docs/content/api-reference/anthropic.md b/docs/content/api-reference/anthropic.md
index 2cc76a1f..f8071827 100644
--- a/docs/content/api-reference/anthropic.md
+++ b/docs/content/api-reference/anthropic.md
@@ -15,10 +15,14 @@ The Anthropic translator accepts requests in Anthropic Messages API format at `/
**Key Features**:
- ✅ Full Anthropic Messages API compatibility
+- ✅ **Passthrough mode** for backends with native Anthropic support (vLLM, llama.cpp, LM Studio, Ollama)
+- ✅ **Translation mode** for OpenAI-compatible backends without native support
+- ✅ Automatic fallback from passthrough to translation when needed
- ✅ Streaming via Server-Sent Events (SSE)
- ✅ Tool use (function calling)
- ✅ Works with all OpenAI-compatible backends
- ✅ Zero backend changes required
+- ✅ Translator metrics for observability (passthrough/translation rates, latency, fallback tracking)
- ⚠️ **Vision Support**: Image content blocks accepted but not yet processed
- ⛔ **Async Support**: Asynchronous workflows are not supported
@@ -31,6 +35,37 @@ The Anthropic translator accepts requests in Anthropic Messages API format at `/
## How it Works
+Olla supports two modes for handling Anthropic API requests:
+
+### Passthrough Mode (Preferred)
+
+When a backend natively supports the Anthropic Messages API, requests are forwarded directly without any translation overhead.
+
+```mermaid
+sequenceDiagram
+ participant Client as Claude Code
+ participant Olla as Olla (Passthrough)
+ participant Backend as Anthropic-Compatible Backend
+
+ Client->>Olla: POST /olla/anthropic/v1/messages
(Anthropic format)
+
+ Note over Olla: 1. Detect native Anthropic support
+ Note over Olla: 2. Forward request as-is
+
+ Olla->>Backend: POST /v1/messages
(Anthropic format - unchanged)
+ Backend->>Olla: Response (Anthropic format)
+
+ Olla->>Client: Response (Anthropic format - unchanged)
+```
+
+**Compatible backends**: vLLM (v0.11.1+), llama.cpp (b4847+), LM Studio (v0.4.1+), Ollama (v0.14.0+)
+
+**Observability**: Responses include `X-Olla-Mode: passthrough` header.
+
+### Translation Mode (Fallback)
+
+When no backend supports native Anthropic format, requests are translated to OpenAI format and responses are translated back.
+
```mermaid
sequenceDiagram
participant Client as Claude Code
@@ -51,16 +86,9 @@ sequenceDiagram
Olla->>Client: Response (Anthropic format)
```
-**Translation Process**:
-
-1. Client sends Anthropic-formatted request
-2. Olla translates request to OpenAI format
-3. Request routed through standard Olla pipeline (load balancing, health checks)
-4. Backend processes request (unaware of original format)
-5. Olla translates OpenAI response back to Anthropic format
-6. Client receives Anthropic-formatted response
+**Mode Selection**: Olla automatically selects the best mode based on available backend capabilities. No client-side configuration is required.
-For detailed explanation, see [API Translation Concept](../concepts/api-translation.md).
+For detailed explanation of both modes, see [API Translation Concept](../concepts/api-translation.md).
## Endpoints Overview
@@ -600,6 +628,13 @@ All responses include standard Olla headers:
| `X-Olla-Model` | Actual model used | `llama4:latest` |
| `X-Olla-Backend-Type` | Backend type | `ollama` |
| `X-Olla-Response-Time` | Total processing time | `1.234s` |
+| `X-Olla-Mode` | Translator mode (only present for passthrough) | `passthrough` |
+
+!!! tip "Detecting Passthrough Mode"
+ When passthrough mode is active, the `X-Olla-Mode: passthrough` header is included in the response. When translation mode is used, this header is absent. This allows monitoring and debugging to distinguish between the two modes.
+
+!!! info "Translator Statistics"
+ For aggregate translator metrics including passthrough rates, success rates, fallback reasons, and latency data, query the [`GET /internal/stats/translators`](system.md#get-internalstatstranslators) endpoint.
## Authentication
@@ -681,6 +716,7 @@ Errors follow Anthropic API format:
- Stop sequences
- Temperature, top_p, top_k parameters
- Content blocks (text, tool_use, tool_result)
+- **Passthrough mode** for backends with native Anthropic support (zero translation overhead)
**Tool Choice Mapping**:
@@ -707,12 +743,13 @@ Errors follow Anthropic API format:
## Configuration
-Enable Anthropic translation in `config.yaml`:
+Anthropic translation is enabled by default. To customise, edit `config.yaml`:
```yaml
translators:
anthropic:
- enabled: true # Enable Anthropic API translator
+ enabled: true # Enabled by default
+ passthrough_enabled: true # Forward directly to backends with native Anthropic support (default)
max_message_size: 10485760 # Max request size (10MB)
# Standard Olla configuration
@@ -730,8 +767,43 @@ discovery:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
-| `enabled` | boolean | `false` | Enable Anthropic translator |
+| `enabled` | boolean | `true` | Enable Anthropic translator (enabled by default) |
| `max_message_size` | integer | `10485760` | Max request size in bytes (10MB) |
+| `passthrough_enabled` | boolean | `true` | Passthrough optimisation mode. When `true` (default), requests are forwarded directly to backends with native Anthropic support for zero translation overhead. When `false`, all requests go through translation regardless of backend capabilities. Only applies when `enabled: true`. Individual backends must also declare `anthropic_support` in their profile. |
+
+### Passthrough Configuration
+
+Passthrough mode requires two things to be active:
+
+1. The `passthrough_enabled` field must be set to `true` in the translator configuration
+2. Backend profiles must declare native Anthropic support via `anthropic_support.enabled: true`
+
+```yaml
+translators:
+ anthropic:
+ enabled: true
+ passthrough_enabled: true # Required to enable passthrough mode
+```
+
+When `passthrough_enabled` is `true` (the default), Olla forwards requests directly to backends with native Anthropic support. Set `passthrough_enabled` to `false` to force all requests through the translation pipeline regardless of backend capabilities, which can be useful for debugging or testing the translation layer.
+
+**Backends with native Anthropic support**:
+
+| Backend | Profile | Min Version | Notes |
+|---------|---------|-------------|-------|
+| vLLM | `config/profiles/vllm.yaml` | v0.11.1+ | No token counting |
+| llama.cpp | `config/profiles/llamacpp.yaml` | b4847+ | Supports token counting |
+| LM Studio | `config/profiles/lmstudio.yaml` | v0.4.1+ | No token counting |
+| Ollama | `config/profiles/ollama.yaml` | v0.14.0+ | No token counting |
+
+To disable passthrough for a specific backend, set `anthropic_support.enabled: false` in the profile:
+
+```yaml
+# config/profiles/vllm.yaml (custom override)
+api:
+ anthropic_support:
+ enabled: false # Force translation mode for this backend
+```
## Performance Considerations
diff --git a/docs/content/api-reference/overview.md b/docs/content/api-reference/overview.md
index 80bcea73..242083d5 100644
--- a/docs/content/api-reference/overview.md
+++ b/docs/content/api-reference/overview.md
@@ -19,10 +19,14 @@ If you ever need to remember the port, think - what's the port, 4 OLLA?!
## API Sections
### [System Endpoints](system.md)
-Internal endpoints for health monitoring and system status.
+Internal endpoints for health monitoring, system status, and statistics.
- `/internal/health` - Health check endpoint
- `/internal/status` - System status and statistics
+- `/internal/status/endpoints` - Endpoint status details
+- `/internal/status/models` - Model registry status
+- `/internal/stats/models` - Model usage statistics
+- `/internal/stats/translators` - Translator usage and performance statistics
- `/internal/process` - Process information
### [Unified Models API](models.md)
@@ -88,13 +92,16 @@ Anthropic-compatible API endpoints for Claude clients.
**Endpoints**:
- `POST /olla/anthropic/v1/messages` - Create a message (chat)
- `GET /olla/anthropic/v1/models` - List available models
+- `POST /olla/anthropic/v1/messages/count_tokens` - Estimate token count
**Features**:
- Full Anthropic Messages API v1 support
-- Automatic translation to OpenAI format
+- **Passthrough mode** for backends with native Anthropic support (vLLM, llama.cpp, LM Studio, Ollama)
+- Automatic fallback to translation mode when needed
- Streaming with Server-Sent Events
- Tool use (function calling)
- Vision support (multi-modal)
+- Translator metrics for observability
**Use With**:
- Claude Code
@@ -102,7 +109,7 @@ Anthropic-compatible API endpoints for Claude clients.
- Crush CLI
- Any Anthropic API client
-See [API Translation](../concepts/api-translation.md) for how translation works.
+See [API Translation](../concepts/api-translation.md) for how passthrough and translation modes work.
## Authentication
@@ -145,6 +152,7 @@ All responses include:
| `X-Olla-Routing-Strategy` | Routing strategy used (when model routing is active) |
| `X-Olla-Routing-Decision` | Routing decision made (routed/fallback/rejected) |
| `X-Olla-Routing-Reason` | Human-readable reason for routing decision |
+| `X-Olla-Mode` | Translator mode (`passthrough` when native format used; absent for translation mode) |
### Provider Metrics (Debug Logs)
diff --git a/docs/content/api-reference/system.md b/docs/content/api-reference/system.md
index 03935cb7..3c1bd1b2 100644
--- a/docs/content/api-reference/system.md
+++ b/docs/content/api-reference/system.md
@@ -12,6 +12,7 @@ Internal endpoints for health monitoring, system status, and process information
| GET | `/internal/status/endpoints` | Detailed endpoint status |
| GET | `/internal/status/models` | Model registry status |
| GET | `/internal/stats/models` | Model usage statistics |
+| GET | `/internal/stats/translators` | Translator usage and performance statistics |
| GET | `/internal/process` | Process information and metrics |
---
@@ -235,6 +236,110 @@ curl -X GET http://localhost:40114/internal/process
| `connections` | object | Connection pool stats |
| `runtime` | object | Runtime information |
+## GET /internal/stats/translators
+
+Translator usage and performance statistics. Provides per-translator metrics and an aggregate summary, useful for monitoring API translation behaviour, passthrough efficiency, and fallback reasons.
+
+### Request
+
+```bash
+curl -X GET http://localhost:40114/internal/stats/translators
+```
+
+### Response
+
+```json
+{
+ "timestamp": "2026-02-13T10:30:00Z",
+ "translators": [
+ {
+ "translator_name": "anthropic",
+ "total_requests": 1500,
+ "successful_requests": 1450,
+ "failed_requests": 50,
+ "success_rate": "96.7%",
+ "passthrough_rate": "80.0%",
+ "passthrough_requests": 1200,
+ "translation_requests": 300,
+ "streaming_requests": 800,
+ "non_streaming_requests": 700,
+ "fallback_no_compatible_endpoints": 5,
+ "fallback_translator_does_not_support_passthrough": 0,
+ "fallback_cannot_passthrough": 295,
+ "average_latency": "245ms"
+ }
+ ],
+ "summary": {
+ "total_translators": 1,
+ "active_translators": 1,
+ "total_requests": 1500,
+ "overall_success_rate": "96.7%",
+ "total_passthrough": 1200,
+ "total_translations": 300,
+ "overall_passthrough_rate": "80.0%",
+ "total_streaming": 800,
+ "total_non_streaming": 700
+ }
+}
+```
+
+### Response Fields
+
+#### Top-level
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `timestamp` | string | Current timestamp in RFC3339 format |
+| `translators` | array | Per-translator statistics, sorted by request count (most active first) |
+| `summary` | object | Aggregate statistics across all translators |
+
+#### Translator Entry (`translators[]`)
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `translator_name` | string | Translator identifier (e.g., "anthropic") |
+| `total_requests` | integer | Total requests processed by this translator |
+| `successful_requests` | integer | Requests that completed successfully |
+| `failed_requests` | integer | Requests that failed |
+| `success_rate` | string | Human-readable success percentage |
+| `passthrough_rate` | string | Human-readable passthrough percentage |
+| `passthrough_requests` | integer | Requests forwarded directly in native format |
+| `translation_requests` | integer | Requests that required format conversion |
+| `streaming_requests` | integer | Streaming (SSE) requests |
+| `non_streaming_requests` | integer | Non-streaming requests |
+| `fallback_no_compatible_endpoints` | integer | Fallbacks due to no healthy endpoints available |
+| `fallback_translator_does_not_support_passthrough` | integer | Fallbacks because the translator lacks passthrough capability |
+| `fallback_cannot_passthrough` | integer | Fallbacks because no backend declares native support |
+| `average_latency` | string | Human-readable average request latency |
+
+#### Summary
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `total_translators` | integer | Total number of registered translators |
+| `active_translators` | integer | Translators that have processed at least one request |
+| `total_requests` | integer | Total requests across all translators |
+| `overall_success_rate` | string | Aggregate success percentage |
+| `total_passthrough` | integer | Total passthrough requests across all translators |
+| `total_translations` | integer | Total translation requests across all translators |
+| `overall_passthrough_rate` | string | Aggregate passthrough percentage |
+| `total_streaming` | integer | Total streaming requests across all translators |
+| `total_non_streaming` | integer | Total non-streaming requests across all translators |
+
+### Key Metrics for Monitoring
+
+**Passthrough Rate**: A high `passthrough_rate` indicates backends are being used optimally in their native format, avoiding translation overhead.
+
+**Fallback Reasons**: The three `fallback_*` fields help diagnose why passthrough is not being used:
+
+- `fallback_no_compatible_endpoints` -- No healthy endpoints available (operational issue, check health endpoint)
+- `fallback_cannot_passthrough` -- Backends do not declare native support for the translator's format (configuration issue)
+- `fallback_translator_does_not_support_passthrough` -- Expected for translators without passthrough capability
+
+**Success Rate**: A declining `success_rate` may indicate backend issues or incompatible request formats.
+
+---
+
## Rate Limits
System endpoints have elevated rate limits:
diff --git a/docs/content/concepts/api-translation.md b/docs/content/concepts/api-translation.md
index f07bcec5..c8fb4e47 100644
--- a/docs/content/concepts/api-translation.md
+++ b/docs/content/concepts/api-translation.md
@@ -243,6 +243,106 @@ The translator performs semantic mapping for tool choice parameters:
---
+## Passthrough Mode
+
+### What is Passthrough?
+
+Passthrough mode is an optimisation that bypasses the translation pipeline entirely when a backend natively supports the incoming request format. For example, vLLM (v0.11.1+), llama.cpp (b4847+), LM Studio (v0.4.1+), and Ollama (v0.14.0+) all natively support the Anthropic Messages API. When Olla detects a compatible backend, it forwards the request directly without any Anthropic-to-OpenAI-and-back conversion.
+
+**Key Benefit**: Zero translation overhead -- requests are forwarded as-is, preserving the original wire format.
+
+### How It Works
+
+```mermaid
+flowchart TD
+ A[Client sends Anthropic request] --> B{Backend supports native Anthropic?}
+ B -->|Yes| C[Passthrough Mode]
+ B -->|No| D[Translation Mode]
+ C --> E[Forward request directly to backend]
+ E --> F[Backend processes in native Anthropic format]
+ F --> G[Response returned as-is]
+ D --> H[Translate Anthropic → OpenAI]
+ H --> I[Route to backend]
+ I --> J[Translate OpenAI → Anthropic]
+ J --> G
+```
+
+**Decision Flow**:
+
+1. Request arrives at `/olla/anthropic/v1/messages`
+2. Olla checks whether the translator implements `PassthroughCapable`
+3. If yes, checks whether `passthrough_enabled` is `true` in the translator config
+4. If yes, checks available endpoints against their profile configurations
+5. If **all** endpoints' profiles have `anthropic_support.enabled: true`, passthrough mode is used
+6. If any endpoint does not support passthrough, falls back to translation mode automatically
+
+### Passthrough vs Translation Comparison
+
+| Aspect | Passthrough | Translation |
+|--------|-------------|-------------|
+| **Overhead** | Near zero | ~1-5ms per request |
+| **Backend requirement** | Native Anthropic support | OpenAI-compatible |
+| **Request modification** | None (forwarded as-is) | Full format conversion |
+| **Response modification** | None | Full format conversion |
+| **Streaming** | Native SSE format | SSE format conversion |
+| **Response header** | `X-Olla-Mode: passthrough` | No `X-Olla-Mode` header |
+| **Feature support** | Backend-dependent | Translation-dependent |
+
+### Compatible Backends
+
+Backends that support passthrough (native Anthropic Messages API):
+
+| Backend | Min Version | Token Counting | Profile Config |
+|---------|-------------|----------------|----------------|
+| vLLM | v0.11.1+ | No | `config/profiles/vllm.yaml` |
+| llama.cpp | b4847+ | Yes | `config/profiles/llamacpp.yaml` |
+| LM Studio | v0.4.1+ | No | `config/profiles/lmstudio.yaml` |
+| Ollama | v0.14.0+ | No | `config/profiles/ollama.yaml` |
+
+### Backend Profile Configuration
+
+Passthrough is configured in each backend's profile YAML under the `api.anthropic_support` section:
+
+```yaml
+# Example: config/profiles/vllm.yaml
+api:
+ anthropic_support:
+ enabled: true # Enable native Anthropic support
+ messages_path: /v1/messages # Backend path for Messages API
+ token_count: false # Whether token counting is supported
+ min_version: "0.11.1" # Minimum backend version required
+ limitations: # Optional known limitations
+ - no_token_counting
+```
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `enabled` | boolean | Whether the backend supports native Anthropic format |
+| `messages_path` | string | Backend path for the Messages API (e.g., `/v1/messages`) |
+| `token_count` | boolean | Whether the backend supports `/v1/messages/count_tokens` |
+| `min_version` | string | Minimum backend version with Anthropic support |
+| `limitations` | list | Known limitations (e.g., `no_token_counting`, `token_counting_404`) |
+
+### Fallback Behaviour
+
+When passthrough is not possible, Olla falls back to translation mode automatically. The fallback reason is tracked in translator metrics:
+
+| Fallback Reason | Description |
+|----------------|-------------|
+| `no_compatible_endpoints` | No healthy endpoints available |
+| `translator_does_not_support_passthrough` | Translator lacks `PassthroughCapable` interface |
+| `cannot_passthrough` | Endpoints don't declare native Anthropic support |
+
+### Observability
+
+Passthrough mode is observable through:
+
+- **Response header**: `X-Olla-Mode: passthrough` (only set when passthrough is used)
+- **Translator stats endpoint**: `GET /internal/stats/translators` exposes passthrough vs translation request counts, success rates, fallback reason breakdowns, and latency data per translator (see [System Endpoints](../api-reference/system.md#get-internalstatstranslators))
+- **Debug logs**: Log entries indicate which mode was selected and why
+
+---
+
## Architecture
### Translation Layer Position
@@ -258,8 +358,8 @@ The translator performs semantic mapping for tool choice parameters:
│ │ │ │ │
│ ▼ ▼ ▼ │
│ /olla/openai/* /olla/anthropic/* Load Balancer │
-│ (pass-through) (translate) Health Checks │
-│ Connection Pool │
+│ (pass-through) (passthrough or Health Checks │
+│ translate) Connection Pool │
└─────────────────────────────────────────────────────────────┘
```
@@ -268,6 +368,7 @@ The translator performs semantic mapping for tool choice parameters:
- Translation is **optional** and **transparent**
- Native endpoints bypass translation entirely
- Translated endpoints use the same backend infrastructure
+- **Passthrough mode** bypasses translation when backends natively support the format
- No impact on native endpoint performance
### Where Translation Happens
@@ -278,27 +379,51 @@ Translation occurs in the **adapter layer** of Olla:
internal/
├── adapter/
│ ├── translator/
+│ │ ├── types.go # PassthroughCapable interface, ProfileLookup
│ │ └── anthropic/
│ │ ├── request.go # Request translation
│ │ ├── response.go # Response translation
│ │ ├── streaming.go # SSE translation
│ │ ├── tools.go # Tool/function translation
+│ │ ├── passthrough.go # Passthrough support (CanPassthrough, PreparePassthrough)
│ │ └── translator.go # Main translator
+│ ├── stats/
+│ │ └── translator_collector.go # Translator metrics (passthrough/translation rates)
│ └── proxy/
│ ├── sherpa/ # Uses translator
│ └── olla/ # Uses translator
+├── core/
+│ ├── constants/
+│ │ └── translator.go # TranslatorMode, FallbackReason constants
+│ ├── domain/
+│ │ └── profile_config.go # AnthropicSupportConfig
+│ └── ports/
+│ └── stats.go # TranslatorRequestEvent, TranslatorStats
+├── app/
+│ └── handlers/
+│ └── handler_translation.go # Passthrough/translation decision logic
```
-**Process**:
+**Process (Translation Mode)**:
1. Request arrives at `/olla/anthropic/v1/messages`
-2. Handler invokes Anthropic translator
-3. Translator converts request to OpenAI format
+2. Handler checks if passthrough is possible (see below)
+3. If not, translator converts request to OpenAI format
4. Proxy routes to backend (standard Olla routing)
5. Backend responds in OpenAI format
6. Translator converts response to Anthropic format
7. Response returned to client
+**Process (Passthrough Mode)**:
+
+1. Request arrives at `/olla/anthropic/v1/messages`
+2. Handler checks if translator implements `PassthroughCapable`
+3. `CanPassthrough()` checks endpoint profiles for `anthropic_support.enabled: true`
+4. If compatible, `PreparePassthrough()` extracts model name and target path
+5. Request forwarded directly to backend without any format conversion
+6. Backend responds in native Anthropic format
+7. Response returned to client as-is
+
### Memory Optimisation
The translator uses buffer pooling to minimise memory allocations:
@@ -363,6 +488,7 @@ Easy to add new translations:
- Request translation: 0.5-2ms per request
- Response translation: 1-5ms per request
- Streaming: ~0.1-0.5ms per chunk
+- **Passthrough mode**: Near-zero overhead (no translation)
**Memory Usage**:
@@ -370,7 +496,7 @@ Easy to add new translations:
- Proportional to content size for vision models
- Buffer pool reduces allocation overhead
-**Recommendation**: Use native endpoints when translation isn't needed for maximum performance.
+**Recommendation**: Use passthrough mode when backends support native Anthropic format (vLLM, llama.cpp, LM Studio, Ollama) for zero translation overhead. Use native endpoints when translation isn't needed for maximum performance.
### Feature Parity
@@ -406,18 +532,24 @@ Streaming translation requires:
## Configuration
-### Enable Translation
+### Translation Configuration
+
+Anthropic translation is enabled by default. To customise:
```yaml
translators:
anthropic:
- enabled: true # Enable Anthropic translator
+ enabled: true # Enabled by default
max_message_size: 10485760 # Max request size (10MB)
+ passthrough_enabled: true # Enable passthrough for backends with native Anthropic support
```
+!!! note "`passthrough_enabled` Optimisation Flag"
+ The `passthrough_enabled` field controls whether passthrough mode is active. When `true` (the default), Olla forwards requests directly to backends whose profiles declare `anthropic_support.enabled: true`, with zero translation overhead. Set to `false` to force all requests through the translation pipeline regardless of backend capabilities. This only applies when `enabled: true` -- when the translator is disabled, `passthrough_enabled` has no effect.
+
### Disable Translation
-To use native endpoints only:
+To disable translation and use native endpoints only:
```yaml
translators:
@@ -425,7 +557,7 @@ translators:
enabled: false
```
-Anthropic endpoints will return 404 when disabled.
+Anthropic endpoints will return 404 when disabled. By default, translation is enabled.
### Performance Tuning
@@ -593,6 +725,24 @@ discovery:
3. Verify `max_message_size` isn't too restrictive
4. Check logs for specific validation errors
+### Passthrough Not Activating
+
+**Issue**: Requests are being translated instead of using passthrough mode
+
+**Possible Causes**:
+
+- `passthrough_enabled` is `false` in the translator config
+- Backend profile does not declare `api.anthropic_support.enabled: true`
+- Not all healthy endpoints support native Anthropic format
+
+**Solutions**:
+
+1. Verify `passthrough_enabled: true` in your translator config (this is the default)
+2. Check the backend profile for `anthropic_support.enabled: true`
+3. Check the `X-Olla-Mode` response header to confirm mode selection
+4. Enable debug logging to see detailed mode selection reasoning
+5. See [Anthropic Translation Setup](../integrations/api-translation/anthropic.md#passthrough-not-working) for detailed troubleshooting
+
### Streaming Issues
**Issue**: Streaming responses are incomplete or malformed
diff --git a/docs/content/concepts/health-checking.md b/docs/content/concepts/health-checking.md
index 33a0888e..3c552af2 100644
--- a/docs/content/concepts/health-checking.md
+++ b/docs/content/concepts/health-checking.md
@@ -561,6 +561,7 @@ Olla provides health and status information through its internal endpoints:
- `/internal/status` - Detailed status information
- `/internal/status/endpoints` - Endpoint health details
- `/internal/stats/models` - Model usage statistics
+- `/internal/stats/translators` - Translator usage and performance statistics
These can be integrated with external monitoring systems to track:
@@ -568,6 +569,7 @@ These can be integrated with external monitoring systems to track:
2. Health check latency trends
3. Failure rates by endpoint
4. Circuit breaker state changes
+5. Translator passthrough efficiency and fallback reasons
## Next Steps
diff --git a/docs/content/concepts/profile-system.md b/docs/content/concepts/profile-system.md
index be1c4536..53539d6f 100644
--- a/docs/content/concepts/profile-system.md
+++ b/docs/content/concepts/profile-system.md
@@ -274,6 +274,42 @@ models:
- First match sets context size
- No match uses platform default
+### Anthropic Support {#anthropic-support}
+
+The `api.anthropic_support` section declares native Anthropic Messages API support for a backend. When present and enabled, the translator layer can skip the Anthropic-to-OpenAI conversion and forward requests directly (passthrough mode).
+
+```yaml
+api:
+ anthropic_support:
+ enabled: true # Backend natively supports Anthropic Messages API
+ messages_path: /v1/messages # Path for the Messages API on the backend
+ token_count: false # Whether /v1/messages/count_tokens is supported
+ min_version: "0.11.1" # Minimum backend version required
+ limitations: # Known limitations
+ - no_token_counting
+```
+
+**Configuration Fields**:
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `enabled` | boolean | Yes | Whether the backend supports native Anthropic format |
+| `messages_path` | string | Yes | Backend path for the Anthropic Messages API (e.g., `/v1/messages`) |
+| `token_count` | boolean | No | Whether the backend supports the token counting endpoint |
+| `min_version` | string | No | Minimum backend version with Anthropic support |
+| `limitations` | list | No | Known limitations (e.g., `no_token_counting`, `token_counting_404`) |
+
+**Backends with native Anthropic support**:
+
+| Backend | Min Version | Token Counting | Notes |
+|---------|-------------|----------------|-------|
+| vLLM | v0.11.1+ | No | High-performance inference |
+| llama.cpp | b4847+ | Yes | Only backend with full token counting |
+| LM Studio | v0.4.1+ | No | Desktop inference |
+| Ollama | v0.14.0+ | No | Popular local inference |
+
+When a client sends a request to `/olla/anthropic/v1/messages` and a backend has `anthropic_support.enabled: true`, Olla will bypass format translation and forward the request directly. This is referred to as **passthrough mode** and has near-zero overhead. See [API Translation](api-translation.md#passthrough-mode) for details.
+
## Complete Profile Structure
```yaml
@@ -292,6 +328,14 @@ routing:
# API configuration
api:
openai_compatible: true # Supports OpenAI API format
+
+ # Native Anthropic Messages API support (optional)
+ anthropic_support:
+ enabled: true # Enable passthrough for Anthropic requests
+ messages_path: /v1/messages # Backend path for Messages API
+ token_count: false # Token counting support
+ min_version: "0.14.0" # Minimum version required
+
paths: # Allowed API paths (allowlist)
- /
- /api/generate
diff --git a/docs/content/configuration/examples.md b/docs/content/configuration/examples.md
index 8ff2bb96..1546c012 100644
--- a/docs/content/configuration/examples.md
+++ b/docs/content/configuration/examples.md
@@ -973,6 +973,79 @@ discovery:
- "*-72b*"
```
+## Anthropic Translation with Passthrough
+
+Configuration for Anthropic API translation with passthrough mode for backends with native Anthropic support:
+
+```yaml
+server:
+ host: "localhost"
+ port: 40114
+
+proxy:
+ engine: "olla"
+ profile: "streaming"
+ load_balancer: "priority"
+
+# Enable Anthropic translator with passthrough optimisation
+translators:
+ anthropic:
+ enabled: true
+ # passthrough_enabled only applies when enabled=true
+ # When true: Forwards requests directly to backends with native Anthropic support (optimal performance)
+ # When false: Always translates Anthropic ↔ OpenAI format (useful for debugging/testing)
+ passthrough_enabled: true
+ max_message_size: 10485760 # 10MB
+
+discovery:
+ type: "static"
+ static:
+ endpoints:
+ # Ollama v0.14.0+ supports native Anthropic API (passthrough eligible)
+ - url: "http://localhost:11434"
+ name: "local-ollama"
+ type: "ollama"
+ priority: 100
+
+ # vLLM v0.11.1+ supports native Anthropic API (passthrough eligible)
+ - url: "http://vllm-server:8000"
+ name: "vllm-prod"
+ type: "vllm"
+ priority: 80
+
+logging:
+ level: "info"
+ format: "json"
+```
+
+### Anthropic Translation Only (No Passthrough)
+
+Force all Anthropic requests through the translation pipeline, useful for debugging or when backends do not have native Anthropic support:
+
+```yaml
+translators:
+ anthropic:
+ enabled: true
+ passthrough_enabled: false # Force translation mode for all requests
+ max_message_size: 10485760
+
+discovery:
+ type: "static"
+ static:
+ endpoints:
+ # SGLang does not have native Anthropic support - uses translation
+ - url: "http://localhost:30000"
+ name: "sglang-server"
+ type: "sglang"
+ priority: 100
+
+ # LiteLLM gateway - uses translation
+ - url: "http://localhost:4000"
+ name: "litellm-gateway"
+ type: "litellm"
+ priority: 50
+```
+
## Environment Variables Override
Example showing environment variable overrides:
diff --git a/docs/content/configuration/practices/monitoring.md b/docs/content/configuration/practices/monitoring.md
index f83449cf..4a694f0f 100644
--- a/docs/content/configuration/practices/monitoring.md
+++ b/docs/content/configuration/practices/monitoring.md
@@ -15,7 +15,8 @@ This guide covers monitoring and observability for Olla deployments.
> # /internal/status - Detailed status
> # /internal/status/endpoints - Endpoint details
> # /internal/stats/models - Model statistics
->
+> # /internal/stats/translators - Translator statistics
+>
> logging:
> level: "info"
> format: "json"
@@ -278,6 +279,9 @@ Key panels for Grafana:
5. **Success Rate**: `1 - (rate(errors) / rate(requests))`
6. **Token Generation Speed**: `olla_tokens_per_second` (from provider metrics)
7. **Token Usage**: `olla_prompt_tokens` + `olla_completion_tokens`
+8. **Translator Passthrough Rate**: `olla_translator_passthrough_rate` per translator (from `/internal/stats/translators`)
+9. **Translator Fallback Reasons**: Breakdown of `fallback_*` counters per translator
+10. **Translator Latency**: `average_latency` per translator (from `/internal/stats/translators`)
## Provider Metrics
@@ -302,6 +306,172 @@ Olla automatically extracts performance metrics from LLM provider responses:
See [Provider Metrics Documentation](../../concepts/provider-metrics.md) for configuration details.
+## Translator Metrics
+
+Olla tracks comprehensive metrics for API translation requests, providing visibility into passthrough vs translation usage, fallback behaviour, and performance.
+
+### Available Translator Metrics
+
+Translator metrics are collected per-translator (e.g., "anthropic") and include:
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `total_requests` | Counter | Total requests processed |
+| `successful_requests` | Counter | Requests that completed successfully |
+| `failed_requests` | Counter | Requests that failed |
+| `passthrough_requests` | Counter | Requests forwarded directly (native format) |
+| `translation_requests` | Counter | Requests that required format conversion |
+| `streaming_requests` | Counter | Streaming (SSE) requests |
+| `non_streaming_requests` | Counter | Non-streaming requests |
+| `fallback_no_compatible_endpoints` | Counter | Fallbacks due to no healthy endpoints |
+| `fallback_translator_does_not_support_passthrough` | Counter | Fallbacks due to translator lacking passthrough |
+| `fallback_cannot_passthrough` | Counter | Fallbacks due to no backends with native support |
+| `avg_latency_ms` | Gauge | Average request latency in milliseconds |
+| `total_latency_ms` | Counter | Cumulative latency across all requests |
+
+### Key Metrics to Track
+
+**Passthrough Efficiency**: Monitor the ratio of `passthrough_requests` to `translation_requests`. A high passthrough rate indicates backends are being used optimally.
+
+**Fallback Reasons**: Track `fallback_*` counters to understand why passthrough isn't being used:
+
+- `fallback_no_compatible_endpoints` - No healthy endpoints available (operational issue)
+- `fallback_cannot_passthrough` - Backends don't declare native Anthropic support (configuration issue)
+- `fallback_translator_does_not_support_passthrough` - Expected for translators without passthrough capability
+
+**Success Rate**: Compare `successful_requests` vs `failed_requests` to detect translation issues.
+
+### Response Header Observability
+
+The `X-Olla-Mode: passthrough` response header is included when passthrough mode is active. This allows external monitoring tools to track mode usage:
+
+```bash
+# Check which mode was used for a request
+curl -sI -X POST http://localhost:40114/olla/anthropic/v1/messages \
+ -H "Content-Type: application/json" \
+ -d '{"model":"llama4:latest","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}' \
+ | grep X-Olla-Mode
+```
+
+### Translator Stats HTTP Endpoint
+
+The `/internal/stats/translators` endpoint exposes all translator metrics via HTTP, making them easy to query from scripts, monitoring tools, and dashboards.
+
+```bash
+# Query translator statistics
+curl -s http://localhost:40114/internal/stats/translators | jq .
+```
+
+The response includes per-translator statistics and an aggregate summary:
+
+```json
+{
+ "timestamp": "2026-02-13T10:30:00Z",
+ "translators": [
+ {
+ "translator_name": "anthropic",
+ "total_requests": 1500,
+ "successful_requests": 1450,
+ "failed_requests": 50,
+ "success_rate": "96.7%",
+ "passthrough_rate": "80.0%",
+ "passthrough_requests": 1200,
+ "translation_requests": 300,
+ "streaming_requests": 800,
+ "non_streaming_requests": 700,
+ "fallback_no_compatible_endpoints": 5,
+ "fallback_translator_does_not_support_passthrough": 0,
+ "fallback_cannot_passthrough": 295,
+ "average_latency": "245ms"
+ }
+ ],
+ "summary": {
+ "total_translators": 1,
+ "active_translators": 1,
+ "total_requests": 1500,
+ "overall_success_rate": "96.7%",
+ "total_passthrough": 1200,
+ "total_translations": 300,
+ "overall_passthrough_rate": "80.0%",
+ "total_streaming": 800,
+ "total_non_streaming": 700
+ }
+}
+```
+
+Translators are sorted by request count (most active first), and all rates and latencies use human-readable formatting.
+
+#### Monitoring with the Translator Stats Endpoint
+
+**Watch passthrough efficiency in real-time:**
+
+```bash
+watch -n 10 'curl -s http://localhost:40114/internal/stats/translators | jq ".summary.overall_passthrough_rate"'
+```
+
+**Check fallback reasons for a specific translator:**
+
+```bash
+curl -s http://localhost:40114/internal/stats/translators | \
+ jq '.translators[] | select(.translator_name == "anthropic") | {
+ passthrough_rate,
+ fallback_no_compatible_endpoints,
+ fallback_cannot_passthrough,
+ fallback_translator_does_not_support_passthrough
+ }'
+```
+
+**Alert on low success rate:**
+
+```bash
+#!/bin/bash
+# check_translator_health.sh
+STATS=$(curl -s http://localhost:40114/internal/stats/translators)
+SUCCESS_RATE=$(echo "$STATS" | jq -r '.summary.overall_success_rate' | tr -d '%')
+
+if (( $(echo "$SUCCESS_RATE < 95" | bc -l) )); then
+ echo "WARNING: Translator success rate is $SUCCESS_RATE%"
+ exit 1
+fi
+echo "OK: Translator success rate is $SUCCESS_RATE%"
+exit 0
+```
+
+**Scrape for Prometheus:**
+
+Add the translator stats endpoint to your Prometheus exporter alongside the status endpoint:
+
+```python
+# Add to prometheus_exporter.py
+translator_requests = Gauge('olla_translator_requests_total', 'Total translator requests', ['translator'])
+translator_passthrough_rate = Gauge('olla_translator_passthrough_rate', 'Passthrough rate', ['translator'])
+translator_success_rate = Gauge('olla_translator_success_rate', 'Success rate', ['translator'])
+
+def collect_translator_metrics():
+ resp = requests.get('http://localhost:40114/internal/stats/translators')
+ data = resp.json()
+ for t in data['translators']:
+ name = t['translator_name']
+ translator_requests.labels(translator=name).set(t['total_requests'])
+ # Parse percentage strings for numeric gauge values
+ pt_rate = float(t['passthrough_rate'].rstrip('%'))
+ translator_passthrough_rate.labels(translator=name).set(pt_rate)
+ sr = float(t['success_rate'].rstrip('%'))
+ translator_success_rate.labels(translator=name).set(sr)
+```
+
+See the [System Endpoints API Reference](../../api-reference/system.md#get-internalstatstranslators) for the complete response field reference.
+
+### Implementation Details
+
+Translator metrics are collected using thread-safe `xsync` counters in `internal/adapter/stats/translator_collector.go`. Metrics are recorded at all decision points in the translation handler (`internal/app/handlers/handler_translation.go`), including:
+
+- Early exits (body read errors, transform errors)
+- Endpoint lookup failures
+- Passthrough mode selection
+- Translation mode fallback with reason tracking
+- Request completion (success or failure)
+
## Health Monitoring
### Endpoint Health
@@ -580,6 +750,8 @@ Production monitoring setup:
- [ ] Resource monitoring
- [ ] Circuit breaker alerts
- [ ] Capacity planning metrics
+- [ ] Translator metrics tracking (passthrough/translation rates, fallback reasons)
+- [ ] `X-Olla-Mode` header monitoring for passthrough efficiency
## Next Steps
diff --git a/docs/content/configuration/reference.md b/docs/content/configuration/reference.md
index 36643caa..43a44046 100644
--- a/docs/content/configuration/reference.md
+++ b/docs/content/configuration/reference.md
@@ -38,6 +38,7 @@ server: # HTTP server configuration
proxy: # Proxy engine settings
discovery: # Endpoint discovery
model_registry: # Model management
+translators: # API translation (e.g., Anthropic ↔ OpenAI)
logging: # Logging configuration
engineering: # Debug features
```
@@ -531,6 +532,96 @@ Routing decisions are exposed via response headers:
| `X-Olla-Routing-Decision` | Action taken (routed/fallback/rejected) |
| `X-Olla-Routing-Reason` | Human-readable reason for decision |
+## Translators Configuration
+
+API translation settings. Translators enable clients designed for one API format to work with backends that use a different format.
+
+> :memo: **Anthropic Translation** (v0.0.20+)
+> Enabled by default. Still actively being improved -- please report any issues or feedback.
+
+### Anthropic Translator
+
+The Anthropic translator enables Claude-compatible clients (Claude Code, OpenCode, Crush CLI) to work with OpenAI-compatible backends.
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `enabled` | bool | `true` | Master switch for the Anthropic translator. When `false`, the `/olla/anthropic/v1/*` endpoints do not exist. |
+| `passthrough_enabled` | bool | `true` | Optimisation mode (only applies when `enabled: true`). When `true`, requests are forwarded directly to backends with native Anthropic support for zero translation overhead. When `false`, all requests go through the Anthropic-to-OpenAI translation pipeline regardless of backend capabilities. |
+| `max_message_size` | int | `10485760` | Maximum request body size in bytes (10MB default). |
+
+#### Two-Level Control: `enabled` + `passthrough_enabled`
+
+The Anthropic translator uses a two-level configuration model:
+
+1. **`enabled`** is the master switch. When `false`, the translator is completely disabled and the `passthrough_enabled` setting has no effect. It is `true` by default.
+2. **`passthrough_enabled`** is the optimisation flag. It only takes effect when `enabled: true`.
+
+When both are active, passthrough mode also requires that the backend profile declares native Anthropic support via `api.anthropic_support.enabled: true`. Both conditions must be true for passthrough to activate:
+
+- `translators.anthropic.passthrough_enabled: true` (global configuration)
+- Backend profile has `api.anthropic_support.enabled: true` (per-backend profile)
+
+If either condition is false, Olla falls back to translation mode automatically.
+
+#### Examples
+
+**Enable translator with passthrough (recommended for production)**:
+
+```yaml
+translators:
+ anthropic:
+ enabled: true
+ passthrough_enabled: true # Forward directly to backends with native Anthropic support
+ max_message_size: 10485760 # 10MB
+```
+
+**Enable translator with translation only (useful for debugging/testing)**:
+
+```yaml
+translators:
+ anthropic:
+ enabled: true
+ passthrough_enabled: false # Always translate Anthropic ↔ OpenAI format
+ max_message_size: 10485760
+```
+
+**Disable translator entirely**:
+
+```yaml
+translators:
+ anthropic:
+ enabled: false
+ # passthrough_enabled has no effect when enabled=false
+ passthrough_enabled: true
+```
+
+#### Performance Implications
+
+| Mode | Overhead | When Used |
+|------|----------|-----------|
+| **Passthrough** | Near-zero (~0ms) | `passthrough_enabled: true` and backend has native Anthropic support |
+| **Translation** | ~1-5ms per request | `passthrough_enabled: false`, or backend lacks native Anthropic support |
+| **Disabled** | N/A | `enabled: false` -- endpoints return 404 |
+
+#### Detecting the Active Mode
+
+Check the `X-Olla-Mode` response header:
+
+- `X-Olla-Mode: passthrough` -- passthrough mode was used
+- Header absent -- translation mode was used
+
+#### Inspector (Development Only)
+
+> :no_entry: **Do not enable in production** -- logs full request/response bodies including potentially sensitive user data.
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `inspector.enabled` | bool | `false` | Enable request/response logging |
+| `inspector.output_dir` | string | `"logs/inspector/anthropic"` | Directory for log output |
+| `inspector.session_header` | string | `"X-Session-ID"` | Header for session grouping |
+
+See [Anthropic Inspector](../notes/anthropic-inspector.md) for details.
+
## Logging Configuration
Application logging settings.
@@ -693,6 +784,16 @@ model_registry:
cache_ttl: 10m
custom_rules: []
+translators:
+ anthropic:
+ enabled: true
+ passthrough_enabled: true
+ max_message_size: 10485760 # 10MB
+ inspector:
+ enabled: false
+ output_dir: "logs/inspector/anthropic"
+ session_header: "X-Session-ID"
+
logging:
level: "info"
format: "json"
diff --git a/docs/content/integrations/api-translation/anthropic.md b/docs/content/integrations/api-translation/anthropic.md
index 43ace233..e6c233f6 100644
--- a/docs/content/integrations/api-translation/anthropic.md
+++ b/docs/content/integrations/api-translation/anthropic.md
@@ -67,25 +67,53 @@ Olla's Anthropic API Translation enables Claude-compatible clients (Claude Code,
## Architecture
+Olla supports two modes for handling Anthropic API requests, selected automatically based on backend capabilities:
+
+### Passthrough Mode (When Available)
+
+```
+┌──────────────────┐ ┌────────── Olla ─────────────┐ ┌─────────────────┐
+│ Claude Code │ │ /olla/anthropic/v1/* │ │ vLLM (v0.11.1+) │
+│ OpenCode │──────▶│ │──────▶│ llama.cpp (b4847+)
+│ Crush CLI │ │ 1. Validate request │ │ LM Studio (v0.4.1+)
+│ │ │ 2. Detect native support │ │ Ollama (v0.14.0+)│
+│ (Anthropic API) │ │ 3. Forward as-is │ │ │
+│ │◀──────│ │◀──────│ (Native Anthropic)
+└──────────────────┘ └─────────────────────────────┘ └─────────────────┘
+```
+
+**Passthrough Flow** (zero translation overhead):
+
+1. Client sends Anthropic Messages API request
+2. Olla detects backend has native Anthropic support (via profile config)
+3. Request forwarded directly to backend without any format conversion
+4. Backend processes request natively
+5. Response returned to client as-is
+6. Response includes `X-Olla-Mode: passthrough` header
+
+### Translation Mode (Fallback)
+
```
┌──────────────────┐ ┌────────── Olla ─────────────┐ ┌─────────────────┐
-│ Claude Code │ │ /olla/anthropic/v1/* │ │ Ollama │
-│ OpenCode │──────▶│ │──────▶│ LM Studio │
-│ Crush CLI │ │ 1. Validate request │ │ vLLM │
-│ │ │ 2. Translate to OpenAI │ │ llama.cpp │
+│ Claude Code │ │ /olla/anthropic/v1/* │ │ SGLang │
+│ OpenCode │──────▶│ │──────▶│ LiteLLM │
+│ Crush CLI │ │ 1. Validate request │ │ OpenAI-compatible│
+│ │ │ 2. Translate to OpenAI │ │ │
│ (Anthropic API) │ │ 3. Route to backend │ │ (OpenAI API) │
│ │◀──────│ 4. Translate response │◀──────│ │
│ │ │ 5. Return Anthropic format │ │ │
└──────────────────┘ └─────────────────────────────┘ └─────────────────┘
+```
+
+**Translation Flow** (automatic fallback):
-Request Flow:
1. Client sends Anthropic Messages API request
-2. Olla translates to OpenAI Chat Completions format
-3. Standard Olla routing (load balancing, health checks, failover)
-4. Backend processes request (unaware of Anthropic origin)
-5. Olla translates OpenAI response back to Anthropic format
-6. Client receives Anthropic-formatted response
-```
+2. Olla detects no backends with native Anthropic support
+3. Olla translates to OpenAI Chat Completions format
+4. Standard Olla routing (load balancing, health checks, failover)
+5. Backend processes request (unaware of Anthropic origin)
+6. Olla translates OpenAI response back to Anthropic format
+7. Client receives Anthropic-formatted response
## What Gets Translated
@@ -214,16 +242,18 @@ data: {"type":"message_stop"}
## Configuration
-### Enable Translation
+### Translation Configuration
-Edit your `config.yaml`:
+Anthropic translation is enabled by default. Edit your `config.yaml` to customise:
```yaml
-# Enable Anthropic API translation
+# Anthropic Messages API Translation (v0.0.20+)
+# Enabled by default. Still actively being improved -- please report any issues or feedback.
translators:
anthropic:
- enabled: true # Enable Anthropic translator
+ enabled: true # Enabled by default
max_message_size: 10485760 # Max request size (10MB)
+ passthrough_enabled: true # Enable passthrough for backends with native Anthropic support
# Standard Olla configuration
discovery:
@@ -244,8 +274,9 @@ discovery:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
-| `enabled` | boolean | `false` | Enable Anthropic API translation |
+| `enabled` | boolean | `true` | Enable Anthropic API translation (enabled by default) |
| `max_message_size` | integer | `10485760` | Maximum request size in bytes (10MB default) |
+| `passthrough_enabled` | boolean | `true` | Passthrough optimisation mode. When `true` (default), requests are forwarded directly to backends that declare `anthropic_support` in their profile, with zero translation overhead. When `false`, all requests use translation regardless of backend capabilities. Only applies when `enabled: true`. |
### Multiple Backends
@@ -296,15 +327,11 @@ proxy:
## Quick Start
-### 1. Enable Translation
+### 1. Configure Endpoints
-Create or edit `config.yaml`:
+Anthropic translation is enabled by default. Create or edit `config.yaml` to configure your backends:
```yaml
-translators:
- anthropic:
- enabled: true
-
discovery:
type: static
static:
@@ -487,6 +514,21 @@ curl -X POST http://localhost:40114/olla/anthropic/v1/messages \
## Supported Backends
+### Passthrough-Compatible Backends (Native Anthropic Support)
+
+These backends natively support the Anthropic Messages API and benefit from passthrough mode (zero translation overhead):
+
+| Backend | Min Version | Token Counting | Config Section |
+|---------|-------------|----------------|----------------|
+| **vLLM** | v0.11.1+ | No | `api.anthropic_support` in `config/profiles/vllm.yaml` |
+| **llama.cpp** | b4847+ | Yes | `api.anthropic_support` in `config/profiles/llamacpp.yaml` |
+| **LM Studio** | v0.4.1+ | No | `api.anthropic_support` in `config/profiles/lmstudio.yaml` |
+| **Ollama** | v0.14.0+ | No | `api.anthropic_support` in `config/profiles/ollama.yaml` |
+
+When using these backends, Olla automatically detects native Anthropic support and forwards requests directly. You can verify passthrough mode is active by checking the `X-Olla-Mode: passthrough` response header.
+
+### Translation-Compatible Backends (OpenAI Format)
+
All OpenAI-compatible backends work through translation:
### Local Backends
@@ -673,11 +715,11 @@ Translation quality depends on backend capabilities:
**Symptom**: 404 errors on `/olla/anthropic/v1/*` endpoints
**Solutions**:
-1. Check translation is enabled:
+1. Check translation is enabled (it is by default, but may have been disabled):
```bash
grep -A 3 "translators:" config.yaml
```
- Should show `enabled: true`
+ Should show `enabled: true` (or not be present, as it defaults to `true`)
2. Verify Olla is running:
```bash
@@ -848,6 +890,99 @@ Translation quality depends on backend capabilities:
}
```
+### Passthrough Not Working
+
+**Symptom**: Requests are being translated (no `X-Olla-Mode: passthrough` header) even though your backend supports native Anthropic format.
+
+**Solutions**:
+
+1. **Check `passthrough_enabled` is `true` in config**:
+ ```bash
+ grep -A 5 "translators:" config.yaml
+ ```
+ Should show:
+ ```yaml
+ translators:
+ anthropic:
+ enabled: true
+ passthrough_enabled: true
+ ```
+
+2. **Check backend profile has `anthropic_support.enabled: true`**:
+ Each backend type declares native Anthropic support in its profile YAML under `api.anthropic_support.enabled`. Verify the profile for your backend type (e.g., `config/profiles/vllm.yaml`, `config/profiles/ollama.yaml`) has:
+ ```yaml
+ api:
+ anthropic_support:
+ enabled: true
+ ```
+
+3. **Both conditions must be true** for passthrough to activate:
+ - `translators.anthropic.passthrough_enabled: true` (global config)
+ - Backend profile `api.anthropic_support.enabled: true` (per-backend)
+
+ If either is `false`, Olla falls back to translation mode.
+
+4. **Check all endpoints support passthrough**:
+ Passthrough mode requires that all healthy endpoints' profiles declare `anthropic_support.enabled: true`. If any endpoint lacks native support, Olla falls back to translation for consistency.
+
+5. **Enable debug logging** to see mode selection:
+ ```yaml
+ logging:
+ level: "debug"
+ ```
+ Look for log entries indicating passthrough or translation mode selection.
+
+### Forcing Translation Mode
+
+**Scenario**: You want to force translation mode for debugging or testing, even though backends support native Anthropic format.
+
+**Solution**: Set `passthrough_enabled: false`:
+```yaml
+translators:
+ anthropic:
+ enabled: true
+ passthrough_enabled: false # Force translation mode for all requests
+```
+
+Alternatively, to disable passthrough for a specific backend type only, set `anthropic_support.enabled: false` in that backend's profile:
+```yaml
+# config/profiles/vllm.yaml (custom override)
+api:
+ anthropic_support:
+ enabled: false # Force translation for vLLM backends only
+```
+
+### Detecting Which Mode Was Used
+
+**Scenario**: You need to know whether a request used passthrough or translation mode.
+
+**Solution**: Check the `X-Olla-Mode` response header:
+
+```bash
+curl -s -D - http://localhost:40114/olla/anthropic/v1/messages \
+ -H "Content-Type: application/json" \
+ -d '{"model":"llama3.2:latest","max_tokens":10,"messages":[{"role":"user","content":"hi"}]}' \
+ 2>&1 | grep -i "x-olla-mode"
+```
+
+- **`X-Olla-Mode: passthrough`** -- request was forwarded directly to the backend
+- **Header absent** -- request went through the Anthropic-to-OpenAI translation pipeline
+
+For aggregate statistics across all requests, query the translator stats endpoint:
+
+```bash
+curl -s http://localhost:40114/internal/stats/translators | jq '.translators[] | {
+ translator_name,
+ passthrough_rate,
+ success_rate,
+ total_requests,
+ fallback_no_compatible_endpoints,
+ fallback_cannot_passthrough
+}'
+```
+
+See [Translator Stats API Reference](../../api-reference/system.md#get-internalstatstranslators) for full response details.
+
### High Latency
**Symptom**: Slow response times compared to direct backend access
@@ -880,16 +1015,58 @@ Translation quality depends on backend capabilities:
docker logs olla | grep -i "response_time"
```
+## Backend Profile Configuration (Passthrough)
+
+Native Anthropic support is declared in each backend's profile YAML file under the `api.anthropic_support` section. This is how Olla knows which backends can receive Anthropic requests directly.
+
+### Example Profile Configuration
+
+```yaml
+# config/profiles/vllm.yaml (excerpt)
+api:
+ anthropic_support:
+ enabled: true # Enable native Anthropic support
+ messages_path: /v1/messages # Backend path for Messages API
+ token_count: false # Token counting not supported
+ min_version: "0.11.1" # Minimum vLLM version required
+ limitations:
+ - no_token_counting
+```
+
+### Configuration Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `enabled` | boolean | Yes | Whether the backend supports native Anthropic format |
+| `messages_path` | string | Yes | Backend path for the Anthropic Messages API |
+| `token_count` | boolean | No | Whether `/v1/messages/count_tokens` is supported |
+| `min_version` | string | No | Minimum backend version with Anthropic support |
+| `limitations` | list | No | Known limitations (e.g., `no_token_counting`) |
+
+### Disabling Passthrough
+
+To force translation mode for a specific backend type, create a custom profile override:
+
+```yaml
+# config/profiles/vllm.yaml (custom override to disable passthrough)
+name: vllm
+api:
+ anthropic_support:
+ enabled: false
+```
+
## Performance Notes
### Translation Overhead
Typical overhead per request:
+
+- **Passthrough mode**: Near-zero overhead (no translation)
- **Request translation**: ~0.5-2ms
- **Response translation**: ~1-5ms
- **Streaming**: ~0.1-0.5ms per chunk
-**Total overhead**: Usually <5ms, negligible compared to model inference time.
+**Total overhead**: Usually <5ms for translation mode, negligible compared to model inference time. Passthrough mode eliminates translation overhead entirely.
### Memory Usage
@@ -937,7 +1114,9 @@ Typical overhead per request:
## Related Documentation
- **[Anthropic Messages API Reference](../../api-reference/anthropic.md)** - Complete API documentation
-- **[API Translation Concept](../../concepts/api-translation.md)** - How translation works
+- **[API Translation Concept](../../concepts/api-translation.md)** - How translation and passthrough work
+- **[Profile System](../../concepts/profile-system.md)** - Backend profile configuration (including `anthropic_support`)
+- **[Monitoring](../../configuration/practices/monitoring.md)** - Translator metrics and observability
- **[Claude Code Integration](../frontend/claude-code.md)** - Set up Claude Code
- **[OpenCode Integration](../frontend/opencode.md)** - Set up OpenCode
- **[Crush CLI Integration](../frontend/crush-cli.md)** - Set up Crush CLI
diff --git a/docs/content/integrations/backend/llamacpp.md b/docs/content/integrations/backend/llamacpp.md
index 8d08ca0d..52f8457f 100644
--- a/docs/content/integrations/backend/llamacpp.md
+++ b/docs/content/integrations/backend/llamacpp.md
@@ -37,6 +37,8 @@ keywords: [llamacpp, llama.cpp, Olla proxy, GGUF models, CPU inference, slot man
Tokenisation API
Code Infill (FIM Support)
GGUF Exclusive Format
+ Native Anthropic Messages API (b4847+)
+ Anthropic Token Counting (b4847+)
@@ -154,6 +156,33 @@ discovery:
priority: 60
```
+## Anthropic Messages API Support
+
+llama.cpp b4847+ natively supports the Anthropic Messages API, enabling Olla to forward Anthropic-format requests directly without translation overhead (passthrough mode). Notably, llama.cpp is the **only backend that supports full token counting** via `/v1/messages/count_tokens`, enabling accurate prompt token estimation without making actual inference requests.
+
+When Olla detects that a llama.cpp endpoint supports native Anthropic format (via the `anthropic_support` section in `config/profiles/llamacpp.yaml`), it will bypass the Anthropic-to-OpenAI translation pipeline and forward requests directly to `/v1/messages` on the backend.
+
+**Profile configuration** (from `config/profiles/llamacpp.yaml`):
+
+```yaml
+api:
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: true
+ min_version: "b4847"
+```
+
+**Key details**:
+
+- Minimum llama.cpp version: **b4847**
+- Token counting (`/v1/messages/count_tokens`): **Supported** (unique among backends)
+- Passthrough mode is automatic -- no client-side configuration needed
+- Responses include `X-Olla-Mode: passthrough` header when passthrough is active
+- Falls back to translation mode if passthrough conditions are not met
+
+For more information, see [API Translation](../../concepts/api-translation.md#passthrough-mode) and [Anthropic API Reference](../../api-reference/anthropic.md).
+
## Endpoints Supported
The following 9 inference endpoints are proxied by the llama.cpp integration profile:
diff --git a/docs/content/integrations/backend/lmstudio.md b/docs/content/integrations/backend/lmstudio.md
index 1dd710e1..cebfd924 100644
--- a/docs/content/integrations/backend/lmstudio.md
+++ b/docs/content/integrations/backend/lmstudio.md
@@ -32,6 +32,7 @@ keywords: LM Studio, Olla, LLM proxy, local inference, OpenAI compatible, model
Model Unification
Model Detection & Normalisation
OpenAI API Compatibility
+ Native Anthropic Messages API (v0.4.1+)
@@ -118,6 +119,33 @@ discovery:
priority: 50
```
+## Anthropic Messages API Support
+
+LM Studio v0.4.1+ natively supports the Anthropic Messages API, enabling Olla to forward Anthropic-format requests directly without translation overhead (passthrough mode). This was added specifically for Claude Code integration, enabling native Anthropic API support without requiring translation middleware.
+
+When Olla detects that a LM Studio endpoint supports native Anthropic format (via the `anthropic_support` section in `config/profiles/lmstudio.yaml`), it will bypass the Anthropic-to-OpenAI translation pipeline and forward requests directly to `/v1/messages` on the backend.
+
+**Profile configuration** (from `config/profiles/lmstudio.yaml`):
+
+```yaml
+api:
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: false
+ min_version: "0.4.1"
+```
+
+**Key details**:
+
+- Minimum LM Studio version: **v0.4.1**
+- Token counting (`/v1/messages/count_tokens`): Not supported
+- Passthrough mode is automatic -- no client-side configuration needed
+- Responses include `X-Olla-Mode: passthrough` header when passthrough is active
+- Falls back to translation mode if passthrough conditions are not met
+
+For more information, see [API Translation](../../concepts/api-translation.md#passthrough-mode) and [Anthropic API Reference](../../api-reference/anthropic.md).
+
## Endpoints Supported
The following endpoints are supported by the LM Studio integration profile:
diff --git a/docs/content/integrations/backend/ollama.md b/docs/content/integrations/backend/ollama.md
index ca8dd95f..ccabc319 100644
--- a/docs/content/integrations/backend/ollama.md
+++ b/docs/content/integrations/backend/ollama.md
@@ -33,6 +33,7 @@ keywords: ollama integration, ollama proxy, olla ollama, ollama configuration, o
Model Detection & Normalisation
OpenAI API Compatibility
GGUF Model Support
+ Native Anthropic Messages API (v0.14.0+)
@@ -139,6 +140,38 @@ discovery:
!!! note "Authentication Not Supported"
Olla does not currently support authentication headers for endpoints. If your Ollama server requires authentication, you'll need to use a reverse proxy or wait for this feature to be added.
+## Anthropic Messages API Support
+
+Ollama v0.14.0+ natively supports the Anthropic Messages API, enabling Olla to forward Anthropic-format requests directly without translation overhead (passthrough mode).
+
+When Olla detects that an Ollama endpoint supports native Anthropic format (via the `anthropic_support` section in `config/profiles/ollama.yaml`), it will bypass the Anthropic-to-OpenAI translation pipeline and forward requests directly to `/v1/messages` on the backend.
+
+**Profile configuration** (from `config/profiles/ollama.yaml`):
+
+```yaml
+api:
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: false
+ min_version: "0.14.0"
+ limitations:
+ - token_counting_404
+```
+
+**Key details**:
+
+- Minimum Ollama version: **v0.14.0**
+- Token counting (`/v1/messages/count_tokens`): Not supported (returns 404)
+- Passthrough mode is automatic -- no client-side configuration needed
+- Responses include `X-Olla-Mode: passthrough` header when passthrough is active
+- Falls back to translation mode if passthrough conditions are not met
+
+!!! note "Ollama Anthropic Compatibility"
+ For details on Ollama's Anthropic compatibility, see the [Ollama Anthropic compatibility documentation](https://docs.ollama.com/api/anthropic-compatibility).
+
+For more information, see [API Translation](../../concepts/api-translation.md#passthrough-mode) and [Anthropic API Reference](../../api-reference/anthropic.md).
+
## Endpoints Supported
The following endpoints are supported by the Ollama integration profile:
diff --git a/docs/content/integrations/backend/vllm.md b/docs/content/integrations/backend/vllm.md
index 3cbd85cf..9610c872 100644
--- a/docs/content/integrations/backend/vllm.md
+++ b/docs/content/integrations/backend/vllm.md
@@ -35,6 +35,7 @@ keywords: vLLM, Olla proxy, LLM inference, GPU optimization, PagedAttention, ten
Prometheus Metrics
Tokenisation API
Reranking API
+ Native Anthropic Messages API (v0.11.1+)
@@ -124,6 +125,35 @@ proxy:
load_balancer: "least-connections"
```
+## Anthropic Messages API Support
+
+vLLM v0.11.1+ natively supports the Anthropic Messages API, enabling Olla to forward Anthropic-format requests directly without translation overhead (passthrough mode).
+
+When Olla detects that a vLLM endpoint supports native Anthropic format (via the `anthropic_support` section in `config/profiles/vllm.yaml`), it will bypass the Anthropic-to-OpenAI translation pipeline and forward requests directly to `/v1/messages` on the backend.
+
+**Profile configuration** (from `config/profiles/vllm.yaml`):
+
+```yaml
+api:
+ anthropic_support:
+ enabled: true
+ messages_path: /v1/messages
+ token_count: false
+ min_version: "0.11.1"
+ limitations:
+ - no_token_counting
+```
+
+**Key details**:
+
+- Minimum vLLM version: **v0.11.1**
+- Token counting (`/v1/messages/count_tokens`): Not supported
+- Passthrough mode is automatic -- no client-side configuration needed
+- Responses include `X-Olla-Mode: passthrough` header when passthrough is active
+- Falls back to translation mode if passthrough conditions are not met
+
+For more information, see [API Translation](../../concepts/api-translation.md#passthrough-mode) and [Anthropic API Reference](../../api-reference/anthropic.md).
+
## Endpoints Supported
The following endpoints are supported by the vLLM integration profile:
diff --git a/docs/content/integrations/frontend/claude-code.md b/docs/content/integrations/frontend/claude-code.md
index 24bce91b..c0e3a510 100644
--- a/docs/content/integrations/frontend/claude-code.md
+++ b/docs/content/integrations/frontend/claude-code.md
@@ -200,7 +200,7 @@ proxy:
response_timeout: 1800s # 30 min for long generations
read_timeout: 600s
-# Anthropic API translation (disabled by default)
+# Anthropic Messages API Translation (enabled by default)
translators:
anthropic:
enabled: true
diff --git a/docs/content/usage.md b/docs/content/usage.md
index 17d61f8c..a98a93a0 100644
--- a/docs/content/usage.md
+++ b/docs/content/usage.md
@@ -200,15 +200,10 @@ Use Claude Code, OpenCode, and Crush CLI with local models through Anthropic API
### Quick Setup
-**1. Enable Translation**
+**1. Configure Backends** (Translation is enabled by default)
```yaml
-# config.yaml
-translators:
- anthropic:
- enabled: true
- max_message_size: 10485760 # 10MB
-
+# config.yaml - Anthropic translation is enabled by default
discovery:
static:
endpoints:
diff --git a/internal/adapter/proxy/olla/service_leak_test.go b/internal/adapter/proxy/olla/service_leak_test.go
index f61b4e1e..572f1c61 100644
--- a/internal/adapter/proxy/olla/service_leak_test.go
+++ b/internal/adapter/proxy/olla/service_leak_test.go
@@ -357,6 +357,10 @@ func (m *mockStatsCollector) GetModelStats() map[string]ports.ModelStats { retur
func (m *mockStatsCollector) GetModelEndpointStats() map[string]map[string]ports.EndpointModelStats {
return nil
}
+func (m *mockStatsCollector) RecordTranslatorRequest(event ports.TranslatorRequestEvent) {}
+func (m *mockStatsCollector) GetTranslatorStats() map[string]ports.TranslatorStats {
+ return nil
+}
func (m *mockStatsCollector) GetProxyStats() ports.ProxyStats { return ports.ProxyStats{} }
func (m *mockStatsCollector) GetEndpointStats() map[string]ports.EndpointStats { return nil }
func (m *mockStatsCollector) GetSecurityStats() ports.SecurityStats { return ports.SecurityStats{} }
diff --git a/internal/adapter/registry/profile/factory.go b/internal/adapter/registry/profile/factory.go
index b79f6ca8..f0e211eb 100644
--- a/internal/adapter/registry/profile/factory.go
+++ b/internal/adapter/registry/profile/factory.go
@@ -165,3 +165,23 @@ func (f *Factory) buildPrefixLookup() {
func (f *Factory) GetLoader() *ProfileLoader {
return f.loader
}
+
+// GetAnthropicSupport implements translator.ProfileLookup interface.
+// Returns the Anthropic support configuration for the given endpoint type,
+// or nil if the profile doesn't exist or doesn't declare Anthropic support.
+func (f *Factory) GetAnthropicSupport(endpointType string) *domain.AnthropicSupportConfig {
+ f.mu.RLock()
+ defer f.mu.RUnlock()
+
+ profile, exists := f.loader.GetProfile(endpointType)
+ if !exists {
+ return nil
+ }
+
+ config := profile.GetConfig()
+ if config == nil || config.API.AnthropicSupport == nil {
+ return nil
+ }
+
+ return config.API.AnthropicSupport
+}
diff --git a/internal/adapter/stats/collector.go b/internal/adapter/stats/collector.go
index e56f2f25..99d7ca93 100644
--- a/internal/adapter/stats/collector.go
+++ b/internal/adapter/stats/collector.go
@@ -55,6 +55,9 @@ type Collector struct {
// Model statistics tracking
modelCollector *ModelCollector
+ // Translator statistics tracking
+ translatorCollector *TranslatorCollector
+
// Using xsync.Counter for better performance under high contention
totalRequests *xsync.Counter
successfulRequests *xsync.Counter
@@ -99,6 +102,7 @@ func NewCollectorWithConfig(logger logger.StyledLogger, modelConfig *ModelCollec
endpoints: xsync.NewMap[string, *endpointData](),
lastCleanup: time.Now().UnixNano(),
modelCollector: NewModelCollectorWithConfig(modelConfig),
+ translatorCollector: NewTranslatorCollector(),
totalRequests: xsync.NewCounter(),
successfulRequests: xsync.NewCounter(),
failedRequests: xsync.NewCounter(),
@@ -399,3 +403,13 @@ func (c *Collector) GetModelStats() map[string]ports.ModelStats {
func (c *Collector) GetModelEndpointStats() map[string]map[string]ports.EndpointModelStats {
return c.modelCollector.GetModelEndpointStats()
}
+
+// Translator-specific tracking methods
+
+func (c *Collector) RecordTranslatorRequest(event ports.TranslatorRequestEvent) {
+ c.translatorCollector.Record(event)
+}
+
+func (c *Collector) GetTranslatorStats() map[string]ports.TranslatorStats {
+ return c.translatorCollector.GetStats()
+}
diff --git a/internal/adapter/stats/translator_collector.go b/internal/adapter/stats/translator_collector.go
new file mode 100644
index 00000000..c7ff843c
--- /dev/null
+++ b/internal/adapter/stats/translator_collector.go
@@ -0,0 +1,143 @@
+package stats
+
+import (
+ "github.com/puzpuzpuz/xsync/v4"
+ "github.com/thushan/olla/internal/core/constants"
+ "github.com/thushan/olla/internal/core/ports"
+)
+
+// TranslatorCollector tracks translator-specific statistics
+type TranslatorCollector struct {
+ translators *xsync.Map[string, *translatorData]
+}
+
+// translatorData holds all metrics for a single translator
+type translatorData struct {
+ // Total request counts
+ totalRequests *xsync.Counter
+ successfulRequests *xsync.Counter
+ failedRequests *xsync.Counter
+
+ // Mode breakdown
+ passthroughRequests *xsync.Counter
+ translationRequests *xsync.Counter
+
+ // Streaming breakdown
+ streamingRequests *xsync.Counter
+ nonStreamingRequests *xsync.Counter
+
+ // Fallback reasons
+ fallbackNoCompatibleEndpoints *xsync.Counter
+ fallbackTranslatorDoesNotSupportPassthrough *xsync.Counter
+ fallbackCannotPassthrough *xsync.Counter
+
+ // Performance metrics
+ totalLatency *xsync.Counter
+ name string // translator name
+}
+
+// NewTranslatorCollector creates a new TranslatorCollector
+func NewTranslatorCollector() *TranslatorCollector {
+ return &TranslatorCollector{
+ translators: xsync.NewMap[string, *translatorData](),
+ }
+}
+
+// Record processes a translator request event and updates metrics
+func (tc *TranslatorCollector) Record(event ports.TranslatorRequestEvent) {
+ data := tc.getOrInit(event.TranslatorName)
+
+ // Update total counts
+ data.totalRequests.Inc()
+ if event.Success {
+ data.totalLatency.Add(event.Latency.Milliseconds())
+ data.successfulRequests.Inc()
+ } else {
+ data.failedRequests.Inc()
+ }
+
+ // Update mode
+ switch event.Mode {
+ case constants.TranslatorModePassthrough:
+ data.passthroughRequests.Inc()
+ case constants.TranslatorModeTranslation:
+ data.translationRequests.Inc()
+ }
+
+ // Update streaming
+ if event.IsStreaming {
+ data.streamingRequests.Inc()
+ } else {
+ data.nonStreamingRequests.Inc()
+ }
+
+ // Update fallback reason (only when mode is translation)
+ if event.Mode == constants.TranslatorModeTranslation {
+ switch event.FallbackReason {
+ case constants.FallbackReasonNone:
+ // No fallback occurred, nothing to track
+ case constants.FallbackReasonNoCompatibleEndpoints:
+ data.fallbackNoCompatibleEndpoints.Inc()
+ case constants.FallbackReasonTranslatorDoesNotSupportPassthrough:
+ data.fallbackTranslatorDoesNotSupportPassthrough.Inc()
+ case constants.FallbackReasonCannotPassthrough:
+ data.fallbackCannotPassthrough.Inc()
+ }
+ }
+}
+
+// GetStats returns aggregated statistics for all translators
+func (tc *TranslatorCollector) GetStats() map[string]ports.TranslatorStats {
+ result := make(map[string]ports.TranslatorStats)
+
+ tc.translators.Range(func(name string, data *translatorData) bool {
+ total := data.totalRequests.Value()
+ successful := data.successfulRequests.Value()
+ totalLatency := data.totalLatency.Value()
+
+ var avgLatency int64
+ if successful > 0 {
+ avgLatency = totalLatency / successful
+ }
+
+ result[name] = ports.TranslatorStats{
+ TranslatorName: name,
+ TotalRequests: total,
+ SuccessfulRequests: successful,
+ FailedRequests: data.failedRequests.Value(),
+ PassthroughRequests: data.passthroughRequests.Value(),
+ TranslationRequests: data.translationRequests.Value(),
+ StreamingRequests: data.streamingRequests.Value(),
+ NonStreamingRequests: data.nonStreamingRequests.Value(),
+ FallbackNoCompatibleEndpoints: data.fallbackNoCompatibleEndpoints.Value(),
+ FallbackTranslatorDoesNotSupportPassthrough: data.fallbackTranslatorDoesNotSupportPassthrough.Value(),
+ FallbackCannotPassthrough: data.fallbackCannotPassthrough.Value(),
+ AverageLatency: avgLatency,
+ TotalLatency: totalLatency,
+ }
+ return true
+ })
+
+ return result
+}
+
+// getOrInit returns existing translator data or creates new one
+func (tc *TranslatorCollector) getOrInit(translatorName string) *translatorData {
+ data, _ := tc.translators.LoadOrCompute(translatorName, func() (*translatorData, bool) {
+ return &translatorData{
+ name: translatorName,
+ totalRequests: xsync.NewCounter(),
+ successfulRequests: xsync.NewCounter(),
+ failedRequests: xsync.NewCounter(),
+ passthroughRequests: xsync.NewCounter(),
+ translationRequests: xsync.NewCounter(),
+ streamingRequests: xsync.NewCounter(),
+ nonStreamingRequests: xsync.NewCounter(),
+ fallbackNoCompatibleEndpoints: xsync.NewCounter(),
+ fallbackTranslatorDoesNotSupportPassthrough: xsync.NewCounter(),
+ fallbackCannotPassthrough: xsync.NewCounter(),
+ totalLatency: xsync.NewCounter(),
+ }, false
+ })
+ return data
+}
diff --git a/internal/adapter/stats/translator_collector_test.go b/internal/adapter/stats/translator_collector_test.go
new file mode 100644
index 00000000..9058abe6
--- /dev/null
+++ b/internal/adapter/stats/translator_collector_test.go
@@ -0,0 +1,797 @@
+package stats
+
+import (
+ "sync"
+ "testing"
+ "time"
+
+ "github.com/thushan/olla/internal/core/constants"
+ "github.com/thushan/olla/internal/core/ports"
+)
+
+// TestTranslatorCollector_RecordPassthrough verifies passthrough request tracking
+func TestTranslatorCollector_RecordPassthrough(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record multiple passthrough requests
+ for i := 0; i < 5; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Verify stats
+ stats := collector.GetStats()
+ if len(stats) != 1 {
+ t.Fatalf("Expected 1 translator, got %d", len(stats))
+ }
+
+ anthropicStats, exists := stats["anthropic"]
+ if !exists {
+ t.Fatal("Anthropic translator stats not found")
+ }
+
+ if anthropicStats.TotalRequests != 5 {
+ t.Errorf("Expected 5 total requests, got %d", anthropicStats.TotalRequests)
+ }
+ if anthropicStats.SuccessfulRequests != 5 {
+ t.Errorf("Expected 5 successful requests, got %d", anthropicStats.SuccessfulRequests)
+ }
+ if anthropicStats.FailedRequests != 0 {
+ t.Errorf("Expected 0 failed requests, got %d", anthropicStats.FailedRequests)
+ }
+ if anthropicStats.PassthroughRequests != 5 {
+ t.Errorf("Expected 5 passthrough requests, got %d", anthropicStats.PassthroughRequests)
+ }
+ if anthropicStats.TranslationRequests != 0 {
+ t.Errorf("Expected 0 translation requests, got %d", anthropicStats.TranslationRequests)
+ }
+
+ // Verify no fallback reasons recorded for passthrough
+ if anthropicStats.FallbackNoCompatibleEndpoints != 0 {
+ t.Errorf("Expected 0 fallback no compatible endpoints, got %d", anthropicStats.FallbackNoCompatibleEndpoints)
+ }
+ if anthropicStats.FallbackTranslatorDoesNotSupportPassthrough != 0 {
+ t.Errorf("Expected 0 fallback translator does not support passthrough, got %d", anthropicStats.FallbackTranslatorDoesNotSupportPassthrough)
+ }
+ if anthropicStats.FallbackCannotPassthrough != 0 {
+ t.Errorf("Expected 0 fallback cannot passthrough, got %d", anthropicStats.FallbackCannotPassthrough)
+ }
+}
+
+// TestTranslatorCollector_RecordTranslationWithFallback verifies translation mode and fallback reason tracking
+func TestTranslatorCollector_RecordTranslationWithFallback(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Test each fallback reason
+ testCases := []struct {
+ name string
+ fallbackReason constants.TranslatorFallbackReason
+ count int
+ }{
+ {
+ name: "no compatible endpoints",
+ fallbackReason: constants.FallbackReasonNoCompatibleEndpoints,
+ count: 3,
+ },
+ {
+ name: "translator does not support passthrough",
+ fallbackReason: constants.FallbackReasonTranslatorDoesNotSupportPassthrough,
+ count: 2,
+ },
+ {
+ name: "cannot passthrough",
+ fallbackReason: constants.FallbackReasonCannotPassthrough,
+ count: 4,
+ },
+ }
+
+ totalTranslations := 0
+ for _, tc := range testCases {
+ for i := 0; i < tc.count; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModeTranslation,
+ FallbackReason: tc.fallbackReason,
+ Success: true,
+ Latency: 150 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ totalTranslations++
+ }
+ }
+
+ // Verify stats
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.TotalRequests != int64(totalTranslations) {
+ t.Errorf("Expected %d total requests, got %d", totalTranslations, anthropicStats.TotalRequests)
+ }
+ if anthropicStats.TranslationRequests != int64(totalTranslations) {
+ t.Errorf("Expected %d translation requests, got %d", totalTranslations, anthropicStats.TranslationRequests)
+ }
+ if anthropicStats.PassthroughRequests != 0 {
+ t.Errorf("Expected 0 passthrough requests, got %d", anthropicStats.PassthroughRequests)
+ }
+
+ // Verify fallback reason counters
+ if anthropicStats.FallbackNoCompatibleEndpoints != 3 {
+ t.Errorf("Expected 3 fallback no compatible endpoints, got %d", anthropicStats.FallbackNoCompatibleEndpoints)
+ }
+ if anthropicStats.FallbackTranslatorDoesNotSupportPassthrough != 2 {
+ t.Errorf("Expected 2 fallback translator does not support passthrough, got %d", anthropicStats.FallbackTranslatorDoesNotSupportPassthrough)
+ }
+ if anthropicStats.FallbackCannotPassthrough != 4 {
+ t.Errorf("Expected 4 fallback cannot passthrough, got %d", anthropicStats.FallbackCannotPassthrough)
+ }
+}
+
+// TestTranslatorCollector_ConcurrentAccess verifies thread-safety under concurrent load
+func TestTranslatorCollector_ConcurrentAccess(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ const numGoroutines = 100
+ const requestsPerGoroutine = 50
+
+ var wg sync.WaitGroup
+
+ // Launch goroutines recording passthrough events
+ for i := 0; i < numGoroutines/2; i++ {
+ wg.Add(1)
+ go func() {
+ defer wg.Done()
+ for j := 0; j < requestsPerGoroutine; j++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+ }()
+ }
+
+ // Launch goroutines recording translation events with fallback
+ for i := 0; i < numGoroutines/2; i++ {
+ wg.Add(1)
+ go func() {
+ defer wg.Done()
+ for j := 0; j < requestsPerGoroutine; j++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModeTranslation,
+ FallbackReason: constants.FallbackReasonNoCompatibleEndpoints,
+ Success: true,
+ Latency: 150 * time.Millisecond,
+ IsStreaming: true,
+ }
+ collector.Record(event)
+ }
+ }()
+ }
+
+ wg.Wait()
+
+ // Verify final counts
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ expectedTotal := int64(numGoroutines * requestsPerGoroutine)
+ if anthropicStats.TotalRequests != expectedTotal {
+ t.Errorf("Expected %d total requests, got %d", expectedTotal, anthropicStats.TotalRequests)
+ }
+
+ expectedPassthrough := int64(numGoroutines/2) * int64(requestsPerGoroutine)
+ if anthropicStats.PassthroughRequests != expectedPassthrough {
+ t.Errorf("Expected %d passthrough requests, got %d", expectedPassthrough, anthropicStats.PassthroughRequests)
+ }
+
+ expectedTranslation := int64(numGoroutines/2) * int64(requestsPerGoroutine)
+ if anthropicStats.TranslationRequests != expectedTranslation {
+ t.Errorf("Expected %d translation requests, got %d", expectedTranslation, anthropicStats.TranslationRequests)
+ }
+
+ if anthropicStats.FallbackNoCompatibleEndpoints != expectedTranslation {
+ t.Errorf("Expected %d fallback no compatible endpoints, got %d", expectedTranslation, anthropicStats.FallbackNoCompatibleEndpoints)
+ }
+
+ // Verify success count
+ if anthropicStats.SuccessfulRequests != expectedTotal {
+ t.Errorf("Expected %d successful requests, got %d", expectedTotal, anthropicStats.SuccessfulRequests)
+ }
+}
+
+// TestTranslatorCollector_PassthroughRate verifies passthrough rate calculation
+func TestTranslatorCollector_PassthroughRate(t *testing.T) {
+ testCases := []struct {
+ name string
+ passthroughCount int
+ translationCount int
+ expectedPassthrough int64
+ expectedTranslation int64
+ }{
+ {
+ name: "all passthrough",
+ passthroughCount: 10,
+ translationCount: 0,
+ expectedPassthrough: 10,
+ expectedTranslation: 0,
+ },
+ {
+ name: "all translation",
+ passthroughCount: 0,
+ translationCount: 10,
+ expectedPassthrough: 0,
+ expectedTranslation: 10,
+ },
+ {
+ name: "50/50 split",
+ passthroughCount: 5,
+ translationCount: 5,
+ expectedPassthrough: 5,
+ expectedTranslation: 5,
+ },
+ {
+ name: "70/30 split",
+ passthroughCount: 7,
+ translationCount: 3,
+ expectedPassthrough: 7,
+ expectedTranslation: 3,
+ },
+ {
+ name: "zero requests",
+ passthroughCount: 0,
+ translationCount: 0,
+ expectedPassthrough: 0,
+ expectedTranslation: 0,
+ },
+ }
+
+ for _, tc := range testCases {
+ t.Run(tc.name, func(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record passthrough requests
+ for i := 0; i < tc.passthroughCount; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Record translation requests
+ for i := 0; i < tc.translationCount; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModeTranslation,
+ FallbackReason: constants.FallbackReasonNoCompatibleEndpoints,
+ Success: true,
+ Latency: 150 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Verify stats
+ stats := collector.GetStats()
+
+ // Handle zero requests case
+ if tc.passthroughCount == 0 && tc.translationCount == 0 {
+ if len(stats) != 0 {
+ t.Errorf("Expected 0 translators for zero requests, got %d", len(stats))
+ }
+ return
+ }
+
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.PassthroughRequests != tc.expectedPassthrough {
+ t.Errorf("Expected %d passthrough requests, got %d", tc.expectedPassthrough, anthropicStats.PassthroughRequests)
+ }
+ if anthropicStats.TranslationRequests != tc.expectedTranslation {
+ t.Errorf("Expected %d translation requests, got %d", tc.expectedTranslation, anthropicStats.TranslationRequests)
+ }
+
+ // Calculate and verify passthrough rate
+ total := tc.expectedPassthrough + tc.expectedTranslation
+ expectedRate := float64(0)
+ if total > 0 {
+ expectedRate = float64(tc.expectedPassthrough) / float64(total) * 100
+ }
+
+ actualRate := float64(0)
+ if anthropicStats.TotalRequests > 0 {
+ actualRate = float64(anthropicStats.PassthroughRequests) / float64(anthropicStats.TotalRequests) * 100
+ }
+
+ if actualRate != expectedRate {
+ t.Errorf("Expected passthrough rate %.2f%%, got %.2f%%", expectedRate, actualRate)
+ }
+ })
+ }
+}
+
+// TestTranslatorCollector_MultipleTranslators verifies stats are tracked separately per translator
+func TestTranslatorCollector_MultipleTranslators(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ translators := []struct {
+ name string
+ passthroughReqs int
+ translationReqs int
+ }{
+ {"anthropic", 10, 5},
+ {"openai", 8, 12},
+ {"ollama", 15, 3},
+ }
+
+ // Record events for each translator
+ for _, translator := range translators {
+ // Passthrough requests
+ for i := 0; i < translator.passthroughReqs; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: translator.name,
+ Model: "test-model",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Translation requests
+ for i := 0; i < translator.translationReqs; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: translator.name,
+ Model: "test-model",
+ Mode: constants.TranslatorModeTranslation,
+ FallbackReason: constants.FallbackReasonCannotPassthrough,
+ Success: true,
+ Latency: 150 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+ }
+
+ // Verify GetTranslatorStats returns all translators
+ stats := collector.GetStats()
+ if len(stats) != len(translators) {
+ t.Fatalf("Expected %d translators, got %d", len(translators), len(stats))
+ }
+
+ // Verify stats are tracked separately
+ for _, translator := range translators {
+ translatorStats, exists := stats[translator.name]
+ if !exists {
+ t.Errorf("Stats not found for translator %s", translator.name)
+ continue
+ }
+
+ expectedTotal := int64(translator.passthroughReqs + translator.translationReqs)
+ if translatorStats.TotalRequests != expectedTotal {
+ t.Errorf("Translator %s: expected %d total requests, got %d", translator.name, expectedTotal, translatorStats.TotalRequests)
+ }
+
+ if translatorStats.PassthroughRequests != int64(translator.passthroughReqs) {
+ t.Errorf("Translator %s: expected %d passthrough requests, got %d", translator.name, translator.passthroughReqs, translatorStats.PassthroughRequests)
+ }
+
+ if translatorStats.TranslationRequests != int64(translator.translationReqs) {
+ t.Errorf("Translator %s: expected %d translation requests, got %d", translator.name, translator.translationReqs, translatorStats.TranslationRequests)
+ }
+
+ // Verify fallback counter for translation requests
+ if translatorStats.FallbackCannotPassthrough != int64(translator.translationReqs) {
+ t.Errorf("Translator %s: expected %d fallback cannot passthrough, got %d", translator.name, translator.translationReqs, translatorStats.FallbackCannotPassthrough)
+ }
+ }
+}
+
+// TestTranslatorCollector_StreamingVsNonStreaming verifies streaming mode tracking
+func TestTranslatorCollector_StreamingVsNonStreaming(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record streaming requests
+ for i := 0; i < 7; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: true,
+ }
+ collector.Record(event)
+ }
+
+ // Record non-streaming requests
+ for i := 0; i < 3; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Verify streaming breakdown
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.StreamingRequests != 7 {
+ t.Errorf("Expected 7 streaming requests, got %d", anthropicStats.StreamingRequests)
+ }
+ if anthropicStats.NonStreamingRequests != 3 {
+ t.Errorf("Expected 3 non-streaming requests, got %d", anthropicStats.NonStreamingRequests)
+ }
+ if anthropicStats.TotalRequests != 10 {
+ t.Errorf("Expected 10 total requests, got %d", anthropicStats.TotalRequests)
+ }
+}
+
+// TestTranslatorCollector_SuccessVsError verifies success and error tracking
+func TestTranslatorCollector_SuccessVsError(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record successful requests
+ for i := 0; i < 8; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Record failed requests
+ for i := 0; i < 2; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: false,
+ Latency: 50 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Verify success/failure breakdown
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.SuccessfulRequests != 8 {
+ t.Errorf("Expected 8 successful requests, got %d", anthropicStats.SuccessfulRequests)
+ }
+ if anthropicStats.FailedRequests != 2 {
+ t.Errorf("Expected 2 failed requests, got %d", anthropicStats.FailedRequests)
+ }
+ if anthropicStats.TotalRequests != 10 {
+ t.Errorf("Expected 10 total requests, got %d", anthropicStats.TotalRequests)
+ }
+}
+
+// TestTranslatorCollector_LatencyTracking verifies latency calculation
+func TestTranslatorCollector_LatencyTracking(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record requests with different latencies
+ latencies := []time.Duration{
+ 100 * time.Millisecond,
+ 200 * time.Millisecond,
+ 150 * time.Millisecond,
+ 250 * time.Millisecond,
+ }
+
+ totalLatency := int64(0)
+ for _, latency := range latencies {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: latency,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ totalLatency += latency.Milliseconds()
+ }
+
+ // Verify latency stats
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.TotalLatency != totalLatency {
+ t.Errorf("Expected total latency %dms, got %dms", totalLatency, anthropicStats.TotalLatency)
+ }
+
+ expectedAvg := totalLatency / int64(len(latencies))
+ if anthropicStats.AverageLatency != expectedAvg {
+ t.Errorf("Expected average latency %dms, got %dms", expectedAvg, anthropicStats.AverageLatency)
+ }
+}
+
+// TestTranslatorCollector_LatencyWithFailures verifies latency calculation includes only successful requests
+func TestTranslatorCollector_LatencyWithFailures(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record successful requests
+ for i := 0; i < 3; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Record failed request with higher latency (should NOT be included in total)
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: false,
+ Latency: 500 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+
+ // Verify latency calculation
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ // Total latency includes only successful requests (3 * 100 = 300)
+ expectedTotal := int64(300)
+ if anthropicStats.TotalLatency != expectedTotal {
+ t.Errorf("Expected total latency %dms, got %dms", expectedTotal, anthropicStats.TotalLatency)
+ }
+
+ // Average latency is calculated using successful requests only (300 / 3 = 100)
+ expectedAvg := expectedTotal / 3
+ if anthropicStats.AverageLatency != expectedAvg {
+ t.Errorf("Expected average latency %dms, got %dms", expectedAvg, anthropicStats.AverageLatency)
+ }
+}
+
+// TestTranslatorCollector_ZeroLatency verifies zero latency handling
+func TestTranslatorCollector_ZeroLatency(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record request with zero latency
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: true,
+ Latency: 0,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+
+ // Verify stats
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.TotalLatency != 0 {
+ t.Errorf("Expected total latency 0ms, got %dms", anthropicStats.TotalLatency)
+ }
+ if anthropicStats.AverageLatency != 0 {
+ t.Errorf("Expected average latency 0ms, got %dms", anthropicStats.AverageLatency)
+ }
+}
+
+// TestTranslatorCollector_ZeroSuccessfulRequests verifies latency calculation with zero successful requests
+func TestTranslatorCollector_ZeroSuccessfulRequests(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record only failed requests
+ for i := 0; i < 3; i++ {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNone,
+ Success: false,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Verify average latency is 0 when no successful requests
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.SuccessfulRequests != 0 {
+ t.Errorf("Expected 0 successful requests, got %d", anthropicStats.SuccessfulRequests)
+ }
+ if anthropicStats.AverageLatency != 0 {
+ t.Errorf("Expected average latency 0ms (no successful requests), got %dms", anthropicStats.AverageLatency)
+ }
+}
+
+// TestTranslatorCollector_AllFallbackReasons verifies tracking of all fallback reasons
+func TestTranslatorCollector_AllFallbackReasons(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record translation with each fallback reason
+ fallbackReasons := []constants.TranslatorFallbackReason{
+ constants.FallbackReasonNoCompatibleEndpoints,
+ constants.FallbackReasonTranslatorDoesNotSupportPassthrough,
+ constants.FallbackReasonCannotPassthrough,
+ }
+
+ for _, reason := range fallbackReasons {
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModeTranslation,
+ FallbackReason: reason,
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+ }
+
+ // Verify each fallback reason is counted
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.FallbackNoCompatibleEndpoints != 1 {
+ t.Errorf("Expected 1 fallback no compatible endpoints, got %d", anthropicStats.FallbackNoCompatibleEndpoints)
+ }
+ if anthropicStats.FallbackTranslatorDoesNotSupportPassthrough != 1 {
+ t.Errorf("Expected 1 fallback translator does not support passthrough, got %d", anthropicStats.FallbackTranslatorDoesNotSupportPassthrough)
+ }
+ if anthropicStats.FallbackCannotPassthrough != 1 {
+ t.Errorf("Expected 1 fallback cannot passthrough, got %d", anthropicStats.FallbackCannotPassthrough)
+ }
+}
+
+// TestTranslatorCollector_FallbackReasonOnlyForTranslation verifies fallback reasons are only recorded for translation mode
+func TestTranslatorCollector_FallbackReasonOnlyForTranslation(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Record passthrough request with fallback reason (should be ignored)
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: constants.TranslatorModePassthrough,
+ FallbackReason: constants.FallbackReasonNoCompatibleEndpoints, // Should be ignored
+ Success: true,
+ Latency: 100 * time.Millisecond,
+ IsStreaming: false,
+ }
+ collector.Record(event)
+
+ // Verify no fallback reason recorded for passthrough
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ if anthropicStats.PassthroughRequests != 1 {
+ t.Errorf("Expected 1 passthrough request, got %d", anthropicStats.PassthroughRequests)
+ }
+ if anthropicStats.FallbackNoCompatibleEndpoints != 0 {
+ t.Errorf("Expected 0 fallback (passthrough mode), got %d", anthropicStats.FallbackNoCompatibleEndpoints)
+ }
+}
+
+// TestTranslatorCollector_EmptyStats verifies empty state
+func TestTranslatorCollector_EmptyStats(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ // Verify empty stats
+ stats := collector.GetStats()
+ if len(stats) != 0 {
+ t.Errorf("Expected 0 translators, got %d", len(stats))
+ }
+}
+
+// TestTranslatorCollector_MixedModesConcurrent verifies thread-safety with mixed modes
+func TestTranslatorCollector_MixedModesConcurrent(t *testing.T) {
+ collector := NewTranslatorCollector()
+
+ const numGoroutines = 100
+ const requestsPerGoroutine = 20
+
+ var wg sync.WaitGroup
+
+ // Launch goroutines with mixed modes, streaming states, and success states
+ for i := 0; i < numGoroutines; i++ {
+ wg.Add(1)
+ go func(id int) {
+ defer wg.Done()
+ for j := 0; j < requestsPerGoroutine; j++ {
+ mode := constants.TranslatorModePassthrough
+ fallback := constants.FallbackReasonNone
+ if (id+j)%2 == 0 {
+ mode = constants.TranslatorModeTranslation
+ fallback = constants.FallbackReasonNoCompatibleEndpoints
+ }
+
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: "anthropic",
+ Model: "claude-3-5-sonnet-20241022",
+ Mode: mode,
+ FallbackReason: fallback,
+ Success: (id+j)%3 != 0, // 2/3 success rate
+ Latency: time.Duration(100+id+j) * time.Millisecond,
+ IsStreaming: (id+j)%2 == 0,
+ }
+ collector.Record(event)
+ }
+ }(i)
+ }
+
+ wg.Wait()
+
+ // Verify final stats
+ stats := collector.GetStats()
+ anthropicStats := stats["anthropic"]
+
+ expectedTotal := int64(numGoroutines * requestsPerGoroutine)
+ if anthropicStats.TotalRequests != expectedTotal {
+ t.Errorf("Expected %d total requests, got %d", expectedTotal, anthropicStats.TotalRequests)
+ }
+
+ // Verify total equals sum of successful and failed
+ if anthropicStats.TotalRequests != anthropicStats.SuccessfulRequests+anthropicStats.FailedRequests {
+ t.Errorf("Total requests (%d) should equal successful (%d) + failed (%d)",
+ anthropicStats.TotalRequests, anthropicStats.SuccessfulRequests, anthropicStats.FailedRequests)
+ }
+
+ // Verify total equals sum of passthrough and translation
+ if anthropicStats.TotalRequests != anthropicStats.PassthroughRequests+anthropicStats.TranslationRequests {
+ t.Errorf("Total requests (%d) should equal passthrough (%d) + translation (%d)",
+ anthropicStats.TotalRequests, anthropicStats.PassthroughRequests, anthropicStats.TranslationRequests)
+ }
+
+ // Verify total equals sum of streaming and non-streaming
+ if anthropicStats.TotalRequests != anthropicStats.StreamingRequests+anthropicStats.NonStreamingRequests {
+ t.Errorf("Total requests (%d) should equal streaming (%d) + non-streaming (%d)",
+ anthropicStats.TotalRequests, anthropicStats.StreamingRequests, anthropicStats.NonStreamingRequests)
+ }
+}
diff --git a/internal/adapter/translator/anthropic/passthrough_test.go b/internal/adapter/translator/anthropic/passthrough_test.go
new file mode 100644
index 00000000..ad9654e7
--- /dev/null
+++ b/internal/adapter/translator/anthropic/passthrough_test.go
@@ -0,0 +1,634 @@
+package anthropic
+
+import (
+ "encoding/json"
+ "net/http"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "github.com/thushan/olla/internal/adapter/translator"
+ "github.com/thushan/olla/internal/config"
+ "github.com/thushan/olla/internal/core/domain"
+)
+
+// mockProfileLookup is a test double that implements translator.ProfileLookup
+// Allows configuring AnthropicSupportConfig per endpoint type for testing
+type mockProfileLookup struct {
+ configs map[string]*domain.AnthropicSupportConfig
+}
+
+// GetAnthropicSupport returns the configured AnthropicSupportConfig for the given endpoint type
+func (m *mockProfileLookup) GetAnthropicSupport(endpointType string) *domain.AnthropicSupportConfig {
+ if m.configs == nil {
+ return nil
+ }
+ return m.configs[endpointType]
+}
+
+// newMockProfileLookup creates a new mock profile lookup with empty configuration
+func newMockProfileLookup() *mockProfileLookup {
+ return &mockProfileLookup{
+ configs: make(map[string]*domain.AnthropicSupportConfig),
+ }
+}
+
+// withSupport adds AnthropicSupportConfig for a specific endpoint type
+func (m *mockProfileLookup) withSupport(endpointType string, cfg *domain.AnthropicSupportConfig) *mockProfileLookup {
+ m.configs[endpointType] = cfg
+ return m
+}
+
+// TestCanPassthrough tests the CanPassthrough method with various endpoint configurations
+func TestCanPassthrough(t *testing.T) {
+ tests := []struct {
+ name string
+ passthroughEnabled bool
+ endpoints []*domain.Endpoint
+ profileLookup translator.ProfileLookup
+ want bool
+ description string
+ }{
+ {
+ name: "all_endpoints_support_anthropic",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm"},
+ {Name: "vllm-2", Type: "vllm"},
+ },
+ profileLookup: newMockProfileLookup().withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }),
+ want: true,
+ description: "should return true when all endpoints support Anthropic passthrough",
+ },
+ {
+ name: "mixed_endpoints_some_support_some_dont",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm"},
+ {Name: "ollama-1", Type: "ollama"},
+ },
+ profileLookup: newMockProfileLookup().
+ withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }).
+ withSupport("ollama", nil), // ollama doesn't support
+ want: false,
+ description: "should return false when some endpoints don't support Anthropic",
+ },
+ {
+ name: "no_endpoints_support_anthropic",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{
+ {Name: "ollama-1", Type: "ollama"},
+ {Name: "ollama-2", Type: "ollama"},
+ },
+ profileLookup: newMockProfileLookup().withSupport("ollama", nil),
+ want: false,
+ description: "should return false when no endpoints support Anthropic",
+ },
+ {
+ name: "passthrough_disabled",
+ passthroughEnabled: false,
+ endpoints: []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm"},
+ },
+ profileLookup: newMockProfileLookup().withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }),
+ want: false,
+ description: "should return false when passthrough is disabled even if endpoints support it",
+ },
+ {
+ name: "empty_endpoints_list",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{},
+ profileLookup: newMockProfileLookup(),
+ want: false,
+ description: "should return false when endpoints list is empty",
+ },
+ {
+ name: "nil_anthropic_support_config",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{
+ {Name: "custom-1", Type: "custom"},
+ },
+ profileLookup: newMockProfileLookup(), // no config for "custom"
+ want: false,
+ description: "should return false when AnthropicSupportConfig is nil",
+ },
+ {
+ name: "anthropic_support_disabled",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm"},
+ },
+ profileLookup: newMockProfileLookup().withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: false, // explicitly disabled
+ MessagesPath: "/v1/messages",
+ }),
+ want: false,
+ description: "should return false when AnthropicSupport.Enabled is false",
+ },
+ {
+ name: "multiple_endpoints_all_support",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm"},
+ {Name: "vllm-2", Type: "vllm"},
+ {Name: "sglang-1", Type: "sglang"},
+ {Name: "litellm-1", Type: "litellm"},
+ },
+ profileLookup: newMockProfileLookup().
+ withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }).
+ withSupport("sglang", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }).
+ withSupport("litellm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }),
+ want: true,
+ description: "should return true when multiple different endpoint types all support Anthropic",
+ },
+ {
+ name: "single_endpoint_without_support",
+ passthroughEnabled: true,
+ endpoints: []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm"},
+ {Name: "vllm-2", Type: "vllm"},
+ {Name: "vllm-3", Type: "vllm"},
+ {Name: "unsupported-1", Type: "unsupported"},
+ },
+ profileLookup: newMockProfileLookup().
+ withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }),
+ // "unsupported" type has no config (returns nil)
+ want: false,
+ description: "should return false if even one endpoint doesn't support passthrough",
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ cfg := config.AnthropicTranslatorConfig{
+ Enabled: true,
+ MaxMessageSize: 10 << 20, // 10MB
+ PassthroughEnabled: tt.passthroughEnabled,
+ }
+
+ translator := NewTranslator(createTestLogger(), cfg)
+ result := translator.CanPassthrough(tt.endpoints, tt.profileLookup)
+
+ assert.Equal(t, tt.want, result, tt.description)
+ })
+ }
+}
+
+// TestPreparePassthrough tests the PreparePassthrough method with various request scenarios
+func TestPreparePassthrough(t *testing.T) {
+ tests := []struct {
+ name string
+ requestBody string
+ maxMsgSize int64
+ wantErr bool
+ errContains string
+ validateFunc func(t *testing.T, result *translator.PassthroughRequest)
+ description string
+ }{
+ {
+ name: "valid_anthropic_request",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "Hello, world!"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ assert.NotNil(t, result)
+ assert.Equal(t, "/v1/messages", result.TargetPath)
+ assert.Equal(t, "claude-3-5-sonnet-20241022", result.ModelName)
+ assert.False(t, result.IsStreaming)
+ assert.NotEmpty(t, result.Body)
+
+ // Verify body is preserved unchanged
+ var req AnthropicRequest
+ err := json.Unmarshal(result.Body, &req)
+ require.NoError(t, err)
+ assert.Equal(t, "claude-3-5-sonnet-20241022", req.Model)
+ assert.Equal(t, 1024, req.MaxTokens)
+ assert.Len(t, req.Messages, 1)
+ },
+ description: "should successfully prepare valid Anthropic request",
+ },
+ {
+ name: "invalid_json",
+ requestBody: `{invalid json`,
+ maxMsgSize: 10 << 20,
+ wantErr: true,
+ errContains: "invalid Anthropic request",
+ description: "should return error for invalid JSON",
+ },
+ {
+ name: "missing_model_field",
+ requestBody: `{
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "Hello"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: true,
+ errContains: "model field is required",
+ description: "should return error when model field is missing",
+ },
+ {
+ name: "missing_messages_field",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: true,
+ errContains: "at least one message is required",
+ description: "should return error when messages array is empty",
+ },
+ {
+ name: "request_exceeds_max_message_size",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "This is a test message"}
+ ]
+ }`,
+ maxMsgSize: 50, // Very small limit
+ wantErr: true,
+ errContains: "request body exceeds maximum size",
+ description: "should return error when request exceeds max_message_size",
+ },
+ {
+ name: "empty_body",
+ requestBody: ``,
+ maxMsgSize: 10 << 20,
+ wantErr: true,
+ errContains: "invalid Anthropic request",
+ description: "should return error for empty body",
+ },
+ {
+ name: "streaming_enabled",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "stream": true,
+ "messages": [
+ {"role": "user", "content": "Stream this response"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ assert.True(t, result.IsStreaming)
+ assert.Equal(t, "claude-3-5-sonnet-20241022", result.ModelName)
+ },
+ description: "should correctly extract streaming flag when true",
+ },
+ {
+ name: "streaming_disabled",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "stream": false,
+ "messages": [
+ {"role": "user", "content": "Don't stream this"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ assert.False(t, result.IsStreaming)
+ },
+ description: "should correctly extract streaming flag when false",
+ },
+ {
+ name: "streaming_not_specified",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "No stream field"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ assert.False(t, result.IsStreaming) // defaults to false
+ },
+ description: "should default streaming to false when not specified",
+ },
+ {
+ name: "target_path_is_v1_messages",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "Test"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ assert.Equal(t, "/v1/messages", result.TargetPath)
+ },
+ description: "should set target path to /v1/messages",
+ },
+ {
+ name: "body_preserved_unchanged",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 2048,
+ "temperature": 0.7,
+ "system": "You are a helpful assistant",
+ "messages": [
+ {"role": "user", "content": "Complex request"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ var req AnthropicRequest
+ err := json.Unmarshal(result.Body, &req)
+ require.NoError(t, err)
+
+ // Verify all fields preserved
+ assert.Equal(t, "claude-3-5-sonnet-20241022", req.Model)
+ assert.Equal(t, 2048, req.MaxTokens)
+ assert.NotNil(t, req.Temperature)
+ assert.Equal(t, 0.7, *req.Temperature)
+ assert.NotNil(t, req.System)
+ },
+ description: "should preserve body unchanged with all fields intact",
+ },
+ {
+ name: "model_name_extraction",
+ requestBody: `{
+ "model": "claude-opus-4-20250514",
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "Test model extraction"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ assert.Equal(t, "claude-opus-4-20250514", result.ModelName)
+ },
+ description: "should correctly extract model name from request",
+ },
+ {
+ name: "invalid_max_tokens_zero",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 0,
+ "messages": [
+ {"role": "user", "content": "Test"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: true,
+ errContains: "max_tokens must be at least 1",
+ description: "should return error for invalid max_tokens value",
+ },
+ {
+ name: "invalid_temperature_too_high",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "temperature": 3.0,
+ "messages": [
+ {"role": "user", "content": "Test"}
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: true,
+ errContains: "temperature must be between 0 and 2",
+ description: "should return error for invalid temperature value",
+ },
+ {
+ name: "complex_request_with_tools",
+ requestBody: `{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "What's the weather?"}
+ ],
+ "tools": [
+ {
+ "name": "get_weather",
+ "description": "Get weather information",
+ "input_schema": {
+ "type": "object",
+ "properties": {
+ "location": {"type": "string"}
+ }
+ }
+ }
+ ]
+ }`,
+ maxMsgSize: 10 << 20,
+ wantErr: false,
+ validateFunc: func(t *testing.T, result *translator.PassthroughRequest) {
+ var req AnthropicRequest
+ err := json.Unmarshal(result.Body, &req)
+ require.NoError(t, err)
+ assert.Len(t, req.Tools, 1)
+ assert.Equal(t, "get_weather", req.Tools[0].Name)
+ },
+ description: "should handle complex request with tools",
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ cfg := config.AnthropicTranslatorConfig{
+ Enabled: true,
+ MaxMessageSize: tt.maxMsgSize,
+ PassthroughEnabled: true,
+ }
+
+ translator := NewTranslator(createTestLogger(), cfg)
+
+ // Pre-buffer the body bytes (as the handler does in production)
+ bodyBytes := []byte(tt.requestBody)
+
+ // Create HTTP request -- body is no longer read by PreparePassthrough
+ // but the request is still passed for header access (inspector, session ID)
+ req, err := http.NewRequest("POST", "/olla/anthropic/v1/messages", http.NoBody)
+ require.NoError(t, err)
+
+ // Create a minimal mock ProfileLookup (not used by PreparePassthrough but required by interface)
+ mockLookup := newMockProfileLookup()
+
+ // Execute PreparePassthrough with pre-buffered body
+ result, err := translator.PreparePassthrough(bodyBytes, req, mockLookup)
+
+ if tt.wantErr {
+ assert.Error(t, err, tt.description)
+ if tt.errContains != "" {
+ assert.Contains(t, err.Error(), tt.errContains, tt.description)
+ }
+ assert.Nil(t, result)
+ } else {
+ assert.NoError(t, err, tt.description)
+ assert.NotNil(t, result)
+ if tt.validateFunc != nil {
+ tt.validateFunc(t, result)
+ }
+ }
+ })
+ }
+}
+
+// TestPreparePassthrough_OversizedBody tests that PreparePassthrough rejects bodies exceeding the configured limit.
+// With bodyBytes passed directly, the translator enforces its own size cap rather than relying on LimitReader.
+func TestPreparePassthrough_OversizedBody(t *testing.T) {
+ const maxSize = 100
+
+ cfg := config.AnthropicTranslatorConfig{
+ Enabled: true,
+ MaxMessageSize: maxSize,
+ PassthroughEnabled: true,
+ }
+
+ translator := NewTranslator(createTestLogger(), cfg)
+
+ // Build a body that exceeds maxSize
+ oversizedBody := make([]byte, maxSize+1)
+ for i := range oversizedBody {
+ oversizedBody[i] = 'x'
+ }
+
+ req, err := http.NewRequest("POST", "/olla/anthropic/v1/messages", http.NoBody)
+ require.NoError(t, err)
+
+ mockLookup := newMockProfileLookup()
+
+ result, err := translator.PreparePassthrough(oversizedBody, req, mockLookup)
+
+ assert.Error(t, err)
+ assert.Contains(t, err.Error(), "request body exceeds maximum size")
+ assert.Nil(t, result)
+}
+
+// TestPreparePassthrough_WithInspector tests that inspector logging is called when enabled
+func TestPreparePassthrough_WithInspector(t *testing.T) {
+ // Create a temporary directory for inspector output
+ tempDir := t.TempDir()
+
+ cfg := config.AnthropicTranslatorConfig{
+ Enabled: true,
+ MaxMessageSize: 10 << 20,
+ PassthroughEnabled: true,
+ Inspector: config.InspectorConfig{
+ Enabled: true,
+ OutputDir: tempDir,
+ SessionHeader: "X-Session-ID",
+ },
+ }
+
+ translator := NewTranslator(createTestLogger(), cfg)
+
+ bodyBytes := []byte(`{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": [
+ {"role": "user", "content": "Test with inspector"}
+ ]
+ }`)
+
+ req, err := http.NewRequest("POST", "/olla/anthropic/v1/messages", http.NoBody)
+ require.NoError(t, err)
+ req.Header.Set("X-Session-ID", "test-session-123")
+
+ mockLookup := newMockProfileLookup()
+
+ result, err := translator.PreparePassthrough(bodyBytes, req, mockLookup)
+
+ assert.NoError(t, err)
+ assert.NotNil(t, result)
+ // Note: We don't verify file creation here as that's an implementation detail
+ // The important thing is that PreparePassthrough succeeds with inspector enabled
+}
+
+// TestCanPassthrough_Integration tests the integration between CanPassthrough and realistic endpoint scenarios
+func TestCanPassthrough_Integration(t *testing.T) {
+ t.Run("realistic_vllm_sglang_deployment", func(t *testing.T) {
+ // Simulate a realistic deployment with vLLM and SGLang backends
+ endpoints := []*domain.Endpoint{
+ {Name: "vllm-gpu-1", Type: "vllm"},
+ {Name: "vllm-gpu-2", Type: "vllm"},
+ {Name: "sglang-gpu-1", Type: "sglang"},
+ }
+
+ profileLookup := newMockProfileLookup().
+ withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ }).
+ withSupport("sglang", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ })
+
+ cfg := config.AnthropicTranslatorConfig{
+ Enabled: true,
+ MaxMessageSize: 10 << 20,
+ PassthroughEnabled: true,
+ }
+
+ translator := NewTranslator(createTestLogger(), cfg)
+ result := translator.CanPassthrough(endpoints, profileLookup)
+
+ assert.True(t, result, "should support passthrough for vLLM+SGLang deployment")
+ })
+
+ t.Run("mixed_deployment_with_ollama", func(t *testing.T) {
+ // Simulate a mixed deployment where not all backends support Anthropic
+ endpoints := []*domain.Endpoint{
+ {Name: "vllm-gpu-1", Type: "vllm"},
+ {Name: "ollama-cpu-1", Type: "ollama"}, // Ollama doesn't support Anthropic
+ }
+
+ profileLookup := newMockProfileLookup().
+ withSupport("vllm", &domain.AnthropicSupportConfig{
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ })
+ // Ollama has no Anthropic support (returns nil)
+
+ cfg := config.AnthropicTranslatorConfig{
+ Enabled: true,
+ MaxMessageSize: 10 << 20,
+ PassthroughEnabled: true,
+ }
+
+ translator := NewTranslator(createTestLogger(), cfg)
+ result := translator.CanPassthrough(endpoints, profileLookup)
+
+ assert.False(t, result, "should not support passthrough with mixed backend capabilities")
+ })
+}
diff --git a/internal/adapter/translator/anthropic/translator.go b/internal/adapter/translator/anthropic/translator.go
index 4d40e013..d963258c 100644
--- a/internal/adapter/translator/anthropic/translator.go
+++ b/internal/adapter/translator/anthropic/translator.go
@@ -3,11 +3,14 @@ package anthropic
import (
"bytes"
"encoding/json"
+ "fmt"
"net/http"
"github.com/thushan/olla/internal/adapter/inspector"
+ "github.com/thushan/olla/internal/adapter/translator"
"github.com/thushan/olla/internal/config"
"github.com/thushan/olla/internal/core/constants"
+ "github.com/thushan/olla/internal/core/domain"
"github.com/thushan/olla/internal/logger"
"github.com/thushan/olla/pkg/pool"
)
@@ -72,6 +75,13 @@ func (t *Translator) GetAPIPath() string {
return "/olla/anthropic/v1/messages"
}
+// MaxBodySize implements BodySizeLimiter so the handler can apply the
+// translator's configured limit when reading the request body, rather
+// than hardcoding a value.
+func (t *Translator) MaxBodySize() int64 {
+ return t.maxMessageSize
+}
+
// getSessionID extracts the session ID from the request using a fallback chain.
// this ensures we always have a valid session id for request tracking:
// 1. try the configured session header (for custom session management)
@@ -140,3 +150,80 @@ func (t *Translator) WriteError(w http.ResponseWriter, err error, statusCode int
t.logger.Error("Failed to write error response", "error", encErr)
}
}
+
+// CanPassthrough implements PassthroughCapable interface
+// Determines whether the request can be forwarded directly to backends without translation.
+// Returns true only if passthrough is enabled and ALL endpoints declare native Anthropic support.
+func (t *Translator) CanPassthrough(endpoints []*domain.Endpoint, profileLookup translator.ProfileLookup) bool {
+ // Fast path: if passthrough is disabled, no need to check endpoints
+ if !t.config.PassthroughEnabled {
+ return false
+ }
+
+ // If we have no endpoints, cannot passthrough
+ if len(endpoints) == 0 {
+ return false
+ }
+
+ // Check all endpoints for native Anthropic support
+ // All endpoints must support passthrough - if any endpoint doesn't support it,
+ // we must fall back to translation to ensure the request can be routed to any backend
+ for _, ep := range endpoints {
+ support := profileLookup.GetAnthropicSupport(ep.Type)
+
+ // If support is nil or explicitly disabled, cannot passthrough
+ if support == nil || !support.Enabled {
+ t.logger.Debug("Endpoint does not support Anthropic passthrough",
+ "endpoint", ep.Name,
+ "type", ep.Type)
+ return false
+ }
+ }
+
+ t.logger.Debug("All endpoints support Anthropic passthrough", "count", len(endpoints))
+ return true
+}
+
+// PreparePassthrough implements PassthroughCapable interface.
+// Validates the already-buffered request body for direct forwarding to backends.
+// Returns the original body bytes, target path, model name, and streaming flag.
+// profileLookup is reserved for future per-endpoint path customisation.
+func (t *Translator) PreparePassthrough(bodyBytes []byte, r *http.Request, _ translator.ProfileLookup) (*translator.PassthroughRequest, error) {
+ // Enforce the translator's size limit on the pre-buffered body.
+ // The handler applies its own LimitReader when reading, but we guard
+ // here as well so the translator's configured limit is authoritative.
+ if int64(len(bodyBytes)) > t.maxMessageSize {
+ return nil, fmt.Errorf("request body exceeds maximum size (%d bytes)", t.maxMessageSize)
+ }
+
+ // Validate the request structure
+ var anthropicReq AnthropicRequest
+ if err := json.Unmarshal(bodyBytes, &anthropicReq); err != nil {
+ return nil, fmt.Errorf("invalid Anthropic request: %w", err)
+ }
+
+ // Validate required fields and constraints
+ if err := anthropicReq.Validate(); err != nil {
+ return nil, fmt.Errorf("request validation failed: %w", err)
+ }
+
+ // Log request to inspector if enabled
+ if t.inspector.Enabled() {
+ sessionID := t.getSessionID(r)
+ if lerr := t.inspector.LogRequest(sessionID, anthropicReq.Model, bodyBytes); lerr != nil {
+ t.logger.Warn("Failed to log request to inspector", "error", lerr)
+ }
+ }
+
+ t.logger.Debug("Prepared request for passthrough",
+ "model", anthropicReq.Model,
+ "streaming", anthropicReq.Stream,
+ "body_size", len(bodyBytes))
+
+ return &translator.PassthroughRequest{
+ Body: bodyBytes,
+ TargetPath: "/v1/messages",
+ ModelName: anthropicReq.Model,
+ IsStreaming: anthropicReq.Stream,
+ }, nil
+}
diff --git a/internal/adapter/translator/extract.go b/internal/adapter/translator/extract.go
new file mode 100644
index 00000000..1c202bd5
--- /dev/null
+++ b/internal/adapter/translator/extract.go
@@ -0,0 +1,41 @@
+package translator
+
+import (
+ "fmt"
+
+ "github.com/tidwall/gjson"
+)
+
+// ExtractModelName performs a lightweight extraction of the top-level "model"
+// field from a JSON request body. This exists to avoid a full unmarshal on the
+// hot path -- the handler needs the model name for endpoint filtering and
+// routing decisions before it knows whether passthrough or translation will be
+// used. A full TransformRequest parse is deferred to the translation path only.
+//
+// Uses gjson.GetBytes which scans forward to the first matching key without
+// allocating an intermediate map, making it significantly cheaper than
+// encoding/json.Unmarshal for this single-field lookup.
+func ExtractModelName(body []byte) (string, error) {
+ if len(body) == 0 {
+ return "", fmt.Errorf("empty request body")
+ }
+
+ result := gjson.GetBytes(body, "model")
+
+ if !result.Exists() {
+ return "", fmt.Errorf("model field is required (body may not be valid JSON)")
+ }
+
+ // gjson coerces non-string types via .String() (numbers become "123",
+ // arrays become their raw JSON). We only accept actual JSON strings.
+ if result.Type != gjson.String {
+ return "", fmt.Errorf("model field must be a string, got %s", result.Type)
+ }
+
+ model := result.Str
+ if model == "" {
+ return "", fmt.Errorf("model field must not be empty")
+ }
+
+ return model, nil
+}
diff --git a/internal/adapter/translator/extract_test.go b/internal/adapter/translator/extract_test.go
new file mode 100644
index 00000000..49ca7623
--- /dev/null
+++ b/internal/adapter/translator/extract_test.go
@@ -0,0 +1,220 @@
+package translator
+
+import (
+ "encoding/json"
+ "testing"
+)
+
+func TestExtractModelName(t *testing.T) {
+ t.Parallel()
+
+ tests := []struct {
+ name string
+ body []byte
+ wantModel string
+ wantErr bool
+ }{
+ {
+ name: "standard anthropic request",
+ body: []byte(`{"model":"claude-3-opus-20240229","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}`),
+ wantModel: "claude-3-opus-20240229",
+ },
+ {
+ name: "model field first",
+ body: []byte(`{"model":"gpt-4","temperature":0.7}`),
+ wantModel: "gpt-4",
+ },
+ {
+ name: "model field last",
+ body: []byte(`{"max_tokens":1024,"messages":[],"model":"llama-3.1-70b"}`),
+ wantModel: "llama-3.1-70b",
+ },
+ {
+ name: "model field in middle of large request",
+ body: []byte(`{"messages":[{"role":"user","content":"Tell me a story"}],"model":"claude-3-haiku-20240307","max_tokens":4096,"stream":true,"temperature":0.5}`),
+ wantModel: "claude-3-haiku-20240307",
+ },
+ {
+ name: "model with slashes and colons",
+ body: []byte(`{"model":"org/repo:latest","max_tokens":100}`),
+ wantModel: "org/repo:latest",
+ },
+ {
+ name: "nested model string in content does not match",
+ body: []byte(`{"messages":[{"role":"user","content":"{\"model\":\"wrong\"}"}],"model":"correct-model","max_tokens":100}`),
+ // gjson extracts the top-level "model" field
+ wantModel: "correct-model",
+ },
+ {
+ name: "missing model field",
+ body: []byte(`{"max_tokens":1024,"messages":[]}`),
+ wantErr: true,
+ },
+ {
+ name: "empty model string",
+ body: []byte(`{"model":"","max_tokens":1024}`),
+ wantErr: true,
+ },
+ {
+ name: "model as number",
+ body: []byte(`{"model":42,"max_tokens":1024}`),
+ wantErr: true,
+ },
+ {
+ name: "model as boolean",
+ body: []byte(`{"model":true,"max_tokens":1024}`),
+ wantErr: true,
+ },
+ {
+ name: "model as array",
+ body: []byte(`{"model":["a","b"],"max_tokens":1024}`),
+ wantErr: true,
+ },
+ {
+ name: "model as object",
+ body: []byte(`{"model":{"name":"test"},"max_tokens":1024}`),
+ wantErr: true,
+ },
+ {
+ name: "model as null",
+ body: []byte(`{"model":null,"max_tokens":1024}`),
+ wantErr: true,
+ },
+ {
+ name: "invalid JSON",
+ body: []byte(`{invalid json`),
+ wantErr: true,
+ },
+ {
+ name: "empty body",
+ body: []byte{},
+ wantErr: true,
+ },
+ {
+ name: "nil body",
+ body: nil,
+ wantErr: true,
+ },
+ {
+ name: "just whitespace",
+ body: []byte(` `),
+ wantErr: true,
+ },
+ {
+ name: "empty JSON object",
+ body: []byte(`{}`),
+ wantErr: true,
+ },
+ {
+ name: "HTML error page",
+ body: []byte(`502 Bad Gateway`),
+ wantErr: true,
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ t.Parallel()
+
+ got, err := ExtractModelName(tt.body)
+ if tt.wantErr {
+ if err == nil {
+ t.Errorf("ExtractModelName() expected error, got model=%q", got)
+ }
+ return
+ }
+ if err != nil {
+ t.Errorf("ExtractModelName() unexpected error: %v", err)
+ return
+ }
+ if got != tt.wantModel {
+ t.Errorf("ExtractModelName() = %q, want %q", got, tt.wantModel)
+ }
+ })
+ }
+}
+
+// BenchmarkExtractModelName measures the lightweight gjson-based extraction
+func BenchmarkExtractModelName(b *testing.B) {
+ // Realistic Anthropic Messages API request body
+ body := []byte(`{
+ "model": "claude-3-opus-20240229",
+ "max_tokens": 4096,
+ "system": "You are a helpful assistant.",
+ "messages": [
+ {"role": "user", "content": "What is the capital of France?"}
+ ],
+ "stream": true,
+ "temperature": 0.7
+ }`)
+
+ b.ResetTimer()
+ b.ReportAllocs()
+ for i := 0; i < b.N; i++ {
+ _, _ = ExtractModelName(body)
+ }
+}
+
+// BenchmarkFullUnmarshal compares against a minimal struct unmarshal.
+// Note: This understates the real TransformRequest cost, which also includes:
+// - Field validation with custom rules
+// - Format conversion from Anthropic to OpenAI structure
+// - Message content transformation
+// - System prompt injection
+// - Buffer pool operations
+// The actual performance improvement over TransformRequest is larger than shown here.
+func BenchmarkFullUnmarshal(b *testing.B) {
+ type anthropicReq struct {
+ Model string `json:"model"`
+ MaxTokens int `json:"max_tokens"`
+ System interface{} `json:"system,omitempty"`
+ Messages []struct {
+ Content interface{} `json:"content"`
+ Role string `json:"role"`
+ } `json:"messages"`
+ Stream bool `json:"stream,omitempty"`
+ Temperature *float64 `json:"temperature,omitempty"`
+ }
+
+ body := []byte(`{
+ "model": "claude-3-opus-20240229",
+ "max_tokens": 4096,
+ "system": "You are a helpful assistant.",
+ "messages": [
+ {"role": "user", "content": "What is the capital of France?"}
+ ],
+ "stream": true,
+ "temperature": 0.7
+ }`)
+
+ b.ResetTimer()
+ b.ReportAllocs()
+ for i := 0; i < b.N; i++ {
+ var req anthropicReq
+ _ = json.Unmarshal(body, &req)
+ }
+}
+
+// BenchmarkExtractModelName_LargeBody simulates a larger, multi-turn conversation
+func BenchmarkExtractModelName_LargeBody(b *testing.B) {
+ body := []byte(`{
+ "model": "claude-3-opus-20240229",
+ "max_tokens": 4096,
+ "system": "You are a coding assistant. Help the user write Go code.",
+ "messages": [
+ {"role": "user", "content": "Write me a function that sorts a slice of integers"},
+ {"role": "assistant", "content": "Here is a function that sorts integers:\n\nfunc sortInts(nums []int) []int {\n\tsort.Ints(nums)\n\treturn nums\n}"},
+ {"role": "user", "content": "Now write a benchmark for it"},
+ {"role": "assistant", "content": "Here is a benchmark:\n\nfunc BenchmarkSortInts(b *testing.B) {\n\tnums := make([]int, 1000)\n\tfor i := range nums {\n\t\tnums[i] = rand.Intn(10000)\n\t}\n\tb.ResetTimer()\n\tfor i := 0; i < b.N; i++ {\n\t\tcopy := make([]int, len(nums))\n\t\tcopy(copy, nums)\n\t\tsortInts(copy)\n\t}\n}"},
+ {"role": "user", "content": "Good, now add error handling and make it generic"}
+ ],
+ "stream": true,
+ "temperature": 0.3
+ }`)
+
+ b.ResetTimer()
+ b.ReportAllocs()
+ for i := 0; i < b.N; i++ {
+ _, _ = ExtractModelName(body)
+ }
+}
diff --git a/internal/adapter/translator/types.go b/internal/adapter/translator/types.go
index 41b72a36..be14c879 100644
--- a/internal/adapter/translator/types.go
+++ b/internal/adapter/translator/types.go
@@ -4,6 +4,8 @@ import (
"context"
"io"
"net/http"
+
+ "github.com/thushan/olla/internal/core/domain"
)
// converts between api formats (e.g., anthropic → openai)
@@ -66,3 +68,96 @@ type TokenCountResponse struct {
type ModelsProvider interface {
GetModels(ctx context.Context) (interface{}, error)
}
+
+// PassthroughCapable is an optional interface for translators that can bypass
+// the translation pipeline entirely. When a backend natively speaks the same
+// wire format as the incoming request (e.g. a vLLM instance with Anthropic
+// Messages API support), the request can be forwarded directly -- avoiding
+// the marshalling overhead of Anthropic->OpenAI->Anthropic round-trips.
+//
+// The handler checks CanPassthrough first; if it returns true, it calls
+// PreparePassthrough to obtain the body and target path, then forwards the
+// request to the backend without any translation.
+//
+// This is intentionally a separate interface rather than a method on
+// RequestTranslator so that existing translators remain unaffected and the
+// passthrough decision is opt-in per translator.
+type PassthroughCapable interface {
+ // CanPassthrough inspects the available endpoints (via their profile
+ // configurations) and determines whether all backends can accept
+ // the request in its native format without translation.
+ //
+ // The profileLookup parameter provides access to per-endpoint-type
+ // AnthropicSupportConfig without creating a hard dependency on the
+ // profile registry. This keeps the translator layer decoupled from
+ // the infrastructure layer.
+ //
+ // Thread-safe: implementations must not mutate the endpoints slice.
+ CanPassthrough(endpoints []*domain.Endpoint, profileLookup ProfileLookup) bool
+
+ // PreparePassthrough validates the already-buffered request body for
+ // passthrough eligibility and returns the target backend path, model
+ // name, and streaming flag.
+ //
+ // bodyBytes is the raw request body already read by the handler. This
+ // avoids a redundant io.ReadAll inside the implementation -- the handler
+ // buffers the body once for model extraction and reuses it here.
+ //
+ // The returned targetPath is the backend-relative path (e.g.
+ // "/v1/messages") that the proxy layer should use when forwarding.
+ //
+ // Returns an error if the body is invalid for passthrough (e.g.
+ // exceeds size limits or uses features the backend doesn't support).
+ PreparePassthrough(bodyBytes []byte, r *http.Request, profileLookup ProfileLookup) (*PassthroughRequest, error)
+}
+
+// PassthroughRequest holds the result of preparing a request for direct
+// forwarding to a backend. Separating this into its own struct (rather than
+// returning multiple values) makes it easier to extend in future phases --
+// for example, adding header overrides or endpoint filtering hints.
+type PassthroughRequest struct {
+
+ // TargetPath is the backend-relative API path (e.g. "/v1/messages").
+ // The proxy layer prepends any necessary prefixes.
+ TargetPath string
+
+ // ModelName is extracted from the request body for routing and
+ // observability (populates X-Olla-Model header).
+ ModelName string
+
+ // Body is the original, unmodified request body bytes. The caller
+ // should set r.Body = io.NopCloser(bytes.NewReader(Body)) before
+ // forwarding to the proxy pipeline.
+ Body []byte
+
+ // IsStreaming indicates whether the request has stream:true set,
+ // so the handler can select the appropriate response pipeline.
+ IsStreaming bool
+}
+
+// BodySizeLimiter is an optional interface for translators that declare their
+// maximum acceptable request body size. The handler uses this to apply a
+// per-translator limit when reading the body, rather than hardcoding a value.
+// If a translator does not implement this interface, the handler falls back
+// to a sensible default (10 MiB).
+type BodySizeLimiter interface {
+ MaxBodySize() int64
+}
+
+// ProfileLookup provides access to backend AnthropicSupportConfig without
+// coupling the translator layer to the profile registry implementation.
+// This interface lives in the translator package because it's consumed by
+// PassthroughCapable implementations -- the profile registry in the adapter
+// layer provides the concrete implementation.
+//
+// Designed to be easily mockable for testing: a single method, no side
+// effects, and a return value that's safe to compare against nil.
+type ProfileLookup interface {
+ // GetAnthropicSupport returns the AnthropicSupportConfig for the given
+ // endpoint type (e.g. "vllm", "sglang", "litellm"). Returns nil if the
+ // profile doesn't exist or doesn't declare Anthropic support.
+ //
+ // The endpointType parameter corresponds to domain.Endpoint.Type, which
+ // maps to the profile name loaded from config/profiles/*.yaml.
+ GetAnthropicSupport(endpointType string) *domain.AnthropicSupportConfig
+}
diff --git a/internal/app/handlers/application.go b/internal/app/handlers/application.go
index 7ae35042..ac75b931 100644
--- a/internal/app/handlers/application.go
+++ b/internal/app/handlers/application.go
@@ -81,6 +81,7 @@ type Application struct {
routeRegistry *router.RouteRegistry
converterFactory *converter.ConverterFactory
profileFactory profile.ProfileFactory
+ profileLookup translator.ProfileLookup
translatorRegistry *translator.Registry
server *http.Server
errCh chan error
@@ -159,6 +160,10 @@ func NewApplication(
logger.Info("Anthropic translator disabled via configuration")
}
+ // Use profile factory directly as it implements the ProfileLookup interface
+ // The Factory.GetAnthropicSupport method provides the required functionality
+ profileLookup := profileFactory
+
return &Application{
Config: cfg,
logger: logger,
@@ -171,6 +176,7 @@ func NewApplication(
securityAdapters: securityAdapters,
routeRegistry: routeRegistry,
profileFactory: profileFactory,
+ profileLookup: profileLookup,
converterFactory: converter.NewConverterFactory(),
translatorRegistry: translatorRegistry,
server: server,
@@ -199,6 +205,11 @@ func (a *Application) GetTranslatorRegistry() *translator.Registry {
return a.translatorRegistry
}
+// GetProfileLookup returns the profile lookup adapter for accessing backend profiles
+func (a *Application) GetProfileLookup() translator.ProfileLookup {
+ return a.profileLookup
+}
+
func (a *Application) RegisterRoutes() {
a.registerRoutes()
}
diff --git a/internal/app/handlers/handler_proxy.go b/internal/app/handlers/handler_proxy.go
index a8b0144c..ae5a0ce3 100644
--- a/internal/app/handlers/handler_proxy.go
+++ b/internal/app/handlers/handler_proxy.go
@@ -27,6 +27,8 @@ type proxyRequest struct {
query string
userAgent string
contentLength int64
+ hadError bool
+ isStreaming bool
}
func (a *Application) proxyHandler(w http.ResponseWriter, r *http.Request) {
diff --git a/internal/app/handlers/handler_stats_translators.go b/internal/app/handlers/handler_stats_translators.go
new file mode 100644
index 00000000..d1c8b708
--- /dev/null
+++ b/internal/app/handlers/handler_stats_translators.go
@@ -0,0 +1,161 @@
+package handlers
+
+import (
+ "encoding/json"
+ "net/http"
+ "sort"
+ "time"
+
+ "github.com/thushan/olla/internal/core/constants"
+ "github.com/thushan/olla/internal/core/ports"
+ "github.com/thushan/olla/pkg/format"
+)
+
+type TranslatorStatsResponse struct {
+ Timestamp time.Time `json:"timestamp"`
+ Translators []TranslatorStatsEntry `json:"translators"`
+ Summary TranslatorStatsSummary `json:"summary"`
+}
+
+type TranslatorStatsEntry struct {
+ TranslatorName string `json:"translator_name"`
+ SuccessRate string `json:"success_rate"`
+ PassthroughRate string `json:"passthrough_rate"`
+ AverageLatency string `json:"average_latency"`
+ TotalRequests int64 `json:"total_requests"`
+ SuccessfulRequests int64 `json:"successful_requests"`
+ FailedRequests int64 `json:"failed_requests"`
+ PassthroughRequests int64 `json:"passthrough_requests"`
+ TranslationRequests int64 `json:"translation_requests"`
+ StreamingRequests int64 `json:"streaming_requests"`
+ NonStreamingRequests int64 `json:"non_streaming_requests"`
+ FallbackNoCompatibleEndpoints int64 `json:"fallback_no_compatible_endpoints"`
+ FallbackTranslatorDoesNotSupportPassthrough int64 `json:"fallback_translator_does_not_support_passthrough"`
+ FallbackCannotPassthrough int64 `json:"fallback_cannot_passthrough"`
+}
+
+type TranslatorStatsSummary struct {
+ OverallSuccessRate string `json:"overall_success_rate"`
+ OverallPassthrough string `json:"overall_passthrough_rate"`
+ TotalTranslators int `json:"total_translators"`
+ ActiveTranslators int `json:"active_translators"`
+ TotalRequests int64 `json:"total_requests"`
+ TotalPassthrough int64 `json:"total_passthrough"`
+ TotalTranslations int64 `json:"total_translations"`
+ TotalStreaming int64 `json:"total_streaming"`
+ TotalNonStreaming int64 `json:"total_non_streaming"`
+}
+
+func (a *Application) translatorStatsHandler(w http.ResponseWriter, r *http.Request) {
+ statsCollector := a.statsCollector
+ if statsCollector == nil {
+ http.Error(w, "Stats collector not initialized", http.StatusServiceUnavailable)
+ return
+ }
+
+ translatorStats := statsCollector.GetTranslatorStats()
+
+ translators := a.buildTranslatorStats(translatorStats)
+ summary := a.buildTranslatorSummary(translatorStats)
+
+ response := TranslatorStatsResponse{
+ Timestamp: time.Now(),
+ Translators: translators,
+ Summary: summary,
+ }
+
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ if err := json.NewEncoder(w).Encode(response); err != nil {
+ a.logger.Error("Failed to encode translator stats response", "error", err)
+ }
+}
+
+func (a *Application) buildTranslatorStats(translatorStats map[string]ports.TranslatorStats) []TranslatorStatsEntry {
+ translators := make([]TranslatorStatsEntry, 0, len(translatorStats))
+
+ for _, stats := range translatorStats {
+ successRate := float64(0)
+ if stats.TotalRequests > 0 {
+ successRate = float64(stats.SuccessfulRequests) / float64(stats.TotalRequests) * 100
+ }
+
+ passthroughRate := float64(0)
+ if stats.TotalRequests > 0 {
+ passthroughRate = float64(stats.PassthroughRequests) / float64(stats.TotalRequests) * 100
+ }
+
+ entry := TranslatorStatsEntry{
+ TranslatorName: stats.TranslatorName,
+ TotalRequests: stats.TotalRequests,
+ SuccessfulRequests: stats.SuccessfulRequests,
+ FailedRequests: stats.FailedRequests,
+ SuccessRate: format.Percentage(successRate),
+ PassthroughRequests: stats.PassthroughRequests,
+ TranslationRequests: stats.TranslationRequests,
+ PassthroughRate: format.Percentage(passthroughRate),
+ StreamingRequests: stats.StreamingRequests,
+ NonStreamingRequests: stats.NonStreamingRequests,
+ AverageLatency: format.Latency(stats.AverageLatency),
+ FallbackNoCompatibleEndpoints: stats.FallbackNoCompatibleEndpoints,
+ FallbackTranslatorDoesNotSupportPassthrough: stats.FallbackTranslatorDoesNotSupportPassthrough,
+ FallbackCannotPassthrough: stats.FallbackCannotPassthrough,
+ }
+
+ translators = append(translators, entry)
+ }
+
+ // Sort translators by total request count (most popular first)
+ sort.Slice(translators, func(i, j int) bool {
+ return translators[i].TotalRequests > translators[j].TotalRequests
+ })
+
+ return translators
+}
+
+func (a *Application) buildTranslatorSummary(translatorStats map[string]ports.TranslatorStats) TranslatorStatsSummary {
+ var totalRequests int64
+ var successfulRequests int64
+ var totalPassthrough int64
+ var totalTranslations int64
+ var totalStreaming int64
+ var totalNonStreaming int64
+
+ // Count translators with any requests as active
+ activeCount := 0
+
+ for _, stats := range translatorStats {
+ totalRequests += stats.TotalRequests
+ successfulRequests += stats.SuccessfulRequests
+ totalPassthrough += stats.PassthroughRequests
+ totalTranslations += stats.TranslationRequests
+ totalStreaming += stats.StreamingRequests
+ totalNonStreaming += stats.NonStreamingRequests
+
+ if stats.TotalRequests > 0 {
+ activeCount++
+ }
+ }
+
+ overallSuccessRate := float64(0)
+ if totalRequests > 0 {
+ overallSuccessRate = float64(successfulRequests) / float64(totalRequests) * 100
+ }
+
+ overallPassthroughRate := float64(0)
+ if totalRequests > 0 {
+ overallPassthroughRate = float64(totalPassthrough) / float64(totalRequests) * 100
+ }
+
+ return TranslatorStatsSummary{
+ TotalTranslators: len(translatorStats),
+ ActiveTranslators: activeCount,
+ TotalRequests: totalRequests,
+ OverallSuccessRate: format.Percentage(overallSuccessRate),
+ TotalPassthrough: totalPassthrough,
+ TotalTranslations: totalTranslations,
+ OverallPassthrough: format.Percentage(overallPassthroughRate),
+ TotalStreaming: totalStreaming,
+ TotalNonStreaming: totalNonStreaming,
+ }
+}
diff --git a/internal/app/handlers/handler_stats_translators_test.go b/internal/app/handlers/handler_stats_translators_test.go
new file mode 100644
index 00000000..58271011
--- /dev/null
+++ b/internal/app/handlers/handler_stats_translators_test.go
@@ -0,0 +1,559 @@
+package handlers
+
+import (
+ "context"
+ "encoding/json"
+ "net/http"
+ "net/http/httptest"
+ "net/url"
+ "sync"
+ "testing"
+ "time"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "github.com/thushan/olla/internal/core/domain"
+ "github.com/thushan/olla/internal/core/ports"
+)
+
+// mockTranslatorStatsCollector implements ports.StatsCollector for translator stats testing
+type mockTranslatorStatsCollector struct {
+ translatorStats map[string]ports.TranslatorStats
+}
+
+func (m *mockTranslatorStatsCollector) RecordRequest(endpoint *domain.Endpoint, status string, latency time.Duration, bytes int64) {
+}
+func (m *mockTranslatorStatsCollector) RecordConnection(endpoint *domain.Endpoint, delta int) {}
+func (m *mockTranslatorStatsCollector) RecordSecurityViolation(violation ports.SecurityViolation) {
+}
+func (m *mockTranslatorStatsCollector) RecordDiscovery(endpoint *domain.Endpoint, success bool, latency time.Duration) {
+}
+func (m *mockTranslatorStatsCollector) RecordModelRequest(model string, endpoint *domain.Endpoint, status string, latency time.Duration, bytes int64) {
+}
+func (m *mockTranslatorStatsCollector) RecordModelError(model string, endpoint *domain.Endpoint, errorType string) {
+}
+func (m *mockTranslatorStatsCollector) GetModelStats() map[string]ports.ModelStats { return nil }
+func (m *mockTranslatorStatsCollector) GetModelEndpointStats() map[string]map[string]ports.EndpointModelStats {
+ return nil
+}
+func (m *mockTranslatorStatsCollector) RecordTranslatorRequest(event ports.TranslatorRequestEvent) {
+}
+func (m *mockTranslatorStatsCollector) GetProxyStats() ports.ProxyStats { return ports.ProxyStats{} }
+func (m *mockTranslatorStatsCollector) GetSecurityStats() ports.SecurityStats {
+ return ports.SecurityStats{}
+}
+func (m *mockTranslatorStatsCollector) GetConnectionStats() map[string]int64 { return nil }
+func (m *mockTranslatorStatsCollector) RecordModelTokens(model string, inputTokens, outputTokens int64) {
+}
+func (m *mockTranslatorStatsCollector) GetEndpointStats() map[string]ports.EndpointStats {
+ return nil
+}
+
+func (m *mockTranslatorStatsCollector) GetTranslatorStats() map[string]ports.TranslatorStats {
+ if m.translatorStats == nil {
+ return make(map[string]ports.TranslatorStats)
+ }
+ return m.translatorStats
+}
+
+// mockTranslatorEndpointRepository for translator stats testing
+type mockTranslatorEndpointRepository struct{}
+
+func (m *mockTranslatorEndpointRepository) GetAll(ctx context.Context) ([]*domain.Endpoint, error) {
+ return nil, nil
+}
+func (m *mockTranslatorEndpointRepository) GetHealthy(ctx context.Context) ([]*domain.Endpoint, error) {
+ return nil, nil
+}
+func (m *mockTranslatorEndpointRepository) GetRoutable(ctx context.Context) ([]*domain.Endpoint, error) {
+ return nil, nil
+}
+func (m *mockTranslatorEndpointRepository) UpdateEndpoint(ctx context.Context, endpoint *domain.Endpoint) error {
+ return nil
+}
+func (m *mockTranslatorEndpointRepository) Exists(ctx context.Context, endpointURL *url.URL) bool {
+ return false
+}
+
+// mockTranslatorModelRegistry for translator stats testing
+type mockTranslatorModelRegistry struct{}
+
+func (m *mockTranslatorModelRegistry) RegisterModel(ctx context.Context, endpointURL string, model *domain.ModelInfo) error {
+ return nil
+}
+func (m *mockTranslatorModelRegistry) RegisterModels(ctx context.Context, endpointURL string, models []*domain.ModelInfo) error {
+ return nil
+}
+func (m *mockTranslatorModelRegistry) GetModelsForEndpoint(ctx context.Context, endpointURL string) ([]*domain.ModelInfo, error) {
+ return nil, nil
+}
+func (m *mockTranslatorModelRegistry) GetEndpointsForModel(ctx context.Context, modelName string) ([]string, error) {
+ return nil, nil
+}
+func (m *mockTranslatorModelRegistry) IsModelAvailable(ctx context.Context, modelName string) bool {
+ return false
+}
+func (m *mockTranslatorModelRegistry) GetAllModels(ctx context.Context) (map[string][]*domain.ModelInfo, error) {
+ return nil, nil
+}
+func (m *mockTranslatorModelRegistry) RemoveEndpoint(ctx context.Context, endpointURL string) error {
+ return nil
+}
+func (m *mockTranslatorModelRegistry) GetStats(ctx context.Context) (domain.RegistryStats, error) {
+ return domain.RegistryStats{}, nil
+}
+func (m *mockTranslatorModelRegistry) ModelsToString(models []*domain.ModelInfo) string {
+ return ""
+}
+func (m *mockTranslatorModelRegistry) ModelsToStrings(models []*domain.ModelInfo) []string {
+ return nil
+}
+func (m *mockTranslatorModelRegistry) GetModelsByCapability(ctx context.Context, capability string) ([]*domain.UnifiedModel, error) {
+ return nil, nil
+}
+func (m *mockTranslatorModelRegistry) GetRoutableEndpointsForModel(ctx context.Context, modelName string, healthyEndpoints []*domain.Endpoint) ([]*domain.Endpoint, *domain.ModelRoutingDecision, error) {
+ return nil, nil, nil
+}
+func (m *mockTranslatorModelRegistry) GetEndpointModelMap(ctx context.Context) (map[string]*domain.EndpointModels, error) {
+ return nil, nil
+}
+
+// createTestTranslatorStatsApplication creates a minimal Application for translator stats testing
+func createTestTranslatorStatsApplication(translatorStats map[string]ports.TranslatorStats) *Application {
+ repo := &mockTranslatorEndpointRepository{}
+ stats := &mockTranslatorStatsCollector{translatorStats: translatorStats}
+ registry := &mockTranslatorModelRegistry{}
+
+ return &Application{
+ repository: repo,
+ statsCollector: stats,
+ modelRegistry: registry,
+ StartTime: time.Now(),
+ }
+}
+
+func TestTranslatorStatsHandler_BasicFunctionality(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "anthropic": {
+ TranslatorName: "anthropic",
+ TotalRequests: 100,
+ SuccessfulRequests: 95,
+ FailedRequests: 5,
+ PassthroughRequests: 80,
+ TranslationRequests: 20,
+ StreamingRequests: 60,
+ NonStreamingRequests: 40,
+ AverageLatency: 245,
+ FallbackNoCompatibleEndpoints: 10,
+ FallbackTranslatorDoesNotSupportPassthrough: 5,
+ FallbackCannotPassthrough: 5,
+ },
+ "openai": {
+ TranslatorName: "openai",
+ TotalRequests: 50,
+ SuccessfulRequests: 48,
+ FailedRequests: 2,
+ PassthroughRequests: 50,
+ TranslationRequests: 0,
+ StreamingRequests: 30,
+ NonStreamingRequests: 20,
+ AverageLatency: 150,
+ FallbackNoCompatibleEndpoints: 0,
+ FallbackTranslatorDoesNotSupportPassthrough: 0,
+ FallbackCannotPassthrough: 0,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ assert.Equal(t, http.StatusOK, w.Code)
+ assert.Contains(t, w.Header().Get("Content-Type"), "application/json")
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Verify response structure
+ assert.Len(t, response.Translators, 2)
+ assert.Equal(t, 2, response.Summary.TotalTranslators)
+ assert.Equal(t, 2, response.Summary.ActiveTranslators)
+ assert.Equal(t, int64(150), response.Summary.TotalRequests)
+ assert.Equal(t, int64(130), response.Summary.TotalPassthrough)
+ assert.Equal(t, int64(20), response.Summary.TotalTranslations)
+
+ // Verify translators are sorted by request count (anthropic first with 100 requests)
+ assert.Equal(t, "anthropic", response.Translators[0].TranslatorName)
+ assert.Equal(t, int64(100), response.Translators[0].TotalRequests)
+ assert.Equal(t, "openai", response.Translators[1].TranslatorName)
+ assert.Equal(t, int64(50), response.Translators[1].TotalRequests)
+
+ // Verify formatted values
+ assert.Equal(t, "95.0%", response.Translators[0].SuccessRate)
+ assert.Equal(t, "80.0%", response.Translators[0].PassthroughRate)
+ assert.Equal(t, "245ms", response.Translators[0].AverageLatency)
+}
+
+func TestTranslatorStatsHandler_EmptyTranslators(t *testing.T) {
+ app := createTestTranslatorStatsApplication(map[string]ports.TranslatorStats{})
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ assert.Equal(t, http.StatusOK, w.Code)
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Verify empty response
+ assert.Empty(t, response.Translators)
+ assert.Equal(t, 0, response.Summary.TotalTranslators)
+ assert.Equal(t, 0, response.Summary.ActiveTranslators)
+ assert.Equal(t, int64(0), response.Summary.TotalRequests)
+
+ // Verify zero values are formatted correctly
+ assert.Equal(t, "0%", response.Summary.OverallSuccessRate)
+ assert.Equal(t, "0%", response.Summary.OverallPassthrough)
+}
+
+func TestTranslatorStatsHandler_NilStatsCollector(t *testing.T) {
+ app := &Application{
+ statsCollector: nil,
+ }
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ assert.Equal(t, http.StatusServiceUnavailable, w.Code)
+ assert.Contains(t, w.Body.String(), "Stats collector not initialized")
+}
+
+func TestTranslatorStatsHandler_SortingByRequestCount(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "low-usage": {
+ TranslatorName: "low-usage",
+ TotalRequests: 10,
+ SuccessfulRequests: 10,
+ },
+ "high-usage": {
+ TranslatorName: "high-usage",
+ TotalRequests: 1000,
+ SuccessfulRequests: 950,
+ },
+ "medium-usage": {
+ TranslatorName: "medium-usage",
+ TotalRequests: 100,
+ SuccessfulRequests: 95,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Verify sorting: highest request count first
+ assert.Equal(t, "high-usage", response.Translators[0].TranslatorName)
+ assert.Equal(t, int64(1000), response.Translators[0].TotalRequests)
+ assert.Equal(t, "medium-usage", response.Translators[1].TranslatorName)
+ assert.Equal(t, int64(100), response.Translators[1].TotalRequests)
+ assert.Equal(t, "low-usage", response.Translators[2].TranslatorName)
+ assert.Equal(t, int64(10), response.Translators[2].TotalRequests)
+}
+
+func TestTranslatorStatsHandler_SuccessRateCalculation(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "perfect": {
+ TranslatorName: "perfect",
+ TotalRequests: 100,
+ SuccessfulRequests: 100,
+ FailedRequests: 0,
+ },
+ "mixed": {
+ TranslatorName: "mixed",
+ TotalRequests: 100,
+ SuccessfulRequests: 75,
+ FailedRequests: 25,
+ },
+ "zero": {
+ TranslatorName: "zero",
+ TotalRequests: 0,
+ SuccessfulRequests: 0,
+ FailedRequests: 0,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Find each translator in response
+ statsMap := make(map[string]TranslatorStatsEntry)
+ for _, entry := range response.Translators {
+ statsMap[entry.TranslatorName] = entry
+ }
+
+ // Verify success rate formatting
+ assert.Equal(t, "100%", statsMap["perfect"].SuccessRate)
+ assert.Equal(t, "75.0%", statsMap["mixed"].SuccessRate)
+ assert.Equal(t, "0%", statsMap["zero"].SuccessRate)
+}
+
+func TestTranslatorStatsHandler_LatencyFormatting(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "fast": {
+ TranslatorName: "fast",
+ TotalRequests: 1,
+ AverageLatency: 5, // 5ms
+ },
+ "medium": {
+ TranslatorName: "medium",
+ TotalRequests: 1,
+ AverageLatency: 245, // 245ms
+ },
+ "slow": {
+ TranslatorName: "slow",
+ TotalRequests: 1,
+ AverageLatency: 367500, // 367.5s
+ },
+ "zero": {
+ TranslatorName: "zero",
+ TotalRequests: 1,
+ AverageLatency: 0,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Find each translator in response
+ statsMap := make(map[string]TranslatorStatsEntry)
+ for _, entry := range response.Translators {
+ statsMap[entry.TranslatorName] = entry
+ }
+
+ // Verify latency formatting
+ assert.Equal(t, "5ms", statsMap["fast"].AverageLatency)
+ assert.Equal(t, "245ms", statsMap["medium"].AverageLatency)
+ assert.Equal(t, "367.5s", statsMap["slow"].AverageLatency)
+ assert.Equal(t, "0ms", statsMap["zero"].AverageLatency)
+}
+
+func TestTranslatorStatsHandler_SummaryPassthroughRate(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "translator1": {
+ TranslatorName: "translator1",
+ TotalRequests: 100,
+ PassthroughRequests: 80,
+ TranslationRequests: 20,
+ },
+ "translator2": {
+ TranslatorName: "translator2",
+ TotalRequests: 50,
+ PassthroughRequests: 50,
+ TranslationRequests: 0,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Total: 150 requests, 130 passthrough = 86.666...% ≈ 86.7%
+ assert.Equal(t, int64(150), response.Summary.TotalRequests)
+ assert.Equal(t, int64(130), response.Summary.TotalPassthrough)
+ assert.Equal(t, "86.7%", response.Summary.OverallPassthrough)
+
+ // Individual passthrough rates
+ statsMap := make(map[string]TranslatorStatsEntry)
+ for _, entry := range response.Translators {
+ statsMap[entry.TranslatorName] = entry
+ }
+ assert.Equal(t, "80.0%", statsMap["translator1"].PassthroughRate)
+ assert.Equal(t, "100%", statsMap["translator2"].PassthroughRate)
+}
+
+func TestTranslatorStatsHandler_ActiveTranslatorCount(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "active1": {
+ TranslatorName: "active1",
+ TotalRequests: 100,
+ },
+ "active2": {
+ TranslatorName: "active2",
+ TotalRequests: 50,
+ },
+ "inactive": {
+ TranslatorName: "inactive",
+ TotalRequests: 0,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Verify total vs active count
+ assert.Equal(t, 3, response.Summary.TotalTranslators)
+ assert.Equal(t, 2, response.Summary.ActiveTranslators) // Only active1 and active2 have requests
+}
+
+func TestTranslatorStatsHandler_FallbackReasonBreakdown(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "translator": {
+ TranslatorName: "translator",
+ TotalRequests: 100,
+ FallbackNoCompatibleEndpoints: 15,
+ FallbackTranslatorDoesNotSupportPassthrough: 10,
+ FallbackCannotPassthrough: 5,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ var response TranslatorStatsResponse
+ err := json.NewDecoder(w.Body).Decode(&response)
+ require.NoError(t, err)
+
+ // Verify all fallback fields are present
+ assert.Len(t, response.Translators, 1)
+ entry := response.Translators[0]
+ assert.Equal(t, int64(15), entry.FallbackNoCompatibleEndpoints)
+ assert.Equal(t, int64(10), entry.FallbackTranslatorDoesNotSupportPassthrough)
+ assert.Equal(t, int64(5), entry.FallbackCannotPassthrough)
+}
+
+func TestTranslatorStatsHandler_Concurrent(t *testing.T) {
+ translatorStats := map[string]ports.TranslatorStats{
+ "translator1": {
+ TranslatorName: "translator1",
+ TotalRequests: 100,
+ SuccessfulRequests: 95,
+ PassthroughRequests: 80,
+ AverageLatency: 150,
+ },
+ "translator2": {
+ TranslatorName: "translator2",
+ TotalRequests: 50,
+ SuccessfulRequests: 48,
+ PassthroughRequests: 50,
+ AverageLatency: 200,
+ },
+ "translator3": {
+ TranslatorName: "translator3",
+ TotalRequests: 25,
+ SuccessfulRequests: 20,
+ PassthroughRequests: 15,
+ AverageLatency: 300,
+ },
+ }
+
+ app := createTestTranslatorStatsApplication(translatorStats)
+
+ // Run 20 concurrent requests to stress test for race conditions
+ const numRequests = 20
+ var wg sync.WaitGroup
+ errors := make(chan error, numRequests)
+ results := make(chan int, numRequests)
+
+ for i := 0; i < numRequests; i++ {
+ wg.Add(1)
+ go func() {
+ defer wg.Done()
+
+ req := httptest.NewRequest(http.MethodGet, "/internal/stats/translators", nil)
+ w := httptest.NewRecorder()
+
+ app.translatorStatsHandler(w, req)
+
+ if w.Code != http.StatusOK {
+ errors <- assert.AnError
+ return
+ }
+
+ var response TranslatorStatsResponse
+ if err := json.NewDecoder(w.Body).Decode(&response); err != nil {
+ errors <- err
+ return
+ }
+
+ // Verify response integrity
+ if len(response.Translators) != 3 {
+ errors <- assert.AnError
+ return
+ }
+
+ if response.Summary.TotalTranslators != 3 {
+ errors <- assert.AnError
+ return
+ }
+
+ results <- w.Code
+ }()
+ }
+
+ wg.Wait()
+ close(errors)
+ close(results)
+
+ // Check for any errors
+ for err := range errors {
+ require.NoError(t, err, "Concurrent request failed")
+ }
+
+ // Verify all requests succeeded
+ successCount := 0
+ for range results {
+ successCount++
+ }
+ assert.Equal(t, numRequests, successCount, "All concurrent requests should succeed")
+}
diff --git a/internal/app/handlers/handler_status_endpoints_test.go b/internal/app/handlers/handler_status_endpoints_test.go
index baca0648..99b8511b 100644
--- a/internal/app/handlers/handler_status_endpoints_test.go
+++ b/internal/app/handlers/handler_status_endpoints_test.go
@@ -84,6 +84,10 @@ func (m *mockStatusStatsCollector) GetModelStats() map[string]ports.ModelStats {
func (m *mockStatusStatsCollector) GetModelEndpointStats() map[string]map[string]ports.EndpointModelStats {
return nil
}
+func (m *mockStatusStatsCollector) RecordTranslatorRequest(event ports.TranslatorRequestEvent) {}
+func (m *mockStatusStatsCollector) GetTranslatorStats() map[string]ports.TranslatorStats {
+ return nil
+}
func (m *mockStatusStatsCollector) GetProxyStats() ports.ProxyStats { return ports.ProxyStats{} }
func (m *mockStatusStatsCollector) GetSecurityStats() ports.SecurityStats {
return ports.SecurityStats{}
diff --git a/internal/app/handlers/handler_translation.go b/internal/app/handlers/handler_translation.go
index 0a5607bc..a3ac2c20 100644
--- a/internal/app/handlers/handler_translation.go
+++ b/internal/app/handlers/handler_translation.go
@@ -7,74 +7,217 @@ import (
"fmt"
"io"
"net/http"
+ "time"
"github.com/thushan/olla/internal/adapter/translator"
"github.com/thushan/olla/internal/core/constants"
"github.com/thushan/olla/internal/core/domain"
+ "github.com/thushan/olla/internal/core/ports"
"github.com/thushan/olla/internal/util"
)
+// executePassthroughRequest handles requests that can be forwarded directly to backends
+// without translation (e.g. Anthropic API requests to vLLM with native Anthropic support).
+// bodyBytes is the pre-buffered request body from the handler, passed through to avoid re-reading.
+func (a *Application) executePassthroughRequest(
+ ctx context.Context,
+ w http.ResponseWriter,
+ r *http.Request,
+ bodyBytes []byte,
+ endpoints []*domain.Endpoint,
+ pr *proxyRequest,
+ trans translator.RequestTranslator,
+) {
+ // Get passthrough request details
+ passthroughTrans, ok := trans.(translator.PassthroughCapable)
+ if !ok {
+ // This should never happen since we checked the interface before calling this function
+ a.writeTranslatorError(w, trans, pr, fmt.Errorf("translator does not support passthrough"), http.StatusInternalServerError)
+ return
+ }
+
+ passthroughReq, err := passthroughTrans.PreparePassthrough(bodyBytes, r, a.profileLookup)
+ if err != nil {
+ a.writeTranslatorError(w, trans, pr, err, http.StatusBadRequest)
+ return
+ }
+
+ // Update proxy request details - capture streaming flag for accurate metrics
+ // (StreamingMs isn't populated in passthrough mode since we don't intercept the stream)
+ pr.isStreaming = passthroughReq.IsStreaming
+
+ pr.requestLogger.Info("using passthrough mode (native Anthropic support)",
+ "model", passthroughReq.ModelName,
+ "streaming", passthroughReq.IsStreaming,
+ "endpoints", len(endpoints))
+
+ // Set request body and path
+ r.Body = io.NopCloser(bytes.NewReader(passthroughReq.Body))
+ r.ContentLength = int64(len(passthroughReq.Body))
+ r.URL.Path = passthroughReq.TargetPath
+
+ // Add passthrough mode header for observability
+ w.Header().Set("X-Olla-Mode", "passthrough")
+
+ // Prepare context
+ ctx, r = a.prepareProxyContext(ctx, r, pr)
+
+ // Log request start
+ a.logRequestStart(pr, len(endpoints))
+
+ // Execute proxy
+ err = a.proxyService.ProxyRequestToEndpoints(ctx, w, r, endpoints, pr.stats, pr.requestLogger)
+
+ a.logRequestResult(pr, err)
+
+ if err != nil {
+ // only write error if response hasn't started
+ if w.Header().Get(constants.HeaderContentType) == "" {
+ a.writeTranslatorError(w, trans, pr, fmt.Errorf("proxy error: %w", err), http.StatusBadGateway)
+ }
+ }
+
+ pr.stats.EndTime = time.Now()
+}
+
+// executeTranslationRequest handles the translation path where requests are converted
+// from the translator's native format (e.g. Anthropic) to OpenAI format for the backend
+func (a *Application) executeTranslationRequest(
+ ctx context.Context,
+ w http.ResponseWriter,
+ r *http.Request,
+ endpoints []*domain.Endpoint,
+ pr *proxyRequest,
+ trans translator.RequestTranslator,
+ transformedReq *translator.TransformedRequest,
+) {
+ // Capture streaming flag for metrics before proxying
+ pr.isStreaming = transformedReq.IsStreaming
+
+ // Serialize OpenAI request
+ openaiBody, err := json.Marshal(transformedReq.OpenAIRequest)
+ if err != nil {
+ a.writeTranslatorError(w, trans, pr, fmt.Errorf("failed to serialize request"), http.StatusInternalServerError)
+ return
+ }
+
+ r.Body = io.NopCloser(bytes.NewReader(openaiBody))
+ r.ContentLength = int64(len(openaiBody))
+
+ // Handle path translation if specified
+ if transformedReq.TargetPath != "" {
+ targetPath := util.StripPrefix(transformedReq.TargetPath, constants.DefaultOllaProxyPathPrefix)
+
+ if targetPath != transformedReq.TargetPath {
+ pr.requestLogger.Warn("TargetPath included proxy prefix, stripped it",
+ "translator", trans.Name(),
+ "proxy_prefix", constants.DefaultOllaProxyPathPrefix,
+ "original_target", transformedReq.TargetPath,
+ "corrected_target", targetPath)
+ }
+
+ pr.requestLogger.Debug("Path translation applied",
+ "original_path", r.URL.Path,
+ "target_path", targetPath,
+ "translator", trans.Name())
+ r.URL.Path = targetPath
+ } else if trans.Name() != "passthrough" {
+ // warn if translator might need path translation (passthrough can ignore)
+ pr.requestLogger.Warn("Translator did not set TargetPath, using original path",
+ "translator", trans.Name(),
+ "original_path", r.URL.Path,
+ "note", "This may cause routing issues if translation requires different endpoint")
+ }
+
+ a.logRequestStart(pr, len(endpoints))
+
+ // Execute proxy with appropriate response handling (streaming vs non-streaming)
+ var proxyErr error
+ if transformedReq.IsStreaming {
+ proxyErr = a.executeTranslatedStreamingRequest(ctx, w, r, endpoints, pr, trans)
+ } else {
+ proxyErr = a.executeTranslatedNonStreamingRequest(ctx, w, r, endpoints, pr, trans)
+ }
+
+ if proxyErr == nil {
+ pr.requestLogger.Debug("Translation request completed successfully",
+ "translator", trans.Name(),
+ "model", pr.model,
+ "path_translated", transformedReq.TargetPath != "",
+ "target_path", transformedReq.TargetPath,
+ "streaming", transformedReq.IsStreaming)
+ }
+
+ a.logRequestResult(pr, proxyErr)
+
+ if proxyErr != nil {
+ // only write error if response hasn't started
+ if w.Header().Get(constants.HeaderContentType) == "" {
+ a.writeTranslatorError(w, trans, pr, fmt.Errorf("proxy error: %w", proxyErr), http.StatusBadGateway)
+ }
+ }
+
+ pr.stats.EndTime = time.Now()
+}
+
// generic handler for any translator (eg anthropic to openai and back)
func (a *Application) translationHandler(trans translator.RequestTranslator) http.HandlerFunc {
+ // Resolve body size limit once at registration time, not per-request.
+ // Translators that implement BodySizeLimiter declare their own max;
+ // others get a safe default.
+ var maxBodySize int64 = 10 << 20 // 10 MiB default
+ if limiter, ok := trans.(translator.BodySizeLimiter); ok {
+ maxBodySize = limiter.MaxBodySize()
+ }
+
return func(w http.ResponseWriter, r *http.Request) {
pr := a.initializeProxyRequest(r)
ctx, r := a.setupRequestContext(r, pr.stats)
- transformedReq, err := trans.TransformRequest(ctx, r)
+ // Buffer body once -- both passthrough and translation paths need it.
+ // Read maxBodySize+1 to detect oversized requests before JSON parsing
+ bodyBytes, err := io.ReadAll(io.LimitReader(r.Body, maxBodySize+1))
if err != nil {
a.writeTranslatorError(w, trans, pr, err, http.StatusBadRequest)
+ a.recordTranslatorMetrics(trans, pr, constants.TranslatorModeTranslation, constants.FallbackReasonNone)
return
}
- pr.model = transformedReq.ModelName
- pr.stats.Model = pr.model
+ // Explicitly check for oversized body (return 413 instead of confusing JSON parse error)
+ if int64(len(bodyBytes)) > maxBodySize {
+ a.writeTranslatorError(w, trans, pr,
+ fmt.Errorf("request body exceeds maximum size (%d bytes)", maxBodySize),
+ http.StatusRequestEntityTooLarge)
+ a.recordTranslatorMetrics(trans, pr, constants.TranslatorModeTranslation, constants.FallbackReasonNone)
+ return
+ }
- // serialize for proxy
- openaiBody, err := json.Marshal(transformedReq.OpenAIRequest)
+ // Lightweight model extraction via gjson -- avoids a full TransformRequest
+ // parse on the passthrough path where the body would be parsed twice
+ // (once here for the model name, once in PreparePassthrough for validation).
+ modelName, err := translator.ExtractModelName(bodyBytes)
if err != nil {
- a.writeTranslatorError(w, trans, pr, fmt.Errorf("failed to serialize request"), http.StatusInternalServerError)
+ a.writeTranslatorError(w, trans, pr, err, http.StatusBadRequest)
+ a.recordTranslatorMetrics(trans, pr, constants.TranslatorModeTranslation, constants.FallbackReasonNone)
return
}
- r.Body = io.NopCloser(bytes.NewReader(openaiBody))
- r.ContentLength = int64(len(openaiBody))
-
- if transformedReq.TargetPath != "" {
- targetPath := util.StripPrefix(transformedReq.TargetPath, constants.DefaultOllaProxyPathPrefix)
-
- if targetPath != transformedReq.TargetPath {
- pr.requestLogger.Warn("TargetPath included proxy prefix, stripped it",
- "translator", trans.Name(),
- "proxy_prefix", constants.DefaultOllaProxyPathPrefix,
- "original_target", transformedReq.TargetPath,
- "corrected_target", targetPath)
- }
-
- pr.requestLogger.Debug("Path translation applied",
- "original_path", r.URL.Path,
- "target_path", targetPath,
- "translator", trans.Name())
- r.URL.Path = targetPath
- } else if trans.Name() != "passthrough" {
- // warn if translator might need path translation (passthrough can ignore)
- pr.requestLogger.Warn("Translator did not set TargetPath, using original path",
- "translator", trans.Name(),
- "original_path", r.URL.Path,
- "note", "This may cause routing issues if translation requires different endpoint")
- }
+ pr.model = modelName
+ pr.stats.Model = pr.model
- // run through proxy pipeline (inspector, security, routing)
+ // Run through proxy pipeline (inspector, security, routing)
a.analyzeRequest(ctx, r, pr)
// Get compatible endpoints for this request
endpoints, err := a.getCompatibleEndpoints(ctx, pr)
if err != nil {
a.writeTranslatorError(w, trans, pr, fmt.Errorf("no healthy endpoints available"), http.StatusServiceUnavailable)
+ a.recordTranslatorMetrics(trans, pr, constants.TranslatorModeTranslation, constants.FallbackReasonNoCompatibleEndpoints)
return
}
// OLLA-282: When no endpoints available, Olla hangs until timeout
- // make shure that we have at least one endpoint available
+ // make sure that we have at least one endpoint available
// prevents hanging when model routing fails to find compatible backends
if len(endpoints) == 0 {
pr.requestLogger.Warn("No endpoints available for model",
@@ -83,37 +226,47 @@ func (a *Application) translationHandler(trans translator.RequestTranslator) htt
a.writeTranslatorError(w, trans, pr,
fmt.Errorf("no healthy endpoints available for model: %s", pr.model),
http.StatusNotFound)
+ a.recordTranslatorMetrics(trans, pr, constants.TranslatorModeTranslation, constants.FallbackReasonNoCompatibleEndpoints)
return
}
- a.logRequestStart(pr, len(endpoints))
+ // Determine mode and fallback reason
+ var mode constants.TranslatorMode
+ var fallbackReason constants.TranslatorFallbackReason
- // Execute proxy request with appropriate response handling
- // streaming vs non-streaming need different handling
- var proxyErr error
- if transformedReq.IsStreaming {
- proxyErr = a.executeTranslatedStreamingRequest(ctx, w, r, endpoints, pr, trans)
+ // Check for passthrough capability
+ if passthroughTrans, ok := trans.(translator.PassthroughCapable); ok {
+ if a.profileLookup != nil && passthroughTrans.CanPassthrough(endpoints, a.profileLookup) {
+ // Passthrough mode -- bodyBytes goes directly to PreparePassthrough
+ // which validates without re-reading. No TransformRequest needed.
+ mode = constants.TranslatorModePassthrough
+ fallbackReason = constants.FallbackReasonNone
+
+ a.executePassthroughRequest(ctx, w, r, bodyBytes, endpoints, pr, trans)
+ a.recordTranslatorMetrics(trans, pr, mode, fallbackReason)
+ return
+ }
+ // Translation mode with fallback reason
+ mode = constants.TranslatorModeTranslation
+ fallbackReason = constants.FallbackReasonCannotPassthrough
} else {
- proxyErr = a.executeTranslatedNonStreamingRequest(ctx, w, r, endpoints, pr, trans)
+ // Translator doesn't support passthrough
+ mode = constants.TranslatorModeTranslation
+ fallbackReason = constants.FallbackReasonTranslatorDoesNotSupportPassthrough
}
- if proxyErr == nil {
- pr.requestLogger.Debug("Translation request completed successfully",
- "translator", trans.Name(),
- "model", pr.model,
- "path_translated", transformedReq.TargetPath != "",
- "target_path", transformedReq.TargetPath,
- "streaming", transformedReq.IsStreaming)
+ // Translation path only -- perform the full parse and format conversion.
+ // This is deferred to here so passthrough requests never pay the cost.
+ r.Body = io.NopCloser(bytes.NewReader(bodyBytes))
+ transformedReq, err := trans.TransformRequest(ctx, r)
+ if err != nil {
+ a.writeTranslatorError(w, trans, pr, err, http.StatusBadRequest)
+ a.recordTranslatorMetrics(trans, pr, mode, fallbackReason)
+ return
}
- a.logRequestResult(pr, proxyErr)
-
- if proxyErr != nil {
- // only write error if response hasn't started
- if w.Header().Get(constants.HeaderContentType) == "" {
- a.writeTranslatorError(w, trans, pr, fmt.Errorf("proxy error: %w", proxyErr), http.StatusBadGateway)
- }
- }
+ a.executeTranslationRequest(ctx, w, r, endpoints, pr, trans, transformedReq)
+ a.recordTranslatorMetrics(trans, pr, mode, fallbackReason)
}
}
@@ -490,6 +643,8 @@ func (a *Application) writeTranslatorError(
err error,
statusCode int,
) {
+ pr.hadError = true
+
pr.requestLogger.Error("Translation request failed",
"translator", trans.Name(),
"error", err.Error(),
@@ -578,6 +733,40 @@ func (a *Application) copyOllaHeaders(from headerGetter, to http.ResponseWriter)
}
}
+// recordTranslatorMetrics records metrics for translator requests
+func (a *Application) recordTranslatorMetrics(
+ trans translator.RequestTranslator,
+ pr *proxyRequest,
+ mode constants.TranslatorMode,
+ fallbackReason constants.TranslatorFallbackReason,
+) {
+ // Calculate latency from request stats
+ latency := time.Since(pr.stats.StartTime)
+ if !pr.stats.EndTime.IsZero() {
+ latency = pr.stats.EndTime.Sub(pr.stats.StartTime)
+ }
+
+ // Determine if request was successful (no error flag set)
+ success := !pr.hadError
+
+ // Use the streaming flag captured during request preparation rather than
+ // inferring from StreamingMs, which isn't populated in passthrough mode
+ isStreaming := pr.isStreaming
+
+ // Record the event
+ event := ports.TranslatorRequestEvent{
+ TranslatorName: trans.Name(),
+ Model: pr.model,
+ Mode: mode,
+ FallbackReason: fallbackReason,
+ Success: success,
+ Latency: latency,
+ IsStreaming: isStreaming,
+ }
+
+ a.statsCollector.RecordTranslatorRequest(event)
+}
+
// abstract header access for both response types
type headerGetter interface {
Header() http.Header
diff --git a/internal/app/handlers/handler_translation_passthrough_test.go b/internal/app/handlers/handler_translation_passthrough_test.go
new file mode 100644
index 00000000..cbc1a3bc
--- /dev/null
+++ b/internal/app/handlers/handler_translation_passthrough_test.go
@@ -0,0 +1,1702 @@
+package handlers
+
+import (
+ "bytes"
+ "context"
+ "encoding/json"
+ "fmt"
+ "io"
+ "net/http"
+ "net/http/httptest"
+ "net/url"
+ "sync"
+ "testing"
+ "time"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "github.com/thushan/olla/internal/adapter/inspector"
+ "github.com/thushan/olla/internal/adapter/translator"
+ "github.com/thushan/olla/internal/config"
+ "github.com/thushan/olla/internal/core/constants"
+ "github.com/thushan/olla/internal/core/domain"
+ "github.com/thushan/olla/internal/core/ports"
+ "github.com/thushan/olla/internal/logger"
+)
+
+// TestTranslationHandler_PassthroughNonStreaming tests end-to-end passthrough for non-streaming requests
+func TestTranslationHandler_PassthroughNonStreaming(t *testing.T) {
+ // Setup mock backend that accepts Anthropic format
+ backendCalled := false
+ var receivedBody []byte
+ var receivedPath string
+
+ mockBackend := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ backendCalled = true
+ receivedPath = r.URL.Path
+
+ // Read body to verify it's unchanged
+ body, _ := io.ReadAll(r.Body)
+ receivedBody = body
+
+ // Return Anthropic format response
+ response := map[string]interface{}{
+ "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
+ "type": "message",
+ "role": "assistant",
+ "content": []map[string]interface{}{{"type": "text", "text": "Hello! How can I help you?"}},
+ "model": "claude-3-5-sonnet-20241022",
+ "usage": map[string]interface{}{
+ "input_tokens": 10,
+ "output_tokens": 20,
+ },
+ }
+
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.Header().Set(constants.HeaderXOllaEndpoint, "test-backend")
+ w.Header().Set(constants.HeaderXOllaBackendType, "vllm")
+ w.Header().Set(constants.HeaderXOllaModel, "claude-3-5-sonnet-20241022")
+ w.WriteHeader(http.StatusOK)
+ json.NewEncoder(w).Encode(response)
+ }))
+ defer mockBackend.Close()
+
+ // Parse backend URL
+ backendURL, _ := url.Parse(mockBackend.URL)
+
+ // Setup endpoints with Anthropic support
+ endpoints := []*domain.Endpoint{
+ {
+ Name: "vllm-backend",
+ URL: backendURL,
+ URLString: mockBackend.URL,
+ Type: "vllm",
+ Status: domain.StatusHealthy,
+ },
+ }
+
+ // Create mock profile lookup that indicates Anthropic support
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ },
+ }
+
+ // Create passthrough-capable translator
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ implementsErrorWriter: true,
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ writeErrorFunc: func(w http.ResponseWriter, err error, statusCode int) {
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(statusCode)
+ json.NewEncoder(w).Encode(map[string]interface{}{"error": err.Error()})
+ },
+ }
+
+ // Create proxy service that forwards to backend
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ // Forward to backend
+ client := &http.Client{Timeout: 5 * time.Second}
+ backendReq, err := http.NewRequest(r.Method, eps[0].URLString+r.URL.Path, r.Body)
+ if err != nil {
+ return err
+ }
+ backendReq.Header = r.Header.Clone()
+
+ resp, err := client.Do(backendReq)
+ if err != nil {
+ return err
+ }
+ defer resp.Body.Close()
+
+ // Copy headers
+ for k, v := range resp.Header {
+ w.Header()[k] = v
+ }
+ w.WriteHeader(resp.StatusCode)
+
+ _, err = io.Copy(w, resp.Body)
+ return err
+ },
+ }
+
+ // Create discovery service that returns our endpoints
+ discoveryService := &mockDiscoveryServiceWithEndpoints{
+ endpoints: endpoints,
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: discoveryService,
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ // Create Anthropic request
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ req.Header.Set(constants.HeaderContentType, constants.ContentTypeJSON)
+
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Assertions
+ assert.True(t, backendCalled, "Backend should have been called")
+ assert.Equal(t, http.StatusOK, rec.Code)
+
+ // Verify passthrough mode header
+ assert.Equal(t, "passthrough", rec.Header().Get("X-Olla-Mode"), "Should have passthrough mode header")
+
+ // Verify request was passed through unchanged
+ var receivedReq map[string]interface{}
+ err := json.Unmarshal(receivedBody, &receivedReq)
+ require.NoError(t, err)
+ assert.Equal(t, "claude-3-5-sonnet-20241022", receivedReq["model"])
+ assert.Equal(t, float64(1024), receivedReq["max_tokens"])
+
+ // Verify path
+ assert.Equal(t, "/v1/messages", receivedPath)
+
+ // Verify response
+ var response map[string]interface{}
+ err = json.Unmarshal(rec.Body.Bytes(), &response)
+ require.NoError(t, err)
+ assert.Equal(t, "message", response["type"])
+ assert.Equal(t, "msg_01XFDUDYJgAACzvnptvVoYEL", response["id"])
+
+ // Verify X-Olla headers preserved
+ assert.NotEmpty(t, rec.Header().Get(constants.HeaderXOllaEndpoint))
+ assert.NotEmpty(t, rec.Header().Get(constants.HeaderXOllaBackendType))
+}
+
+// TestTranslationHandler_PassthroughStreaming tests end-to-end passthrough for streaming requests
+func TestTranslationHandler_PassthroughStreaming(t *testing.T) {
+ backendCalled := false
+ var receivedPath string
+
+ // Setup mock backend that returns SSE stream
+ mockBackend := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ backendCalled = true
+ receivedPath = r.URL.Path
+
+ // Return SSE stream in Anthropic format
+ w.Header().Set(constants.HeaderContentType, "text/event-stream")
+ w.Header().Set(constants.HeaderXOllaEndpoint, "test-backend")
+ w.Header().Set(constants.HeaderXOllaBackendType, "vllm")
+ w.WriteHeader(http.StatusOK)
+
+ // Write SSE events
+ events := []string{
+ `event: message_start
+data: {"type":"message_start","message":{"id":"msg_123","type":"message","role":"assistant"}}
+
+`,
+ `event: content_block_delta
+data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}
+
+`,
+ `event: message_delta
+data: {"type":"message_delta","delta":{"stop_reason":"end_turn"}}
+
+`,
+ `event: message_stop
+data: {"type":"message_stop"}
+
+`,
+ }
+
+ for _, event := range events {
+ fmt.Fprint(w, event)
+ if f, ok := w.(http.Flusher); ok {
+ f.Flush()
+ }
+ }
+ }))
+ defer mockBackend.Close()
+
+ backendURL, _ := url.Parse(mockBackend.URL)
+
+ endpoints := []*domain.Endpoint{
+ {
+ Name: "vllm-backend",
+ URL: backendURL,
+ URLString: mockBackend.URL,
+ Type: "vllm",
+ Status: domain.StatusHealthy,
+ },
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ transformRequestFunc: func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ return &translator.TransformedRequest{
+ ModelName: "claude-3-5-sonnet-20241022",
+ IsStreaming: true,
+ }, nil
+ },
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ client := &http.Client{Timeout: 5 * time.Second}
+ backendReq, err := http.NewRequest(r.Method, eps[0].URLString+r.URL.Path, r.Body)
+ if err != nil {
+ return err
+ }
+
+ resp, err := client.Do(backendReq)
+ if err != nil {
+ return err
+ }
+ defer resp.Body.Close()
+
+ for k, v := range resp.Header {
+ w.Header()[k] = v
+ }
+ w.WriteHeader(resp.StatusCode)
+
+ _, err = io.Copy(w, resp.Body)
+ return err
+ },
+ }
+
+ discoveryService := &mockDiscoveryServiceWithEndpoints{
+ endpoints: endpoints,
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: discoveryService,
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "stream": true,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ req.Header.Set(constants.HeaderContentType, constants.ContentTypeJSON)
+
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Assertions
+ assert.True(t, backendCalled, "Backend should have been called")
+ assert.Equal(t, http.StatusOK, rec.Code)
+
+ // Verify passthrough mode header
+ assert.Equal(t, "passthrough", rec.Header().Get("X-Olla-Mode"))
+
+ // Verify SSE content type
+ assert.Equal(t, "text/event-stream", rec.Header().Get(constants.HeaderContentType))
+
+ // Verify path
+ assert.Equal(t, "/v1/messages", receivedPath)
+
+ // Verify SSE events are passed through
+ body := rec.Body.String()
+ assert.Contains(t, body, "event: message_start")
+ assert.Contains(t, body, "event: content_block_delta")
+ assert.Contains(t, body, "event: message_delta")
+ assert.Contains(t, body, "event: message_stop")
+
+ // Verify X-Olla headers
+ assert.NotEmpty(t, rec.Header().Get(constants.HeaderXOllaEndpoint))
+}
+
+// TestTranslationHandler_PassthroughWithMultipleEndpoints tests passthrough with load balancing
+func TestTranslationHandler_PassthroughWithMultipleEndpoints(t *testing.T) {
+ backendsCalled := make(map[string]bool)
+
+ // Setup multiple mock backends
+ backend1 := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ backendsCalled["backend1"] = true
+ response := map[string]interface{}{"id": "msg_01", "type": "message", "role": "assistant"}
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.Header().Set(constants.HeaderXOllaEndpoint, "vllm-1")
+ w.WriteHeader(http.StatusOK)
+ json.NewEncoder(w).Encode(response)
+ }))
+ defer backend1.Close()
+
+ backend2 := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ backendsCalled["backend2"] = true
+ response := map[string]interface{}{"id": "msg_02", "type": "message", "role": "assistant"}
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.Header().Set(constants.HeaderXOllaEndpoint, "vllm-2")
+ w.WriteHeader(http.StatusOK)
+ json.NewEncoder(w).Encode(response)
+ }))
+ defer backend2.Close()
+
+ backendURL1, _ := url.Parse(backend1.URL)
+ backendURL2, _ := url.Parse(backend2.URL)
+
+ endpoints := []*domain.Endpoint{
+ {
+ Name: "vllm-1",
+ URL: backendURL1,
+ URLString: backend1.URL,
+ Type: "vllm",
+ Status: domain.StatusHealthy,
+ },
+ {
+ Name: "vllm-2",
+ URL: backendURL2,
+ URLString: backend2.URL,
+ Type: "vllm",
+ Status: domain.StatusHealthy,
+ },
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ // Use first endpoint (simulating load balancer selection)
+ client := &http.Client{Timeout: 5 * time.Second}
+ backendReq, err := http.NewRequest(r.Method, eps[0].URLString+r.URL.Path, r.Body)
+ if err != nil {
+ return err
+ }
+
+ resp, err := client.Do(backendReq)
+ if err != nil {
+ return err
+ }
+ defer resp.Body.Close()
+
+ for k, v := range resp.Header {
+ w.Header()[k] = v
+ }
+ w.WriteHeader(resp.StatusCode)
+ _, err = io.Copy(w, resp.Body)
+ return err
+ },
+ }
+
+ discoveryService := &mockDiscoveryServiceWithEndpoints{
+ endpoints: endpoints,
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: discoveryService,
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Verify one backend was called
+ assert.True(t, len(backendsCalled) > 0, "At least one backend should be called")
+
+ // Verify passthrough mode
+ assert.Equal(t, "passthrough", rec.Header().Get("X-Olla-Mode"))
+ assert.Equal(t, http.StatusOK, rec.Code)
+}
+
+// TestTranslationHandler_FallbackToTranslation_MixedEndpoints tests fallback when endpoints have mixed support
+func TestTranslationHandler_FallbackToTranslation_MixedEndpoints(t *testing.T) {
+ translationUsed := false
+
+ endpoints := []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm", Status: domain.StatusHealthy},
+ {Name: "ollama-1", Type: "ollama", Status: domain.StatusHealthy}, // No Anthropic support
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ // ollama has no config (returns nil)
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ transformRequestFunc: func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ translationUsed = true
+ return &translator.TransformedRequest{
+ OpenAIRequest: map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "messages": []interface{}{
+ map[string]interface{}{"role": "user", "content": "test"},
+ },
+ },
+ ModelName: "claude-3-5-sonnet-20241022",
+ IsStreaming: false,
+ TargetPath: "/v1/chat/completions",
+ }, nil
+ },
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ response := map[string]interface{}{
+ "id": "chatcmpl-123",
+ "object": "chat.completion",
+ "choices": []interface{}{},
+ }
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ return json.NewEncoder(w).Encode(response)
+ },
+ }
+
+ discoveryService := &mockDiscoveryServiceWithEndpoints{
+ endpoints: endpoints,
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: discoveryService,
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Verify translation mode was used (not passthrough)
+ assert.True(t, translationUsed, "Translation should be used when endpoints have mixed support")
+ assert.Equal(t, http.StatusOK, rec.Code)
+ assert.NotEqual(t, "passthrough", rec.Header().Get("X-Olla-Mode"), "Should not use passthrough mode")
+}
+
+// TestTranslationHandler_FallbackToTranslation_PassthroughDisabled tests fallback when passthrough is disabled
+func TestTranslationHandler_FallbackToTranslation_PassthroughDisabled(t *testing.T) {
+ translationUsed := false
+
+ endpoints := []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm", Status: domain.StatusHealthy},
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ },
+ }
+
+ // Translator with passthrough disabled
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: false, // Disabled
+ profileLookup: profileLookup,
+ transformRequestFunc: func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ translationUsed = true
+ return &translator.TransformedRequest{
+ OpenAIRequest: map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "messages": []interface{}{
+ map[string]interface{}{"role": "user", "content": "test"},
+ },
+ },
+ ModelName: "claude-3-5-sonnet-20241022",
+ IsStreaming: false,
+ TargetPath: "/v1/chat/completions",
+ }, nil
+ },
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ response := map[string]interface{}{
+ "id": "chatcmpl-123",
+ "object": "chat.completion",
+ "choices": []interface{}{},
+ }
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ return json.NewEncoder(w).Encode(response)
+ },
+ }
+
+ discoveryService := &mockDiscoveryServiceWithEndpoints{
+ endpoints: endpoints,
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: discoveryService,
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Verify translation mode was used even though backend supports Anthropic
+ assert.True(t, translationUsed, "Translation should be used when passthrough is disabled")
+ assert.Equal(t, http.StatusOK, rec.Code)
+ assert.NotEqual(t, "passthrough", rec.Header().Get("X-Olla-Mode"))
+}
+
+// TestTranslationHandler_FallbackToTranslation_NoAnthropicSupport tests fallback for backends without support
+func TestTranslationHandler_FallbackToTranslation_NoAnthropicSupport(t *testing.T) {
+ translationUsed := false
+
+ endpoints := []*domain.Endpoint{
+ {Name: "ollama-1", Type: "ollama", Status: domain.StatusHealthy},
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ // ollama has no Anthropic support configured
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ transformRequestFunc: func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ translationUsed = true
+ return &translator.TransformedRequest{
+ OpenAIRequest: map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "messages": []interface{}{
+ map[string]interface{}{"role": "user", "content": "test"},
+ },
+ },
+ ModelName: "claude-3-5-sonnet-20241022",
+ IsStreaming: false,
+ TargetPath: "/v1/chat/completions",
+ }, nil
+ },
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ response := map[string]interface{}{
+ "id": "chatcmpl-123",
+ "object": "chat.completion",
+ "choices": []interface{}{},
+ }
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ return json.NewEncoder(w).Encode(response)
+ },
+ }
+
+ discoveryService := &mockDiscoveryServiceWithEndpoints{
+ endpoints: endpoints,
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: discoveryService,
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Verify translation mode was used
+ assert.True(t, translationUsed, "Translation should be used when backend lacks Anthropic support")
+ assert.Equal(t, http.StatusOK, rec.Code)
+}
+
+// TestTranslationHandler_PassthroughErrorPreservation tests that errors are properly preserved in passthrough mode
+func TestTranslationHandler_PassthroughErrorPreservation(t *testing.T) {
+ // Setup mock backend that returns an error
+ mockBackend := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.Header().Set(constants.HeaderXOllaEndpoint, "test-backend")
+ w.WriteHeader(http.StatusBadRequest)
+
+ errorResp := map[string]interface{}{
+ "type": "error",
+ "error": map[string]interface{}{
+ "type": "invalid_request_error",
+ "message": "Invalid model specified",
+ },
+ }
+ json.NewEncoder(w).Encode(errorResp)
+ }))
+ defer mockBackend.Close()
+
+ backendURL, _ := url.Parse(mockBackend.URL)
+
+ endpoints := []*domain.Endpoint{
+ {
+ Name: "vllm-backend",
+ URL: backendURL,
+ URLString: mockBackend.URL,
+ Type: "vllm",
+ Status: domain.StatusHealthy,
+ },
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ client := &http.Client{Timeout: 5 * time.Second}
+ backendReq, err := http.NewRequest(r.Method, eps[0].URLString+r.URL.Path, r.Body)
+ if err != nil {
+ return err
+ }
+
+ resp, err := client.Do(backendReq)
+ if err != nil {
+ return err
+ }
+ defer resp.Body.Close()
+
+ for k, v := range resp.Header {
+ w.Header()[k] = v
+ }
+ w.WriteHeader(resp.StatusCode)
+ _, err = io.Copy(w, resp.Body)
+ return err
+ },
+ }
+
+ discoveryService := &mockDiscoveryServiceWithEndpoints{
+ endpoints: endpoints,
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: discoveryService,
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "invalid-model",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Verify error response is preserved
+ assert.Equal(t, http.StatusBadRequest, rec.Code)
+ assert.Equal(t, "passthrough", rec.Header().Get("X-Olla-Mode"))
+
+ var errorResp map[string]interface{}
+ err := json.Unmarshal(rec.Body.Bytes(), &errorResp)
+ require.NoError(t, err)
+ assert.Equal(t, "error", errorResp["type"])
+
+ errorObj := errorResp["error"].(map[string]interface{})
+ assert.Equal(t, "invalid_request_error", errorObj["type"])
+ assert.Contains(t, errorObj["message"], "Invalid model")
+}
+
+// TestTranslationHandler_ExistingTranslationTestsStillPass verifies no regression
+func TestTranslationHandler_ExistingTranslationTestsStillPass(t *testing.T) {
+ // This test ensures the existing translation tests still work
+ // Run a basic translation flow test
+ mockLogger := &mockStyledLogger{}
+ trans := &mockTranslator{
+ name: "test-translator",
+ implementsErrorWriter: true,
+ writeErrorFunc: func(w http.ResponseWriter, err error, statusCode int) {
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(statusCode)
+ json.NewEncoder(w).Encode(map[string]interface{}{
+ "error": err.Error(),
+ })
+ },
+ }
+
+ app := &Application{
+ logger: mockLogger,
+ proxyService: &mockProxyService{},
+ statsCollector: &mockStatsCollector{},
+ repository: &mockEndpointRepository{},
+ inspectorChain: inspector.NewChain(mockLogger),
+ profileFactory: &mockProfileFactory{},
+ discoveryService: &mockDiscoveryServiceForTranslation{},
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ reqBody := map[string]interface{}{
+ "model": "test-model",
+ "messages": []interface{}{
+ map[string]interface{}{
+ "role": "user",
+ "content": "Hello",
+ },
+ },
+ }
+ body, _ := json.Marshal(reqBody)
+ req := httptest.NewRequest("POST", "/test", bytes.NewReader(body))
+ req.Header.Set(constants.HeaderContentType, constants.ContentTypeJSON)
+
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ assert.Equal(t, http.StatusOK, rec.Code)
+ assert.Equal(t, constants.ContentTypeJSON, rec.Header().Get(constants.HeaderContentType))
+
+ // Verify X-Olla-* headers are preserved
+ assert.NotEmpty(t, rec.Header().Get(constants.HeaderXOllaRequestID))
+ assert.NotEmpty(t, rec.Header().Get(constants.HeaderXOllaEndpoint))
+}
+
+// Mock implementations for testing
+
+// mockPassthroughProfileLookup implements translator.ProfileLookup for testing
+type mockPassthroughProfileLookup struct {
+ configs map[string]*domain.AnthropicSupportConfig
+}
+
+func (m *mockPassthroughProfileLookup) GetAnthropicSupport(endpointType string) *domain.AnthropicSupportConfig {
+ if m.configs == nil {
+ return nil
+ }
+ return m.configs[endpointType]
+}
+
+// mockPassthroughTranslator implements both RequestTranslator and PassthroughCapable
+type mockPassthroughTranslator struct {
+ name string
+ transformRequestFunc func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error)
+ transformResponseFunc func(ctx context.Context, openaiResp interface{}, original *http.Request) (interface{}, error)
+ transformStreamingFunc func(ctx context.Context, openaiStream io.Reader, w http.ResponseWriter, original *http.Request) error
+ writeErrorFunc func(w http.ResponseWriter, err error, statusCode int)
+ implementsErrorWriter bool
+ passthroughEnabled bool
+ profileLookup translator.ProfileLookup
+}
+
+func (m *mockPassthroughTranslator) Name() string {
+ return m.name
+}
+
+func (m *mockPassthroughTranslator) TransformRequest(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ if m.transformRequestFunc != nil {
+ return m.transformRequestFunc(ctx, r)
+ }
+ // Default implementation
+ body, _ := io.ReadAll(r.Body)
+ var req map[string]interface{}
+ json.Unmarshal(body, &req)
+
+ modelName := ""
+ if model, ok := req["model"].(string); ok {
+ modelName = model
+ }
+
+ isStreaming := false
+ if stream, ok := req["stream"].(bool); ok {
+ isStreaming = stream
+ }
+
+ return &translator.TransformedRequest{
+ OpenAIRequest: map[string]interface{}{
+ "model": modelName,
+ "messages": []interface{}{},
+ },
+ ModelName: modelName,
+ IsStreaming: isStreaming,
+ }, nil
+}
+
+func (m *mockPassthroughTranslator) TransformResponse(ctx context.Context, openaiResp interface{}, original *http.Request) (interface{}, error) {
+ if m.transformResponseFunc != nil {
+ return m.transformResponseFunc(ctx, openaiResp, original)
+ }
+ return map[string]interface{}{
+ "id": "mock-response-id",
+ "content": "mock response",
+ }, nil
+}
+
+func (m *mockPassthroughTranslator) TransformStreamingResponse(ctx context.Context, openaiStream io.Reader, w http.ResponseWriter, original *http.Request) error {
+ if m.transformStreamingFunc != nil {
+ return m.transformStreamingFunc(ctx, openaiStream, w, original)
+ }
+ w.Header().Set(constants.HeaderContentType, "text/event-stream")
+ _, err := io.Copy(w, openaiStream)
+ return err
+}
+
+func (m *mockPassthroughTranslator) WriteError(w http.ResponseWriter, err error, statusCode int) {
+ if m.implementsErrorWriter && m.writeErrorFunc != nil {
+ m.writeErrorFunc(w, err, statusCode)
+ return
+ }
+ panic("WriteError called on translator that doesn't implement ErrorWriter")
+}
+
+// CanPassthrough implements PassthroughCapable
+func (m *mockPassthroughTranslator) CanPassthrough(endpoints []*domain.Endpoint, profileLookup translator.ProfileLookup) bool {
+ if !m.passthroughEnabled {
+ return false
+ }
+
+ if len(endpoints) == 0 {
+ return false
+ }
+
+ // Check if all endpoints support Anthropic
+ for _, endpoint := range endpoints {
+ cfg := profileLookup.GetAnthropicSupport(endpoint.Type)
+ if cfg == nil || !cfg.Enabled {
+ return false
+ }
+ }
+
+ return true
+}
+
+// PreparePassthrough implements PassthroughCapable
+func (m *mockPassthroughTranslator) PreparePassthrough(bodyBytes []byte, r *http.Request, profileLookup translator.ProfileLookup) (*translator.PassthroughRequest, error) {
+ var req map[string]interface{}
+ if err := json.Unmarshal(bodyBytes, &req); err != nil {
+ return nil, fmt.Errorf("invalid JSON: %w", err)
+ }
+
+ modelName := ""
+ if model, ok := req["model"].(string); ok {
+ modelName = model
+ }
+
+ isStreaming := false
+ if stream, ok := req["stream"].(bool); ok {
+ isStreaming = stream
+ }
+
+ return &translator.PassthroughRequest{
+ Body: bodyBytes,
+ TargetPath: "/v1/messages",
+ ModelName: modelName,
+ IsStreaming: isStreaming,
+ }, nil
+}
+
+// mockDiscoveryServiceWithEndpoints provides configured endpoints for testing
+type mockDiscoveryServiceWithEndpoints struct {
+ endpoints []*domain.Endpoint
+}
+
+func (m *mockDiscoveryServiceWithEndpoints) GetEndpoints(ctx context.Context) ([]*domain.Endpoint, error) {
+ return m.endpoints, nil
+}
+
+func (m *mockDiscoveryServiceWithEndpoints) GetHealthyEndpoints(ctx context.Context) ([]*domain.Endpoint, error) {
+ healthy := make([]*domain.Endpoint, 0)
+ for _, ep := range m.endpoints {
+ if ep.Status == domain.StatusHealthy {
+ healthy = append(healthy, ep)
+ }
+ }
+ return healthy, nil
+}
+
+func (m *mockDiscoveryServiceWithEndpoints) RefreshEndpoints(ctx context.Context) error {
+ return nil
+}
+
+func (m *mockDiscoveryServiceWithEndpoints) UpdateEndpointStatus(ctx context.Context, endpoint *domain.Endpoint) error {
+ return nil
+}
+
+// ========== METRICS INTEGRATION TESTS ==========
+// These tests verify that translator metrics are properly recorded during HTTP request flows
+
+// mockStatsCollectorWithCapture extends mockStatsCollector to capture metrics calls
+type mockStatsCollectorWithCapture struct {
+ recordedEvents []ports.TranslatorRequestEvent
+ mu sync.Mutex
+}
+
+func (m *mockStatsCollectorWithCapture) RecordRequest(endpoint *domain.Endpoint, status string, latency time.Duration, bytes int64) {
+}
+func (m *mockStatsCollectorWithCapture) RecordConnection(endpoint *domain.Endpoint, delta int) {}
+func (m *mockStatsCollectorWithCapture) RecordSecurityViolation(violation ports.SecurityViolation) {
+}
+func (m *mockStatsCollectorWithCapture) RecordDiscovery(endpoint *domain.Endpoint, success bool, latency time.Duration) {
+}
+func (m *mockStatsCollectorWithCapture) RecordModelRequest(model string, endpoint *domain.Endpoint, status string, latency time.Duration, bytes int64) {
+}
+func (m *mockStatsCollectorWithCapture) RecordModelError(model string, endpoint *domain.Endpoint, errorType string) {
+}
+func (m *mockStatsCollectorWithCapture) GetModelStats() map[string]ports.ModelStats { return nil }
+func (m *mockStatsCollectorWithCapture) GetModelEndpointStats() map[string]map[string]ports.EndpointModelStats {
+ return nil
+}
+func (m *mockStatsCollectorWithCapture) RecordTranslatorRequest(event ports.TranslatorRequestEvent) {
+ m.mu.Lock()
+ defer m.mu.Unlock()
+ m.recordedEvents = append(m.recordedEvents, event)
+}
+func (m *mockStatsCollectorWithCapture) GetTranslatorStats() map[string]ports.TranslatorStats {
+ return nil
+}
+func (m *mockStatsCollectorWithCapture) GetProxyStats() ports.ProxyStats { return ports.ProxyStats{} }
+func (m *mockStatsCollectorWithCapture) GetEndpointStats() map[string]ports.EndpointStats {
+ return nil
+}
+func (m *mockStatsCollectorWithCapture) GetSecurityStats() ports.SecurityStats {
+ return ports.SecurityStats{}
+}
+func (m *mockStatsCollectorWithCapture) GetConnectionStats() map[string]int64 { return nil }
+
+func (m *mockStatsCollectorWithCapture) getRecordedEvents() []ports.TranslatorRequestEvent {
+ m.mu.Lock()
+ defer m.mu.Unlock()
+ events := make([]ports.TranslatorRequestEvent, len(m.recordedEvents))
+ copy(events, m.recordedEvents)
+ return events
+}
+
+// TestTranslationHandler_MetricsRecordedForPassthrough verifies metrics are recorded for passthrough requests
+func TestTranslationHandler_MetricsRecordedForPassthrough(t *testing.T) {
+ // Setup mock backend
+ mockBackend := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ // Add small delay to ensure measurable latency
+ time.Sleep(1 * time.Millisecond)
+ response := map[string]interface{}{
+ "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
+ "type": "message",
+ "role": "assistant",
+ "content": []map[string]interface{}{{"type": "text", "text": "Hello"}},
+ "model": "claude-3-5-sonnet-20241022",
+ }
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ json.NewEncoder(w).Encode(response)
+ }))
+ defer mockBackend.Close()
+
+ backendURL, _ := url.Parse(mockBackend.URL)
+ endpoints := []*domain.Endpoint{
+ {
+ Name: "vllm-backend",
+ URL: backendURL,
+ URLString: mockBackend.URL,
+ Type: "vllm",
+ Status: domain.StatusHealthy,
+ },
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ implementsErrorWriter: true,
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ client := &http.Client{Timeout: 5 * time.Second}
+ backendReq, err := http.NewRequest(r.Method, eps[0].URLString+r.URL.Path, r.Body)
+ if err != nil {
+ return err
+ }
+ resp, err := client.Do(backendReq)
+ if err != nil {
+ return err
+ }
+ defer resp.Body.Close()
+ for k, v := range resp.Header {
+ w.Header()[k] = v
+ }
+ w.WriteHeader(resp.StatusCode)
+ io.Copy(w, resp.Body)
+ return nil
+ },
+ }
+
+ // Create stats collector that captures events
+ statsCollector := &mockStatsCollectorWithCapture{}
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: statsCollector,
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: &mockDiscoveryServiceWithEndpoints{endpoints: endpoints},
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ // Send non-streaming request
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ req.Header.Set(constants.HeaderContentType, constants.ContentTypeJSON)
+
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Verify response is successful
+ assert.Equal(t, http.StatusOK, rec.Code)
+ assert.Equal(t, "passthrough", rec.Header().Get("X-Olla-Mode"))
+
+ // Verify metrics were recorded
+ events := statsCollector.getRecordedEvents()
+ require.Len(t, events, 1, "Expected exactly one translator metrics event")
+
+ event := events[0]
+ assert.Equal(t, "anthropic", event.TranslatorName)
+ assert.Equal(t, "claude-3-5-sonnet-20241022", event.Model)
+ assert.Equal(t, constants.TranslatorModePassthrough, event.Mode)
+ assert.Equal(t, constants.FallbackReasonNone, event.FallbackReason)
+ assert.True(t, event.Success)
+ assert.False(t, event.IsStreaming)
+ assert.Greater(t, event.Latency, time.Duration(0))
+}
+
+// TestTranslationHandler_MetricsRecordedForTranslation verifies metrics are recorded for translation requests
+func TestTranslationHandler_MetricsRecordedForTranslation(t *testing.T) {
+ // Setup endpoints WITHOUT Anthropic support (forces translation mode)
+ endpoints := []*domain.Endpoint{
+ {Name: "ollama-1", Type: "ollama", Status: domain.StatusHealthy},
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ // No Anthropic support for ollama
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ transformRequestFunc: func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ return &translator.TransformedRequest{
+ OpenAIRequest: map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "messages": []interface{}{
+ map[string]interface{}{"role": "user", "content": "test"},
+ },
+ },
+ ModelName: "claude-3-5-sonnet-20241022",
+ IsStreaming: false,
+ TargetPath: "/v1/chat/completions",
+ }, nil
+ },
+ transformResponseFunc: func(ctx context.Context, openaiResp interface{}, original *http.Request) (interface{}, error) {
+ return map[string]interface{}{
+ "id": "msg_123",
+ "type": "message",
+ }, nil
+ },
+ implementsErrorWriter: true,
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ response := map[string]interface{}{
+ "id": "chatcmpl-123",
+ "object": "chat.completion",
+ "choices": []interface{}{},
+ }
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ return json.NewEncoder(w).Encode(response)
+ },
+ }
+
+ statsCollector := &mockStatsCollectorWithCapture{}
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: statsCollector,
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: &mockDiscoveryServiceWithEndpoints{endpoints: endpoints},
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ req.Header.Set(constants.HeaderContentType, constants.ContentTypeJSON)
+
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ // Verify response is successful
+ assert.Equal(t, http.StatusOK, rec.Code)
+
+ // Verify metrics were recorded for translation mode
+ events := statsCollector.getRecordedEvents()
+ require.Len(t, events, 1, "Expected exactly one translator metrics event")
+
+ event := events[0]
+ assert.Equal(t, "anthropic", event.TranslatorName)
+ assert.Equal(t, "claude-3-5-sonnet-20241022", event.Model)
+ assert.Equal(t, constants.TranslatorModeTranslation, event.Mode)
+ assert.Equal(t, constants.FallbackReasonCannotPassthrough, event.FallbackReason)
+ assert.True(t, event.Success)
+ assert.False(t, event.IsStreaming)
+}
+
+// TestTranslationHandler_MetricsRecordedForFallback verifies metrics capture fallback scenarios
+func TestTranslationHandler_MetricsRecordedForFallback(t *testing.T) {
+ // Test case: mixed endpoint support (some support Anthropic, some don't)
+ endpoints := []*domain.Endpoint{
+ {Name: "vllm-1", Type: "vllm", Status: domain.StatusHealthy},
+ {Name: "ollama-1", Type: "ollama", Status: domain.StatusHealthy}, // No Anthropic support
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {
+ Enabled: true,
+ MessagesPath: "/v1/messages",
+ },
+ // ollama has no config
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ transformRequestFunc: func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ return &translator.TransformedRequest{
+ OpenAIRequest: map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "messages": []interface{}{map[string]interface{}{"role": "user", "content": "test"}},
+ },
+ ModelName: "claude-3-5-sonnet-20241022",
+ IsStreaming: false,
+ TargetPath: "/v1/chat/completions",
+ }, nil
+ },
+ transformResponseFunc: func(ctx context.Context, openaiResp interface{}, original *http.Request) (interface{}, error) {
+ return map[string]interface{}{"id": "msg_123", "type": "message"}, nil
+ },
+ implementsErrorWriter: true,
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ response := map[string]interface{}{"id": "chatcmpl-123", "object": "chat.completion", "choices": []interface{}{}}
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ return json.NewEncoder(w).Encode(response)
+ },
+ }
+
+ statsCollector := &mockStatsCollectorWithCapture{}
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: statsCollector,
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: &mockDiscoveryServiceWithEndpoints{endpoints: endpoints},
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ anthropicReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{
+ {"role": "user", "content": "Hello"},
+ },
+ }
+ reqBody, _ := json.Marshal(anthropicReq)
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ req.Header.Set(constants.HeaderContentType, constants.ContentTypeJSON)
+
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ assert.Equal(t, http.StatusOK, rec.Code)
+
+ // Verify fallback reason is recorded
+ events := statsCollector.getRecordedEvents()
+ require.Len(t, events, 1)
+
+ event := events[0]
+ assert.Equal(t, constants.TranslatorModeTranslation, event.Mode)
+ assert.Equal(t, constants.FallbackReasonCannotPassthrough, event.FallbackReason)
+ assert.True(t, event.Success)
+}
+
+// TestTranslationHandler_MetricsRecordedForStreamingVsNonStreaming verifies streaming flag is tracked
+func TestTranslationHandler_MetricsRecordedForStreamingVsNonStreaming(t *testing.T) {
+ mockBackend := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ // Add small delay to ensure measurable latency
+ time.Sleep(1 * time.Millisecond)
+ // Check if request is streaming based on request body
+ body, _ := io.ReadAll(r.Body)
+ var req map[string]interface{}
+ json.Unmarshal(body, &req)
+
+ if stream, ok := req["stream"].(bool); ok && stream {
+ // Return SSE stream
+ w.Header().Set(constants.HeaderContentType, "text/event-stream")
+ w.WriteHeader(http.StatusOK)
+ fmt.Fprint(w, "event: message_start\ndata: {\\\"type\\\":\\\"message_start\\\"}\n\n")
+ fmt.Fprint(w, "event: message_stop\ndata: {\\\"type\\\":\\\"message_stop\\\"}\n\n")
+ } else {
+ // Return JSON response
+ response := map[string]interface{}{"id": "msg_123", "type": "message"}
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ json.NewEncoder(w).Encode(response)
+ }
+ }))
+ defer mockBackend.Close()
+
+ backendURL, _ := url.Parse(mockBackend.URL)
+ endpoints := []*domain.Endpoint{
+ {
+ Name: "vllm-backend",
+ URL: backendURL,
+ URLString: mockBackend.URL,
+ Type: "vllm",
+ Status: domain.StatusHealthy,
+ },
+ }
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {Enabled: true, MessagesPath: "/v1/messages"},
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ implementsErrorWriter: true,
+ transformRequestFunc: func(ctx context.Context, r *http.Request) (*translator.TransformedRequest, error) {
+ body, _ := io.ReadAll(r.Body)
+ var req map[string]interface{}
+ json.Unmarshal(body, &req)
+
+ modelName := "claude-3-5-sonnet-20241022"
+ if model, ok := req["model"].(string); ok {
+ modelName = model
+ }
+
+ isStreaming := false
+ if stream, ok := req["stream"].(bool); ok {
+ isStreaming = stream
+ }
+
+ return &translator.TransformedRequest{
+ ModelName: modelName,
+ IsStreaming: isStreaming,
+ }, nil
+ },
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ client := &http.Client{Timeout: 5 * time.Second}
+ backendReq, err := http.NewRequest(r.Method, eps[0].URLString+r.URL.Path, r.Body)
+ if err != nil {
+ return err
+ }
+
+ resp, err := client.Do(backendReq)
+ if err != nil {
+ return err
+ }
+ defer resp.Body.Close()
+ for k, v := range resp.Header {
+ w.Header()[k] = v
+ }
+ w.WriteHeader(resp.StatusCode)
+ io.Copy(w, resp.Body)
+
+ return nil
+ },
+ }
+
+ statsCollector := &mockStatsCollectorWithCapture{}
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: statsCollector,
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return endpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: &mockDiscoveryServiceWithEndpoints{endpoints: endpoints},
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ // Test 1: Non-streaming request
+ nonStreamingReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "stream": false,
+ "messages": []map[string]interface{}{{"role": "user", "content": "Hello"}},
+ }
+ reqBody, _ := json.Marshal(nonStreamingReq)
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ assert.Equal(t, http.StatusOK, rec.Code)
+
+ // Test 2: Streaming request
+ streamingReq := map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "stream": true,
+ "messages": []map[string]interface{}{{"role": "user", "content": "Hello"}},
+ }
+ reqBody2, _ := json.Marshal(streamingReq)
+ req2 := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody2))
+ rec2 := httptest.NewRecorder()
+ handler.ServeHTTP(rec2, req2)
+
+ assert.Equal(t, http.StatusOK, rec2.Code)
+
+ // Verify both events were recorded with correct streaming flag
+ events := statsCollector.getRecordedEvents()
+ require.Len(t, events, 2)
+
+ // First event should be non-streaming
+ assert.False(t, events[0].IsStreaming, "First request should be non-streaming")
+
+ // Second event should be streaming
+ assert.True(t, events[1].IsStreaming, "Second request should be streaming")
+}
+
+// TestTranslationHandler_MetricsRecordedForSuccessVsError verifies success/failure tracking
+func TestTranslationHandler_MetricsRecordedForSuccessVsError(t *testing.T) {
+ successBackend := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ // Add small delay to ensure measurable latency
+ time.Sleep(1 * time.Millisecond)
+ response := map[string]interface{}{"id": "msg_123", "type": "message"}
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusOK)
+ json.NewEncoder(w).Encode(response)
+ }))
+ defer successBackend.Close()
+
+ errorBackend := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ errorResp := map[string]interface{}{
+ "type": "error",
+ "error": map[string]interface{}{
+ "type": "invalid_request_error",
+ "message": "Test error",
+ },
+ }
+ w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
+ w.WriteHeader(http.StatusBadRequest)
+ json.NewEncoder(w).Encode(errorResp)
+ }))
+ defer errorBackend.Close()
+
+ successURL, _ := url.Parse(successBackend.URL)
+ errorURL, _ := url.Parse(errorBackend.URL)
+
+ profileLookup := &mockPassthroughProfileLookup{
+ configs: map[string]*domain.AnthropicSupportConfig{
+ "vllm": {Enabled: true, MessagesPath: "/v1/messages"},
+ },
+ }
+
+ trans := &mockPassthroughTranslator{
+ name: "anthropic",
+ passthroughEnabled: true,
+ profileLookup: profileLookup,
+ implementsErrorWriter: true,
+ }
+
+ statsCollector := &mockStatsCollectorWithCapture{}
+
+ // Test 1: Successful request
+ successEndpoints := []*domain.Endpoint{
+ {Name: "success-backend", URL: successURL, URLString: successBackend.URL, Type: "vllm", Status: domain.StatusHealthy},
+ }
+
+ proxyService := &mockProxyService{
+ proxyFunc: func(ctx context.Context, w http.ResponseWriter, r *http.Request, eps []*domain.Endpoint, stats *ports.RequestStats, rlog logger.StyledLogger) error {
+ client := &http.Client{Timeout: 5 * time.Second}
+ backendReq, err := http.NewRequest(r.Method, eps[0].URLString+r.URL.Path, r.Body)
+ if err != nil {
+ return err
+ }
+ resp, err := client.Do(backendReq)
+ if err != nil {
+ return err
+ }
+ defer resp.Body.Close()
+ for k, v := range resp.Header {
+ w.Header()[k] = v
+ }
+ w.WriteHeader(resp.StatusCode)
+ io.Copy(w, resp.Body)
+ return nil
+ },
+ }
+
+ app := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: statsCollector,
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return successEndpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: &mockDiscoveryServiceWithEndpoints{endpoints: successEndpoints},
+ Config: &config.Config{},
+ }
+
+ handler := app.translationHandler(trans)
+
+ reqBody, _ := json.Marshal(map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{{"role": "user", "content": "Hello"}},
+ })
+
+ req := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody))
+ rec := httptest.NewRecorder()
+ handler.ServeHTTP(rec, req)
+
+ assert.Equal(t, http.StatusOK, rec.Code)
+
+ // Test 2: Error request
+ errorEndpoints := []*domain.Endpoint{
+ {Name: "error-backend", URL: errorURL, URLString: errorBackend.URL, Type: "vllm", Status: domain.StatusHealthy},
+ }
+
+ app2 := &Application{
+ logger: &mockStyledLogger{},
+ proxyService: proxyService,
+ statsCollector: statsCollector,
+ repository: &mockEndpointRepository{getEndpointsFunc: func() []*domain.Endpoint { return errorEndpoints }},
+ inspectorChain: inspector.NewChain(&mockStyledLogger{}),
+ profileFactory: &mockProfileFactory{},
+ profileLookup: profileLookup,
+ discoveryService: &mockDiscoveryServiceWithEndpoints{endpoints: errorEndpoints},
+ Config: &config.Config{},
+ }
+
+ handler2 := app2.translationHandler(trans)
+
+ reqBody2, _ := json.Marshal(map[string]interface{}{
+ "model": "claude-3-5-sonnet-20241022",
+ "max_tokens": 1024,
+ "messages": []map[string]interface{}{{"role": "user", "content": "Hello"}},
+ })
+
+ req2 := httptest.NewRequest("POST", "/olla/anthropic/v1/messages", bytes.NewReader(reqBody2))
+ rec2 := httptest.NewRecorder()
+ handler2.ServeHTTP(rec2, req2)
+
+ assert.Equal(t, http.StatusBadRequest, rec2.Code)
+
+ // Verify metrics recorded for both success and error
+ events := statsCollector.getRecordedEvents()
+ require.Len(t, events, 2)
+
+ // First event should be successful
+ assert.True(t, events[0].Success, "First request should be successful")
+
+ // Second event should be successful (even though backend returned error, the handler processed it successfully)
+ // Backend errors are considered successful processing from the handler's perspective
+ assert.True(t, events[1].Success, "Second request should be successful (handler processed backend error)")
+}
diff --git a/internal/app/handlers/handler_translation_test.go b/internal/app/handlers/handler_translation_test.go
index 8cf852b1..4c9ab002 100644
--- a/internal/app/handlers/handler_translation_test.go
+++ b/internal/app/handlers/handler_translation_test.go
@@ -524,6 +524,10 @@ func (m *mockStatsCollector) GetModelStats() map[string]ports.ModelStats { retur
func (m *mockStatsCollector) GetModelEndpointStats() map[string]map[string]ports.EndpointModelStats {
return nil
}
+func (m *mockStatsCollector) RecordTranslatorRequest(event ports.TranslatorRequestEvent) {}
+func (m *mockStatsCollector) GetTranslatorStats() map[string]ports.TranslatorStats {
+ return nil
+}
func (m *mockStatsCollector) GetProxyStats() ports.ProxyStats { return ports.ProxyStats{} }
func (m *mockStatsCollector) GetEndpointStats() map[string]ports.EndpointStats { return nil }
func (m *mockStatsCollector) GetSecurityStats() ports.SecurityStats { return ports.SecurityStats{} }
diff --git a/internal/app/handlers/server_routes.go b/internal/app/handlers/server_routes.go
index 3f191356..60636c18 100644
--- a/internal/app/handlers/server_routes.go
+++ b/internal/app/handlers/server_routes.go
@@ -34,6 +34,7 @@ func (a *Application) registerRoutes() {
a.routeRegistry.RegisterWithMethod("/internal/status/endpoints", a.endpointsStatusHandler, "Endpoints status", "GET")
a.routeRegistry.RegisterWithMethod("/internal/status/models", a.modelsStatusHandler, "Models status", "GET")
a.routeRegistry.RegisterWithMethod("/internal/stats/models", a.modelStatsHandler, "Model statistics", "GET")
+ a.routeRegistry.RegisterWithMethod("/internal/stats/translators", a.translatorStatsHandler, "Translator statistics", "GET")
a.routeRegistry.RegisterWithMethod("/internal/process", a.processStatsHandler, "Process status", "GET")
a.routeRegistry.RegisterWithMethod("/version", a.versionHandler, "Olla version information", "GET")
diff --git a/internal/config/config.go b/internal/config/config.go
index 55edd601..8d933093 100644
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -121,8 +121,9 @@ func DefaultConfig() *Config {
},
Translators: TranslatorsConfig{
Anthropic: AnthropicTranslatorConfig{
- Enabled: false,
- MaxMessageSize: 10 << 20, // 10MB - Anthropic API limit,
+ Enabled: true,
+ PassthroughEnabled: true,
+ MaxMessageSize: 10 << 20, // 10MB - Anthropic API limit,
Inspector: InspectorConfig{
Enabled: false,
OutputDir: "logs/inspector/anthropic",
@@ -330,6 +331,11 @@ func applyEnvOverrides(config *Config) {
config.Translators.Anthropic.MaxMessageSize = size
}
}
+ if val := os.Getenv("OLLA_TRANSLATORS_ANTHROPIC_PASSTHROUGH_ENABLED"); val != "" {
+ if enabled, err := strconv.ParseBool(val); err == nil {
+ config.Translators.Anthropic.PassthroughEnabled = enabled
+ }
+ }
}
// parseByteSize parses human-readable byte sizes like "100MB", "1GB"
diff --git a/internal/config/config_test.go b/internal/config/config_test.go
index c34704ca..11246bf3 100644
--- a/internal/config/config_test.go
+++ b/internal/config/config_test.go
@@ -619,12 +619,47 @@ func TestLoadConfig_WithTranslatorConfig(t *testing.T) {
}
}
+func TestLoadConfig_WithPassthroughEnabledEnvVar(t *testing.T) {
+ // Test that OLLA_TRANSLATORS_ANTHROPIC_PASSTHROUGH_ENABLED overrides config
+ testCases := []struct {
+ name string
+ envValue string
+ expected bool
+ }{
+ {"disable passthrough via env var", "false", false},
+ {"enable passthrough via env var", "true", true},
+ {"disable passthrough via 0", "0", false},
+ {"enable passthrough via 1", "1", true},
+ }
+
+ for _, tc := range testCases {
+ t.Run(tc.name, func(t *testing.T) {
+ os.Setenv("OLLA_TRANSLATORS_ANTHROPIC_PASSTHROUGH_ENABLED", tc.envValue)
+ defer os.Unsetenv("OLLA_TRANSLATORS_ANTHROPIC_PASSTHROUGH_ENABLED")
+
+ cfg, err := Load()
+ if err != nil {
+ t.Fatalf("Load failed: %v", err)
+ }
+
+ if cfg.Translators.Anthropic.PassthroughEnabled != tc.expected {
+ t.Errorf("Expected PassthroughEnabled=%v from env var %q, got %v",
+ tc.expected, tc.envValue, cfg.Translators.Anthropic.PassthroughEnabled)
+ }
+ })
+ }
+}
+
func TestDefaultConfig_Translators(t *testing.T) {
cfg := DefaultConfig()
// Test Anthropic translator defaults
- if cfg.Translators.Anthropic.Enabled {
- t.Error("Expected Anthropic translator disabled by default")
+ if !cfg.Translators.Anthropic.Enabled {
+ t.Error("Expected Anthropic translator enabled by default")
+ }
+
+ if !cfg.Translators.Anthropic.PassthroughEnabled {
+ t.Error("Expected Anthropic translator passthrough enabled by default")
}
if cfg.Translators.Anthropic.Inspector.Enabled {
diff --git a/internal/config/types.go b/internal/config/types.go
index ed9c6e6c..f64268a7 100644
--- a/internal/config/types.go
+++ b/internal/config/types.go
@@ -181,6 +181,11 @@ type AnthropicTranslatorConfig struct {
Inspector InspectorConfig `yaml:"inspector"`
MaxMessageSize int64 `yaml:"max_message_size"`
Enabled bool `yaml:"enabled"`
+
+ // PassthroughEnabled controls whether requests can be forwarded directly
+ // to backends that natively support the Anthropic Messages API, bypassing
+ // the Anthropic-to-OpenAI translation pipeline (olla v0.0.23+).
+ PassthroughEnabled bool `yaml:"passthrough_enabled"`
}
// InspectorConfig holds configuration for request/response inspection
diff --git a/internal/core/constants/translator.go b/internal/core/constants/translator.go
new file mode 100644
index 00000000..8d734a53
--- /dev/null
+++ b/internal/core/constants/translator.go
@@ -0,0 +1,30 @@
+package constants
+
+// TranslatorMode represents how the request was handled by the translator
+type TranslatorMode string
+
+const (
+ // TranslatorModePassthrough indicates request was passed through natively (no translation)
+ TranslatorModePassthrough TranslatorMode = "passthrough"
+
+ // TranslatorModeTranslation indicates request was translated between formats
+ TranslatorModeTranslation TranslatorMode = "translation"
+)
+
+// TranslatorFallbackReason explains why passthrough wasn't used
+type TranslatorFallbackReason string
+
+const (
+ // FallbackReasonNone indicates no fallback occurred (passthrough succeeded)
+ FallbackReasonNone TranslatorFallbackReason = ""
+
+ // FallbackReasonNoCompatibleEndpoints means no endpoints support native format
+ FallbackReasonNoCompatibleEndpoints TranslatorFallbackReason = "no_compatible_endpoints"
+
+ // FallbackReasonTranslatorDoesNotSupportPassthrough means translator lacks passthrough capability
+ FallbackReasonTranslatorDoesNotSupportPassthrough TranslatorFallbackReason = "translator_does_not_support_passthrough"
+
+ // FallbackReasonCannotPassthrough means endpoints don't support native format
+ //nolint:gosec // false positive: "passthrough" is not a credential
+ FallbackReasonCannotPassthrough TranslatorFallbackReason = "cannot_passthrough"
+)
diff --git a/internal/core/domain/profile_config.go b/internal/core/domain/profile_config.go
index f43e673f..33c5aa2a 100644
--- a/internal/core/domain/profile_config.go
+++ b/internal/core/domain/profile_config.go
@@ -46,10 +46,15 @@ type ProfileConfig struct {
} `yaml:"routing"`
API struct {
- ModelDiscoveryPath string `yaml:"model_discovery_path"`
- HealthCheckPath string `yaml:"health_check_path"`
- Paths []string `yaml:"paths"`
- OpenAICompatible bool `yaml:"openai_compatible"`
+ // AnthropicSupport declares whether this backend natively speaks the
+ // Anthropic Messages API. When present and enabled, the translator layer
+ // can skip the Anthropic-to-OpenAI conversion and forward requests directly.
+ // Nil means the backend has no native Anthropic support (the common case).
+ AnthropicSupport *AnthropicSupportConfig `yaml:"anthropic_support,omitempty"`
+ ModelDiscoveryPath string `yaml:"model_discovery_path"`
+ HealthCheckPath string `yaml:"health_check_path"`
+ Paths []string `yaml:"paths"`
+ OpenAICompatible bool `yaml:"openai_compatible"`
} `yaml:"api"`
Resources struct {
@@ -105,3 +110,67 @@ type ContextPattern struct {
Pattern string `yaml:"pattern"`
Context int64 `yaml:"context"`
}
+
+// AnthropicSupportConfig declares native Anthropic Messages API support for a
+// backend platform. This enables the passthrough optimisation: when a backend
+// natively understands the Anthropic wire format, requests can be forwarded
+// directly without the costly Anthropic-to-OpenAI-and-back translation.
+//
+// Example YAML (in a profile's api section):
+//
+// api:
+// anthropic_support:
+// enabled: true
+// messages_path: "/v1/messages"
+// token_count: true
+// min_version: "2023-06-01"
+// limitations:
+// - "no_extended_thinking"
+// - "max_tokens_4096"
+type AnthropicSupportConfig struct {
+ // MessagesPath is the backend path that accepts Anthropic Messages API
+ // requests (e.g. "/v1/messages"). Required when Enabled is true.
+ MessagesPath string `yaml:"messages_path"`
+
+ // MinVersion is the minimum anthropic-version header value the backend
+ // requires. If the incoming request specifies an older version, the
+ // translator falls back to the translation path. Use the standard
+ // Anthropic version date format (e.g. "2023-06-01").
+ MinVersion string `yaml:"min_version,omitempty"`
+
+ // Limitations lists Anthropic features this backend does NOT support.
+ // Used by CanPassthrough to decide whether a particular request can be
+ // sent directly or must go through translation instead.
+ // Common values: "no_extended_thinking", "no_tool_use", "no_vision",
+ // "max_tokens_4096".
+ Limitations []string `yaml:"limitations,omitempty"`
+
+ // Enabled controls whether passthrough is active for this backend.
+ // Defaults to false so existing profiles remain unaffected.
+ Enabled bool `yaml:"enabled"`
+
+ // TokenCount indicates the backend supports the Anthropic token counting
+ // endpoint. When true, token count requests can also be passed through.
+ TokenCount bool `yaml:"token_count,omitempty"`
+}
+
+// HasLimitation reports whether the backend declares a specific limitation.
+// Callers use this to check whether a request feature (e.g. extended thinking)
+// is unsupported before attempting passthrough.
+func (c *AnthropicSupportConfig) HasLimitation(limitation string) bool {
+ if c == nil {
+ return false
+ }
+ for _, l := range c.Limitations {
+ if l == limitation {
+ return true
+ }
+ }
+ return false
+}
+
+// SupportsPassthrough is a convenience check that the config is non-nil and
+// explicitly enabled. Safe to call on a nil receiver.
+func (c *AnthropicSupportConfig) SupportsPassthrough() bool {
+ return c != nil && c.Enabled
+}
diff --git a/internal/core/ports/stats.go b/internal/core/ports/stats.go
index 48886c52..728f6863 100644
--- a/internal/core/ports/stats.go
+++ b/internal/core/ports/stats.go
@@ -3,6 +3,7 @@ package ports
import (
"time"
+ "github.com/thushan/olla/internal/core/constants"
"github.com/thushan/olla/internal/core/domain"
)
@@ -18,6 +19,10 @@ type StatsCollector interface {
GetModelStats() map[string]ModelStats
GetModelEndpointStats() map[string]map[string]EndpointModelStats
+ // Translator-specific tracking
+ RecordTranslatorRequest(event TranslatorRequestEvent)
+ GetTranslatorStats() map[string]TranslatorStats
+
GetProxyStats() ProxyStats
GetEndpointStats() map[string]EndpointStats
GetSecurityStats() SecurityStats
@@ -75,3 +80,41 @@ type EndpointModelStats struct {
AverageLatency int64 `json:"avg_latency_ms"`
ConsecutiveErrors int `json:"consecutive_errors"`
}
+
+// TranslatorRequestEvent captures metrics for a single translator request
+type TranslatorRequestEvent struct {
+ TranslatorName string // e.g. "anthropic"
+ Model string // requested model
+ Mode constants.TranslatorMode // passthrough or translation
+ FallbackReason constants.TranslatorFallbackReason // why passthrough wasn't used
+ Success bool // whether request succeeded
+ IsStreaming bool // streaming vs non-streaming
+ Latency time.Duration // end-to-end request duration
+}
+
+// TranslatorStats aggregates metrics for a specific translator
+type TranslatorStats struct {
+ TranslatorName string `json:"translator_name"`
+
+ // Total request counts
+ TotalRequests int64 `json:"total_requests"`
+ SuccessfulRequests int64 `json:"successful_requests"`
+ FailedRequests int64 `json:"failed_requests"`
+
+ // Mode breakdown
+ PassthroughRequests int64 `json:"passthrough_requests"` // native format used
+ TranslationRequests int64 `json:"translation_requests"` // format conversion required
+
+ // Streaming breakdown
+ StreamingRequests int64 `json:"streaming_requests"`
+ NonStreamingRequests int64 `json:"non_streaming_requests"`
+
+ // Fallback reasons (when passthrough couldn't be used)
+ FallbackNoCompatibleEndpoints int64 `json:"fallback_no_compatible_endpoints"`
+ FallbackTranslatorDoesNotSupportPassthrough int64 `json:"fallback_translator_does_not_support_passthrough"`
+ FallbackCannotPassthrough int64 `json:"fallback_cannot_passthrough"`
+
+ // Performance metrics
+ AverageLatency int64 `json:"avg_latency_ms"`
+ TotalLatency int64 `json:"total_latency_ms"`
+}
diff --git a/makefile b/makefile
index 3092d123..1c1767c7 100644
--- a/makefile
+++ b/makefile
@@ -18,7 +18,7 @@ LDFLAGS := -ldflags "\
-X '$(PKG).Tool=$(TOOL)' \
-X '$(PKG).User=$(USER)'"
-.PHONY: run clean build test test-verbose test-short test-race test-cover bench version install-deps check-deps
+.PHONY: run clean build test test-verbose test-short test-race test-cover bench version install-deps check-deps vet
# Build the application with version info
build:
@@ -183,11 +183,11 @@ clean:
deps:
@go mod download && go mod tidy
-ready-tools: fmt lint align
+ready-tools: fmt vet lint align
@echo -e "\033[32mCode is clean for tests!\033[0m"
-# Make code ready for commit (test, test-race, fmt, lint, align)
-ready: test-short test-race fmt lint align
+# Make code ready for commit (test, test-race, fmt, vet, lint, align)
+ready: test-short test-race fmt vet lint align
@echo -e "\033[32mCode is ready for commit!\033[0m"
# Build binaries only (no archives) to ./build directory
@@ -240,6 +240,12 @@ fmt:
@go fmt ./...
@echo "Running go fmt...Done!"
+# Run go vet
+vet:
+ @echo "Running go vet..."
+ @go vet ./...
+ @echo "Running go vet...Done!"
+
# Run linter
lint:
@echo "Running golangci-lint..."
@@ -311,7 +317,7 @@ dev:
@go build $(LDFLAGS) -gcflags="all=-N -l" -o bin/olla-dev .
# Run full CI pipeline locally
-ci: deps fmt lint test-race test-cover build
+ci: deps fmt vet lint test-race test-cover build
@echo "CI pipeline completed successfully!"
# Docker compose up with local config
@@ -341,8 +347,8 @@ help:
@echo " dev - Build development binary (with debug symbols)"
@echo " clean - Clean build artifacts and logs"
@echo " deps - Download and tidy dependencies"
- @echo " ready - Make code ready for commit (test, fmt, lint, align)"
- @echo " ready-tools - Check code is ready with tools (fmt, lint, align)"
+ @echo " ready - Make code ready for commit (test, fmt, vet, lint, align)"
+ @echo " ready-tools - Check code is ready with tools (fmt, vet, lint, align)"
@echo " validate-linux - Build and test Linux binaries (AMD64 + ARM64)"
@echo " validate-darwin - Build and test macOS binaries (Intel + Apple Silicon)"
@echo " validate-windows- Build and test Windows binaries (AMD64 + ARM64)"
@@ -355,6 +361,7 @@ help:
@echo " release-test - Test full release (binaries + docker + archives)"
@echo " goreleaser-check- Check goreleaser configuration"
@echo " fmt - Format code"
+ @echo " vet - Run go vet static analysis"
@echo " lint - Run linter (requires golangci-lint)"
@echo " align - Run alignment checker (requires betteralign)"
@echo " install-deps - Install dependencies at pinned versions"
diff --git a/readme.md b/readme.md
index f0246006..1dbff76f 100644
--- a/readme.md
+++ b/readme.md
@@ -145,10 +145,7 @@ You can learn more about [OpenWebUI Ollama with Olla](https://thushan.github.io/
### 🤖 **Anthropic Message API / CLI Tools - Claude Code, OpenCode, Crush**
-> [!CAUTION]
-> Introduced in v0.0.20+, the Anthropic implementation is *experimental* and should be used with caution.
-
-You can use CLI tools with Olla by using the new Anthropic Message API at `/olla/anthropic` to run Claude Code with Local AI models you have on your machine.
+Olla's Anthropic Messages API translation (v0.0.20+) is **enabled by default**, allowing you to use CLI tools like Claude Code with local AI models on your machine via `/olla/anthropic`. Still actively being improved -- please report any issues or feedback.
We have examples for: