Skip to content
Open
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
54d666c
add envresolver package
thushan May 15, 2026
f7fa69e
add file source support to envresolver
thushan May 15, 2026
1d1d591
add auth type constants
thushan May 15, 2026
9c5481e
add auth and headers fields to endpoint config
thushan May 15, 2026
fa05256
add precomputed auth fields to endpoint domain
thushan May 15, 2026
2987c4e
silence gosec on auth constants
thushan May 15, 2026
59d58dc
validate endpoint auth shape on load
thushan May 15, 2026
290b58a
resolve env and file references in endpoint auth
thushan May 15, 2026
41e458e
precompute endpoint auth headers on load
thushan May 15, 2026
895963d
pass endpoint to copyheaders
thushan May 15, 2026
041168d
inject endpoint auth in copyheaders
thushan May 15, 2026
390216b
inject endpoint custom headers in copyheaders
thushan May 15, 2026
acf438f
inject endpoint auth on health probes
thushan May 15, 2026
99558b8
classify auth failures as config error in health
thushan May 15, 2026
b70fe25
respect retry-after on 429 health responses
thushan May 15, 2026
9ee9a0a
skip retry on post once response body started
thushan May 15, 2026
fb59dc5
fix retry safety lint
thushan May 15, 2026
cb731b3
add proxy env and timeout to transports
thushan May 15, 2026
1811013
test proxy env and timeout on transports
thushan May 15, 2026
3ac99f0
cover auth across proxy handlers
thushan May 15, 2026
4ec48e6
gofmt new test files
thushan May 15, 2026
7659f96
add aimock auth fixture
thushan May 15, 2026
47fca59
add auth test scripts
thushan May 15, 2026
71b80aa
add authenticated local backend docs
thushan May 15, 2026
215dd4a
add experimental remote backend notes
thushan May 15, 2026
73fee36
link auth docs from backend overview
thushan May 15, 2026
c9d55ef
redact secrets in access log query strings
thushan May 15, 2026
94a3efd
hide endpoint url from json output
thushan May 15, 2026
7e59e67
add auth hint to backend profiles
thushan May 15, 2026
a018a5c
align endpoint models struct tags
thushan May 15, 2026
09bd42d
unwrap response writer in retry tracker
thushan May 15, 2026
0e3db60
apply endpoint auth on model discovery
thushan May 15, 2026
c5f5bab
persist rate-limited-until on endpoint updates
thushan May 15, 2026
5693c5b
drop proxy-from-env on outbound proxy transports
thushan May 15, 2026
74d2f8d
strip sensitive headers from upstream responses
thushan May 15, 2026
c9e4bbf
mark endpoint headers map as json-ignored
thushan May 15, 2026
f83380f
align domain struct fields
thushan May 15, 2026
c8ad0dc
drop proxy-from-env on health transport
thushan May 15, 2026
e780f45
strip endpoint-configured headers from upstream responses
thushan May 15, 2026
c51a205
local-first callout on the index page
thushan May 15, 2026
4fde4a7
document auth and headers in config reference
thushan May 15, 2026
143f9d9
link auth from vllm, llamacpp and lmstudio docs
thushan May 15, 2026
134873a
mention auth in configuration overview
thushan May 15, 2026
ed62416
add auth entries to the FAQ
thushan May 15, 2026
d281992
note response header stripping in security docs
thushan May 15, 2026
bf6ade4
fix em-dashes in auth FAQ and security doc
thushan May 15, 2026
9e81f0b
remove em-dashes from the auth docs
thushan May 15, 2026
a0104d1
fix doubled path in remote backend recipes
thushan May 15, 2026
6b43d5f
use bearer for litellm and clarify api_key in auth docs
thushan May 15, 2026
0d9a411
replace stale auth warning in openwebui docs
thushan May 15, 2026
fbc53b1
recommend auth block over headers map in lmdeploy docs
thushan May 15, 2026
10ad1e0
use a generic var name in api_key default example
thushan May 15, 2026
a97993f
check GetAll errors in repository tests
thushan May 15, 2026
06f3d90
drop stale phase reference in health auth test
thushan May 15, 2026
28ce2d3
URL-decode query keys before redaction check
thushan May 15, 2026
540681a
use LookupEnv to distinguish unset from empty in envresolver
thushan May 15, 2026
efb3c0f
use portable timestamp in auth-env-fatal.sh
thushan May 15, 2026
940fc48
refined descriptions
thushan May 15, 2026
532c473
bump compatible deps within go 1.24
thushan May 15, 2026
1189dfb
note go 1.24 constraint and pinned deps
thushan May 15, 2026
80ea503
actually round-trip in copyheaders auth test
thushan May 15, 2026
a4ca669
treat timeout as failure in auth-env-fatal
thushan May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ Always run `make ready` before committing changes.
- **Version Management**: Build-time version injection via `internal/version`

### Development Guidelines
- Go 1.24+
- Go 1.24 (do not bump to 1.25; see Dependencies for held-back packages)
- Australian English for comments and documentation
- Comment on **why** rather than **what**
- Always run `make ready` before committing
Expand All @@ -237,6 +237,19 @@ Always run `make ready` before committing changes.

Do not add additional dependencies unless explicitly asked.

### Go 1.24 Compatibility Pins

Olla targets Go 1.24. From the versions listed below onward, the upstream `go` directive moves to 1.25, so these packages are held back:

- `golang.org/x/sys` at v0.41.0 (v0.42.0+ requires Go 1.25)
- `golang.org/x/term` at v0.40.0
- `golang.org/x/text` at v0.34.0
- `golang.org/x/sync` at v0.19.0
- `golang.org/x/time` at v0.14.0
- `atomicgo.dev/keyboard` at v0.2.9

`go get -u ./...` will silently bump the toolchain to 1.25 by pulling these. After running it, check `go.mod` and pin the affected packages back to the versions above, or use `go get -u=patch ./...` to limit upgrades to patch releases only.

## SUB-AGENT DELEGATION

CRITICAL: Always delegate tasks to the appropriate subagent. Do NOT perform work directly in the main context.
Expand Down
5 changes: 5 additions & 0 deletions config/profiles/litellm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ characteristics:
max_concurrent_requests: 100 # LiteLLM handles high concurrency well
default_priority: 95 # High priority as a unified gateway
streaming_support: true
auth:
required: false # optional but common in production deployments
types:
- bearer # master key via Authorization: Bearer
- api_key # some versions accept x-goog-api-key or custom header

# Detection hints for auto-discovery
detection:
Expand Down
4 changes: 4 additions & 0 deletions config/profiles/llamacpp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,10 @@ characteristics:
default_priority: 95 # High priority for direct GGUF inference
streaming_support: true
single_model_server: true # important: One model per instance
auth:
required: false
types:
- bearer # enabled via --api-key flag

# Detection hints for auto-discovery
detection:
Expand Down
4 changes: 4 additions & 0 deletions config/profiles/ollama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ characteristics:
max_concurrent_requests: 10
default_priority: 100
streaming_support: true
auth:
required: false
types:
- bearer # used by Ollama Cloud and protected remote instances

# Detection hints for auto-discovery
detection:
Expand Down
5 changes: 5 additions & 0 deletions config/profiles/openai-compatible.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ characteristics:
max_concurrent_requests: 20
default_priority: 50
streaming_support: true
auth:
required: false
types:
- bearer # standard Authorization: Bearer for most OpenAI-compatible APIs
- api_key # some backends use a custom header (set header: in auth config)

# Detection hints for auto-discovery
detection:
Expand Down
4 changes: 4 additions & 0 deletions config/profiles/vllm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ characteristics:
max_concurrent_requests: 100
default_priority: 80
streaming_support: true
auth:
required: false
types:
- bearer # enabled via --api-key flag

# Detection hints for auto-discovery
detection:
Expand Down
192 changes: 192 additions & 0 deletions docs/content/configuration/endpoint-auth-remote.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
---
title: Remote Backend Auth (Experimental) - Cloud API Recipes
description: Experimental recipes for using Olla with remote cloud APIs like Ollama Cloud, OpenRouter, and Groq. Understand the limitations and caveats before use.
keywords: olla remote, cloud api, ollama cloud, openrouter, groq, experimental
---

# Remote Backend Auth (Experimental)

!!! warning "Not officially supported"
Olla is designed for **local, self-hosted inference backends**. Remote cloud APIs are not
a first-class use case. The recipes below work today for users who want to experiment, but
we make no guarantees about continued compatibility, and issues specific to cloud providers
will not be prioritised.

If you want to use hosted APIs, consider LiteLLM as an intermediary. It handles the
provider-specific quirks, and Olla then talks to LiteLLM as a local OpenAI-compatible endpoint.

## Why Cloud APIs Are Not First-Class

Cloud inference APIs have operational characteristics that Olla does not currently handle:

- **Rate limit headers** (`x-ratelimit-*`, `retry-after`): Olla does not parse or propagate
provider-specific rate limit signalling beyond honouring 429 for health state.
- **Path-prefix base URLs**: Some APIs require a base path in the URL
(e.g. `https://api.groq.com/openai/v1`). See below for how this interacts with health and
model discovery.
- **Cold-start latency**: Serverless-backed providers can have high first-token latency that
exceeds Olla's default health check timeouts.
- **Model namespacing**: Many cloud APIs use `provider/model-name` format. Olla's model
discovery and unification are tuned for local naming conventions.
- **No local health check**: Cloud APIs do not expose a `/health` endpoint. Health checks
against `/v1/models` incur real API calls and may consume quota.

## URL Construction for Path-Prefixed Bases

Olla joins discovery paths onto the base URL path using `path.Join`. For a base like
`https://api.groq.com/openai/v1`, the default health or model path `/v1/models` gets
joined as `/openai/v1/v1/models` -- a doubled prefix that silently breaks health checks and
model discovery.

Set explicit absolute `health_check_url` and `model_url` values to bypass the join entirely.
`ResolveURLPath` returns absolute URLs as-is, so `https://api.groq.com/openai/v1/models`
goes to the wire unchanged. This only affects discovery; proxy-time URL building is
controlled separately by `preserve_path`.

## What We Don't Promise

- Health check accuracy for cloud endpoints
- Correct model listing or unification across local and remote endpoints
- Retry or backoff behaviour that respects provider-specific rate limiting
- Compatibility with provider authentication changes

## Recipes

These configurations work at the time of writing. Treat them as starting points, not
production-tested deployments.

### Ollama Cloud

Ollama Cloud (`https://ollama.com`) accepts bearer authentication. Set your API key from
[ollama.com/settings/keys](https://ollama.com/settings/keys).

```yaml
discovery:
static:
endpoints:
- url: "https://ollama.com"
name: "ollama-cloud"
type: "ollama"
priority: 10 # lower than local instances
check_interval: 60s # avoid hammering cloud health checks
check_timeout: 10s
auth:
type: bearer
token: "${OLLAMA_CLOUD_API_KEY}"
```

**Known limitations:**

- The Ollama Cloud API surface may differ from local Ollama. Model names include the namespace
(e.g. `hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF`).
- Health check hits `/`, which works on the Ollama Cloud base URL.

### OpenRouter

OpenRouter exposes an OpenAI-compatible API at `https://openrouter.ai/api/v1`. The `/api/v1`
prefix path means you need `preserve_path: true` to prevent Olla from stripping it.

```yaml
discovery:
static:
endpoints:
- url: "https://openrouter.ai/api/v1"
name: "openrouter"
type: "openai-compatible"
priority: 10
preserve_path: true # required: prevents stripping the /api/v1 prefix
health_check_url: "https://openrouter.ai/api/v1/models"
model_url: "https://openrouter.ai/api/v1/models"
check_interval: 120s
check_timeout: 15s
auth:
type: bearer
token: "${OPENROUTER_API_KEY}"
```

**Known limitations:**

- Health checks probe `/api/v1/models` which incurs an API call. Set `check_interval` high
to avoid burning quota.
- OpenRouter requires an `HTTP-Referer` header for attribution on some tiers. Use `headers:`
to set it:

```yaml
headers:
HTTP-Referer: "https://your-app.example.com"
X-Title: "Your App Name"
```

- Model names include the provider prefix (e.g. `openai/gpt-4o`, `anthropic/claude-3-5-sonnet`).
These will not unify with local model names.

### Groq

Groq provides a fast OpenAI-compatible inference API.

```yaml
discovery:
static:
endpoints:
- url: "https://api.groq.com/openai/v1"
name: "groq"
type: "openai-compatible"
priority: 10
preserve_path: true
health_check_url: "https://api.groq.com/openai/v1/models"
model_url: "https://api.groq.com/openai/v1/models"
check_interval: 120s
check_timeout: 10s
auth:
type: bearer
token: "${GROQ_API_KEY}"
```

**Known limitations:**

- Same health check cost caveat as OpenRouter.
- Groq's rate limits are aggressive on the free tier. A misconfigured health interval can
exhaust rate limits before any inference requests are made.

## Mixing Local and Remote

You can combine local and remote endpoints. Set priorities so local endpoints are strongly
preferred and remote endpoints act as overflow:

```yaml
discovery:
static:
endpoints:
# Local, always preferred
- url: "http://localhost:8000"
name: "local-vllm"
type: "vllm"
priority: 100

# Remote fallback
- url: "https://api.groq.com/openai/v1"
name: "groq-fallback"
type: "openai-compatible"
priority: 5
preserve_path: true
health_check_url: "https://api.groq.com/openai/v1/models"
model_url: "https://api.groq.com/openai/v1/models"
check_interval: 120s
auth:
type: bearer
token: "${GROQ_API_KEY}"
```

With `load_balancer: priority`, requests only reach the remote endpoint when all local
endpoints are unhealthy.

## Community Contributions

If you build cloud-specific profile YAML files or improve health check behaviour for cloud
APIs, PRs are welcome. See [Contributing](../development/contributing.md).

## See Also

- [Endpoint Authentication](endpoint-auth.md): auth configuration reference
- [Configuration Overview](overview.md): general configuration
- [LiteLLM Integration](../integrations/backend/litellm.md): recommended cloud API gateway
Loading
Loading