Skip to content

feat: Authentication for local backends#146

Open
thushan wants to merge 62 commits into
mainfrom
feature/local-auth
Open

feat: Authentication for local backends#146
thushan wants to merge 62 commits into
mainfrom
feature/local-auth

Conversation

@thushan
Copy link
Copy Markdown
Owner

@thushan thushan commented May 15, 2026

An attempt to address #132.

Adds per-endpoint auth for local backends. Three types: bearer, api_key, basic. Credentials come from inline strings, env vars (${VAR}), or _file siblings for Docker / k8s. Configurred per endpoint, not per profile. Also we're focused on local backends only.

auth:
type: bearer
token: "${VLLM_API_KEY}"

Works against vLLM / llama.cpp / LiteLLM with --api-key, or anything behind a bearer-auth reverse proxy. There's also a headers: map for custom outbound headers as an escape hatch.

The plumbing works for them but cloud-specific things (rate limits, model namespacing, path-prefix bases) aren't first-class concerns.

A few related things tagged along:

  • 401 / 403 from a health probe now show as config_error instead of dead
  • 429 honours Retry-After, doesn't trip the circuit breaker
  • POST retries skipped once response bytes have flushed (no double billing on RST mid-stream)
  • Upstream response headers get a strip list mirroring the request side, plus dynamic deny for any header named in the endpoint's auth or headers: config
  • HTTPS_PROXY removed from outbound proxy transports (kept on health)
  • Auth applied to model discovery too, not just proxy and health

Unresolved ${VAR} is fatal at startup with a suitable error message at startup rather than later, we figured.

Letting this simmer for a bit.

Summary by CodeRabbit

  • New Features

    • Per-endpoint outbound authentication (bearer, api_key, basic)
    • Environment-variable expansion and file-backed secrets for credentials
    • Endpoint-specific custom header injection and auth-aware health probes
    • HTTP 429 handling with Retry-After parsing and rate-limited scheduling
    • Proxy/request/response header handling updated to inject auth and strip sensitive upstream headers
    • Sensitive endpoint/auth fields omitted from JSON responses
  • Documentation

    • Endpoint auth guide, remote-backend recipes and updated integration/security docs

thushan added 30 commits May 15, 2026 15:40
Health probes now carry the same per-endpoint auth and custom headers
that proxy requests do. Without this every authenticated backend gets
marked unhealthy on its first probe because it returns 401.

injectEndpointAuth applies endpoint.Headers then AuthHeaderName/Value
with the same precedence as CopyHeaders, but skips the client-request
strip logic that is inappropriate for synthesised probe requests.
401 and 403 from a health probe now map to StatusConfigError rather than
StatusUnhealthy. This tells the operator the backend is alive but the
credentials are wrong, and avoids tripping the circuit breaker on a
problem that retrying can never fix.

429 maps to StatusRateLimited using the same non-routable, no-CB logic.

Both statuses appear in /internal/status/endpoints so the operator can
distinguish a misconfigured endpoint from a genuinely dead one.
When a health probe receives 429, the Retry-After header is parsed
(delay-seconds and HTTP-date formats per RFC 9110) and stored as
RateLimitedUntil on the endpoint. The scheduler skips any endpoint
whose window hasn't elapsed, preventing quota exhaustion on throttled
backends.

Missing or malformed Retry-After falls back to 30s with an Info log
so the operator can investigate without the system going silent.
429 never trips the circuit breaker — it's a policy response, not a failure.
ExecuteWithRetry now suppresses failover for non-idempotent methods
(POST, PATCH, DELETE) when the response has already been committed to
the client. Retrying after bytes are flushed would double-charge metered
APIs or send the client a corrupted response from two different backends.

GET, HEAD and OPTIONS are unconditionally retried — they are defined as
idempotent by RFC 9110. The responseStartedWriter wrapper tracks whether
WriteHeader or Write has been called without allocating on the hot path.
Five end-to-end bash scripts covering bearer, api_key, basic, headers-only,
and the missing-env-var fatal case. Each script spins up the Go mock backend,
starts Olla with a generated config, and asserts both the happy path (correct
credential → 200) and the failure path (wrong credential → non-200). The
env-fatal script has no backend at all — it proves Olla exits non-zero with
the endpoint name and missing variable in the error message.
@thushan thushan self-assigned this May 15, 2026
@thushan thushan added documentation Improvements or additions to documentation enhancement New feature or request configuration Configuration bug or issue work in progress We're working on it labels May 15, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@thushan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 34 minutes and 26 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 48465ade-7e89-4647-9e76-543cd1c3fdb5

📥 Commits

Reviewing files that changed from the base of the PR and between 1189dfb and a4ca669.

📒 Files selected for processing (2)
  • internal/adapter/proxy/core/common_auth_test.go
  • test/scripts/auth/auth-env-fatal.sh

Walkthrough

Olla gains per-endpoint outbound authentication supporting bearer tokens, API keys, and HTTP Basic credentials. Configuration includes inline values, file-based secrets with environment variable expansion, and custom headers. Health checks inject credentials and detect rate-limits via HTTP 429 responses. Proxy request/response handlers apply endpoint-aware header injection and strip sensitive headers from upstream responses. Comprehensive documentation and integration tests validate all auth flows.

Changes

Per-Endpoint Authentication System

Layer / File(s) Summary
Configuration types and authentication constants
internal/config/types.go, internal/config/types_test.go, internal/core/constants/auth.go, internal/core/constants/auth_test.go, internal/core/domain/profile_config.go
New AuthConfig struct for per-endpoint credentials (bearer/api_key/basic with inline and _file variants); EndpointConfig gains Auth and Headers fields; exported auth constants (type names, HTTP header names, scheme prefixes); AuthHint type for profile capability hints; YAML round-trip test coverage.
Authentication validation and credential resolution
internal/adapter/discovery/auth.go, internal/adapter/discovery/auth_test.go
Shape validation for bearer/api_key/basic auth configs; credential resolution with ${VAR} environment expansion and _file secret loading; HTTP header name/value precomputation (Authorization for bearer/basic, custom header for api_key); test coverage for validation, resolution, env var missing/empty/conflict cases, and security (auth header value excluded from JSON).
Endpoint discovery and config loading
internal/adapter/discovery/repository.go, internal/adapter/discovery/repository_test.go, internal/adapter/discovery/http_client.go, internal/adapter/discovery/http_client_test.go, internal/core/domain/healthcheck.go, internal/core/domain/endpoint.go
StaticEndpointRepository validates, resolves, and precomputes auth during config loading; UpdateEndpoint persists RateLimitedUntil; HTTPModelDiscoveryClient injects endpoint headers and auth onto discovery requests; HealthCheckResult gains RateLimitedUntil field; Endpoint struct gains auth fields (AuthHeaderName, AuthHeaderValue, Headers, RateLimitedUntil) with json:"-" tags for security.
Health checking with authentication and rate-limiting
internal/adapter/health/client.go, internal/adapter/health/client_auth_test.go, internal/adapter/health/client_classify_test.go, internal/adapter/health/client_ratelimit_test.go, internal/adapter/health/client_transport_test.go, internal/adapter/health/checker.go, internal/adapter/health/types.go
Health client injects endpoint auth/headers onto probes; HTTP 401/403 classified as StatusConfigError (no circuit-breaker); HTTP 429 parsed for Retry-After (numeric/HTTP-date) with fallback backoff, setting RateLimitedUntil; health checker skips rate-limited endpoints; transport adds ResponseHeaderTimeout to prevent hanging; tests cover auth injection, missing auth, rate-limit parsing, transport config.
Proxy request/response header handling
internal/adapter/proxy/core/common.go, internal/adapter/proxy/core/common_auth_test.go, internal/adapter/proxy/core/common_test.go, internal/adapter/proxy/core/retry.go, internal/adapter/proxy/core/retry_safety_test.go, internal/adapter/proxy/olla/service.go, internal/adapter/proxy/olla/service_retry.go, internal/adapter/proxy/olla/service_transport_test.go, internal/adapter/proxy/sherpa/service.go, internal/adapter/proxy/sherpa/service_retry.go, internal/adapter/proxy/sherpa/service_transport_test.go
CopyHeaders expanded to accept endpoint; applies endpoint custom headers then endpoint auth (precedence); CopyResponseHeaders filters upstream response by denying static sensitive headers plus endpoint auth header and configured keys; Olla/Sherpa proxy services call endpoint-aware helpers; retry logic tracks response-start via responseStartedWriter, skipping retry for non-idempotent methods when bytes committed; transport configuration uses shared proxy constants.
Query string sanitisation
internal/app/middleware/logging.go, internal/app/middleware/logging_test.go
AccessLoggingMiddleware redacts values for sensitive query keys (api_key, token, password, access_token, secret, auth) with [REDACTED] marker; non-sensitive parameters unaffected; comprehensive test coverage.
Environment variable expansion utility
pkg/envresolver/resolver.go, pkg/envresolver/resolver_test.go, pkg/envresolver/platform_test.go
New pkg/envresolver package supporting ${VAR} and ${VAR:-default} placeholder expansion; Expand replaces unset with empty/default; ExpandStrict errors on missing vars; ExpandWithFile enforces value vs _file mutual exclusivity with file reading/trimming; tests cover all patterns and error cases.
Profile authentication hints
config/profiles/litellm.yaml, config/profiles/llamacpp.yaml, config/profiles/ollama.yaml, config/profiles/openai-compatible.yaml, config/profiles/vllm.yaml
Profile Characteristics.auth sections declare supported auth types (bearer, api_key, basic) and required flag for each backend platform profile.
Comprehensive documentation
docs/content/configuration/endpoint-auth.md, docs/content/configuration/endpoint-auth-remote.md, docs/content/configuration/overview.md, docs/content/configuration/reference.md, docs/content/configuration/practices/security.md, docs/content/faq.md, docs/content/index.md, docs/content/integrations/backend/llamacpp.md, docs/content/integrations/backend/lmdeploy.md, docs/content/integrations/backend/lmstudio.md, docs/content/integrations/backend/ollama.md, docs/content/integrations/backend/vllm.md, docs/content/integrations/frontend/openwebui.md, docs/mkdocs.yml
Endpoint authentication guide with bearer/api_key/basic examples, environment variable expansion, file-based secrets, Docker Compose/Kubernetes examples, header escapes, fail-fast validation, and provider recipes; remote backend auth (experimental) with cloud provider recipes (Ollama Cloud, OpenRouter, Groq) and limitations; FAQ entries on config_error, rate_limited, and env var handling; integration guides updated; overview links auth docs; home page clarifies local-first focus; nav updated.
Integration tests and fixtures
test/cmd/mockbackend/main.go, test/manual/config.auth.yaml, test/scripts/auth/auth-bearer.sh, test/scripts/auth/auth-basic.sh, test/scripts/auth/auth-api-key.sh, test/scripts/auth/auth-headers-only.sh, test/scripts/auth/auth-env-fatal.sh, test/scripts/auth/lib.sh, internal/app/handlers/handler_auth_routes_test.go, makefile
Mock backend server enforces header-based auth on /v1/models and /v1/chat/completions; bash integration test scripts validate bearer, api_key, basic, and header-only flows, plus environment variable fatal errors; shared test library with port-freeing and HTTP utilities; handler-layer regression tests verify auth endpoints flow through routes unchanged; makefile targets orchestrate tests.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • thushan/olla#71: Both PRs modify internal/adapter/proxy/core/common.go’s CopyHeaders logic (header-copy behavior).
  • thushan/olla#43: Refactors hardcoded HTTP/auth header strings to centralised constants, overlapping header-handling changes.
  • thushan/olla#99: Modifies proxy request-preparation paths; relates to header injection and proxy flow changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/local-auth

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/adapter/discovery/repository_test.go`:
- Around line 871-873: The test currently ignores the error returned by
repo.GetAll() (eps, _ := repo.GetAll(context.Background())) and then indexes
into eps which can mask repository failures; update the calls to check the
returned error from GetAll (e.g., eps, err := repo.GetAll(context.Background());
if err != nil { t.Fatalf("GetAll failed: %v", err) } or use require.NoError(t,
err)) before accessing eps and similarly fix the other occurrence around the
eps/ep usage (lines referencing eps, ep) so the test fails with a clear message
on repo errors instead of panicking.

In `@internal/adapter/health/client_auth_test.go`:
- Around line 65-83: The comment in TestHealthProbe_MissingAuth is outdated
about future refinement; update or remove the "Phase 6 will refine this to
StatusConfigError" phrasing because determineStatus in client.go already maps
HTTP 401 to domain.StatusConfigError. Edit the comment in
TestHealthProbe_MissingAuth to state the current behavior (401 is mapped to
StatusConfigError) or strip the Phase 6 reference so the test comment accurately
reflects determineStatus's mapping.

In `@internal/app/middleware/logging.go`:
- Around line 229-237: The redaction loop currently compares the raw key string
k (from strings.Cut(pair, "=")) against sensitiveQueryKeys, which allows
percent-encoded keys to bypass detection; update the logic in the loop that sets
k (and uses pair) to URL-decode the key (use url.QueryUnescape or equivalent)
before doing the sensitive-key comparison in the sensitiveQueryKeys loop and use
the decoded key only for the equality checks (keep the original pair/value
intact for reconstruction), and handle any decode errors by treating the key as
non-sensitive (i.e., fall back to the original k) so existing behavior is
preserved when decoding fails.

In `@pkg/envresolver/resolver.go`:
- Around line 32-34: The current use of os.Getenv(name) in resolver.go (the v :=
os.Getenv(name); v != "" check) conflates unset and explicitly empty environment
variables; change the lookup in the resolver function to use os.LookupEnv(name)
so you can detect existence (if exists { return v }) and thus return an
explicitly set empty string, or if the behavior was intentional add a clarifying
comment on the same check explaining why empty values should fall through to
defaults; update the code paths that reference this branch (the os.Getenv(...)
check) accordingly.

In `@test/scripts/auth/auth-env-fatal.sh`:
- Line 32: The MISSING_VAR assignment uses date +%s%N which breaks on macOS
because %N (nanoseconds) is unsupported; change the assignment of MISSING_VAR
(currently "OLLA_TEST_MISSING_TOKEN_$(date +%s%N)") to use a cross-platform
timestamp (e.g., seconds-only: $(date +%s)) or another portable uniqueness
method so the script runs on macOS and Linux for the auth-env-fatal.sh test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: be3f5d1a-c529-46df-9223-4960b0a57c4b

📥 Commits

Reviewing files that changed from the base of the PR and between 6d6ac4d and 10ad1e0.

📒 Files selected for processing (71)
  • config/profiles/litellm.yaml
  • config/profiles/llamacpp.yaml
  • config/profiles/ollama.yaml
  • config/profiles/openai-compatible.yaml
  • config/profiles/vllm.yaml
  • docs/content/configuration/endpoint-auth-remote.md
  • docs/content/configuration/endpoint-auth.md
  • docs/content/configuration/overview.md
  • docs/content/configuration/practices/security.md
  • docs/content/configuration/reference.md
  • docs/content/faq.md
  • docs/content/index.md
  • docs/content/integrations/backend/llamacpp.md
  • docs/content/integrations/backend/lmdeploy.md
  • docs/content/integrations/backend/lmstudio.md
  • docs/content/integrations/backend/ollama.md
  • docs/content/integrations/backend/vllm.md
  • docs/content/integrations/frontend/openwebui.md
  • docs/mkdocs.yml
  • internal/adapter/discovery/auth.go
  • internal/adapter/discovery/auth_test.go
  • internal/adapter/discovery/http_client.go
  • internal/adapter/discovery/http_client_test.go
  • internal/adapter/discovery/repository.go
  • internal/adapter/discovery/repository_test.go
  • internal/adapter/health/checker.go
  • internal/adapter/health/client.go
  • internal/adapter/health/client_auth_test.go
  • internal/adapter/health/client_classify_test.go
  • internal/adapter/health/client_ratelimit_test.go
  • internal/adapter/health/client_transport_test.go
  • internal/adapter/health/types.go
  • internal/adapter/proxy/config/unified.go
  • internal/adapter/proxy/core/common.go
  • internal/adapter/proxy/core/common_auth_test.go
  • internal/adapter/proxy/core/common_test.go
  • internal/adapter/proxy/core/retry.go
  • internal/adapter/proxy/core/retry_safety_test.go
  • internal/adapter/proxy/olla/service.go
  • internal/adapter/proxy/olla/service_retry.go
  • internal/adapter/proxy/olla/service_transport_test.go
  • internal/adapter/proxy/sherpa/service.go
  • internal/adapter/proxy/sherpa/service_retry.go
  • internal/adapter/proxy/sherpa/service_transport_test.go
  • internal/app/handlers/handler_auth_routes_test.go
  • internal/app/middleware/logging.go
  • internal/app/middleware/logging_test.go
  • internal/config/types.go
  • internal/config/types_test.go
  • internal/core/constants/auth.go
  • internal/core/constants/auth_test.go
  • internal/core/domain/auth_hint_test.go
  • internal/core/domain/endpoint.go
  • internal/core/domain/endpoint_test.go
  • internal/core/domain/healthcheck.go
  • internal/core/domain/json_safety_test.go
  • internal/core/domain/model.go
  • internal/core/domain/profile_config.go
  • internal/logger/styled_plain.go
  • makefile
  • pkg/envresolver/platform_test.go
  • pkg/envresolver/resolver.go
  • pkg/envresolver/resolver_test.go
  • test/cmd/mockbackend/main.go
  • test/manual/config.auth.yaml
  • test/scripts/auth/auth-api-key.sh
  • test/scripts/auth/auth-basic.sh
  • test/scripts/auth/auth-bearer.sh
  • test/scripts/auth/auth-env-fatal.sh
  • test/scripts/auth/auth-headers-only.sh
  • test/scripts/auth/lib.sh

Comment thread internal/adapter/discovery/repository_test.go Outdated
Comment thread internal/adapter/health/client_auth_test.go
Comment thread internal/app/middleware/logging.go
Comment thread pkg/envresolver/resolver.go Outdated
Comment thread test/scripts/auth/auth-env-fatal.sh Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/scripts/auth/auth-env-fatal.sh`:
- Around line 111-116: The test currently treats any non-zero EXIT_CODE as a
successful "fail-fast" startup, which yields false positives when timeout (exit
code 124) occurs; update the conditional around EXIT_CODE to treat 124 as a
timeout failure instead of a valid non-zero failure: use EXIT_CODE to explicitly
detect timeout (124) and call fail with a message about the timeout, otherwise
if EXIT_CODE is a non-124 non-zero value call pass, and if EXIT_CODE is 0 call
fail that startup did not abort; reference EXIT_CODE and the pass/fail helpers
when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 13c6afdd-b511-4799-9c60-f9a82313c49c

📥 Commits

Reviewing files that changed from the base of the PR and between 10ad1e0 and 1189dfb.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (30)
  • CLAUDE.md
  • docs/content/development/setup.md
  • go.mod
  • internal/adapter/discovery/auth.go
  • internal/adapter/discovery/auth_test.go
  • internal/adapter/discovery/repository_test.go
  • internal/adapter/health/checker.go
  • internal/adapter/health/client.go
  • internal/adapter/health/client_auth_test.go
  • internal/adapter/health/client_ratelimit_test.go
  • internal/adapter/health/client_transport_test.go
  • internal/adapter/health/types.go
  • internal/adapter/proxy/core/common.go
  • internal/adapter/proxy/core/common_auth_test.go
  • internal/adapter/proxy/core/common_test.go
  • internal/adapter/proxy/core/retry.go
  • internal/adapter/proxy/core/retry_safety_test.go
  • internal/adapter/proxy/olla/service_transport_test.go
  • internal/adapter/proxy/sherpa/service_transport_test.go
  • internal/app/handlers/handler_auth_routes_test.go
  • internal/app/middleware/logging.go
  • internal/app/middleware/logging_test.go
  • internal/core/constants/auth.go
  • internal/core/domain/endpoint.go
  • internal/core/domain/json_safety_test.go
  • internal/core/domain/profile_config.go
  • pkg/envresolver/resolver.go
  • pkg/envresolver/resolver_test.go
  • test/cmd/mockbackend/main.go
  • test/scripts/auth/auth-env-fatal.sh
✅ Files skipped from review due to trivial changes (3)
  • docs/content/development/setup.md
  • go.mod
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (23)
  • internal/adapter/health/types.go
  • internal/app/middleware/logging_test.go
  • internal/adapter/health/client_transport_test.go
  • internal/adapter/proxy/sherpa/service_transport_test.go
  • internal/core/domain/json_safety_test.go
  • internal/app/middleware/logging.go
  • internal/adapter/health/checker.go
  • internal/adapter/discovery/repository_test.go
  • test/cmd/mockbackend/main.go
  • internal/core/constants/auth.go
  • internal/adapter/proxy/olla/service_transport_test.go
  • internal/adapter/health/client_auth_test.go
  • pkg/envresolver/resolver_test.go
  • internal/core/domain/endpoint.go
  • internal/adapter/proxy/core/common_test.go
  • internal/adapter/proxy/core/retry.go
  • internal/adapter/discovery/auth_test.go
  • pkg/envresolver/resolver.go
  • internal/adapter/proxy/core/common.go
  • internal/app/handlers/handler_auth_routes_test.go
  • internal/adapter/health/client_ratelimit_test.go
  • internal/adapter/health/client.go
  • internal/core/domain/profile_config.go

Comment thread internal/adapter/proxy/core/common_auth_test.go
Comment thread test/scripts/auth/auth-env-fatal.sh Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Configuration bug or issue documentation Improvements or additions to documentation enhancement New feature or request work in progress We're working on it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant