feat: Authentication for local backends#146
Conversation
Health probes now carry the same per-endpoint auth and custom headers that proxy requests do. Without this every authenticated backend gets marked unhealthy on its first probe because it returns 401. injectEndpointAuth applies endpoint.Headers then AuthHeaderName/Value with the same precedence as CopyHeaders, but skips the client-request strip logic that is inappropriate for synthesised probe requests.
401 and 403 from a health probe now map to StatusConfigError rather than StatusUnhealthy. This tells the operator the backend is alive but the credentials are wrong, and avoids tripping the circuit breaker on a problem that retrying can never fix. 429 maps to StatusRateLimited using the same non-routable, no-CB logic. Both statuses appear in /internal/status/endpoints so the operator can distinguish a misconfigured endpoint from a genuinely dead one.
When a health probe receives 429, the Retry-After header is parsed (delay-seconds and HTTP-date formats per RFC 9110) and stored as RateLimitedUntil on the endpoint. The scheduler skips any endpoint whose window hasn't elapsed, preventing quota exhaustion on throttled backends. Missing or malformed Retry-After falls back to 30s with an Info log so the operator can investigate without the system going silent. 429 never trips the circuit breaker — it's a policy response, not a failure.
ExecuteWithRetry now suppresses failover for non-idempotent methods (POST, PATCH, DELETE) when the response has already been committed to the client. Retrying after bytes are flushed would double-charge metered APIs or send the client a corrupted response from two different backends. GET, HEAD and OPTIONS are unconditionally retried — they are defined as idempotent by RFC 9110. The responseStartedWriter wrapper tracks whether WriteHeader or Write has been called without allocating on the hot path.
Five end-to-end bash scripts covering bearer, api_key, basic, headers-only, and the missing-env-var fatal case. Each script spins up the Go mock backend, starts Olla with a generated config, and asserts both the happy path (correct credential → 200) and the failure path (wrong credential → non-200). The env-fatal script has no backend at all — it proves Olla exits non-zero with the endpoint name and missing variable in the error message.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughOlla gains per-endpoint outbound authentication supporting bearer tokens, API keys, and HTTP Basic credentials. Configuration includes inline values, file-based secrets with environment variable expansion, and custom headers. Health checks inject credentials and detect rate-limits via HTTP 429 responses. Proxy request/response handlers apply endpoint-aware header injection and strip sensitive headers from upstream responses. Comprehensive documentation and integration tests validate all auth flows. ChangesPer-Endpoint Authentication System
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/adapter/discovery/repository_test.go`:
- Around line 871-873: The test currently ignores the error returned by
repo.GetAll() (eps, _ := repo.GetAll(context.Background())) and then indexes
into eps which can mask repository failures; update the calls to check the
returned error from GetAll (e.g., eps, err := repo.GetAll(context.Background());
if err != nil { t.Fatalf("GetAll failed: %v", err) } or use require.NoError(t,
err)) before accessing eps and similarly fix the other occurrence around the
eps/ep usage (lines referencing eps, ep) so the test fails with a clear message
on repo errors instead of panicking.
In `@internal/adapter/health/client_auth_test.go`:
- Around line 65-83: The comment in TestHealthProbe_MissingAuth is outdated
about future refinement; update or remove the "Phase 6 will refine this to
StatusConfigError" phrasing because determineStatus in client.go already maps
HTTP 401 to domain.StatusConfigError. Edit the comment in
TestHealthProbe_MissingAuth to state the current behavior (401 is mapped to
StatusConfigError) or strip the Phase 6 reference so the test comment accurately
reflects determineStatus's mapping.
In `@internal/app/middleware/logging.go`:
- Around line 229-237: The redaction loop currently compares the raw key string
k (from strings.Cut(pair, "=")) against sensitiveQueryKeys, which allows
percent-encoded keys to bypass detection; update the logic in the loop that sets
k (and uses pair) to URL-decode the key (use url.QueryUnescape or equivalent)
before doing the sensitive-key comparison in the sensitiveQueryKeys loop and use
the decoded key only for the equality checks (keep the original pair/value
intact for reconstruction), and handle any decode errors by treating the key as
non-sensitive (i.e., fall back to the original k) so existing behavior is
preserved when decoding fails.
In `@pkg/envresolver/resolver.go`:
- Around line 32-34: The current use of os.Getenv(name) in resolver.go (the v :=
os.Getenv(name); v != "" check) conflates unset and explicitly empty environment
variables; change the lookup in the resolver function to use os.LookupEnv(name)
so you can detect existence (if exists { return v }) and thus return an
explicitly set empty string, or if the behavior was intentional add a clarifying
comment on the same check explaining why empty values should fall through to
defaults; update the code paths that reference this branch (the os.Getenv(...)
check) accordingly.
In `@test/scripts/auth/auth-env-fatal.sh`:
- Line 32: The MISSING_VAR assignment uses date +%s%N which breaks on macOS
because %N (nanoseconds) is unsupported; change the assignment of MISSING_VAR
(currently "OLLA_TEST_MISSING_TOKEN_$(date +%s%N)") to use a cross-platform
timestamp (e.g., seconds-only: $(date +%s)) or another portable uniqueness
method so the script runs on macOS and Linux for the auth-env-fatal.sh test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: be3f5d1a-c529-46df-9223-4960b0a57c4b
📒 Files selected for processing (71)
config/profiles/litellm.yamlconfig/profiles/llamacpp.yamlconfig/profiles/ollama.yamlconfig/profiles/openai-compatible.yamlconfig/profiles/vllm.yamldocs/content/configuration/endpoint-auth-remote.mddocs/content/configuration/endpoint-auth.mddocs/content/configuration/overview.mddocs/content/configuration/practices/security.mddocs/content/configuration/reference.mddocs/content/faq.mddocs/content/index.mddocs/content/integrations/backend/llamacpp.mddocs/content/integrations/backend/lmdeploy.mddocs/content/integrations/backend/lmstudio.mddocs/content/integrations/backend/ollama.mddocs/content/integrations/backend/vllm.mddocs/content/integrations/frontend/openwebui.mddocs/mkdocs.ymlinternal/adapter/discovery/auth.gointernal/adapter/discovery/auth_test.gointernal/adapter/discovery/http_client.gointernal/adapter/discovery/http_client_test.gointernal/adapter/discovery/repository.gointernal/adapter/discovery/repository_test.gointernal/adapter/health/checker.gointernal/adapter/health/client.gointernal/adapter/health/client_auth_test.gointernal/adapter/health/client_classify_test.gointernal/adapter/health/client_ratelimit_test.gointernal/adapter/health/client_transport_test.gointernal/adapter/health/types.gointernal/adapter/proxy/config/unified.gointernal/adapter/proxy/core/common.gointernal/adapter/proxy/core/common_auth_test.gointernal/adapter/proxy/core/common_test.gointernal/adapter/proxy/core/retry.gointernal/adapter/proxy/core/retry_safety_test.gointernal/adapter/proxy/olla/service.gointernal/adapter/proxy/olla/service_retry.gointernal/adapter/proxy/olla/service_transport_test.gointernal/adapter/proxy/sherpa/service.gointernal/adapter/proxy/sherpa/service_retry.gointernal/adapter/proxy/sherpa/service_transport_test.gointernal/app/handlers/handler_auth_routes_test.gointernal/app/middleware/logging.gointernal/app/middleware/logging_test.gointernal/config/types.gointernal/config/types_test.gointernal/core/constants/auth.gointernal/core/constants/auth_test.gointernal/core/domain/auth_hint_test.gointernal/core/domain/endpoint.gointernal/core/domain/endpoint_test.gointernal/core/domain/healthcheck.gointernal/core/domain/json_safety_test.gointernal/core/domain/model.gointernal/core/domain/profile_config.gointernal/logger/styled_plain.gomakefilepkg/envresolver/platform_test.gopkg/envresolver/resolver.gopkg/envresolver/resolver_test.gotest/cmd/mockbackend/main.gotest/manual/config.auth.yamltest/scripts/auth/auth-api-key.shtest/scripts/auth/auth-basic.shtest/scripts/auth/auth-bearer.shtest/scripts/auth/auth-env-fatal.shtest/scripts/auth/auth-headers-only.shtest/scripts/auth/lib.sh
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/scripts/auth/auth-env-fatal.sh`:
- Around line 111-116: The test currently treats any non-zero EXIT_CODE as a
successful "fail-fast" startup, which yields false positives when timeout (exit
code 124) occurs; update the conditional around EXIT_CODE to treat 124 as a
timeout failure instead of a valid non-zero failure: use EXIT_CODE to explicitly
detect timeout (124) and call fail with a message about the timeout, otherwise
if EXIT_CODE is a non-124 non-zero value call pass, and if EXIT_CODE is 0 call
fail that startup did not abort; reference EXIT_CODE and the pass/fail helpers
when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 13c6afdd-b511-4799-9c60-f9a82313c49c
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (30)
CLAUDE.mddocs/content/development/setup.mdgo.modinternal/adapter/discovery/auth.gointernal/adapter/discovery/auth_test.gointernal/adapter/discovery/repository_test.gointernal/adapter/health/checker.gointernal/adapter/health/client.gointernal/adapter/health/client_auth_test.gointernal/adapter/health/client_ratelimit_test.gointernal/adapter/health/client_transport_test.gointernal/adapter/health/types.gointernal/adapter/proxy/core/common.gointernal/adapter/proxy/core/common_auth_test.gointernal/adapter/proxy/core/common_test.gointernal/adapter/proxy/core/retry.gointernal/adapter/proxy/core/retry_safety_test.gointernal/adapter/proxy/olla/service_transport_test.gointernal/adapter/proxy/sherpa/service_transport_test.gointernal/app/handlers/handler_auth_routes_test.gointernal/app/middleware/logging.gointernal/app/middleware/logging_test.gointernal/core/constants/auth.gointernal/core/domain/endpoint.gointernal/core/domain/json_safety_test.gointernal/core/domain/profile_config.gopkg/envresolver/resolver.gopkg/envresolver/resolver_test.gotest/cmd/mockbackend/main.gotest/scripts/auth/auth-env-fatal.sh
✅ Files skipped from review due to trivial changes (3)
- docs/content/development/setup.md
- go.mod
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (23)
- internal/adapter/health/types.go
- internal/app/middleware/logging_test.go
- internal/adapter/health/client_transport_test.go
- internal/adapter/proxy/sherpa/service_transport_test.go
- internal/core/domain/json_safety_test.go
- internal/app/middleware/logging.go
- internal/adapter/health/checker.go
- internal/adapter/discovery/repository_test.go
- test/cmd/mockbackend/main.go
- internal/core/constants/auth.go
- internal/adapter/proxy/olla/service_transport_test.go
- internal/adapter/health/client_auth_test.go
- pkg/envresolver/resolver_test.go
- internal/core/domain/endpoint.go
- internal/adapter/proxy/core/common_test.go
- internal/adapter/proxy/core/retry.go
- internal/adapter/discovery/auth_test.go
- pkg/envresolver/resolver.go
- internal/adapter/proxy/core/common.go
- internal/app/handlers/handler_auth_routes_test.go
- internal/adapter/health/client_ratelimit_test.go
- internal/adapter/health/client.go
- internal/core/domain/profile_config.go
An attempt to address #132.
Adds per-endpoint auth for local backends. Three types: bearer, api_key, basic. Credentials come from inline strings, env vars (
${VAR}), or_filesiblings for Docker / k8s. Configurred per endpoint, not per profile. Also we're focused on local backends only.Works against vLLM / llama.cpp / LiteLLM with --api-key, or anything behind a bearer-auth reverse proxy. There's also a headers: map for custom outbound headers as an escape hatch.
The plumbing works for them but cloud-specific things (rate limits, model namespacing, path-prefix bases) aren't first-class concerns.
A few related things tagged along:
Unresolved
${VAR}is fatal at startup with a suitable error message at startup rather than later, we figured.Letting this simmer for a bit.
Summary by CodeRabbit
New Features
Documentation