Conversation
fix memory leaks fix tests after refactor missing config values retries fail for non-sticky workloads balancer tweaks for mixed mode invalid type persistence for wrapper :( lint fix missing outcome for sticky fallback with no available nodes add tests for no available nodes
hash the header sessionid to avoid length buffers use net.SplitHostPort for ipv6 add and update tests
Preserves sticky-session context keys and stickyStatsFn alongside model-alias resolver and ContextModelAliasMapKey from main. Both sets of changes coexist cleanly.
…g time into applyStickySessions() - incorrect unhealthy predicate issue
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 23 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughThis pull request adds a comprehensive sticky-session feature enabling per-conversation backend affinity for multi-turn LLM workloads to preserve KV-cache locality. The implementation includes a TTL-based in-memory session store, configurable key-source strategies, health-aware session purging, response header propagation, integration with existing health checks and discovery mechanisms, extensive test coverage, and detailed documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Proxy as Proxy Handler
participant Balancer as Sticky Wrapper
participant Cache as TTL Cache
participant Backend as Backend
participant Health as Health Checker
participant Discovery as Discovery Service
Client->>Proxy: POST /olla/proxy/chat (X-Olla-Session-ID)
Proxy->>Proxy: analyzeRequest()
Proxy->>Balancer: ComputeStickyKey(request)
Balancer->>Balancer: Hash session_header/prefix/auth/IP
Balancer->>Proxy: (key, source)
Proxy->>Proxy: Store in context
Proxy->>Balancer: Select(ctx, endpoints)
Balancer->>Cache: Lookup(sticky_key)
alt Cache Hit
Cache-->>Balancer: pinned_url
Balancer->>Balancer: Verify endpoint routable
Balancer-->>Proxy: endpoint (outcome=hit)
else Cache Miss
Balancer->>Balancer: Select via inner selector
Balancer->>Cache: Store(sticky_key, url, TTL)
Balancer-->>Proxy: endpoint (outcome=miss)
end
Proxy->>Backend: Forward request
Backend-->>Proxy: Response
Proxy->>Proxy: SetStickySessionHeaders (from context)
Proxy-->>Client: Response + sticky headers
Health->>Backend: Health check
Backend-->>Health: Unhealthy
Health->>Discovery: OnEndpointUnhealthy callback
Discovery->>Discovery: GetRoutable endpoints
Discovery->>Balancer: PurgeDeadEndpoints(routable)
Balancer->>Cache: Remove entries for dead URLs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (12)
internal/core/constants/context.go (1)
14-18: Good use of a private key type for the new sticky context keys.Using
contextKeyavoids collisions with other packages. Note the pre-existingContextRoutePrefixKey/ContextRequestIdKey/etc. are still untyped strings; consider migrating those tocontextKeyin a follow-up for consistency (out of scope for this PR).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/constants/context.go` around lines 14 - 18, The new sticky session keys correctly use the private contextKey type to avoid collisions; as a follow-up, migrate the existing untyped context keys (ContextRoutePrefixKey, ContextRequestIdKey, etc.) to the same contextKey type: change their declarations to contextKey("...") and update all uses of context.WithValue and ctx.Value to pass and compare the typed keys (e.g., ContextRoutePrefixKey, ContextRequestIdKey) so callers use the same typed key values instead of raw strings; ensure equality checks and type assertions continue to work with the new contextKey type across functions that access these keys.makefile (1)
333-336: Minor: considerpython3and a comment-style consistency.Two small nits:
- Other targets (
test-script-integrationat line 329) use a#comment prefix, whereas line 333 uses##. Harmless, but inconsistent.- Invoking
pythonassumes apythonshim is available; on many Linux distros onlypython3is installed by default. The existingtest-script-integrationtarget has the same issue, so this is pre-existing — feel free to leave as-is or normalise both at once.The checkmake
minphonywarning about a missingalltarget is a false positive for this project.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@makefile` around lines 333 - 336, The Makefile target test-script-sticky uses an inconsistent comment style and calls python without guaranteeing python3; update the comment to use a single '#' like the other target and change the invocation in the test-script-sticky target to use python3 (and optionally normalize test-script-integration the same way) so the command becomes "cd test/scripts && python3 sticky/test-sticky-sessions.py $(ARGS)"; adjust only the comment prefix and the interpreter name for targets test-script-sticky (and test-script-integration if you choose to normalize both).internal/core/domain/routing.go (1)
84-94: Consider typed constants forResult/Sourcevalues.
ResultandSourceare effectively small enums ("hit" | "miss" | "repin" | "disabled", and the five key-source strings) that are duplicated as string literals acrossadapter/balancer/sticky.go,adapter/proxy/core/common.go, and tests. Promoting them to named constants ininternal/core/constants/(or typed string aliases here) would remove stringly-typed coupling and make the contract self-documenting. Not blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/domain/routing.go` around lines 84 - 94, Introduce typed string aliases and constants for StickyOutcome fields to avoid stringly-typed usage: define a type StickyResult string and constants (e.g. StickyResultHit, StickyResultMiss, StickyResultRepin, StickyResultDisabled) and a type StickySource string with constants for the five key-source values, either in this file near the StickyOutcome struct or in a new/internal/core/constants package; update all uses of literal strings in adapter/balancer/sticky.go, adapter/proxy/core/common.go, and tests to use these new constants (and change StickyOutcome.Result and .Source to use the typed aliases) so callers and tests compile against the single canonical set of values.internal/config/config.go (1)
363-367: Consider adding env overrides for the other sticky knobs.Only
OLLA_PROXY_STICKY_SESSIONS_ENABLEDis exposed via env. For operational parity with other settings (and because sticky sessions are marked experimental), it would be useful to also overrideIdleTTLSeconds,MaxSessions, andKeySources(comma-separated, same pattern asOLLA_SERVER_TRUSTED_PROXY_CIDRSon line 334). Not blocking for this PR.Also, consistent with surrounding patterns, invalid boolean values are silently ignored here — that matches other env overrides in this function, so no change needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/config/config.go` around lines 363 - 367, Add environment overrides for the remaining sticky session settings by reading OLLA_PROXY_STICKY_SESSIONS_IDLE_TTL_SECONDS (parse with strconv.Atoi and assign to config.Proxy.StickySessions.IdleTTLSeconds if parse succeeds), OLLA_PROXY_STICKY_SESSIONS_MAX_SESSIONS (parse with strconv.Atoi and assign to config.Proxy.StickySessions.MaxSessions), and OLLA_PROXY_STICKY_SESSIONS_KEY_SOURCES (split on commas, trim whitespace and assign the resulting slice to config.Proxy.StickySessions.KeySources), following the same nil/ignore-on-parse-error pattern and the comma-split pattern used for OLLA_SERVER_TRUSTED_PROXY_CIDRS to locate the appropriate block around the existing OLLA_PROXY_STICKY_SESSIONS_ENABLED handling.internal/app/services/discovery.go (1)
104-117: Minor: shutdown-time log noise fromGetRoutableon cancelled context.During shutdown the unhealthy callback may be invoked after the context has been cancelled (e.g. if a final status transition is processed while
Stopis in progress), in which caseGetRoutable(ctx)returnscontext.Canceledand this emits a warning that isn't really actionable. Consider suppressing the log whenerrors.Is(err, context.Canceled)to keep shutdown quiet:💡 Suggested guard
routable, err := s.endpointRepo.GetRoutable(ctx) if err != nil { - s.logger.Warn("Failed to fetch routable endpoints for sticky session purge", "error", err) + if !errors.Is(err, context.Canceled) { + s.logger.Warn("Failed to fetch routable endpoints for sticky session purge", "error", err) + } return }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/app/services/discovery.go` around lines 104 - 117, The unhealthy callback currently logs a warning when s.endpointRepo.GetRoutable(ctx) returns an error, which produces noisy warnings during shutdown when the context is canceled; update the callback (the function passed to s.healthChecker.SetUnhealthyCallback that uses purgeDeadFn.Load and calls s.endpointRepo.GetRoutable) to check if errors.Is(err, context.Canceled) and simply return without logging in that case, otherwise keep the existing s.logger.Warn call for real errors before invoking (*fn)(routable).internal/config/types.go (1)
76-89: Consider adding aValidate()forStickySessionConfigfor parity with the other config types.Other config types in this file (
AnthropicTranslatorConfig,InspectorConfig) exposeValidate()and catch misconfigurations at startup. Per the snippet frominternal/adapter/balancer/sticky.go, an unknown entry inKeySourcesis silently ignored by theswitch, and invalidPrefixHashBytes/IdleTTLSecondsare clamped at runtime. A lightweight validator invoked fromConfig.Validate()would surface typos inkey_sourcesat startup rather than producing an always-miss sticky store that clients see only via/internal/stats/sticky. Something like:💡 Proposed validator
func (c *StickySessionConfig) Validate() error { if !c.Enabled { return nil } allowed := map[string]struct{}{ "session_header": {}, "prefix_hash": {}, "auth_header": {}, "ip": {}, } for _, src := range c.KeySources { if _, ok := allowed[src]; !ok { return fmt.Errorf("sticky_sessions.key_sources contains unknown source %q", src) } } if c.PrefixHashBytes < 0 { return fmt.Errorf("sticky_sessions.prefix_hash_bytes must be non-negative, got %d", c.PrefixHashBytes) } if c.IdleTTLSeconds < 0 { return fmt.Errorf("sticky_sessions.idle_ttl_seconds must be non-negative, got %d", c.IdleTTLSeconds) } return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/config/types.go` around lines 76 - 89, Add a Validate method on StickySessionConfig and wire it into the top-level Config.Validate() so misconfigurations are caught at startup; specifically, implement StickySessionConfig.Validate() to (when Enabled) verify each entry in KeySources is one of the allowed values ("session_header", "prefix_hash", "auth_header", "ip") and return an error for unknown entries, and also ensure PrefixHashBytes and IdleTTLSeconds are non-negative (returning an error if not). Call this new StickySessionConfig.Validate() from the existing Config.Validate() path so typos in key_sources or invalid numeric values are surfaced early rather than silently ignored at runtime.internal/app/handlers/handler_stats_sticky.go (1)
10-40: Optional: fold the two disabled-path branches into one helper to avoid drift.Both
a.stickyStatsFn == nilandstats == nilencode the exact same payload; inlining them twice makes it easy for the two paths to diverge on future edits (e.g. adding a version field).♻️ Proposed tidy-up
-func (a *Application) stickyStatsHandler(w http.ResponseWriter, r *http.Request) { +func (a *Application) stickyStatsHandler(w http.ResponseWriter, r *http.Request) { w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON) w.WriteHeader(http.StatusOK) - if a.stickyStatsFn == nil { - // Sticky sessions are disabled — return a stable JSON shape so callers - // can branch on the "enabled" field rather than on status codes. - if err := json.NewEncoder(w).Encode(struct { - Enabled bool `json:"enabled"` - }{Enabled: false}); err != nil { - a.logger.Error("Failed to encode sticky stats response", "error", err) - } - return - } - - stats := a.stickyStatsFn() - if stats == nil { - // stickyStatsFn is wired but the wrapper was not created because sticky - // sessions are disabled in config — return the same stable shape as the - // nil-function path so callers always branch on "enabled", not status codes. - if err := json.NewEncoder(w).Encode(struct { - Enabled bool `json:"enabled"` - }{Enabled: false}); err != nil { - a.logger.Error("Failed to encode sticky stats response", "error", err) - } - return - } - if err := json.NewEncoder(w).Encode(stats); err != nil { - a.logger.Error("Failed to encode sticky stats response", "error", err) - } + var payload any = struct { + Enabled bool `json:"enabled"` + }{Enabled: false} + if a.stickyStatsFn != nil { + if stats := a.stickyStatsFn(); stats != nil { + payload = stats + } + } + if err := json.NewEncoder(w).Encode(payload); err != nil { + a.logger.Error("Failed to encode sticky stats response", "error", err) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/app/handlers/handler_stats_sticky.go` around lines 10 - 40, The two disabled branches in stickyStatsHandler (when a.stickyStatsFn == nil and when stats == nil) duplicate the same JSON encoding logic; extract that payload+encoding into a small helper (e.g., a method like writeStickyDisabled(w http.ResponseWriter) or a package-level helper used by stickyStatsHandler) and replace both branches to call it, preserving the Content-Type header and the existing error logging via a.logger.Error on json.NewEncoder(w).Encode failures so both paths remain identical and cannot drift.internal/app/handlers/application.go (1)
88-90: Minor: comment is slightly misleading.The stated rationale ("so the handler layer does not need to import the balancer package") no longer holds — this file now imports
internal/adapter/balanceron line 9 specifically to reference*balancer.StickyStats. The closure still usefully decouplesApplicationfromProxyServiceWrapper, but the comment overstates the decoupling. Consider rewording to reflect the actual benefit (decoupling from the proxy service, not from the balancer package).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/app/handlers/application.go` around lines 88 - 90, The comment for the stickyStatsFn field is misleading about package-level decoupling; update the comment on the Application struct's stickyStatsFn to say the closure decouples Application from the proxy service implementation (e.g., ProxyServiceWrapper) rather than claiming it avoids importing the balancer package, and mention it returns *balancer.StickyStats to clarify the actual dependency on balancer.StickyStats.internal/adapter/health/checker.go (1)
290-298: Consider logging when the unhealthy callback fires.Unlike the recovery callback branch (lines 254-283) which logs the transition and callback outcome, the unhealthy branch fires silently. For an operator debugging why sticky sessions were unexpectedly purged, a single
Info/Debuglog — e.g."Endpoint became non-routable, triggering sticky session purge" endpoint=... was=... now=...— would be valuable. Also note the timeout constantDefaultRecoveryCallbackTimeoutis now used for both callback paths; a neutral name (e.g.DefaultCallbackTimeout) would better reflect its shared use.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/health/checker.go` around lines 290 - 298, The unhealthy-callback path currently fires silently; update the block that checks statusChanged && !newStatus.IsRoutable() && oldStatus.IsRoutable() to log the transition before launching the goroutine (mirror the recovery branch style) including endpoint identifier and old/new routable state, and ensure the goroutine logs the callback outcome/errors from c.unhealthyCallback.OnEndpointUnhealthy; also rename the shared DefaultRecoveryCallbackTimeout constant to a neutral name like DefaultCallbackTimeout and update usages (including in the OnEndpointUnhealthy invocation) so the timeout name reflects both callback paths.internal/app/services/proxy.go (1)
276-278: Silent fallback whendiscoverySvcis nil.If
s.discoverySvcis unset whenapplyStickySessionsruns, the purge-on-unhealthy hook is silently never registered, and there is no second chance: session entries for dead backends will only be reclaimed via sliding TTL or LRU pressure. In normal wiring this branch is unreachable, but alogger.Warnhere would surface misconfiguration early (e.g. during tests or custom DI wiring) rather than producing confusing purge behaviour at runtime.🔧 Proposed fix
if s.discoverySvc != nil { s.discoverySvc.SetPurgeDeadEndpointsFn(s.PurgeDeadEndpoints) + } else { + s.logger.Warn("Sticky sessions enabled but discovery service is not wired; " + + "dead-backend purge will rely on TTL only") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/app/services/proxy.go` around lines 276 - 278, applyStickySessions currently skips registering the purge hook when s.discoverySvc is nil without any indication; add a warning log to surface misconfiguration by emitting a logger warning when s.discoverySvc == nil in the same branch. Locate the block that checks s.discoverySvc in applyStickySessions (the code that calls s.discoverySvc.SetPurgeDeadEndpointsFn(s.PurgeDeadEndpoints)) and add a call to the service logger (e.g., s.logger.Warn or the module's logger) explaining that discoverySvc is unset and purge-on-unhealthy will not be registered so tests/DI issues are visible early.internal/adapter/proxy/core/common_test.go (1)
747-752: Nit: test name overstates what the assertion guarantees.
expectSession: ""combined withw.Header().Get(...)can't distinguish between "header not written" and "header written as empty string" — both satisfy the assertion. If the intent is to verify the header is genuinely skipped (not written with an empty value), use_, ok := w.Header()[constants.HeaderXOllaStickySession]and assert!okinstead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/proxy/core/common_test.go` around lines 747 - 752, The test "empty_result_skips_sticky_session_header" currently uses expectSession: "" and reads w.Header().Get(...), which cannot distinguish "not set" vs "set to empty"; modify the assertion to check header presence directly by using the header map lookup (e.g. _, ok := w.Header()[constants.HeaderXOllaStickySession]) and assert !ok instead of relying on expectSession, and update the test case (and any expectSession usage) to reflect that the header should be absent when outcome is &domain.StickyOutcome{Result: "", Source: "ip"}.internal/adapter/balancer/sticky.go (1)
233-247: Consider hashing the IP host for consistency and privacy hygiene.Every other key source hashes its input (
session_header,prefix_hash,auth_header), butipstores the plaintext remote host in the affinity key — e.g.ip:192.168.1.42:llama3. That’s fine for correctness, but it leaks an IP (PII under GDPR/CCPA) into in‑memory cache keys and anywhere they may later be logged/metricized, and it’s slightly inconsistent with the rest of the cascade. Anfnv.New64apass would match the other sources and bound key length.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/balancer/sticky.go` around lines 233 - 247, The stickyKeyFromIP function currently returns plaintext host in the affinity key; change it to hash the host (e.g., using fnv.New64a) before composing the returned key so the function stickyKeyFromIP produces a bounded-length, hashed key like other sources (session_header, prefix_hash, auth_header) to avoid leaking raw IPs and keep consistency—compute the 64-bit FNV-1a of host and use its hex/decimal representation in place of the raw host when returning "ip:<hash>:<modelName>" while leaving the secondary return value "ip" unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Around line 164-166: Update the CLAUDE.md header documentation to include the
missing `none` key source (or explicitly note that the header is omitted when
the source is `none`) to match the possible values defined by
HeaderXOllaStickyKeySource in internal/core/constants/content.go and the
behavior of SetStickySessionHeaders; ensure the line listing key sources
includes `none` (or an explicit omission note) so the documentation and
constants/SetStickySessionHeaders behavior are consistent.
In `@internal/app/handlers/handler_proxy.go`:
- Around line 145-157: The read/restore logic can drop consumed bytes when
io.ReadAll returns partial data with an error; change the block that calls
io.ReadAll(io.LimitReader(r.Body, int64(limit)+1)) so that regardless of readErr
or snap length you always restore r.Body by wrapping the consumed snapshot with
io.NopCloser(io.MultiReader(bytes.NewReader(snap), r.Body)); set bodySnap = snap
(can be empty) and then handle/log readErr separately instead of skipping the
re-wrap; keep the same limit computation (cfg.PrefixHashBytes fallback to 512)
and continue to use the same symbols (bodySnap, r.Body, cfg.PrefixHashBytes,
io.ReadAll, io.LimitReader, io.NopCloser, io.MultiReader, bytes.NewReader) so
the proxy engine always sees a complete stream.
- Around line 187-203: The helper setStickyResponseHeaders duplicates logic in
core.SetStickySessionHeaders but unconditionally writes HeaderXOllaStickySession
causing empty header values; remove setStickyResponseHeaders (and
setStickyResponseHeadersFromRequest) and delegate to
core.SetStickySessionHeaders (which reads the outcome from the request context
and already guards outcome.Result != "") by importing the core package and
calling core.SetStickySessionHeaders(w, r) where these helpers were used; if you
cannot import core, change setStickyResponseHeaders to mirror core exactly by
only calling w.Header().Set(constants.HeaderXOllaStickySession, outcome.Result)
when outcome.Result != "" and keep the same checks for outcome.Source and
session header echoing to match core.SetStickySessionHeaders.
In `@test/scripts/integration/test-integration.py`:
- Around line 1437-1475: The test builds model_ids from self.models without
checking backend health, causing flakes; update the model selection to mirror
the discovery filtering used for passthrough_candidates/translation_candidates
by only including models from self.models whose metadata (olla.availability or
similar) contains at least one healthy/routable endpoint, then derive model_ids
from that filtered list (replace the current list comprehension that creates
model_ids). Use the same availability-check logic/keys as the other candidate
filters so model_a/model_b are guaranteed routable.
- Around line 1334-1347: The test's allow-list uses the wrong key source token;
update the valid_sources set in test-integration.py (where valid_sources =
{"session_header", "prefix_hash", "auth_hash", "ip", "none"}) to match the
actual value returned by ComputeStickyKey and documented in
constants.HeaderXOllaStickyKeySource by replacing "auth_hash" with "auth_header"
so the test recognizes the auth header source.
In `@test/scripts/sticky/README.md`:
- Line 73: The fenced code block in test/scripts/sticky/README.md for the
"expected-output" section is missing a language tag (MD040); update the opening
backticks from ``` to include a language such as text or plaintext (e.g., change
``` to ```text) so the block becomes fenced with a language identifier and
satisfies markdownlint.
- Line 12: Update the README entries that list key source values to use the
canonical value "auth_header" instead of the incorrect "auth_hash" so they match
the implementation in internal/adapter/balancer/sticky.go (which uses
auth_header) and the configuration examples; replace occurrences at the two
spots where the table and examples list key sources (the row containing
`X-Olla-Sticky-Key-Source` and the later example at line ~127) to prevent users
from specifying an unrecognized `key_sources` value.
In `@test/scripts/sticky/test-sticky-sessions.py`:
- Around line 46-48: The test currently treats "auth_hash" as a valid sticky key
source but the balancer and constants use "auth_header"; update the
VALID_KEY_SOURCES set to include "auth_header" instead of "auth_hash" and update
any guards/messages that reference "auth_hash" (notably the check and
user-facing string in _test_prefix_hash_hit) so they expect and report
"auth_header" to match internal/adapter/balancer/sticky.go and
constants.HeaderXOllaStickyKeySource.
---
Nitpick comments:
In `@internal/adapter/balancer/sticky.go`:
- Around line 233-247: The stickyKeyFromIP function currently returns plaintext
host in the affinity key; change it to hash the host (e.g., using fnv.New64a)
before composing the returned key so the function stickyKeyFromIP produces a
bounded-length, hashed key like other sources (session_header, prefix_hash,
auth_header) to avoid leaking raw IPs and keep consistency—compute the 64-bit
FNV-1a of host and use its hex/decimal representation in place of the raw host
when returning "ip:<hash>:<modelName>" while leaving the secondary return value
"ip" unchanged.
In `@internal/adapter/health/checker.go`:
- Around line 290-298: The unhealthy-callback path currently fires silently;
update the block that checks statusChanged && !newStatus.IsRoutable() &&
oldStatus.IsRoutable() to log the transition before launching the goroutine
(mirror the recovery branch style) including endpoint identifier and old/new
routable state, and ensure the goroutine logs the callback outcome/errors from
c.unhealthyCallback.OnEndpointUnhealthy; also rename the shared
DefaultRecoveryCallbackTimeout constant to a neutral name like
DefaultCallbackTimeout and update usages (including in the OnEndpointUnhealthy
invocation) so the timeout name reflects both callback paths.
In `@internal/adapter/proxy/core/common_test.go`:
- Around line 747-752: The test "empty_result_skips_sticky_session_header"
currently uses expectSession: "" and reads w.Header().Get(...), which cannot
distinguish "not set" vs "set to empty"; modify the assertion to check header
presence directly by using the header map lookup (e.g. _, ok :=
w.Header()[constants.HeaderXOllaStickySession]) and assert !ok instead of
relying on expectSession, and update the test case (and any expectSession usage)
to reflect that the header should be absent when outcome is
&domain.StickyOutcome{Result: "", Source: "ip"}.
In `@internal/app/handlers/application.go`:
- Around line 88-90: The comment for the stickyStatsFn field is misleading about
package-level decoupling; update the comment on the Application struct's
stickyStatsFn to say the closure decouples Application from the proxy service
implementation (e.g., ProxyServiceWrapper) rather than claiming it avoids
importing the balancer package, and mention it returns *balancer.StickyStats to
clarify the actual dependency on balancer.StickyStats.
In `@internal/app/handlers/handler_stats_sticky.go`:
- Around line 10-40: The two disabled branches in stickyStatsHandler (when
a.stickyStatsFn == nil and when stats == nil) duplicate the same JSON encoding
logic; extract that payload+encoding into a small helper (e.g., a method like
writeStickyDisabled(w http.ResponseWriter) or a package-level helper used by
stickyStatsHandler) and replace both branches to call it, preserving the
Content-Type header and the existing error logging via a.logger.Error on
json.NewEncoder(w).Encode failures so both paths remain identical and cannot
drift.
In `@internal/app/services/discovery.go`:
- Around line 104-117: The unhealthy callback currently logs a warning when
s.endpointRepo.GetRoutable(ctx) returns an error, which produces noisy warnings
during shutdown when the context is canceled; update the callback (the function
passed to s.healthChecker.SetUnhealthyCallback that uses purgeDeadFn.Load and
calls s.endpointRepo.GetRoutable) to check if errors.Is(err, context.Canceled)
and simply return without logging in that case, otherwise keep the existing
s.logger.Warn call for real errors before invoking (*fn)(routable).
In `@internal/app/services/proxy.go`:
- Around line 276-278: applyStickySessions currently skips registering the purge
hook when s.discoverySvc is nil without any indication; add a warning log to
surface misconfiguration by emitting a logger warning when s.discoverySvc == nil
in the same branch. Locate the block that checks s.discoverySvc in
applyStickySessions (the code that calls
s.discoverySvc.SetPurgeDeadEndpointsFn(s.PurgeDeadEndpoints)) and add a call to
the service logger (e.g., s.logger.Warn or the module's logger) explaining that
discoverySvc is unset and purge-on-unhealthy will not be registered so tests/DI
issues are visible early.
In `@internal/config/config.go`:
- Around line 363-367: Add environment overrides for the remaining sticky
session settings by reading OLLA_PROXY_STICKY_SESSIONS_IDLE_TTL_SECONDS (parse
with strconv.Atoi and assign to config.Proxy.StickySessions.IdleTTLSeconds if
parse succeeds), OLLA_PROXY_STICKY_SESSIONS_MAX_SESSIONS (parse with
strconv.Atoi and assign to config.Proxy.StickySessions.MaxSessions), and
OLLA_PROXY_STICKY_SESSIONS_KEY_SOURCES (split on commas, trim whitespace and
assign the resulting slice to config.Proxy.StickySessions.KeySources), following
the same nil/ignore-on-parse-error pattern and the comma-split pattern used for
OLLA_SERVER_TRUSTED_PROXY_CIDRS to locate the appropriate block around the
existing OLLA_PROXY_STICKY_SESSIONS_ENABLED handling.
In `@internal/config/types.go`:
- Around line 76-89: Add a Validate method on StickySessionConfig and wire it
into the top-level Config.Validate() so misconfigurations are caught at startup;
specifically, implement StickySessionConfig.Validate() to (when Enabled) verify
each entry in KeySources is one of the allowed values ("session_header",
"prefix_hash", "auth_header", "ip") and return an error for unknown entries, and
also ensure PrefixHashBytes and IdleTTLSeconds are non-negative (returning an
error if not). Call this new StickySessionConfig.Validate() from the existing
Config.Validate() path so typos in key_sources or invalid numeric values are
surfaced early rather than silently ignored at runtime.
In `@internal/core/constants/context.go`:
- Around line 14-18: The new sticky session keys correctly use the private
contextKey type to avoid collisions; as a follow-up, migrate the existing
untyped context keys (ContextRoutePrefixKey, ContextRequestIdKey, etc.) to the
same contextKey type: change their declarations to contextKey("...") and update
all uses of context.WithValue and ctx.Value to pass and compare the typed keys
(e.g., ContextRoutePrefixKey, ContextRequestIdKey) so callers use the same typed
key values instead of raw strings; ensure equality checks and type assertions
continue to work with the new contextKey type across functions that access these
keys.
In `@internal/core/domain/routing.go`:
- Around line 84-94: Introduce typed string aliases and constants for
StickyOutcome fields to avoid stringly-typed usage: define a type StickyResult
string and constants (e.g. StickyResultHit, StickyResultMiss, StickyResultRepin,
StickyResultDisabled) and a type StickySource string with constants for the five
key-source values, either in this file near the StickyOutcome struct or in a
new/internal/core/constants package; update all uses of literal strings in
adapter/balancer/sticky.go, adapter/proxy/core/common.go, and tests to use these
new constants (and change StickyOutcome.Result and .Source to use the typed
aliases) so callers and tests compile against the single canonical set of
values.
In `@makefile`:
- Around line 333-336: The Makefile target test-script-sticky uses an
inconsistent comment style and calls python without guaranteeing python3; update
the comment to use a single '#' like the other target and change the invocation
in the test-script-sticky target to use python3 (and optionally normalize
test-script-integration the same way) so the command becomes "cd test/scripts &&
python3 sticky/test-sticky-sessions.py $(ARGS)"; adjust only the comment prefix
and the interpreter name for targets test-script-sticky (and
test-script-integration if you choose to normalize both).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ff8dede0-d813-495d-9d3d-ce518fcc1a4c
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (36)
CLAUDE.mdconfig/config.yamldocs/content/concepts/load-balancing.mddocs/content/concepts/sticky-sessions.mddocs/content/configuration/reference.mddocs/mkdocs.ymlgo.modinternal/adapter/balancer/sticky.gointernal/adapter/balancer/sticky_test.gointernal/adapter/health/checker.gointernal/adapter/health/checker_test.gointernal/adapter/health/recovery_callback.gointernal/adapter/proxy/core/common.gointernal/adapter/proxy/core/common_test.gointernal/adapter/proxy/olla/service.gointernal/adapter/proxy/olla/service_retry.gointernal/adapter/proxy/sherpa/service_retry.gointernal/app/app.gointernal/app/handlers/application.gointernal/app/handlers/handler_proxy.gointernal/app/handlers/handler_stats_sticky.gointernal/app/handlers/handler_translation.gointernal/app/handlers/server_routes.gointernal/app/services/discovery.gointernal/app/services/http.gointernal/app/services/proxy.gointernal/config/config.gointernal/config/types.gointernal/core/constants/content.gointernal/core/constants/context.gointernal/core/domain/routing.gomakefilereadme.mdtest/scripts/integration/test-integration.pytest/scripts/sticky/README.mdtest/scripts/sticky/test-sticky-sessions.py
| // setStickyResponseHeaders writes sticky session outcome headers so clients can observe | ||
| // affinity routing decisions. When the client provided an explicit session ID header, | ||
| // we echo it back so stateless clients can track their own session. | ||
| func (a *Application) setStickyResponseHeaders(w http.ResponseWriter, r *http.Request, outcome *balancer.StickyOutcome) { | ||
| if outcome == nil { | ||
| return | ||
| } | ||
| w.Header().Set(constants.HeaderXOllaStickySession, outcome.Result) | ||
| if outcome.Source != "" && outcome.Source != "none" { | ||
| w.Header().Set(constants.HeaderXOllaStickyKeySource, outcome.Source) | ||
| } | ||
| if outcome.Source == "session_header" { | ||
| if sid := r.Header.Get(constants.HeaderXOllaSessionID); sid != "" { | ||
| w.Header().Set(constants.HeaderXOllaSessionID, sid) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find callers to understand migration impact.
rg -nP '\bsetStickyResponseHeaders(FromRequest)?\s*\(' --type=go -C1Repository: thushan/olla
Length of output: 1929
🏁 Script executed:
rg -nA 15 'func.*SetStickySessionHeaders' --type=goRepository: thushan/olla
Length of output: 2619
🏁 Script executed:
rg -n 'type StickyOutcome' --type=goRepository: thushan/olla
Length of output: 199
🏁 Script executed:
rg -n 'HeaderXOllaStickySession' --type=go -C2Repository: thushan/olla
Length of output: 1928
🏁 Script executed:
rg -nA 20 'func TestSetStickySessionHeaders' --type=goRepository: thushan/olla
Length of output: 1780
🏁 Script executed:
rg -nA 5 'expectSession:.*""' internal/adapter/proxy/core/common_test.go | head -40Repository: thushan/olla
Length of output: 380
🏁 Script executed:
sed -n '740,760p' internal/adapter/proxy/core/common_test.goRepository: thushan/olla
Length of output: 671
🏁 Script executed:
rg -n 'setStickyResponseHeaders' internal/app/handlers/handler_proxy_test.goRepository: thushan/olla
Length of output: 139
🏁 Script executed:
find internal/app/handlers -name '*test.go' -type fRepository: thushan/olla
Length of output: 951
🏁 Script executed:
rg -n 'setStickyResponseHeaders' internal/app/handlers --type=goRepository: thushan/olla
Length of output: 1088
🏁 Script executed:
rg -n 'HeaderXOllaStickySession\|setStickyResponseHeaders' internal/app/handlers/handler_translation_test.goRepository: thushan/olla
Length of output: 38
🏁 Script executed:
rg -n 'X-Olla-Sticky' internal/app/handlers --type=go -C2Repository: thushan/olla
Length of output: 38
🏁 Script executed:
rg -n 'X-Olla-Sticky|HeaderXOllaStickySession' --type=go | head -20Repository: thushan/olla
Length of output: 928
🏁 Script executed:
head -30 internal/app/handlers/handler_proxy.go | grep -E '^import|^package'Repository: thushan/olla
Length of output: 81
🏁 Script executed:
sed -n '1,30p' internal/app/handlers/handler_proxy.goRepository: thushan/olla
Length of output: 772
🏁 Script executed:
rg -n 'internal/adapter/proxy/core' internal/app/handlers/handler_proxy.goRepository: thushan/olla
Length of output: 38
🏁 Script executed:
rg -n 'import.*core' internal/app/handlers/*.go | grep -v testRepository: thushan/olla
Length of output: 38
🏁 Script executed:
grep -n 'internal/adapter' internal/app/handlers/handler_proxy.goRepository: thushan/olla
Length of output: 111
Remove duplicate sticky-header helper and consolidate with core.SetStickySessionHeaders.
This helper duplicates internal/adapter/proxy/core/common.go:SetStickySessionHeaders but differs in a critical way: it calls w.Header().Set(HeaderXOllaStickySession, outcome.Result) unconditionally, so an outcome with an empty Result (sticky enabled but no key extracted this turn) produces a bare X-Olla-Sticky-Session: header. The core variant guards with if outcome.Result != "" before setting it, which is the tested and expected behaviour. Two code paths writing the same header with different rules invites test flakiness.
Replace both setStickyResponseHeaders and setStickyResponseHeadersFromRequest by delegating to core.SetStickySessionHeaders directly (it extracts the outcome from the request context internally, which this wrapper already does). If importing the core package is not preferred, align the guard conditions exactly with the core variant.
♻️ Suggested consolidation
-// setStickyResponseHeadersFromRequest reads the StickyOutcome from the request context
-// and writes the sticky session headers. Used by sub-handlers that have *http.Request
-// but not the outcome pointer directly. Must be called before w.WriteHeader().
-func (a *Application) setStickyResponseHeadersFromRequest(w http.ResponseWriter, r *http.Request) {
- outcome, _ := r.Context().Value(constants.ContextStickyOutcomeKey).(*balancer.StickyOutcome)
- a.setStickyResponseHeaders(w, r, outcome)
-}
-
-// setStickyResponseHeaders writes sticky session outcome headers so clients can observe
-// affinity routing decisions. When the client provided an explicit session ID header,
-// we echo it back so stateless clients can track their own session.
-func (a *Application) setStickyResponseHeaders(w http.ResponseWriter, r *http.Request, outcome *balancer.StickyOutcome) {
- if outcome == nil {
- return
- }
- w.Header().Set(constants.HeaderXOllaStickySession, outcome.Result)
- if outcome.Source != "" && outcome.Source != "none" {
- w.Header().Set(constants.HeaderXOllaStickyKeySource, outcome.Source)
- }
- if outcome.Source == "session_header" {
- if sid := r.Header.Get(constants.HeaderXOllaSessionID); sid != "" {
- w.Header().Set(constants.HeaderXOllaSessionID, sid)
- }
- }
-}
+// setStickyResponseHeadersFromRequest delegates to the shared core helper so
+// there is a single source of truth for sticky response-header semantics.
+func (a *Application) setStickyResponseHeadersFromRequest(w http.ResponseWriter, r *http.Request) {
+ core.SetStickySessionHeaders(w, r)
+}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/app/handlers/handler_proxy.go` around lines 187 - 203, The helper
setStickyResponseHeaders duplicates logic in core.SetStickySessionHeaders but
unconditionally writes HeaderXOllaStickySession causing empty header values;
remove setStickyResponseHeaders (and setStickyResponseHeadersFromRequest) and
delegate to core.SetStickySessionHeaders (which reads the outcome from the
request context and already guards outcome.Result != "") by importing the core
package and calling core.SetStickySessionHeaders(w, r) where these helpers were
used; if you cannot import core, change setStickyResponseHeaders to mirror core
exactly by only calling w.Header().Set(constants.HeaderXOllaStickySession,
outcome.Result) when outcome.Result != "" and keep the same checks for
outcome.Source and session header echoing to match core.SetStickySessionHeaders.
| # -- Test 7: model scoping — same session ID, different models pin separately | ||
| model_ids = [m.get("id", m.get("name", "")) for m in self.models | ||
| if m.get("id", m.get("name", "")) and "embed" not in m.get("id", "").lower()] | ||
| if len(model_ids) >= 2: | ||
| model_a = model_ids[0] | ||
| model_b = model_ids[1] | ||
| scope_sid = f"integration-scope-{int(time.time())}" | ||
| scope_headers = {"X-Olla-Session-ID": scope_sid} | ||
|
|
||
| # Establish a pin for model_a | ||
| self._post("/olla/proxy/v1/chat/completions", | ||
| {"model": model_a, "messages": [{"role": "user", "content": "hi"}], "max_tokens": 1}, | ||
| headers=scope_headers) | ||
| # model_b with the same session ID must get its own independent pin (miss) | ||
| rb_first = self._post("/olla/proxy/v1/chat/completions", | ||
| {"model": model_b, "messages": [{"role": "user", "content": "hi"}], "max_tokens": 1}, | ||
| headers=scope_headers) | ||
| # model_a follow-up must still hit | ||
| ra_second = self._post("/olla/proxy/v1/chat/completions", | ||
| {"model": model_a, "messages": [{"role": "user", "content": "hi"}], "max_tokens": 1}, | ||
| headers=scope_headers) | ||
|
|
||
| ok = False | ||
| detail = "" | ||
| if (rb_first is not None and ra_second is not None | ||
| and rb_first.status_code == 200 and ra_second.status_code == 200): | ||
| rb_sticky = rb_first.headers.get("X-Olla-Sticky-Session", "") | ||
| ra_sticky = ra_second.headers.get("X-Olla-Sticky-Session", "") | ||
| # model_b first request should be a miss (new key because model differs) | ||
| # model_a second request should be a hit (existing pin) | ||
| ok = rb_sticky == "miss" and ra_sticky == "hit" | ||
| detail = f"model_b first={rb_sticky} (want miss), model_a second={ra_sticky} (want hit)" | ||
| else: | ||
| detail = self._error_detail(rb_first or ra_second) | ||
|
|
||
| self._print_result("Model-scoped keys: same session ID, different models pin separately", ok, detail) | ||
| self.record(f"{phase}/model-scoping", ok, detail, phase) | ||
| else: | ||
| self.pcolor(GREY, f" [SKIP] Model scoping test skipped — only {len(model_ids)} model(s) available") |
There was a problem hiding this comment.
Model‑scoping test may flake on heterogeneous fleets.
model_ids is built from self.models without filtering by healthy availability, so model_ids[0]/model_ids[1] may not be routable at all (e.g. live only on an unhealthy backend). Consider restricting the pool to models whose olla.availability contains a healthy endpoint — similar to the discovery logic already used for passthrough_candidates/translation_candidates earlier in the file — so the test doesn't spuriously fail with 5xx.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/scripts/integration/test-integration.py` around lines 1437 - 1475, The
test builds model_ids from self.models without checking backend health, causing
flakes; update the model selection to mirror the discovery filtering used for
passthrough_candidates/translation_candidates by only including models from
self.models whose metadata (olla.availability or similar) contains at least one
healthy/routable endpoint, then derive model_ids from that filtered list
(replace the current list comprehension that creates model_ids). Use the same
availability-check logic/keys as the other candidate filters so model_a/model_b
are guaranteed routable.
|
|
||
| ## Expected Output | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced code block (MD040).
markdownlint flags line 73 — the expected-output block has no language specified. Use text (or plaintext).
📝 Proposed fix
-```
+```text
========================================================================
Olla Sticky Session Test🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 73-73: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/scripts/sticky/README.md` at line 73, The fenced code block in
test/scripts/sticky/README.md for the "expected-output" section is missing a
language tag (MD040); update the opening backticks from ``` to include a
language such as text or plaintext (e.g., change ``` to ```text) so the block
becomes fenced with a language identifier and satisfies markdownlint.
Adds optional (still somewhat experimental) sticky session support so multi-turn conversations land on the same backend across turns. Big win on long prompts since the backend keeps its KV-cache warm instead of re-ingesting context every turn.
Good stuff:
/internal/stats/stickyfor observability. Stable {"enabled":false} shape when off.Config (disabled by default):
Summary by CodeRabbit
New Features
/internal/stats/sticky) for monitoring session metricsDocumentation
Tests