Skip to content

feat: Sticky Sessions#133

Merged
thushan merged 9 commits intomainfrom
feature/sticky-sessions
Apr 16, 2026
Merged

feat: Sticky Sessions#133
thushan merged 9 commits intomainfrom
feature/sticky-sessions

Conversation

@thushan
Copy link
Copy Markdown
Owner

@thushan thushan commented Apr 16, 2026

Adds optional (still somewhat experimental) sticky session support so multi-turn conversations land on the same backend across turns. Big win on long prompts since the backend keeps its KV-cache warm instead of re-ingesting context every turn.

Good stuff:

  • New StickySessionWrapper that decorates any existing load balancer. Priority / round-robin / least-connections all work unchanged when sticky is off.
  • Key derivation with a configurable cascade: X-Olla-Session-ID header, message-prefix hash, auth header, or client IP. First match wins.
  • ttlcache-backed store with sliding TTL and LRU cap. Bounded memory.
  • Dead-endpoint purge wired into the health checker. Only fires on routable to non-routable transitions, so Healthy -> Busy does not evict live sessions.
  • Response headers: X-Olla-Sticky-Session (hit/miss/repin/disabled), X-Olla-Sticky-Key-Source, and X-Olla-Session-ID echoed when the client sends one.
  • GET /internal/stats/sticky for observability. Stable {"enabled":false} shape when off.

Config (disabled by default):

  proxy:
    sticky_sessions:
      enabled: true
      idle_ttl_seconds: 600
      max_sessions: 10000
      key_sources: [session_header, prefix_hash, auth_header]

Summary by CodeRabbit

  • New Features

    • Added sticky sessions feature to maintain backend affinity for multi-turn conversations, preserving KV cache across requests
    • Configuration options for session TTL, capacity limits, and key derivation strategies
    • New response headers indicate session affinity results (hit, miss, repin)
    • Internal stats endpoint (/internal/stats/sticky) for monitoring session metrics
  • Documentation

    • Comprehensive sticky sessions concept guide with configuration reference
    • Load balancing guidance recommending sticky sessions for multi-turn workloads
  • Tests

    • New test suite validating session pinning, header behaviour, and multi-backend scenarios

fix memory leaks

fix tests after refactor

missing config values

retries fail for non-sticky workloads

balancer tweaks for mixed mode

invalid type persistence for wrapper :(

lint

fix missing outcome for sticky fallback with no available nodes

add tests for no available nodes
hash the header sessionid to avoid length buffers

use net.SplitHostPort for ipv6

add and update tests
Preserves sticky-session context keys and stickyStatsFn alongside
model-alias resolver and ContextModelAliasMapKey from main. Both sets
of changes coexist cleanly.
…g time into applyStickySessions()

- incorrect unhealthy predicate issue
@thushan thushan self-assigned this Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Warning

Rate limit exceeded

@thushan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 23 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 23 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eabf374f-901f-467c-8673-586981df2a21

📥 Commits

Reviewing files that changed from the base of the PR and between 4bbe95b and ed1609a.

📒 Files selected for processing (5)
  • CLAUDE.md
  • internal/app/handlers/handler_proxy.go
  • test/scripts/integration/test-integration.py
  • test/scripts/sticky/README.md
  • test/scripts/sticky/test-sticky-sessions.py

Walkthrough

This pull request adds a comprehensive sticky-session feature enabling per-conversation backend affinity for multi-turn LLM workloads to preserve KV-cache locality. The implementation includes a TTL-based in-memory session store, configurable key-source strategies, health-aware session purging, response header propagation, integration with existing health checks and discovery mechanisms, extensive test coverage, and detailed documentation.

Changes

Cohort / File(s) Summary
Documentation
CLAUDE.md, readme.md, docs/content/concepts/sticky-sessions.md, docs/content/concepts/load-balancing.md, docs/content/configuration/reference.md, docs/mkdocs.yml, test/scripts/sticky/README.md
Added comprehensive documentation of sticky-sessions feature including usage guidance, configuration schema, response headers, environment variables, stats endpoint, and operational notes; updated navigation and feature lists.
Configuration
config/config.yaml, internal/config/types.go, internal/config/config.go
Introduced proxy.sticky_sessions configuration block with fields for enabled state, TTL, LRU capacity, key sources, and hash input size; added environment variable override support for OLLA_PROXY_STICKY_SESSIONS_ENABLED.
Domain & Constants
internal/core/domain/routing.go, internal/core/constants/content.go, internal/core/constants/context.go
Added StickyOutcome type with result/source fields, new header constants (X-Olla-Sticky-Session, X-Olla-Sticky-Key-Source, X-Olla-Session-ID), and context key constants for plumbing sticky state through request lifecycle.
Balancer Core
internal/adapter/balancer/sticky.go, internal/adapter/balancer/sticky_test.go
Implemented StickySessionWrapper adapting an EndpointSelector with TTL-based session caching, sticky-key computation across multiple sources (session header, prefix hash, auth header, IP), cache lifecycle management, and comprehensive test suite covering selection outcomes, TTL expiry, model scoping, and concurrent access.
Health Checker
internal/adapter/health/recovery_callback.go, internal/adapter/health/checker.go, internal/adapter/health/checker_test.go
Added UnhealthyCallback interface and implementations to detect endpoint transitions from routable to non-routable states, enabling downstream purging of dead sessions; new test validates callback triggers only on specific state transitions.
Proxy Response Headers
internal/adapter/proxy/core/common.go, internal/adapter/proxy/core/common_test.go, internal/adapter/proxy/olla/service.go, internal/adapter/proxy/olla/service_retry.go, internal/adapter/proxy/sherpa/service_retry.go
Added SetStickySessionHeaders function to write sticky-session response headers based on context outcome; integrated call into olla and sherpa proxy response pipelines with corresponding test coverage.
Proxy & Handler Logic
internal/app/handlers/handler_proxy.go, internal/app/handlers/handler_translation.go
Injected sticky-key computation and context storage into proxy and translation handlers after request analysis; added response header writing for sticky outcomes before response commitment; includes body snapshot/restore for prefix-hash key computation.
Handler Infrastructure
internal/app/handlers/application.go, internal/app/handlers/handler_stats_sticky.go, internal/app/handlers/server_routes.go
Added stickyStatsFn field and setter to Application for injecting stats provider; implemented /internal/stats/sticky endpoint returning enabled status and cache metrics; registered new internal GET route.
Service Wiring
internal/app/services/proxy.go, internal/app/services/discovery.go, internal/app/services/http.go, internal/app/app.go
Wired sticky-session wrapper creation in proxy service startup; added PurgeDeadEndpoints forwarding and StickyStats accessor; connected health checker's unhealthy callback to purge dead sessions via discovery service; injected stats function into application handler.
Testing & Tooling
test/scripts/integration/test-integration.py, test/scripts/sticky/test-sticky-sessions.py, makefile
Added standalone sticky-session integration test script validating session pinning, header echoing, affinity strategies, and stats endpoint; extended integration tester with phase and skip flags; added test-script-sticky make target.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Proxy as Proxy Handler
    participant Balancer as Sticky Wrapper
    participant Cache as TTL Cache
    participant Backend as Backend
    participant Health as Health Checker
    participant Discovery as Discovery Service

    Client->>Proxy: POST /olla/proxy/chat (X-Olla-Session-ID)
    Proxy->>Proxy: analyzeRequest()
    Proxy->>Balancer: ComputeStickyKey(request)
    Balancer->>Balancer: Hash session_header/prefix/auth/IP
    Balancer->>Proxy: (key, source)
    Proxy->>Proxy: Store in context
    Proxy->>Balancer: Select(ctx, endpoints)
    Balancer->>Cache: Lookup(sticky_key)
    alt Cache Hit
        Cache-->>Balancer: pinned_url
        Balancer->>Balancer: Verify endpoint routable
        Balancer-->>Proxy: endpoint (outcome=hit)
    else Cache Miss
        Balancer->>Balancer: Select via inner selector
        Balancer->>Cache: Store(sticky_key, url, TTL)
        Balancer-->>Proxy: endpoint (outcome=miss)
    end
    Proxy->>Backend: Forward request
    Backend-->>Proxy: Response
    Proxy->>Proxy: SetStickySessionHeaders (from context)
    Proxy-->>Client: Response + sticky headers
    
    Health->>Backend: Health check
    Backend-->>Health: Unhealthy
    Health->>Discovery: OnEndpointUnhealthy callback
    Discovery->>Discovery: GetRoutable endpoints
    Discovery->>Balancer: PurgeDeadEndpoints(routable)
    Balancer->>Cache: Remove entries for dead URLs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #99: Both modify the proxy request/response pipeline in internal/adapter/proxy/olla/service.go and related sherpa retry handlers to add processing steps into the response path.
  • PR #50: Both extend health-checker callback machinery and wire discovery service purge/update callbacks for backend state transitions.
  • PR #107: Both modify internal/app/handlers/handler_proxy.go to inject preprocessing logic (prefix stripping vs. sticky-key injection) into the request handler pipeline.

Suggested labels

enhancement, documentation

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Sticky Sessions' directly and clearly describes the main feature addition: implementing sticky session functionality for preserving backend affinity across multi-turn conversations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/sticky-sessions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (12)
internal/core/constants/context.go (1)

14-18: Good use of a private key type for the new sticky context keys.

Using contextKey avoids collisions with other packages. Note the pre-existing ContextRoutePrefixKey/ContextRequestIdKey/etc. are still untyped strings; consider migrating those to contextKey in a follow-up for consistency (out of scope for this PR).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/constants/context.go` around lines 14 - 18, The new sticky
session keys correctly use the private contextKey type to avoid collisions; as a
follow-up, migrate the existing untyped context keys (ContextRoutePrefixKey,
ContextRequestIdKey, etc.) to the same contextKey type: change their
declarations to contextKey("...") and update all uses of context.WithValue and
ctx.Value to pass and compare the typed keys (e.g., ContextRoutePrefixKey,
ContextRequestIdKey) so callers use the same typed key values instead of raw
strings; ensure equality checks and type assertions continue to work with the
new contextKey type across functions that access these keys.
makefile (1)

333-336: Minor: consider python3 and a comment-style consistency.

Two small nits:

  1. Other targets (test-script-integration at line 329) use a # comment prefix, whereas line 333 uses ##. Harmless, but inconsistent.
  2. Invoking python assumes a python shim is available; on many Linux distros only python3 is installed by default. The existing test-script-integration target has the same issue, so this is pre-existing — feel free to leave as-is or normalise both at once.

The checkmake minphony warning about a missing all target is a false positive for this project.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@makefile` around lines 333 - 336, The Makefile target test-script-sticky uses
an inconsistent comment style and calls python without guaranteeing python3;
update the comment to use a single '#' like the other target and change the
invocation in the test-script-sticky target to use python3 (and optionally
normalize test-script-integration the same way) so the command becomes "cd
test/scripts && python3 sticky/test-sticky-sessions.py $(ARGS)"; adjust only the
comment prefix and the interpreter name for targets test-script-sticky (and
test-script-integration if you choose to normalize both).
internal/core/domain/routing.go (1)

84-94: Consider typed constants for Result/Source values.

Result and Source are effectively small enums ("hit" | "miss" | "repin" | "disabled", and the five key-source strings) that are duplicated as string literals across adapter/balancer/sticky.go, adapter/proxy/core/common.go, and tests. Promoting them to named constants in internal/core/constants/ (or typed string aliases here) would remove stringly-typed coupling and make the contract self-documenting. Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/domain/routing.go` around lines 84 - 94, Introduce typed string
aliases and constants for StickyOutcome fields to avoid stringly-typed usage:
define a type StickyResult string and constants (e.g. StickyResultHit,
StickyResultMiss, StickyResultRepin, StickyResultDisabled) and a type
StickySource string with constants for the five key-source values, either in
this file near the StickyOutcome struct or in a new/internal/core/constants
package; update all uses of literal strings in adapter/balancer/sticky.go,
adapter/proxy/core/common.go, and tests to use these new constants (and change
StickyOutcome.Result and .Source to use the typed aliases) so callers and tests
compile against the single canonical set of values.
internal/config/config.go (1)

363-367: Consider adding env overrides for the other sticky knobs.

Only OLLA_PROXY_STICKY_SESSIONS_ENABLED is exposed via env. For operational parity with other settings (and because sticky sessions are marked experimental), it would be useful to also override IdleTTLSeconds, MaxSessions, and KeySources (comma-separated, same pattern as OLLA_SERVER_TRUSTED_PROXY_CIDRS on line 334). Not blocking for this PR.

Also, consistent with surrounding patterns, invalid boolean values are silently ignored here — that matches other env overrides in this function, so no change needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/config/config.go` around lines 363 - 367, Add environment overrides
for the remaining sticky session settings by reading
OLLA_PROXY_STICKY_SESSIONS_IDLE_TTL_SECONDS (parse with strconv.Atoi and assign
to config.Proxy.StickySessions.IdleTTLSeconds if parse succeeds),
OLLA_PROXY_STICKY_SESSIONS_MAX_SESSIONS (parse with strconv.Atoi and assign to
config.Proxy.StickySessions.MaxSessions), and
OLLA_PROXY_STICKY_SESSIONS_KEY_SOURCES (split on commas, trim whitespace and
assign the resulting slice to config.Proxy.StickySessions.KeySources), following
the same nil/ignore-on-parse-error pattern and the comma-split pattern used for
OLLA_SERVER_TRUSTED_PROXY_CIDRS to locate the appropriate block around the
existing OLLA_PROXY_STICKY_SESSIONS_ENABLED handling.
internal/app/services/discovery.go (1)

104-117: Minor: shutdown-time log noise from GetRoutable on cancelled context.

During shutdown the unhealthy callback may be invoked after the context has been cancelled (e.g. if a final status transition is processed while Stop is in progress), in which case GetRoutable(ctx) returns context.Canceled and this emits a warning that isn't really actionable. Consider suppressing the log when errors.Is(err, context.Canceled) to keep shutdown quiet:

💡 Suggested guard
 		routable, err := s.endpointRepo.GetRoutable(ctx)
 		if err != nil {
-			s.logger.Warn("Failed to fetch routable endpoints for sticky session purge", "error", err)
+			if !errors.Is(err, context.Canceled) {
+				s.logger.Warn("Failed to fetch routable endpoints for sticky session purge", "error", err)
+			}
 			return
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/app/services/discovery.go` around lines 104 - 117, The unhealthy
callback currently logs a warning when s.endpointRepo.GetRoutable(ctx) returns
an error, which produces noisy warnings during shutdown when the context is
canceled; update the callback (the function passed to
s.healthChecker.SetUnhealthyCallback that uses purgeDeadFn.Load and calls
s.endpointRepo.GetRoutable) to check if errors.Is(err, context.Canceled) and
simply return without logging in that case, otherwise keep the existing
s.logger.Warn call for real errors before invoking (*fn)(routable).
internal/config/types.go (1)

76-89: Consider adding a Validate() for StickySessionConfig for parity with the other config types.

Other config types in this file (AnthropicTranslatorConfig, InspectorConfig) expose Validate() and catch misconfigurations at startup. Per the snippet from internal/adapter/balancer/sticky.go, an unknown entry in KeySources is silently ignored by the switch, and invalid PrefixHashBytes/IdleTTLSeconds are clamped at runtime. A lightweight validator invoked from Config.Validate() would surface typos in key_sources at startup rather than producing an always-miss sticky store that clients see only via /internal/stats/sticky. Something like:

💡 Proposed validator
func (c *StickySessionConfig) Validate() error {
    if !c.Enabled {
        return nil
    }
    allowed := map[string]struct{}{
        "session_header": {}, "prefix_hash": {}, "auth_header": {}, "ip": {},
    }
    for _, src := range c.KeySources {
        if _, ok := allowed[src]; !ok {
            return fmt.Errorf("sticky_sessions.key_sources contains unknown source %q", src)
        }
    }
    if c.PrefixHashBytes < 0 {
        return fmt.Errorf("sticky_sessions.prefix_hash_bytes must be non-negative, got %d", c.PrefixHashBytes)
    }
    if c.IdleTTLSeconds < 0 {
        return fmt.Errorf("sticky_sessions.idle_ttl_seconds must be non-negative, got %d", c.IdleTTLSeconds)
    }
    return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/config/types.go` around lines 76 - 89, Add a Validate method on
StickySessionConfig and wire it into the top-level Config.Validate() so
misconfigurations are caught at startup; specifically, implement
StickySessionConfig.Validate() to (when Enabled) verify each entry in KeySources
is one of the allowed values ("session_header", "prefix_hash", "auth_header",
"ip") and return an error for unknown entries, and also ensure PrefixHashBytes
and IdleTTLSeconds are non-negative (returning an error if not). Call this new
StickySessionConfig.Validate() from the existing Config.Validate() path so typos
in key_sources or invalid numeric values are surfaced early rather than silently
ignored at runtime.
internal/app/handlers/handler_stats_sticky.go (1)

10-40: Optional: fold the two disabled-path branches into one helper to avoid drift.

Both a.stickyStatsFn == nil and stats == nil encode the exact same payload; inlining them twice makes it easy for the two paths to diverge on future edits (e.g. adding a version field).

♻️ Proposed tidy-up
-func (a *Application) stickyStatsHandler(w http.ResponseWriter, r *http.Request) {
+func (a *Application) stickyStatsHandler(w http.ResponseWriter, r *http.Request) {
 	w.Header().Set(constants.HeaderContentType, constants.ContentTypeJSON)
 	w.WriteHeader(http.StatusOK)
 
-	if a.stickyStatsFn == nil {
-		// Sticky sessions are disabled — return a stable JSON shape so callers
-		// can branch on the "enabled" field rather than on status codes.
-		if err := json.NewEncoder(w).Encode(struct {
-			Enabled bool `json:"enabled"`
-		}{Enabled: false}); err != nil {
-			a.logger.Error("Failed to encode sticky stats response", "error", err)
-		}
-		return
-	}
-
-	stats := a.stickyStatsFn()
-	if stats == nil {
-		// stickyStatsFn is wired but the wrapper was not created because sticky
-		// sessions are disabled in config — return the same stable shape as the
-		// nil-function path so callers always branch on "enabled", not status codes.
-		if err := json.NewEncoder(w).Encode(struct {
-			Enabled bool `json:"enabled"`
-		}{Enabled: false}); err != nil {
-			a.logger.Error("Failed to encode sticky stats response", "error", err)
-		}
-		return
-	}
-	if err := json.NewEncoder(w).Encode(stats); err != nil {
-		a.logger.Error("Failed to encode sticky stats response", "error", err)
-	}
+	var payload any = struct {
+		Enabled bool `json:"enabled"`
+	}{Enabled: false}
+	if a.stickyStatsFn != nil {
+		if stats := a.stickyStatsFn(); stats != nil {
+			payload = stats
+		}
+	}
+	if err := json.NewEncoder(w).Encode(payload); err != nil {
+		a.logger.Error("Failed to encode sticky stats response", "error", err)
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/app/handlers/handler_stats_sticky.go` around lines 10 - 40, The two
disabled branches in stickyStatsHandler (when a.stickyStatsFn == nil and when
stats == nil) duplicate the same JSON encoding logic; extract that
payload+encoding into a small helper (e.g., a method like writeStickyDisabled(w
http.ResponseWriter) or a package-level helper used by stickyStatsHandler) and
replace both branches to call it, preserving the Content-Type header and the
existing error logging via a.logger.Error on json.NewEncoder(w).Encode failures
so both paths remain identical and cannot drift.
internal/app/handlers/application.go (1)

88-90: Minor: comment is slightly misleading.

The stated rationale ("so the handler layer does not need to import the balancer package") no longer holds — this file now imports internal/adapter/balancer on line 9 specifically to reference *balancer.StickyStats. The closure still usefully decouples Application from ProxyServiceWrapper, but the comment overstates the decoupling. Consider rewording to reflect the actual benefit (decoupling from the proxy service, not from the balancer package).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/app/handlers/application.go` around lines 88 - 90, The comment for
the stickyStatsFn field is misleading about package-level decoupling; update the
comment on the Application struct's stickyStatsFn to say the closure decouples
Application from the proxy service implementation (e.g., ProxyServiceWrapper)
rather than claiming it avoids importing the balancer package, and mention it
returns *balancer.StickyStats to clarify the actual dependency on
balancer.StickyStats.
internal/adapter/health/checker.go (1)

290-298: Consider logging when the unhealthy callback fires.

Unlike the recovery callback branch (lines 254-283) which logs the transition and callback outcome, the unhealthy branch fires silently. For an operator debugging why sticky sessions were unexpectedly purged, a single Info/Debug log — e.g. "Endpoint became non-routable, triggering sticky session purge" endpoint=... was=... now=... — would be valuable. Also note the timeout constant DefaultRecoveryCallbackTimeout is now used for both callback paths; a neutral name (e.g. DefaultCallbackTimeout) would better reflect its shared use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/health/checker.go` around lines 290 - 298, The
unhealthy-callback path currently fires silently; update the block that checks
statusChanged && !newStatus.IsRoutable() && oldStatus.IsRoutable() to log the
transition before launching the goroutine (mirror the recovery branch style)
including endpoint identifier and old/new routable state, and ensure the
goroutine logs the callback outcome/errors from
c.unhealthyCallback.OnEndpointUnhealthy; also rename the shared
DefaultRecoveryCallbackTimeout constant to a neutral name like
DefaultCallbackTimeout and update usages (including in the OnEndpointUnhealthy
invocation) so the timeout name reflects both callback paths.
internal/app/services/proxy.go (1)

276-278: Silent fallback when discoverySvc is nil.

If s.discoverySvc is unset when applyStickySessions runs, the purge-on-unhealthy hook is silently never registered, and there is no second chance: session entries for dead backends will only be reclaimed via sliding TTL or LRU pressure. In normal wiring this branch is unreachable, but a logger.Warn here would surface misconfiguration early (e.g. during tests or custom DI wiring) rather than producing confusing purge behaviour at runtime.

🔧 Proposed fix
 	if s.discoverySvc != nil {
 		s.discoverySvc.SetPurgeDeadEndpointsFn(s.PurgeDeadEndpoints)
+	} else {
+		s.logger.Warn("Sticky sessions enabled but discovery service is not wired; " +
+			"dead-backend purge will rely on TTL only")
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/app/services/proxy.go` around lines 276 - 278, applyStickySessions
currently skips registering the purge hook when s.discoverySvc is nil without
any indication; add a warning log to surface misconfiguration by emitting a
logger warning when s.discoverySvc == nil in the same branch. Locate the block
that checks s.discoverySvc in applyStickySessions (the code that calls
s.discoverySvc.SetPurgeDeadEndpointsFn(s.PurgeDeadEndpoints)) and add a call to
the service logger (e.g., s.logger.Warn or the module's logger) explaining that
discoverySvc is unset and purge-on-unhealthy will not be registered so tests/DI
issues are visible early.
internal/adapter/proxy/core/common_test.go (1)

747-752: Nit: test name overstates what the assertion guarantees.

expectSession: "" combined with w.Header().Get(...) can't distinguish between "header not written" and "header written as empty string" — both satisfy the assertion. If the intent is to verify the header is genuinely skipped (not written with an empty value), use _, ok := w.Header()[constants.HeaderXOllaStickySession] and assert !ok instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/proxy/core/common_test.go` around lines 747 - 752, The test
"empty_result_skips_sticky_session_header" currently uses expectSession: "" and
reads w.Header().Get(...), which cannot distinguish "not set" vs "set to empty";
modify the assertion to check header presence directly by using the header map
lookup (e.g. _, ok := w.Header()[constants.HeaderXOllaStickySession]) and assert
!ok instead of relying on expectSession, and update the test case (and any
expectSession usage) to reflect that the header should be absent when outcome is
&domain.StickyOutcome{Result: "", Source: "ip"}.
internal/adapter/balancer/sticky.go (1)

233-247: Consider hashing the IP host for consistency and privacy hygiene.

Every other key source hashes its input (session_header, prefix_hash, auth_header), but ip stores the plaintext remote host in the affinity key — e.g. ip:192.168.1.42:llama3. That’s fine for correctness, but it leaks an IP (PII under GDPR/CCPA) into in‑memory cache keys and anywhere they may later be logged/metricized, and it’s slightly inconsistent with the rest of the cascade. An fnv.New64a pass would match the other sources and bound key length.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/balancer/sticky.go` around lines 233 - 247, The
stickyKeyFromIP function currently returns plaintext host in the affinity key;
change it to hash the host (e.g., using fnv.New64a) before composing the
returned key so the function stickyKeyFromIP produces a bounded-length, hashed
key like other sources (session_header, prefix_hash, auth_header) to avoid
leaking raw IPs and keep consistency—compute the 64-bit FNV-1a of host and use
its hex/decimal representation in place of the raw host when returning
"ip:<hash>:<modelName>" while leaving the secondary return value "ip" unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 164-166: Update the CLAUDE.md header documentation to include the
missing `none` key source (or explicitly note that the header is omitted when
the source is `none`) to match the possible values defined by
HeaderXOllaStickyKeySource in internal/core/constants/content.go and the
behavior of SetStickySessionHeaders; ensure the line listing key sources
includes `none` (or an explicit omission note) so the documentation and
constants/SetStickySessionHeaders behavior are consistent.

In `@internal/app/handlers/handler_proxy.go`:
- Around line 145-157: The read/restore logic can drop consumed bytes when
io.ReadAll returns partial data with an error; change the block that calls
io.ReadAll(io.LimitReader(r.Body, int64(limit)+1)) so that regardless of readErr
or snap length you always restore r.Body by wrapping the consumed snapshot with
io.NopCloser(io.MultiReader(bytes.NewReader(snap), r.Body)); set bodySnap = snap
(can be empty) and then handle/log readErr separately instead of skipping the
re-wrap; keep the same limit computation (cfg.PrefixHashBytes fallback to 512)
and continue to use the same symbols (bodySnap, r.Body, cfg.PrefixHashBytes,
io.ReadAll, io.LimitReader, io.NopCloser, io.MultiReader, bytes.NewReader) so
the proxy engine always sees a complete stream.
- Around line 187-203: The helper setStickyResponseHeaders duplicates logic in
core.SetStickySessionHeaders but unconditionally writes HeaderXOllaStickySession
causing empty header values; remove setStickyResponseHeaders (and
setStickyResponseHeadersFromRequest) and delegate to
core.SetStickySessionHeaders (which reads the outcome from the request context
and already guards outcome.Result != "") by importing the core package and
calling core.SetStickySessionHeaders(w, r) where these helpers were used; if you
cannot import core, change setStickyResponseHeaders to mirror core exactly by
only calling w.Header().Set(constants.HeaderXOllaStickySession, outcome.Result)
when outcome.Result != "" and keep the same checks for outcome.Source and
session header echoing to match core.SetStickySessionHeaders.

In `@test/scripts/integration/test-integration.py`:
- Around line 1437-1475: The test builds model_ids from self.models without
checking backend health, causing flakes; update the model selection to mirror
the discovery filtering used for passthrough_candidates/translation_candidates
by only including models from self.models whose metadata (olla.availability or
similar) contains at least one healthy/routable endpoint, then derive model_ids
from that filtered list (replace the current list comprehension that creates
model_ids). Use the same availability-check logic/keys as the other candidate
filters so model_a/model_b are guaranteed routable.
- Around line 1334-1347: The test's allow-list uses the wrong key source token;
update the valid_sources set in test-integration.py (where valid_sources =
{"session_header", "prefix_hash", "auth_hash", "ip", "none"}) to match the
actual value returned by ComputeStickyKey and documented in
constants.HeaderXOllaStickyKeySource by replacing "auth_hash" with "auth_header"
so the test recognizes the auth header source.

In `@test/scripts/sticky/README.md`:
- Line 73: The fenced code block in test/scripts/sticky/README.md for the
"expected-output" section is missing a language tag (MD040); update the opening
backticks from ``` to include a language such as text or plaintext (e.g., change
``` to ```text) so the block becomes fenced with a language identifier and
satisfies markdownlint.
- Line 12: Update the README entries that list key source values to use the
canonical value "auth_header" instead of the incorrect "auth_hash" so they match
the implementation in internal/adapter/balancer/sticky.go (which uses
auth_header) and the configuration examples; replace occurrences at the two
spots where the table and examples list key sources (the row containing
`X-Olla-Sticky-Key-Source` and the later example at line ~127) to prevent users
from specifying an unrecognized `key_sources` value.

In `@test/scripts/sticky/test-sticky-sessions.py`:
- Around line 46-48: The test currently treats "auth_hash" as a valid sticky key
source but the balancer and constants use "auth_header"; update the
VALID_KEY_SOURCES set to include "auth_header" instead of "auth_hash" and update
any guards/messages that reference "auth_hash" (notably the check and
user-facing string in _test_prefix_hash_hit) so they expect and report
"auth_header" to match internal/adapter/balancer/sticky.go and
constants.HeaderXOllaStickyKeySource.

---

Nitpick comments:
In `@internal/adapter/balancer/sticky.go`:
- Around line 233-247: The stickyKeyFromIP function currently returns plaintext
host in the affinity key; change it to hash the host (e.g., using fnv.New64a)
before composing the returned key so the function stickyKeyFromIP produces a
bounded-length, hashed key like other sources (session_header, prefix_hash,
auth_header) to avoid leaking raw IPs and keep consistency—compute the 64-bit
FNV-1a of host and use its hex/decimal representation in place of the raw host
when returning "ip:<hash>:<modelName>" while leaving the secondary return value
"ip" unchanged.

In `@internal/adapter/health/checker.go`:
- Around line 290-298: The unhealthy-callback path currently fires silently;
update the block that checks statusChanged && !newStatus.IsRoutable() &&
oldStatus.IsRoutable() to log the transition before launching the goroutine
(mirror the recovery branch style) including endpoint identifier and old/new
routable state, and ensure the goroutine logs the callback outcome/errors from
c.unhealthyCallback.OnEndpointUnhealthy; also rename the shared
DefaultRecoveryCallbackTimeout constant to a neutral name like
DefaultCallbackTimeout and update usages (including in the OnEndpointUnhealthy
invocation) so the timeout name reflects both callback paths.

In `@internal/adapter/proxy/core/common_test.go`:
- Around line 747-752: The test "empty_result_skips_sticky_session_header"
currently uses expectSession: "" and reads w.Header().Get(...), which cannot
distinguish "not set" vs "set to empty"; modify the assertion to check header
presence directly by using the header map lookup (e.g. _, ok :=
w.Header()[constants.HeaderXOllaStickySession]) and assert !ok instead of
relying on expectSession, and update the test case (and any expectSession usage)
to reflect that the header should be absent when outcome is
&domain.StickyOutcome{Result: "", Source: "ip"}.

In `@internal/app/handlers/application.go`:
- Around line 88-90: The comment for the stickyStatsFn field is misleading about
package-level decoupling; update the comment on the Application struct's
stickyStatsFn to say the closure decouples Application from the proxy service
implementation (e.g., ProxyServiceWrapper) rather than claiming it avoids
importing the balancer package, and mention it returns *balancer.StickyStats to
clarify the actual dependency on balancer.StickyStats.

In `@internal/app/handlers/handler_stats_sticky.go`:
- Around line 10-40: The two disabled branches in stickyStatsHandler (when
a.stickyStatsFn == nil and when stats == nil) duplicate the same JSON encoding
logic; extract that payload+encoding into a small helper (e.g., a method like
writeStickyDisabled(w http.ResponseWriter) or a package-level helper used by
stickyStatsHandler) and replace both branches to call it, preserving the
Content-Type header and the existing error logging via a.logger.Error on
json.NewEncoder(w).Encode failures so both paths remain identical and cannot
drift.

In `@internal/app/services/discovery.go`:
- Around line 104-117: The unhealthy callback currently logs a warning when
s.endpointRepo.GetRoutable(ctx) returns an error, which produces noisy warnings
during shutdown when the context is canceled; update the callback (the function
passed to s.healthChecker.SetUnhealthyCallback that uses purgeDeadFn.Load and
calls s.endpointRepo.GetRoutable) to check if errors.Is(err, context.Canceled)
and simply return without logging in that case, otherwise keep the existing
s.logger.Warn call for real errors before invoking (*fn)(routable).

In `@internal/app/services/proxy.go`:
- Around line 276-278: applyStickySessions currently skips registering the purge
hook when s.discoverySvc is nil without any indication; add a warning log to
surface misconfiguration by emitting a logger warning when s.discoverySvc == nil
in the same branch. Locate the block that checks s.discoverySvc in
applyStickySessions (the code that calls
s.discoverySvc.SetPurgeDeadEndpointsFn(s.PurgeDeadEndpoints)) and add a call to
the service logger (e.g., s.logger.Warn or the module's logger) explaining that
discoverySvc is unset and purge-on-unhealthy will not be registered so tests/DI
issues are visible early.

In `@internal/config/config.go`:
- Around line 363-367: Add environment overrides for the remaining sticky
session settings by reading OLLA_PROXY_STICKY_SESSIONS_IDLE_TTL_SECONDS (parse
with strconv.Atoi and assign to config.Proxy.StickySessions.IdleTTLSeconds if
parse succeeds), OLLA_PROXY_STICKY_SESSIONS_MAX_SESSIONS (parse with
strconv.Atoi and assign to config.Proxy.StickySessions.MaxSessions), and
OLLA_PROXY_STICKY_SESSIONS_KEY_SOURCES (split on commas, trim whitespace and
assign the resulting slice to config.Proxy.StickySessions.KeySources), following
the same nil/ignore-on-parse-error pattern and the comma-split pattern used for
OLLA_SERVER_TRUSTED_PROXY_CIDRS to locate the appropriate block around the
existing OLLA_PROXY_STICKY_SESSIONS_ENABLED handling.

In `@internal/config/types.go`:
- Around line 76-89: Add a Validate method on StickySessionConfig and wire it
into the top-level Config.Validate() so misconfigurations are caught at startup;
specifically, implement StickySessionConfig.Validate() to (when Enabled) verify
each entry in KeySources is one of the allowed values ("session_header",
"prefix_hash", "auth_header", "ip") and return an error for unknown entries, and
also ensure PrefixHashBytes and IdleTTLSeconds are non-negative (returning an
error if not). Call this new StickySessionConfig.Validate() from the existing
Config.Validate() path so typos in key_sources or invalid numeric values are
surfaced early rather than silently ignored at runtime.

In `@internal/core/constants/context.go`:
- Around line 14-18: The new sticky session keys correctly use the private
contextKey type to avoid collisions; as a follow-up, migrate the existing
untyped context keys (ContextRoutePrefixKey, ContextRequestIdKey, etc.) to the
same contextKey type: change their declarations to contextKey("...") and update
all uses of context.WithValue and ctx.Value to pass and compare the typed keys
(e.g., ContextRoutePrefixKey, ContextRequestIdKey) so callers use the same typed
key values instead of raw strings; ensure equality checks and type assertions
continue to work with the new contextKey type across functions that access these
keys.

In `@internal/core/domain/routing.go`:
- Around line 84-94: Introduce typed string aliases and constants for
StickyOutcome fields to avoid stringly-typed usage: define a type StickyResult
string and constants (e.g. StickyResultHit, StickyResultMiss, StickyResultRepin,
StickyResultDisabled) and a type StickySource string with constants for the five
key-source values, either in this file near the StickyOutcome struct or in a
new/internal/core/constants package; update all uses of literal strings in
adapter/balancer/sticky.go, adapter/proxy/core/common.go, and tests to use these
new constants (and change StickyOutcome.Result and .Source to use the typed
aliases) so callers and tests compile against the single canonical set of
values.

In `@makefile`:
- Around line 333-336: The Makefile target test-script-sticky uses an
inconsistent comment style and calls python without guaranteeing python3; update
the comment to use a single '#' like the other target and change the invocation
in the test-script-sticky target to use python3 (and optionally normalize
test-script-integration the same way) so the command becomes "cd test/scripts &&
python3 sticky/test-sticky-sessions.py $(ARGS)"; adjust only the comment prefix
and the interpreter name for targets test-script-sticky (and
test-script-integration if you choose to normalize both).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff8dede0-d813-495d-9d3d-ce518fcc1a4c

📥 Commits

Reviewing files that changed from the base of the PR and between 76e05c0 and 4bbe95b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (36)
  • CLAUDE.md
  • config/config.yaml
  • docs/content/concepts/load-balancing.md
  • docs/content/concepts/sticky-sessions.md
  • docs/content/configuration/reference.md
  • docs/mkdocs.yml
  • go.mod
  • internal/adapter/balancer/sticky.go
  • internal/adapter/balancer/sticky_test.go
  • internal/adapter/health/checker.go
  • internal/adapter/health/checker_test.go
  • internal/adapter/health/recovery_callback.go
  • internal/adapter/proxy/core/common.go
  • internal/adapter/proxy/core/common_test.go
  • internal/adapter/proxy/olla/service.go
  • internal/adapter/proxy/olla/service_retry.go
  • internal/adapter/proxy/sherpa/service_retry.go
  • internal/app/app.go
  • internal/app/handlers/application.go
  • internal/app/handlers/handler_proxy.go
  • internal/app/handlers/handler_stats_sticky.go
  • internal/app/handlers/handler_translation.go
  • internal/app/handlers/server_routes.go
  • internal/app/services/discovery.go
  • internal/app/services/http.go
  • internal/app/services/proxy.go
  • internal/config/config.go
  • internal/config/types.go
  • internal/core/constants/content.go
  • internal/core/constants/context.go
  • internal/core/domain/routing.go
  • makefile
  • readme.md
  • test/scripts/integration/test-integration.py
  • test/scripts/sticky/README.md
  • test/scripts/sticky/test-sticky-sessions.py

Comment thread CLAUDE.md
Comment thread internal/app/handlers/handler_proxy.go
Comment on lines +187 to +203
// setStickyResponseHeaders writes sticky session outcome headers so clients can observe
// affinity routing decisions. When the client provided an explicit session ID header,
// we echo it back so stateless clients can track their own session.
func (a *Application) setStickyResponseHeaders(w http.ResponseWriter, r *http.Request, outcome *balancer.StickyOutcome) {
if outcome == nil {
return
}
w.Header().Set(constants.HeaderXOllaStickySession, outcome.Result)
if outcome.Source != "" && outcome.Source != "none" {
w.Header().Set(constants.HeaderXOllaStickyKeySource, outcome.Source)
}
if outcome.Source == "session_header" {
if sid := r.Header.Get(constants.HeaderXOllaSessionID); sid != "" {
w.Header().Set(constants.HeaderXOllaSessionID, sid)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find callers to understand migration impact.
rg -nP '\bsetStickyResponseHeaders(FromRequest)?\s*\(' --type=go -C1

Repository: thushan/olla

Length of output: 1929


🏁 Script executed:

rg -nA 15 'func.*SetStickySessionHeaders' --type=go

Repository: thushan/olla

Length of output: 2619


🏁 Script executed:

rg -n 'type StickyOutcome' --type=go

Repository: thushan/olla

Length of output: 199


🏁 Script executed:

rg -n 'HeaderXOllaStickySession' --type=go -C2

Repository: thushan/olla

Length of output: 1928


🏁 Script executed:

rg -nA 20 'func TestSetStickySessionHeaders' --type=go

Repository: thushan/olla

Length of output: 1780


🏁 Script executed:

rg -nA 5 'expectSession:.*""' internal/adapter/proxy/core/common_test.go | head -40

Repository: thushan/olla

Length of output: 380


🏁 Script executed:

sed -n '740,760p' internal/adapter/proxy/core/common_test.go

Repository: thushan/olla

Length of output: 671


🏁 Script executed:

rg -n 'setStickyResponseHeaders' internal/app/handlers/handler_proxy_test.go

Repository: thushan/olla

Length of output: 139


🏁 Script executed:

find internal/app/handlers -name '*test.go' -type f

Repository: thushan/olla

Length of output: 951


🏁 Script executed:

rg -n 'setStickyResponseHeaders' internal/app/handlers --type=go

Repository: thushan/olla

Length of output: 1088


🏁 Script executed:

rg -n 'HeaderXOllaStickySession\|setStickyResponseHeaders' internal/app/handlers/handler_translation_test.go

Repository: thushan/olla

Length of output: 38


🏁 Script executed:

rg -n 'X-Olla-Sticky' internal/app/handlers --type=go -C2

Repository: thushan/olla

Length of output: 38


🏁 Script executed:

rg -n 'X-Olla-Sticky|HeaderXOllaStickySession' --type=go | head -20

Repository: thushan/olla

Length of output: 928


🏁 Script executed:

head -30 internal/app/handlers/handler_proxy.go | grep -E '^import|^package'

Repository: thushan/olla

Length of output: 81


🏁 Script executed:

sed -n '1,30p' internal/app/handlers/handler_proxy.go

Repository: thushan/olla

Length of output: 772


🏁 Script executed:

rg -n 'internal/adapter/proxy/core' internal/app/handlers/handler_proxy.go

Repository: thushan/olla

Length of output: 38


🏁 Script executed:

rg -n 'import.*core' internal/app/handlers/*.go | grep -v test

Repository: thushan/olla

Length of output: 38


🏁 Script executed:

grep -n 'internal/adapter' internal/app/handlers/handler_proxy.go

Repository: thushan/olla

Length of output: 111


Remove duplicate sticky-header helper and consolidate with core.SetStickySessionHeaders.

This helper duplicates internal/adapter/proxy/core/common.go:SetStickySessionHeaders but differs in a critical way: it calls w.Header().Set(HeaderXOllaStickySession, outcome.Result) unconditionally, so an outcome with an empty Result (sticky enabled but no key extracted this turn) produces a bare X-Olla-Sticky-Session: header. The core variant guards with if outcome.Result != "" before setting it, which is the tested and expected behaviour. Two code paths writing the same header with different rules invites test flakiness.

Replace both setStickyResponseHeaders and setStickyResponseHeadersFromRequest by delegating to core.SetStickySessionHeaders directly (it extracts the outcome from the request context internally, which this wrapper already does). If importing the core package is not preferred, align the guard conditions exactly with the core variant.

♻️ Suggested consolidation
-// setStickyResponseHeadersFromRequest reads the StickyOutcome from the request context
-// and writes the sticky session headers. Used by sub-handlers that have *http.Request
-// but not the outcome pointer directly. Must be called before w.WriteHeader().
-func (a *Application) setStickyResponseHeadersFromRequest(w http.ResponseWriter, r *http.Request) {
-	outcome, _ := r.Context().Value(constants.ContextStickyOutcomeKey).(*balancer.StickyOutcome)
-	a.setStickyResponseHeaders(w, r, outcome)
-}
-
-// setStickyResponseHeaders writes sticky session outcome headers so clients can observe
-// affinity routing decisions. When the client provided an explicit session ID header,
-// we echo it back so stateless clients can track their own session.
-func (a *Application) setStickyResponseHeaders(w http.ResponseWriter, r *http.Request, outcome *balancer.StickyOutcome) {
-	if outcome == nil {
-		return
-	}
-	w.Header().Set(constants.HeaderXOllaStickySession, outcome.Result)
-	if outcome.Source != "" && outcome.Source != "none" {
-		w.Header().Set(constants.HeaderXOllaStickyKeySource, outcome.Source)
-	}
-	if outcome.Source == "session_header" {
-		if sid := r.Header.Get(constants.HeaderXOllaSessionID); sid != "" {
-			w.Header().Set(constants.HeaderXOllaSessionID, sid)
-		}
-	}
-}
+// setStickyResponseHeadersFromRequest delegates to the shared core helper so
+// there is a single source of truth for sticky response-header semantics.
+func (a *Application) setStickyResponseHeadersFromRequest(w http.ResponseWriter, r *http.Request) {
+	core.SetStickySessionHeaders(w, r)
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/app/handlers/handler_proxy.go` around lines 187 - 203, The helper
setStickyResponseHeaders duplicates logic in core.SetStickySessionHeaders but
unconditionally writes HeaderXOllaStickySession causing empty header values;
remove setStickyResponseHeaders (and setStickyResponseHeadersFromRequest) and
delegate to core.SetStickySessionHeaders (which reads the outcome from the
request context and already guards outcome.Result != "") by importing the core
package and calling core.SetStickySessionHeaders(w, r) where these helpers were
used; if you cannot import core, change setStickyResponseHeaders to mirror core
exactly by only calling w.Header().Set(constants.HeaderXOllaStickySession,
outcome.Result) when outcome.Result != "" and keep the same checks for
outcome.Source and session header echoing to match core.SetStickySessionHeaders.

Comment thread test/scripts/integration/test-integration.py Outdated
Comment on lines +1437 to +1475
# -- Test 7: model scoping — same session ID, different models pin separately
model_ids = [m.get("id", m.get("name", "")) for m in self.models
if m.get("id", m.get("name", "")) and "embed" not in m.get("id", "").lower()]
if len(model_ids) >= 2:
model_a = model_ids[0]
model_b = model_ids[1]
scope_sid = f"integration-scope-{int(time.time())}"
scope_headers = {"X-Olla-Session-ID": scope_sid}

# Establish a pin for model_a
self._post("/olla/proxy/v1/chat/completions",
{"model": model_a, "messages": [{"role": "user", "content": "hi"}], "max_tokens": 1},
headers=scope_headers)
# model_b with the same session ID must get its own independent pin (miss)
rb_first = self._post("/olla/proxy/v1/chat/completions",
{"model": model_b, "messages": [{"role": "user", "content": "hi"}], "max_tokens": 1},
headers=scope_headers)
# model_a follow-up must still hit
ra_second = self._post("/olla/proxy/v1/chat/completions",
{"model": model_a, "messages": [{"role": "user", "content": "hi"}], "max_tokens": 1},
headers=scope_headers)

ok = False
detail = ""
if (rb_first is not None and ra_second is not None
and rb_first.status_code == 200 and ra_second.status_code == 200):
rb_sticky = rb_first.headers.get("X-Olla-Sticky-Session", "")
ra_sticky = ra_second.headers.get("X-Olla-Sticky-Session", "")
# model_b first request should be a miss (new key because model differs)
# model_a second request should be a hit (existing pin)
ok = rb_sticky == "miss" and ra_sticky == "hit"
detail = f"model_b first={rb_sticky} (want miss), model_a second={ra_sticky} (want hit)"
else:
detail = self._error_detail(rb_first or ra_second)

self._print_result("Model-scoped keys: same session ID, different models pin separately", ok, detail)
self.record(f"{phase}/model-scoping", ok, detail, phase)
else:
self.pcolor(GREY, f" [SKIP] Model scoping test skipped — only {len(model_ids)} model(s) available")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Model‑scoping test may flake on heterogeneous fleets.

model_ids is built from self.models without filtering by healthy availability, so model_ids[0]/model_ids[1] may not be routable at all (e.g. live only on an unhealthy backend). Consider restricting the pool to models whose olla.availability contains a healthy endpoint — similar to the discovery logic already used for passthrough_candidates/translation_candidates earlier in the file — so the test doesn't spuriously fail with 5xx.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/integration/test-integration.py` around lines 1437 - 1475, The
test builds model_ids from self.models without checking backend health, causing
flakes; update the model selection to mirror the discovery filtering used for
passthrough_candidates/translation_candidates by only including models from
self.models whose metadata (olla.availability or similar) contains at least one
healthy/routable endpoint, then derive model_ids from that filtered list
(replace the current list comprehension that creates model_ids). Use the same
availability-check logic/keys as the other candidate filters so model_a/model_b
are guaranteed routable.

Comment thread test/scripts/sticky/README.md Outdated

## Expected Output

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to the fenced code block (MD040).

markdownlint flags line 73 — the expected-output block has no language specified. Use text (or plaintext).

📝 Proposed fix
-```
+```text
 ========================================================================
   Olla Sticky Session Test
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 73-73: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/sticky/README.md` at line 73, The fenced code block in
test/scripts/sticky/README.md for the "expected-output" section is missing a
language tag (MD040); update the opening backticks from ``` to include a
language such as text or plaintext (e.g., change ``` to ```text) so the block
becomes fenced with a language identifier and satisfies markdownlint.

Comment thread test/scripts/sticky/test-sticky-sessions.py Outdated
@thushan thushan merged commit 7d153be into main Apr 16, 2026
8 checks passed
@thushan thushan deleted the feature/sticky-sessions branch April 16, 2026 23:48
@coderabbitai coderabbitai Bot mentioned this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant