Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions .claude/skills/test-sticky-sessions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
---
name: test-sticky-sessions
description: >
Runs the Olla sticky session integration test harness end-to-end across all
provider-scoped routes. Trigger when the user asks to: verify sticky sessions
work, run the sticky session integration test, test provider-route affinity,
or check whether the providerProxyHandler bug fix is holding.
Delegable to Sonnet — does not require Opus.
---

# Sticky Session Integration Test

This skill exercises sticky session affinity across **all** provider-scoped
routes that AIMock can serve:

| Route | Request path | Status |
|---|---|---|
| Main proxy | `/olla/proxy/v1/chat/completions` | tested |
| openai-compatible | `/olla/openai-compatible/v1/chat/completions` | tested (primary regression target) |
| openai | `/olla/openai/v1/chat/completions` | tested |
| vllm | `/olla/vllm/v1/chat/completions` | tested |
| sglang | `/olla/sglang/v1/chat/completions` | tested |
| llamacpp | `/olla/llamacpp/v1/chat/completions` | tested |
| lmstudio | `/olla/lmstudio/v1/chat/completions` | tested |
| lm-studio (alt prefix) | `/olla/lm-studio/v1/chat/completions` | tested |
| litellm | `/olla/litellm/v1/chat/completions` | tested |
| dmr | `/olla/dmr/v1/chat/completions` | tested |
| vllm-mlx | `/olla/vllm-mlx/v1/chat/completions` | tested |
| anthropic translator | `/olla/anthropic/v1/messages` | tested + passthrough assertion |
| lemonade | `/olla/lemonade/api/v1/chat/completions` | **skipped** — AIMock does not serve `/api/v1/*` |
| ollama | `/olla/ollama/api/chat` | **skipped** — AIMock does not speak Ollama `/api/*` protocol |

The `/olla/openai-compatible/` and `/olla/openai/` paths were affected by a bug
where `providerProxyHandler` never injected sticky session context — those
routes are the primary regression targets.

## Steps

### 1. Pre-flight: verify Docker is running

```bash
docker info > /dev/null 2>&1 || { echo "Docker is not running — start Docker Desktop first"; exit 1; }
```

### 2. Start AIMock instances

```bash
make mock-up
```

Waits until all three AIMock containers report healthy (ports 9300/9301/9302).
Each instance returns a unique `BACKEND:instance-{a,b,c}` marker so the test
can confirm which backend served each response.

### 3. Build Olla and start with sticky config

```bash
LOG="${TMPDIR:-/tmp}/olla-sticky.log"
go run . --config test/manual/config.sticky.yaml > "$LOG" 2>&1 &
OLLA_PID=$!
```

Wait until ready:
```bash
until curl -sf http://localhost:40114/internal/health > /dev/null; do sleep 1; done
echo "Olla ready (PID $OLLA_PID, log $LOG)"
```

### 4. Run the assertion script

```bash
OLLA_URL=http://localhost:40114 bash test/scripts/sticky/test-sticky-provider-routes.sh
RESULT=$?
```

For each active (non-skipped) route, the script asserts:
- Turn 1: `X-Olla-Sticky-Session: miss`, `X-Olla-Sticky-Key-Source: session_header`
- Turn 2: `X-Olla-Sticky-Session: hit`, same `X-Olla-Endpoint` as Turn 1, same backend marker
- Turn 3: across 10 fresh sessions, at least one lands on a different backend
- Anthropic path additionally asserts `X-Olla-Mode: passthrough`
Comment on lines +76 to +80
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align the documented Turn 3 behaviour with the script.

The harness sends 30 fresh sessions and skips main-proxy diversity, but the doc says 10 sessions and shows main-proxy passing Turn 3. This will mislead manual troubleshooting.

📝 Proposed fix
-- Turn 3: across 10 fresh sessions, at least one lands on a different backend
+- Turn 3: across 30 fresh sessions, at least one lands on a different backend
@@
 ── main-proxy ──
   ✓ PASS — Turn 1 HTTP 200
   ✓ PASS — Turn 1 sticky=miss
   ✓ PASS — Turn 1 key-source=session_header
   Pinned to: mock-compat-b (BACKEND:instance-b)
   ✓ PASS — Turn 2 HTTP 200
   ✓ PASS — Turn 2 sticky=hit
   ✓ PASS — Turn 2 same endpoint (mock-compat-b)
   ✓ PASS — Turn 2 same backend marker (BACKEND:instance-b)
-  ✓ PASS — Turn 3 load balancing reaches multiple backends
+  SKIP Turn 3 diversity — main-proxy pool is huge and LCB tie-break is deterministic at zero connections — spread not meaningful here

Also applies to: 105-139

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/test-sticky-sessions.md around lines 76 - 80, Update the Turn
3 documentation in .claude/skills/test-sticky-sessions.md to match the actual
test harness behavior: change "10 fresh sessions" to "30 fresh sessions" and
remove or correct any statement that implies the main-proxy path is evaluated
for diversity (i.e., remove "shows main-proxy passing Turn 3" or explicitly
document that main-proxy is skipped for diversity). Ensure the description for
Turn 3 notes that diversity is checked across 30 fresh sessions and that
main-proxy is excluded, and apply the same corrections to the other affected
block referenced (the section around lines 105-139).

- Stats endpoint: `insertions > 0`, `hits > 0`, `active_sessions > 0`

Skipped routes print clearly: `SKIP <label> — <reason>` and do not count as failures.

### 5. Teardown (bulletproof — always runs)

```bash
kill "$OLLA_PID" 2>/dev/null || true
make mock-down
exit "$RESULT"
```

### Fully automated (single command)

```bash
make test-sticky-manual
```

This target handles all five steps including the EXIT trap teardown.

---

## Expected output (passing run)

```text
╔══════════════════════════════════════════════════════════════╗
║ Olla Sticky Session — All Provider Routes Regression Test ║
╚══════════════════════════════════════════════════════════════╝

── main-proxy ──
✓ PASS — Turn 1 HTTP 200
✓ PASS — Turn 1 sticky=miss
✓ PASS — Turn 1 key-source=session_header
Pinned to: mock-compat-b (BACKEND:instance-b)
✓ PASS — Turn 2 HTTP 200
✓ PASS — Turn 2 sticky=hit
✓ PASS — Turn 2 same endpoint (mock-compat-b)
✓ PASS — Turn 2 same backend marker (BACKEND:instance-b)
✓ PASS — Turn 3 load balancing reaches multiple backends

── openai-compatible ──
... same pattern ...

... (vllm, sglang, llamacpp, lmstudio, lm-studio, litellm, dmr, vllm-mlx) ...

SKIP lemonade (/olla/lemonade/api/v1/chat/completions) — AIMock does not serve /api/v1/* — Lemonade uses a non-standard path prefix
SKIP ollama (/olla/ollama/api/chat) — AIMock does not speak the Ollama /api/* protocol

── anthropic-translator ──
✓ PASS — Turn 1 X-Olla-Mode=passthrough
...

── Sticky Session Stats ──
✓ PASS — stats.insertions > 0
✓ PASS — stats.hits > 0
✓ PASS — stats.active_sessions > 0

Results: 99 passed 0 failed 2 skipped (99 total assertions)
✓ All sticky session assertions passed.
```

---

## Manual verification (troubleshooting)

**Health check:**
```bash
curl -s http://localhost:40114/internal/health | python3 -m json.tool
```

**Turn 1 — main proxy:**
```bash
curl -s -D - -X POST http://localhost:40114/olla/proxy/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Olla-Session-ID: debug-sess-001" \
-d '{"model":"test","messages":[{"role":"user","content":"ping"}],"max_tokens":20}'
```

Expected response headers:
```text
X-Olla-Sticky-Session: miss
X-Olla-Sticky-Key-Source: session_header
X-Olla-Endpoint: mock-compat-{a,b,c}
```

**Turn 2 — same session, expect hit:**
```bash
curl -s -D - -X POST http://localhost:40114/olla/proxy/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Olla-Session-ID: debug-sess-001" \
-d '{"model":"test","messages":[{"role":"user","content":"ping"}],"max_tokens":20}'
```

Expected: `X-Olla-Sticky-Session: hit`

**Provider-scoped route (regression path):**
```bash
curl -s -D - -X POST http://localhost:40114/olla/openai-compatible/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Olla-Session-ID: debug-sess-002" \
-d '{"model":"test","messages":[{"role":"user","content":"ping"}],"max_tokens":20}'
```

**vLLM-specific route:**
```bash
curl -s -D - -X POST http://localhost:40114/olla/vllm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Olla-Session-ID: debug-sess-vllm" \
-d '{"model":"test","messages":[{"role":"user","content":"ping"}],"max_tokens":20}'
```

**Anthropic passthrough:**
```bash
curl -s -D - -X POST http://localhost:40114/olla/anthropic/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: test" \
-H "anthropic-version: 2023-06-01" \
-H "X-Olla-Session-ID: debug-sess-003" \
-d '{"model":"claude-3-haiku-20240307","max_tokens":20,"messages":[{"role":"user","content":"ping"}]}'
```

Expected: `X-Olla-Mode: passthrough`

**Stats:**
```bash
curl -s http://localhost:40114/internal/stats/sticky | python3 -m json.tool
```

---

## Notes

- `test/manual/config.sticky.yaml` registers three endpoints per provider type
(all pointing at AIMock on 9300/9301/9302) so affinity checks are meaningful.
- The `openai-compatible` profile declares `anthropic_support.enabled: true`,
enabling passthrough mode on the Anthropic translator path.
- Lemonade and Ollama routes are skipped cleanly — they require a dedicated mock
that speaks their native protocols (`/api/v1/chat/completions` and
`/api/chat`/`/api/generate` respectively).
- To test the Olla engine (high-performance), change `engine: "sherpa"` to
`engine: "olla"` in `test/manual/config.sticky.yaml` and re-run.
- The script is portable: `#!/usr/bin/env bash`, no absolute paths, no
platform-specific constructs. Runs on macOS, Linux, and Git-Bash on Windows.
10 changes: 10 additions & 0 deletions config/profiles/openai-compatible.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,16 @@ routing:
# API compatibility
api:
openai_compatible: true

# Anthropic Messages API support
# Most OpenAI-compatible backends (LiteLLM, generic proxies) do not natively serve
# /v1/messages, so passthrough is disabled by default to avoid 404s. Enable this
# per backend profile (e.g. vllm.yaml, lmstudio.yaml) when the server supports it.
anthropic_support:
enabled: false
messages_path: /v1/messages
token_count: false

paths:
- /v1/models # 0: health check & models
- /v1/chat/completions # 1: chat completions
Expand Down
5 changes: 5 additions & 0 deletions internal/adapter/balancer/sticky.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,11 +199,16 @@ func stickyKeyFromSessionHeader(r *http.Request, modelName string) (string, stri

// stickyKeyFromPrefixHash hashes the first prefixBytes bytes of the messages
// JSON array so requests with identical conversation prefixes are routed together.
// Falls back to the legacy completions "prompt" field so non-chat endpoints
// (e.g. /v1/completions, llamaswap-style passthroughs) also produce a key.
func stickyKeyFromPrefixHash(body []byte, modelName string, prefixBytes int) (string, string) {
if len(body) == 0 {
return "", ""
}
raw := gjson.GetBytes(body, "messages").Raw
if raw == "" || raw == "[]" || raw == "null" {
raw = gjson.GetBytes(body, "prompt").Raw
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.
if raw == "" {
return "", ""
}
Expand Down
Loading
Loading