LLM layer: minor correctness cleanup (pricing match, Ollama stream errors, Gemini parts, 408, etc.)

## Summary

A cluster of small, individually low-severity correctness issues in the LLM layer, grouped into one cleanup issue. Each is independently fixable.

## 1. Pricing lookup matches the wrong (shorter) model first
`src/core/pricing/pricing_data.py:85-87` — the last-resort substring loop returns the **first insertion-order** entry where either name contains the other. `gpt-4.1` precedes `gpt-4.1-mini`, so `gpt-4.1-mini-2025-04-14` matches `gpt-4.1` and is priced ~5x too high; same class of bug for `gemini-2.5-flash-lite-*` → `gemini-2.5-flash`. **Fix:** prefer the longest matching key.

## 2. Ollama streaming error bodies can never be read → context-overflow 400s misclassified
`src/core/llm/providers/ollama.py:530-558` — `raise_for_status()` runs inside `client.stream(...)` before the body is read, so `e.response.json()` raises `httpx.ResponseNotRead`, swallowed by a bare `except`. The keyword check ("context", "length", …) never matches, so Ollama's `num_ctx`-overflow 400 (payload sets `"truncate": False`) is not raised as `ContextOverflowError` and the translator can't grow context / shrink chunks. **Fix:** `await e.response.aread()` before parsing.

## 3. Gemini response parsing reads only `parts[0]`
`src/core/llm/providers/gemini.py:227-229` — Gemini can return multiple `parts` (thought + text, or long split responses); taking `parts[0].get("text")` drops the rest. **Fix:** join all text parts.

## 4. `OllamaProvider.get_model_context_size()` references a never-created attribute
`src/core/llm/providers/ollama.py:661` — uses `self._context_detector`, never assigned in `__init__`; raises `AttributeError`, swallowed into a "failed gracefully" warning. Currently dead (no caller) but broken. **Fix:** create the detector or remove the method.

## 5. Repetition-loop threshold branch is unreachable
`src/core/llm/thinking/detection.py:60-63` — `elif phrase_len >= 40` can never run because `phrase_len >= 20` is checked first; the strongest loop signal never gets the lenient threshold. **Fix:** order the branches longest-first.

## 6. LiteLLM provider's KeyPool integration is dead
`src/core/llm/providers/litellm.py:80-89,125` — `_build_kwargs()` uses `peek()` once before the retry loop, never `acquire()`/`mark_throttled()`, so multi-key LiteLLM never rotates on `RateLimitError`. **Fix:** rotate keys in the retry loop, or drop the pool wiring if unsupported.

## 7. Thinking cache stores monotonic loop time as a persistent timestamp
`src/core/llm/thinking/cache.py:131-137` — `tested_at` uses `loop.time()` (monotonic, resets per process) then persists it as if wall-clock. Currently never read, but misleading. **Fix:** use `time.time()`.

## 8. 408 classified as non-retryable
`src/core/llm/rate_limit_handler.py:25-37` — `is_retryable_http_status` treats 408 (Request Timeout) as non-retryable; it's conventionally transient. **Fix:** add 408 to the retryable set.

---
_Found during the June 2026 repo audit. Severity: low (each). Confidence: certain except #2 (likely)._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM layer: minor correctness cleanup (pricing match, Ollama stream errors, Gemini parts, 408, etc.) #231

Summary

1. Pricing lookup matches the wrong (shorter) model first

2. Ollama streaming error bodies can never be read → context-overflow 400s misclassified

3. Gemini response parsing reads only `parts[0]`

4. `OllamaProvider.get_model_context_size()` references a never-created attribute

5. Repetition-loop threshold branch is unreachable

6. LiteLLM provider's KeyPool integration is dead

7. Thinking cache stores monotonic loop time as a persistent timestamp

8. 408 classified as non-retryable

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LLM layer: minor correctness cleanup (pricing match, Ollama stream errors, Gemini parts, 408, etc.) #231

Description

Summary

1. Pricing lookup matches the wrong (shorter) model first

2. Ollama streaming error bodies can never be read → context-overflow 400s misclassified

3. Gemini response parsing reads only parts[0]

4. OllamaProvider.get_model_context_size() references a never-created attribute

5. Repetition-loop threshold branch is unreachable

6. LiteLLM provider's KeyPool integration is dead

7. Thinking cache stores monotonic loop time as a persistent timestamp

8. 408 classified as non-retryable

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

3. Gemini response parsing reads only `parts[0]`

4. `OllamaProvider.get_model_context_size()` references a never-created attribute