Skip to content

fix(cache): enforce fail-closed Redis backend for distributed mode#199

Merged
Rahul Dass (rahuldass19) merged 6 commits into
mainfrom
fix/issue-189-redis-failclosed
Jun 6, 2026
Merged

fix(cache): enforce fail-closed Redis backend for distributed mode#199
Rahul Dass (rahuldass19) merged 6 commits into
mainfrom
fix/issue-189-redis-failclosed

Conversation

@rahuldass19

@rahuldass19 Rahul Dass (rahuldass19) commented Jun 1, 2026

Copy link
Copy Markdown
Member

Problem

RedisCache silently fell back to a node-local VerificationCache when Redis
was unavailable. In a multi-node deployment this means:

  • Node A: Redis down → writes to its own in-memory cache
  • Node B: Redis down → reads from its own empty in-memory cache
  • Same query, same tenant, two different outcomes — silently

This breaks distributed consistency at the enforcement boundary. Rate limits,
replay protection, and policy gates built on top of the shared cache can be
bypassed simply by routing requests to a cold or restarting node. Infrastructure
failure was silently converted into a security-mode downgrade with no operator
signal.

Closes #189.


What Changed

CacheBackendMode enum (new)

Three explicit operating modes — no hidden behavior:

Mode Redis unavailable Use when
STRICT_DISTRIBUTED Raises CacheBackendUnavailableError Rate limits, replay protection, policy gates
EXPLICIT_DEGRADED Falls back to node-local cache with audit marker + security event Operator explicitly accepts consistency trade-off
LOCAL_ONLY Never touches Redis Single-process, tests, cached_verify()

CacheBackendUnavailableError (new)

Explicit, informative exception raised in STRICT_DISTRIBUTED mode. Message
includes guidance toward EXPLICIT_DEGRADED or LOCAL_ONLY so callers know
their options.

RedisCache (modified)

  • Accepts mode: CacheBackendMode = STRICT_DISTRIBUTED
  • Removed unconditional _fallback_cache = VerificationCache() on Redis unavailable
  • STRICT_DISTRIBUTED + no Redis at startup → raises immediately at construction
  • STRICT_DISTRIBUTED + Redis error at runtime (get/set/invalidate/clear) → raises CacheBackendUnavailableError
  • EXPLICIT_DEGRADED fallback activation emits a structured JSON security event to qwed.cache.security logger
  • Every result served from the fallback path carries _degraded_mode=True and _cache_backend="local_degraded" — cross-node divergence is explicit, not silent
  • stats() now exposes mode and degraded fields

get_cache() (modified)

  • Accepts mode parameter, defaults to STRICT_DISTRIBUTED
  • use_redis=False or LOCAL_ONLY → returns plain VerificationCache

cached_verify() (modified)

Explicitly uses LOCAL_ONLY. Its existing behaviour was already effectively
local-only via the silent fallback — this makes the intent clear and documented.
No breaking change for existing callers.

redis_config.py (modified)

Added reset_redis_state() — resets module-level Redis singleton between tests
so Redis availability can be simulated independently per test case.


Tests

tests/test_pr189_redis_failclosed.py — 25 new tests:

  • TestStrictDistributedStartupDown — raises on startup, error message guides caller, no fallback created
  • TestStrictDistributedRuntimeError — raises on get/set/invalidate/clear, never activates fallback
  • TestExplicitDegradedStartupDown — activates fallback, emits valid JSON security event, tags results
  • TestExplicitDegradedRuntimeLoss — runtime Redis loss activates fallback lazily, emits security event, documents cross-node isolation
  • TestLocalOnlyMode — never calls get_redis_client, consistent within process
  • TestGetCacheContract — default mode is STRICT_DISTRIBUTED, each mode returns correct type
  • TestCachedVerifyBackwardCompat — never touches Redis, caches correctly, no _degraded_mode tag

Full regression: 1198 passed, 82 skipped — zero regressions.


Backward Compatibility

Caller Before After
VerificationCache directly In-memory only Unchanged
cached_verify() Silent local fallback Explicit LOCAL_ONLY — same behavior
RedisCache() (no mode arg) Silent fallback on Redis down Raises CacheBackendUnavailableErrorbreaking for distributed callers who relied on silent fallback
get_cache(use_redis=True) Silent fallback Raises — same as above

Callers that previously relied on silent fallback must now either handle
CacheBackendUnavailableError explicitly or opt into EXPLICIT_DEGRADED.
This is intentional — the old behavior was the bug.

Summary by CodeRabbit

  • Bug Fixes

    • Remote cache outages now surface explicit errors in strict mode; degraded-mode returns are clearly marked when a local fallback is used.
  • New Features

    • Per-tenant in-memory verification cache with TTL+LRU, deep-copy isolation, hit/miss stats, length/containment checks.
    • Structured security events emitted on degraded activation and recovery.
    • Verify helper now defaults to local-only operation.
  • Refactor

    • Cache backend now operates in strict, explicit-degraded, or local-only modes.
  • Tests / Chores

    • Expanded test coverage for all modes and a test helper to reset cache/redis state.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors cache backend into an explicit mode-driven system, adds CacheBackendMode and CacheBackendUnavailableError, implements a TTL+LRU VerificationCache with introspection, rewrites RedisCache to enforce STRICT_DISTRIBUTED or EXPLICIT_DEGRADED semantics, adds reset_redis_state, and includes comprehensive tests for startup/runtime failures and recovery.

Changes

Cache backend mode enforcement

Layer / File(s) Summary
Backend failure semantics contract
src/qwed_new/core/cache.py
Introduces CacheBackendMode enum (STRICT_DISTRIBUTED, EXPLICIT_DEGRADED, LOCAL_ONLY) and CacheBackendUnavailableError exception, defining when Redis unavailability must raise versus when degraded fallback may activate.
VerificationCache implementation (TTL + LRU)
src/qwed_new/core/cache.py
Thread-safe in-memory TTL+LRU cache keyed by normalized DSL (+ optional vars JSON), deep-copy isolation for stored results, capacity eviction, and targeted invalidation.
VerificationCache stats and dunders
src/qwed_new/core/cache.py
Adds stats property, __len__, and __contains__ (membership uses get() so TTL expiry is respected).
RedisCache mode-driven enforcement
src/qwed_new/core/cache.py
Rewrites RedisCache to accept mode, reject LOCAL_ONLY at construction, centralize runtime Redis error handling, enforce STRICT_DISTRIBUTED fail-closed semantics, implement EXPLICIT_DEGRADED lazy local fallback with _degraded_mode and _cache_backend markers and structured security events, add retry cooldown, and make clear/stats mode-aware. Updates get_cache() and cached_verify() wiring.
Redis test-state reset utility
src/qwed_new/core/redis_config.py
Adds reset_redis_state() to clear module-level Redis client and availability flag for test isolation.
STRICT_DISTRIBUTED tests (startup & runtime)
tests/test_pr189_redis_failclosed.py
Adds tests asserting startup Redis unavailability and runtime Redis operation failures raise CacheBackendUnavailableError and never instantiate a fallback VerificationCache.
EXPLICIT_DEGRADED tests (startup, runtime, recovery)
tests/test_pr189_redis_failclosed.py
Adds tests validating fallback activation when configured, structured security event emission on activation/recovery, degraded markers on results, cross-node divergence semantics, and recovery behavior clearing fallback and emitting BACKEND_RECOVERED.
LOCAL_ONLY, factory, and cached_verify tests
tests/test_pr189_redis_failclosed.py
Validates LOCAL_ONLY never calls Redis, get_cache(use_redis=False) returns plain VerificationCache, cached_verify() stays local-only without degraded markers, and RedisCache(mode=LOCAL_ONLY) raises ValueError before any Redis connection attempts.
VerificationCache correctness, isolation, and stats tests
tests/test_pr189_redis_failclosed.py
Tests __contains__/__len__ with TTL expiry, tenant isolation in local-only mode, deep-copy boundaries to prevent mutation leakage, and stats() edge cases reporting degraded vs strict differences.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Redis
  participant VerificationCache
  participant SecurityLogger

  Client->>Redis: get/set/invalidate/scan
  Redis-->>Client: success (distributed path)
  Redis--x Client: runtime failure
  Client->>VerificationCache: fallback read/write (EXPLICIT_DEGRADED)
  VerificationCache-->>Client: return cached result with _degraded_mode/_cache_backend
  Client->>SecurityLogger: emit BACKEND_DEGRADED / BACKEND_RECOVERED events
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I tunneled through keys and TTLs tight,

Made Redis stern and logs polite,
No silent fallbacks in the night,
Modes declared and markers bright,
Now nodes agree — the cache is right!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description is comprehensive and well-structured, addressing problem statement, changes made, tests, and backward compatibility considerations. However, it does not explicitly follow the QWED Enforcement Checklist template provided in the repository. Consider restructuring the description to explicitly include the QWED Enforcement Checklist sections (no fallback execution, no raw eval/exec, verification enforcement, fail-closed paths, and QWED_RULES.md compliance notes) to align with repository standards.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: enforcing fail-closed Redis backend behavior for distributed mode, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from Issue #189: removes silent fallback, introduces explicit CacheBackendMode enum with three modes, implements fail-closed behavior for STRICT_DISTRIBUTED, adds CacheBackendUnavailableError, emits structured security events for degraded mode, tags degraded results, and includes extensive test coverage for startup/runtime failures and multi-node scenarios.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing Issue #189: CacheBackendMode, CacheBackendUnavailableError, RedisCache modifications, get_cache updates, cached_verify clarification, and redis_config helper function. No unrelated refactoring or feature additions detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-189-redis-failclosed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
@codspeed-hq

codspeed-hq Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 20 untouched benchmarks


Comparing fix/issue-189-redis-failclosed (2fd6709) with main (2ec2acb)

Open in CodSpeed

Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac6ba661a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread src/qwed_new/core/cache.py
@sentry

sentry Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.30705% with 33 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/qwed_new/core/cache.py 86.52% 31 Missing ⚠️
src/qwed_new/core/redis_config.py 81.81% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/test_pr189_redis_failclosed.py (2)

31-42: ⚡ Quick win

Remove the dead _make_redis_cache helper.

This helper is never called (each test class defines its own _make_* helper), and the first with block is a no-op: it patches RedisCache.__init__.__globals__ (a read-only function attribute) and then immediately passes. Leaving it in is confusing and would error if ever invoked. Recommend deleting it.

♻️ Proposed removal
-def _make_redis_cache(mode: CacheBackendMode, client=None):
-    """
-    Build a RedisCache with a fully mocked Redis client.
-    Patches get_redis_client() so no real Redis is needed.
-    """
-    with patch("qwed_new.core.redis_config.get_redis_client", return_value=client), \
-         patch("qwed_new.core.cache.RedisCache.__init__.__globals__", {}):
-        pass
-
-    # Patch at the point of import inside RedisCache.__init__
-    with patch("qwed_new.core.redis_config.get_redis_client", return_value=client):
-        return RedisCache(mode=mode)
-
-
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_pr189_redis_failclosed.py` around lines 31 - 42, Remove the unused
helper _make_redis_cache entirely: delete the function and its two with blocks
(including the noop patch of RedisCache.__init__.__globals__ and the second
patch that returns RedisCache(mode=mode)); any tests that need a mocked Redis
client should define their own _make_* helpers as already done, so ensure no
references to _make_redis_cache, RedisCache.__init__.__globals__,
get_redis_client patching, or the return RedisCache(mode=mode) remain in the
file.

14-14: 💤 Low value

Drop the unused PropertyMock import.

CodeQL flags PropertyMock as unused; only MagicMock and patch are referenced.

♻️ Proposed cleanup
-from unittest.mock import MagicMock, patch, PropertyMock
+from unittest.mock import MagicMock, patch
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_pr189_redis_failclosed.py` at line 14, The import line includes an
unused symbol PropertyMock; remove PropertyMock from the import so only
MagicMock and patch are imported (i.e., update the statement that currently
imports MagicMock, patch, PropertyMock), then run lint/tests to confirm no
references remain to PropertyMock; references to MagicMock and patch should be
untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/cache.py`:
- Around line 228-233: The __contains__ method currently checks presence in
self._cache without acquiring _lock or validating TTL, so it can return True for
expired entries; update __contains__(dsl_code) to acquire self._lock, compute
key via _generate_key(dsl_code), look up the entry and verify it is not expired
using the same TTL validation logic as get() (and remove/return False if
expired) to mirror get() behavior; also make __len__ thread-safe by acquiring
self._lock and counting only non-expired entries (or returning the length of
self._cache after pruning expired entries) so both methods are consistent and
safe with _lock, _cache, _generate_key, and get().
- Line 450: The clear method currently has a default None but uses an implicit
Optional in its annotation; update the signature of clear to explicitly annotate
the parameter as Optional[str] (e.g., def clear(self, pattern: Optional[str]) ->
int: or def clear(self, pattern: Optional[str] = None) -> int:), add the
necessary import from typing (Optional) if not already present, and ensure any
references inside the clear method that assume a str handle the None case
safely.
- Line 574: The parameter type for verify_fn in the function signature uses
lowercase callable and an implicit Optional; update the annotation to use
typing.Callable and typing.Optional explicitly (e.g., Optional[Callable[...,
Any]]), and add the corresponding imports (Callable, Optional, Any) to the
module imports so the signature for verify_fn is properly typed; locate the
verify_fn parameter in src/qwed_new/core/cache.py and adjust its annotation and
imports accordingly.

---

Nitpick comments:
In `@tests/test_pr189_redis_failclosed.py`:
- Around line 31-42: Remove the unused helper _make_redis_cache entirely: delete
the function and its two with blocks (including the noop patch of
RedisCache.__init__.__globals__ and the second patch that returns
RedisCache(mode=mode)); any tests that need a mocked Redis client should define
their own _make_* helpers as already done, so ensure no references to
_make_redis_cache, RedisCache.__init__.__globals__, get_redis_client patching,
or the return RedisCache(mode=mode) remain in the file.
- Line 14: The import line includes an unused symbol PropertyMock; remove
PropertyMock from the import so only MagicMock and patch are imported (i.e.,
update the statement that currently imports MagicMock, patch, PropertyMock),
then run lint/tests to confirm no references remain to PropertyMock; references
to MagicMock and patch should be untouched.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a675bc6c-baa2-494b-9a22-10604acfd4a8

📥 Commits

Reviewing files that changed from the base of the PR and between 2ec2acb and ac6ba66.

📒 Files selected for processing (3)
  • src/qwed_new/core/cache.py
  • src/qwed_new/core/redis_config.py
  • tests/test_pr189_redis_failclosed.py

Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown

Greptile Summary

This PR replaces the silent Redis-to-in-memory fallback in RedisCache with three explicit operating modes (STRICT_DISTRIBUTED, EXPLICIT_DEGRADED, LOCAL_ONLY), fixing a distributed-consistency vulnerability where infrastructure failure was silently converted into a per-node cache divergence. The default mode is STRICT_DISTRIBUTED, which raises CacheBackendUnavailableError rather than degrading without notice.

  • RedisCache and get_cache() now accept a mode parameter; EXPLICIT_DEGRADED activates a node-local fallback only after emitting a structured security event, and every degraded result is tagged _degraded_mode=True so callers can detect the downgrade explicitly.
  • _generate_key switches from plain-string concatenation to a JSON-envelope hash, fixing a real collision bug, but this change alters every existing Redis key hash causing a cold-cache window on deploy.
  • redis_config.py gains invalidate_redis_client() and reset_redis_state() to support retry and test isolation; the permanent _redis_available = False sentinel is removed so connections can be re-attempted after a failure.

Confidence Score: 5/5

Safe to merge after acknowledging the cold-cache window that follows deployment due to the key-format change.

The new findings are observability and deployment concerns rather than correctness bugs: stats() reports backend_error=True during the post-failure cooldown window even after Redis has recovered, and the corrected key-generation format invalidates all existing Redis entries on deploy. Neither breaks the fail-closed guarantee that is the core purpose of this PR.

src/qwed_new/core/cache.py — specifically the stats() error path and the deployment impact of the _generate_key format change.

Important Files Changed

Filename Overview
src/qwed_new/core/cache.py Core change: introduces CacheBackendMode enum, CacheBackendUnavailableError, and rewrites RedisCache and get_cache() to be fail-closed by default; fixes hash-collision in _generate_key but silently invalidates all existing Redis entries; stats() can report backend_error=True during cooldown after Redis has already recovered
src/qwed_new/core/redis_config.py Adds invalidate_redis_client() and reset_redis_state(); removes the permanent _redis_available=False sentinel so the client can reconnect after a failure; changes are minimal and well-scoped
tests/test_pr189_redis_failclosed.py 680-line test suite covering all three modes, startup/runtime failure paths, recovery, stats edge cases, tenant isolation, and deep-copy guarantees; setup_method omits _constructing_events.clear() which could leave stale state if a future test triggers a concurrent-construction path

Reviews (26): Last reviewed commit: "fix(cache): exclude demo block from cove..." | Re-trigger Greptile

Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from ac6ba66 to e43a351 Compare June 1, 2026 10:34
Comment thread tests/test_pr189_redis_failclosed.py Dismissed
Comment thread tests/test_pr189_redis_failclosed.py Dismissed
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/cache.py`:
- Around line 545-556: The exception handler in the stats path currently sets
"degraded" using the backend mode (self.mode !=
CacheBackendMode.STRICT_DISTRIBUTED) which can be misleading; change it to
reflect whether the fallback cache is actually being used (e.g., check
self._fallback_cache or an existing runtime flag like self._using_fallback) so
"degraded" is True only when a fallback cache exists/is active and False
otherwise, updating the dict construction in the except block of the stats
method (or the surrounding Cache class) to use that runtime check instead of the
mode comparison.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 438e1ad1-1d0b-4028-9b8c-7f5faefb703e

📥 Commits

Reviewing files that changed from the base of the PR and between ac6ba66 and e43a351.

📒 Files selected for processing (3)
  • src/qwed_new/core/cache.py
  • src/qwed_new/core/redis_config.py
  • tests/test_pr189_redis_failclosed.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/qwed_new/core/redis_config.py

Comment thread src/qwed_new/core/cache.py
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from e43a351 to ab67b74 Compare June 1, 2026 21:08
Comment thread src/qwed_new/core/cache.py Fixed
Comment thread src/qwed_new/core/cache.py Fixed
Comment thread src/qwed_new/core/cache.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/cache.py`:
- Around line 406-430: The get() (and likewise set()) path can stay degraded
forever because it checks self._client without attempting to reacquire it; call
self._try_get_client() at the start of Cache.get and Cache.set (before checking
self._client) so the client is reinitialized when Redis returns, then proceed
with the existing logic (keep existing calls to _recover_from_fallback() and
_handle_runtime_redis_error()). Ensure this change avoids altering fallback
semantics (still use self._fallback_cache when _client is None) and preserves
invalidate()/clear() behavior.
- Around line 154-169: The cached verification results are stored and returned
by reference in VerificationCache (methods set() and get()), allowing caller
mutations (e.g., cached_verify() tagging or nested model edits) to poison future
cache entries; fix by performing a deep copy of the result object at the cache
boundary: when storing in VerificationCache.set() deep-copy the value before
placing it into the cache entry, and when returning from VerificationCache.get()
return a deep copy of entry.result (use copy.deepcopy) so internal stored
objects are never mutated by callers; ensure both places (existing get() block
and the corresponding set() code path) are updated.
- Around line 591-599: get_cache currently reuses a single module-global
VerificationCache instance for LOCAL_ONLY or when use_redis=False which allows
cross-tenant leakage because VerificationCache._generate_key does not include
tenant context; fix by scoping local caches per tenant (e.g., maintain a mapping
of tenant_id -> VerificationCache and return/instantiate a tenant-specific
instance when mode==CacheBackendMode.LOCAL_ONLY or use_redis is False) or
alternatively ensure VerificationCache._generate_key incorporates tenant_id into
generated keys; update get_cache and any module globals (_verification_cache) to
use tenant-specific storage and mirror the same change for the other affected
branch (lines ~620-623) so local-only paths never share cache across tenant_ids.
- Around line 322-327: The _recover_from_fallback helper currently emits
BACKEND_RECOVERED always; change it so the event is only emitted when a real
state transition occurs by checking _fallback_cache under _fallback_lock and
emitting only if it was non-None. In practice, inside _recover_from_fallback
(and using self._fallback_lock) verify if self._fallback_cache is not None, set
it to None, and then call self._emit_security_event("BACKEND_RECOVERED",
MSG_BACKEND_RECOVERED); do not call _emit_security_event when there was no
fallback active. Ensure you reference the existing symbols
_recover_from_fallback, _fallback_cache, _fallback_lock, _emit_security_event,
and MSG_BACKEND_RECOVERED.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 803a0b15-15be-420d-8901-f8854615f6e1

📥 Commits

Reviewing files that changed from the base of the PR and between e43a351 and ab67b74.

📒 Files selected for processing (3)
  • src/qwed_new/core/cache.py
  • src/qwed_new/core/redis_config.py
  • tests/test_pr189_redis_failclosed.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/qwed_new/core/redis_config.py
  • tests/test_pr189_redis_failclosed.py

Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from ab67b74 to a0a64fd Compare June 2, 2026 08:54
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread tests/test_pr189_redis_failclosed.py Fixed
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/cache.py`:
- Around line 168-170: Replace the JSON round-trip deep-copy with copy.deepcopy
to avoid coercing non-JSON types: import copy in src/qwed_new/core/cache.py and
change the return statements that currently do
json.loads(json.dumps(entry.result)) (the ones around the uses of entry.result
at the shown spots) to return copy.deepcopy(entry.result); make the same
replacement for the similar code block around lines 192-194 so cached values
preserve types like Decimal, sympy objects, and tuples instead of being
converted by JSON.
- Around line 413-427: The Redis I/O try block currently wraps JSON
(de)serialization causing payload errors to be treated as backend outages;
narrow the exception scope so only the client.get()/client.set() call invokes
_handle_runtime_redis_error("get"/"set", exc) and let json.loads/json.dumps
raise normally (or handle them separately). Concretely, call client.get(key)
inside a small try that catches Redis/runtime errors and calls
_handle_runtime_redis_error, then after recovering from fallback run
json.loads(cached) outside that try (and similarly for the set path with
json.dumps), keeping calls to _recover_from_fallback(), _hits/_misses updates,
and return logic unchanged.
- Around line 629-642: get_cache() currently uses a single mutable global
_redis_cache that can be swapped by concurrent callers and return a RedisCache
for the wrong tenant/mode; replace this last-writer-wins singleton with a keyed
cache map (e.g. _redis_caches keyed by (tenant_id, mode)) and guard access with
a module-level lock to make lookup/creation atomic. Specifically, keep the
existing _verification_caches logic for LOCAL_ONLY, but for the Redis path
change uses of _redis_cache to look up in _redis_caches[(tenant_id, mode)] and,
under the lock, create and store a new RedisCache(tenant_id=tenant_id,
mode=mode) only if missing; ensure callers return the cached instance without
releasing the lock early so no interleaving can swap namespaces for the returned
RedisCache.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26aacf39-807e-4d2a-8980-e5e22374bb79

📥 Commits

Reviewing files that changed from the base of the PR and between ab67b74 and a0a64fd.

📒 Files selected for processing (3)
  • src/qwed_new/core/cache.py
  • src/qwed_new/core/redis_config.py
  • tests/test_pr189_redis_failclosed.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/qwed_new/core/redis_config.py

Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from a0a64fd to 2a21c62 Compare June 4, 2026 09:07
Comment thread src/qwed_new/core/cache.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/cache.py`:
- Around line 130-135: The cache key construction is ambiguous because the code
concatenates json.dumps(variables) directly onto the normalized DSL (the local
variable normalized), causing collisions; update the key builder used by
VerificationCache/RedisCache to produce an unambiguous encoding instead of
simple concatenation—e.g., build a single structured string (or JSON object)
that explicitly labels fields (like {"dsl": normalized, "variables":
<json-dumped vars>}) or insert a clear delimiter/prefix-length before the
variables so boundaries are preserved; ensure you change the code that currently
does json.dumps(variables, sort_keys=True) and string-append to normalized to
use this structured encoding everywhere the key is computed.
- Around line 183-191: When inserting into the cache, avoid evicting an
unrelated LRU entry if the key already exists: change the eviction logic inside
the with self._lock block to only pop oldest items when len(self._cache) >=
self.max_size AND the key is not already in self._cache (i.e., only evict for a
new key that would push size over max_size). Update the block around
self._cache, self.max_size, and the CacheEntry insertion so existing keys are
updated in-place without causing an unnecessary popitem(last=False).
- Around line 375-398: The runtime Redis error handler
(_handle_runtime_redis_error) currently leaves self._client intact so callers
(get/set/invalidate/clear) keep using the broken client; mark the client dead by
setting self._client = None inside _handle_runtime_redis_error (while holding
_fallback_lock) when handling EXPLICIT_DEGRADED failures so the existing
client-retry/cooldown path is triggered; ensure the get/set/invalidate/clear
code paths that check for client is None and the STRICT_DISTRIBUTED behavior
remain unchanged (i.e., still fail-closed for STRICT_DISTRIBUTED) so a fresh
client will be created or the fallback used instead of repeatedly hitting a
broken client.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 76b111fb-660c-44d6-b572-94c2a2267cfb

📥 Commits

Reviewing files that changed from the base of the PR and between a0a64fd and 2a21c62.

📒 Files selected for processing (3)
  • src/qwed_new/core/cache.py
  • src/qwed_new/core/redis_config.py
  • tests/test_pr189_redis_failclosed.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/qwed_new/core/redis_config.py

Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/cache.py
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from 2a21c62 to 8c5fa59 Compare June 4, 2026 10:25
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from df1adf0 to 287593f Compare June 4, 2026 22:10
Comment thread src/qwed_new/core/cache.py Outdated
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch 2 times, most recently from dc252b8 to 7963b3f Compare June 5, 2026 02:42
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/redis_config.py
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from 7963b3f to 2a3934e Compare June 5, 2026 09:43
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from 2a3934e to 25ddce4 Compare June 5, 2026 10:03
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
…loses #189)

Issue #189: RedisCache silently fell back to node-local VerificationCache
when Redis was unavailable, breaking distributed consistency and enabling
policy/rate-limit bypass by routing requests to cold nodes.

Root cause: RedisCache.__init__ unconditionally created a VerificationCache
fallback when _client is None, and get/set/invalidate/clear silently routed
to it. Infrastructure failure became an implicit security-mode downgrade.

Fix: Introduce explicit CacheBackendMode enum with three modes --

  STRICT_DISTRIBUTED (new default for RedisCache and get_cache):
    Redis unavailable at startup -> raises CacheBackendUnavailableError.
    Redis error at runtime (get/set/invalidate/clear) -> raises
    CacheBackendUnavailableError. No silent node-local fallback, ever.
    Use this for rate limits, replay protection, policy gates.

  EXPLICIT_DEGRADED (opt-in):
    Redis unavailable -> activates VerificationCache fallback, but only
    after emitting a structured JSON security event to qwed.cache.security
    logger. Every result from the fallback carries _degraded_mode=True and
    _cache_backend=local_degraded so callers and auditors can detect the
    downgrade. Cross-node divergence becomes explicit, not silent.

  LOCAL_ONLY:
    Never touches Redis. Pure in-memory VerificationCache. Used by
    cached_verify() (backward compatible -- its existing behavior was
    already effectively local-only via silent fallback).

Additional changes:
  - _emit_security_event(): structured JSON log (event, mode, tenant_id,
    timestamp) emitted on BACKEND_UNAVAILABLE_AT_STARTUP and
    BACKEND_UNAVAILABLE_AT_RUNTIME
  - stats() now exposes mode and degraded fields for observability
  - redis_config.py: add reset_redis_state() for test isolation
  - VerificationCache: zero changes (backward compatible)
  - cached_verify(): explicitly LOCAL_ONLY with doc comment explaining why

Tests: 25 new tests in tests/test_pr189_redis_failclosed.py covering
startup-down, runtime-loss, security event emission, JSON event validity,
_degraded_mode tagging, cross-node isolation documentation, get_cache()
default mode contract, and cached_verify() backward compatibility.

Full regression: 1198 passed, 82 skipped -- no regressions.
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from 25ddce4 to b1da90e Compare June 5, 2026 10:17
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py Outdated
Comment thread src/qwed_new/core/cache.py
Comment thread src/qwed_new/core/cache.py
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from c8a2a4b to b1da90e Compare June 5, 2026 20:32
Comment thread src/qwed_new/core/cache.py
@rahuldass19 Rahul Dass (rahuldass19) force-pushed the fix/issue-189-redis-failclosed branch from 50c163b to b1da90e Compare June 5, 2026 22:55
@sonarqubecloud

sonarqubecloud Bot commented Jun 5, 2026

Copy link
Copy Markdown

@rahuldass19 Rahul Dass (rahuldass19) merged commit 06801f6 into main Jun 6, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants