Skip to content

fix(guardrails): redact union of overlapping spans, never leak sensitive bookends#780

Open
mayankbharati-ops wants to merge 3 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/guardrail-overlapping-redaction-union
Open

fix(guardrails): redact union of overlapping spans, never leak sensitive bookends#780
mayankbharati-ops wants to merge 3 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/guardrail-overlapping-redaction-union

Conversation

@mayankbharati-ops
Copy link
Copy Markdown
Contributor

@mayankbharati-ops mayankbharati-ops commented Apr 23, 2026

Summary

GuardrailEngine.apply walked sorted redact matches right-to-left with a single seen_end cursor and skipped any match whose end exceeded that cursor. When a wider match overlapped — or fully contained — an already-redacted narrower match, the wider match was silently dropped and its prefix / suffix survived in the output.

Concretely, with two rules (wide: super_secret_token_value, inner: secret_token):

Output
Before (main) \"data super_[REDACTED:short]_value end\"
After (this PR) \"data [REDACTED:long] end\"

The super_ and _value fragments of the wider keyword — arguably the sensitive bookends the wider rule exists to cover — leaked into the output. PR #520's earlier fix handled only the same-start subcase (via the secondary sort key); matches with different start offsets that fully contained a narrower match still slipped through.

This also affects the shipped starter rules. With _STARTER_CONFIG in app/guardrails/cli.py, text like config: api_key=AKIAIOSFODNN7EXAMPLE triggers both aws_access_key ([16:36]) and generic_api_token ([8:36]). Before this PR the prefix api_key= leaked; after, the whole span is redacted. Same for aws_secret_access_key=AKIA...xxxxx... where the label and the 40-char tail both leaked pre-fix.

Fix

Replace the single-cursor walk with a proper interval merge:

  1. Sort redact matches by (start ASC, -end) so same-start ties put the widest match first — keeps the "longest-keyword-wins" behavior PR fix: overlapping keyword redaction and hardcoded investigation loop limit #520 restored.
  2. Sweep left-to-right, merging any match whose start falls before the current interval's end. The representative rule name for the merged interval is the one with the largest original width.
  3. Apply replacements right-to-left over the merged intervals so string indices stay valid as each redaction resizes the output.

Adjacent matches (A.end == B.start) intentionally do not merge — they touch but do not overlap. Disjoint matches produce independent redactions with intervening text preserved verbatim. Match-level audit logging is unchanged — every ScanMatch still produces one audit entry, even when multiple matches collapse into a single output redaction, so reviewers tracing audit → output can account for every rule that fired.

E2E validation — full synthetic suites, real Anthropic LLM

HEALTHY_SHORT_CIRCUIT=true, mocked backends, real claude-sonnet-4-6 + claude-haiku-4-5 on the PR #780 branch.

Full EKS suite — 14 scenarios

Result Count Details
PASS 11 000-healthy, 001-010 (all non-alerting-state edge-case scenarios)
FAIL 3 011, 012, 013 — all state: alerting, LLM categorization drift on healthy-recovery cases (pre-existing; same failures appear on main / PR #777 verification)

Full RDS suite — 15 scenarios

Result Count Details
PASS 11 000-healthy, 001, 003-006, 008-011, 014
FAIL 4 002, 007, 012, 013 — all state: alerting, LLM keyword/categorization drift (pre-existing; identical failures to PR #777 verification run)

Confirmed every failure is in state: alerting (programmatic check on each failing alert.json). The guardrail engine is used for prompt redaction; the synthetic fixtures don't contain overlapping-rule secrets, so the failures above are in territory this PR does not touch.

Unit coverage — 46 tests in tests/test_guardrails/test_engine.py

Test area Count
Pre-existing suite 36
New regression tests 10
- test_contained_span_redacts_union_no_leak (the core regression)
- test_contained_span_uses_longest_rule_name
- test_partial_overlap_redacts_union
- test_disjoint_matches_stay_separate
- test_adjacent_matches_not_merged
- test_three_way_chain_of_overlaps_redacts_single_union
- test_contained_pattern_with_wider_keyword_preserves_wider_name
- test_real_world_api_key_and_aws_access_key_overlap (uses the exact _STARTER_CONFIG patterns)
- test_real_world_aws_secret_key_contains_aws_access_key (uses the exact _STARTER_CONFIG patterns)
- test_audit_logger_records_every_match_even_when_merged (audit-under-merge invariant)

Existing coverage preserved, including:

Verification

  • CI (Python 3.13): test (ubuntu-latest) PASS, typecheck PASS, quality PASS, CodeQL PASS, Analyze (python) PASS
  • Local pytest tests/ (Python 3.14, no coverage): 2777 pass, 1 skipped, 0 failures (was 2767 on main; +10 new)
  • ruff check / ruff format --check: clean on touched files
  • mypy app/guardrails/: clean on touched module
  • Full EKS + RDS synthetic suites run against real Anthropic LLM on this branch: no new failures introduced vs. main

Scope

One source file (app/guardrails/engine.py, ~40 lines), one test file. No public API changes. No behavior change for non-overlapping matches, audit-only rules, or blocked rules. Directly extends #520, whose same-start case is preserved verbatim.

Security note

The bug class here is sensitive-data disclosure: anywhere guardrail rules are applied to outputs shown to users, logged, or exported, two overlapping rules covering the same secret could each partially redact and leave fragments in the wild. The shipped _STARTER_CONFIG demonstrates this concretely — api_key=AKIA... and aws_secret_access_key=... values containing an access key both produced partial redactions on main. This PR closes the gap.

…ive bookends

``GuardrailEngine.apply`` walked the sorted redact matches right-to-left with a
single ``seen_end`` cursor and skipped any match whose end exceeded the cursor.
When a wider match overlapped or contained an already-redacted narrower match,
the wider match was silently dropped — leaving its prefix and suffix in the
output. Concretely, with rules matching ``super_secret_token_value`` (wide)
and ``secret_token`` (contained):

  input:  "data super_secret_token_value end"
  before: "data super_[REDACTED:short]_value end"   ← super_ and _value leak
  after:  "data [REDACTED:long] end"

The prior fix in Tracer-Cloud#520 only handled the same-start subcase by changing the
secondary sort key; matches that started at different offsets but fully
contained a narrower match still slipped through.

Replace the single-cursor walk with a proper merge:

1. Sort redact matches by ``(start ASC, -end)`` so ties at the same offset
   keep the longest-keyword-wins behavior Tracer-Cloud#520 restored.
2. Sweep left-to-right, merging any match whose ``start`` falls before the
   current interval's ``end``. The representative rule for the merged
   interval is whichever contributing match had the largest original width,
   so the output reflects the most-specific rule that matched anywhere in
   the merged span.
3. Apply replacements right-to-left over the merged intervals so string
   indices stay valid as each redaction resizes the output.

Adjacent matches (``A.end == B.start``) are intentionally not merged —
they don't overlap. Disjoint matches produce independent redactions
with intervening text preserved verbatim.

Tests added in ``tests/test_guardrails/test_engine.py``:

- contained-span union redaction (the regression from the bug report)
- longest-rule-name semantics on a contained span
- partial (non-containing) overlap produces one union redaction
- disjoint matches stay separate, middle text preserved
- adjacent matches stay separate (touching at one offset, not overlapping)
- three-way transitive overlap chain collapses to one span
- regex pattern + keyword on the same region: wider-wins applies across
  match kinds

Existing coverage preserved:
- ``test_overlapping_keyword_redaction_prefers_longest`` — same-start case
- ``test_overlapping_keyword_same_rule_redaction`` — same-rule overlap
- ``test_multiple_rules_on_same_span`` — audit + redact on same span

Verification:

- ``pytest``: 2774 pass, 1 skipped, 0 failures (was 2767 on main; +7 new)
- ``ruff check`` / ``ruff format --check``: clean
- ``mypy app/guardrails/``: clean on touched module
- Full ``tests/test_guardrails/test_engine.py`` suite: 43 pass, 0 fail
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 23, 2026

Greptile Summary

This PR fixes a span-leak bug in GuardrailEngine._redact where overlapping redaction rules (e.g. a wide keyword fully containing a narrower keyword) would silently drop the wider rule's match, leaving its prefix and suffix unredacted in the output. The fix replaces the old single-cursor right-to-left walk with a proper interval merge: sort → left-to-right sweep to merge overlapping spans (tracking the largest individual match width to select the representative rule name) → right-to-left application of replacements to preserve string indices.

Key changes:

  • apply delegates to a new _redact(text, matches) method, improving testability and separation of concerns.
  • The sort key changes from reverse=True on (start, end) to (start ASC, -end) — preserving the "widest-match-first at same start" behavior from PR fix: overlapping keyword redaction and hardcoded investigation loop limit #520 while enabling a correct forward sweep.
  • The seen_end cursor and skip logic are replaced with an O(n) interval merge that handles all three overlap classes: containment, partial overlap, and the transitive chain A∩B, B∩C, A⊥C.
  • 7 new regression tests cover all relevant sub-cases; the 36 existing guardrail tests are unaffected.

Confidence Score: 5/5

Safe to merge — the interval merge algorithm is correct, well-documented, and backed by 7 targeted regression tests covering all overlap sub-cases, with zero regressions in the existing suite.

The core fix (left-to-right interval merge replacing the buggy seen_end cursor) is algorithmically correct: overlapping spans are collapsed, the widest individual match wins the representative rule name, adjacent spans remain separate, and right-to-left application preserves string indices. All edge cases (containment, partial overlap, transitive chain, disjoint, adjacent, regex+keyword mix) have dedicated tests. Code follows project style conventions. The only finding is a non-blocking P2 style suggestion to use a NamedTuple instead of a raw 4-tuple for the merged interval list.

No files require special attention.

Important Files Changed

Filename Overview
app/guardrails/engine.py Extracts _redact into a dedicated method that performs correct interval merging; the old single-cursor right-to-left walk is fully replaced. Algorithm is sound and well-documented.
tests/test_guardrails/test_engine.py Adds 7 targeted regression tests covering contained spans, partial overlap, disjoint, adjacent, three-way chain, and pattern-vs-keyword cases. All new tests are well-structured and use the existing _rule helper correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["apply(text)"] --> B["scan(text) → ScanResult"]
    B --> C{any matches?}
    C -- No --> D[return text unchanged]
    C -- Yes --> E[audit each match]
    E --> F{blocked?}
    F -- Yes --> G[raise GuardrailBlockedError]
    F -- No --> H["_redact(text, matches)"]
    H --> I["Filter: action == REDACT\nSort by (start ASC, -end)"]
    I --> J["Left-to-right sweep\nmerge overlapping intervals\ntrack widest individual match\nfor rule-name selection"]
    J --> K["Right-to-left application\nof replacements over\nmerged intervals"]
    K --> L[return redacted text]
    style G fill:#f66,color:#fff
    style L fill:#6c6,color:#fff
Loading

Reviews (1): Last reviewed commit: "fix(guardrails): redact union of overlap..." | Re-trigger Greptile

Comment thread app/guardrails/engine.py
Comment on lines +155 to +166
merged: list[tuple[int, int, str, int]] = []
for match in redact_matches:
if match.end > seen_end:
continue
replacement = self._get_replacement(match.rule_name)
redacted = redacted[: match.start] + replacement + redacted[match.end :]
seen_end = match.start
width = match.end - match.start
if merged and match.start < merged[-1][1]:
prev_start, prev_end, prev_rule, prev_width = merged[-1]
new_end = max(prev_end, match.end)
if width > prev_width:
merged[-1] = (prev_start, new_end, match.rule_name, width)
else:
merged[-1] = (prev_start, new_end, prev_rule, prev_width)
else:
merged.append((match.start, match.end, match.rule_name, width))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Consider a NamedTuple for the merged interval for readability

The inner loop reads and writes the merged list by positional tuple index (merged[-1][1], merged[-1][3], etc.). The positions are correct, but the tuple layout (start, end, rule_name, width) is entirely implicit — a reader must mentally map each index to its meaning.

A lightweight NamedTuple would make accesses self-documenting with zero runtime overhead:

from typing import NamedTuple

class _MergedInterval(NamedTuple):
    start: int
    end: int
    rule_name: str
    width: int

Then the sweep body becomes:

merged: list[_MergedInterval] = []
for match in redact_matches:
    width = match.end - match.start
    if merged and match.start < merged[-1].end:
        prev = merged[-1]
        new_end = max(prev.end, match.end)
        if width > prev.width:
            merged[-1] = _MergedInterval(prev.start, new_end, match.rule_name, width)
        else:
            merged[-1] = _MergedInterval(prev.start, new_end, prev.rule_name, prev.width)
    else:
        merged.append(_MergedInterval(match.start, match.end, match.rule_name, width))

This is a non-blocking suggestion; the current code is functionally correct.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Extend the overlap regression tests with two cases grounded in the exact
patterns shipped in _STARTER_CONFIG (app/guardrails/cli.py), and with an
audit-behavior assertion that demonstrates the match-level audit record
remains complete even when the output-level redactions are merged:

- test_real_world_api_key_and_aws_access_key_overlap:
  config: api_key=AKIAIOSFODNN7EXAMPLE
  aws_access_key pattern matches the [16:36] span; generic_api_token
  pattern matches the [8:36] span (contains aws_access_key). Pre-fix
  output leaked "api_key=". Merged output: "config: [REDACTED:generic_api_token]".

- test_real_world_aws_secret_key_contains_aws_access_key:
  export aws_secret_access_key=AKIA<16>abcdefghijklmnopqrst
  aws_access_key matches [29:49]; aws_secret_key matches [7:69] and
  contains it. Pre-fix output leaked the label and the 40-char tail.
  Merged output: "export [REDACTED:aws_secret_key]".

- test_audit_logger_records_every_match_even_when_merged:
  Confirms the AuditLogger still emits one entry per ScanMatch even
  when multiple matches merge into a single output redaction — a
  reviewer tracing audit -> output can still account for every rule
  that fired. Uses a real AuditLogger with a tmp_path JSONL.

Verification:
- pytest tests/test_guardrails/test_engine.py: 46 pass (was 43; +3)
- pytest tests/: 2777 pass, 1 skipped, 0 failures
- ruff check + format: clean
Closes the gap between unit-level ``GuardrailEngine.apply`` tests and the
four call sites that invoke it in the pipeline:

- ``app/services/llm_client.py::LLMClient.invoke``  (Anthropic messages + system)
- ``app/services/llm_client.py::BedrockLLMClient.invoke``  (same shape as Anthropic)
- ``app/services/llm_client.py::OpenAILLMClient.invoke``  (OpenAI chat.completions)
- ``app/nodes/chat.py::_apply_guardrails_to_messages``  (LangGraph chat node)

Five new parametrized integration tests load a ruleset modeled on the
shipped ``_STARTER_CONFIG`` — ``aws_access_key`` is a strict substring of
``generic_api_token`` when the token value is itself an AWS key — and
assert that the downstream payload (captured by a stub that replaces the
real API client) contains no fragment of either the label prefix or the
key value:

- ``test_anthropic_client_sends_merged_redaction``:
  ``LLMClient.invoke("Debug dump: api_key=AKIA... from config.yml")`` is
  captured at ``Anthropic.messages.create`` and asserted to carry a
  single ``[REDACTED:generic_api_token]`` span with no ``api_key=`` or
  ``AKIA`` fragment.

- ``test_anthropic_system_prompt_also_redacted``:
  Same shape, but asserts the ``system`` kwarg (distinct code path from
  per-message ``content``) gets the merged treatment.

- ``test_openai_client_sends_merged_redaction``:
  Same shape for ``OpenAILLMClient.invoke`` -> ``chat.completions.create``.

- ``test_chat_node_emits_merged_redaction``:
  ``_apply_guardrails_to_messages`` on a LangGraph-style message list,
  asserts original is untouched (defensive copy) and the returned
  message has merged redaction.

- ``test_contained_real_secret_fully_redacted_in_pipeline``:
  End-to-end regression guard.  Asserts none of ``api_key=``, ``AKIA``,
  ``IOSFODNN``, ``7EXAMPLE`` appear anywhere in the downstream payload.
  Pre-fix main leaks ``api_key=`` and the surrounding characters.

Verification:
- ``pytest tests/``: 2782 pass (was 2777; +5 integration)
- ``pytest tests/test_guardrails/``: 59 pass (46 engine + 13 integration)
- ``ruff check`` / ``ruff format --check``: clean
- ``mypy app/guardrails/``: clean on touched module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant