fix(guardrails): redact union of overlapping spans, never leak sensitive bookends#780
Conversation
…ive bookends ``GuardrailEngine.apply`` walked the sorted redact matches right-to-left with a single ``seen_end`` cursor and skipped any match whose end exceeded the cursor. When a wider match overlapped or contained an already-redacted narrower match, the wider match was silently dropped — leaving its prefix and suffix in the output. Concretely, with rules matching ``super_secret_token_value`` (wide) and ``secret_token`` (contained): input: "data super_secret_token_value end" before: "data super_[REDACTED:short]_value end" ← super_ and _value leak after: "data [REDACTED:long] end" The prior fix in Tracer-Cloud#520 only handled the same-start subcase by changing the secondary sort key; matches that started at different offsets but fully contained a narrower match still slipped through. Replace the single-cursor walk with a proper merge: 1. Sort redact matches by ``(start ASC, -end)`` so ties at the same offset keep the longest-keyword-wins behavior Tracer-Cloud#520 restored. 2. Sweep left-to-right, merging any match whose ``start`` falls before the current interval's ``end``. The representative rule for the merged interval is whichever contributing match had the largest original width, so the output reflects the most-specific rule that matched anywhere in the merged span. 3. Apply replacements right-to-left over the merged intervals so string indices stay valid as each redaction resizes the output. Adjacent matches (``A.end == B.start``) are intentionally not merged — they don't overlap. Disjoint matches produce independent redactions with intervening text preserved verbatim. Tests added in ``tests/test_guardrails/test_engine.py``: - contained-span union redaction (the regression from the bug report) - longest-rule-name semantics on a contained span - partial (non-containing) overlap produces one union redaction - disjoint matches stay separate, middle text preserved - adjacent matches stay separate (touching at one offset, not overlapping) - three-way transitive overlap chain collapses to one span - regex pattern + keyword on the same region: wider-wins applies across match kinds Existing coverage preserved: - ``test_overlapping_keyword_redaction_prefers_longest`` — same-start case - ``test_overlapping_keyword_same_rule_redaction`` — same-rule overlap - ``test_multiple_rules_on_same_span`` — audit + redact on same span Verification: - ``pytest``: 2774 pass, 1 skipped, 0 failures (was 2767 on main; +7 new) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module - Full ``tests/test_guardrails/test_engine.py`` suite: 43 pass, 0 fail
Greptile SummaryThis PR fixes a span-leak bug in Key changes:
Confidence Score: 5/5Safe to merge — the interval merge algorithm is correct, well-documented, and backed by 7 targeted regression tests covering all overlap sub-cases, with zero regressions in the existing suite. The core fix (left-to-right interval merge replacing the buggy No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["apply(text)"] --> B["scan(text) → ScanResult"]
B --> C{any matches?}
C -- No --> D[return text unchanged]
C -- Yes --> E[audit each match]
E --> F{blocked?}
F -- Yes --> G[raise GuardrailBlockedError]
F -- No --> H["_redact(text, matches)"]
H --> I["Filter: action == REDACT\nSort by (start ASC, -end)"]
I --> J["Left-to-right sweep\nmerge overlapping intervals\ntrack widest individual match\nfor rule-name selection"]
J --> K["Right-to-left application\nof replacements over\nmerged intervals"]
K --> L[return redacted text]
style G fill:#f66,color:#fff
style L fill:#6c6,color:#fff
Reviews (1): Last reviewed commit: "fix(guardrails): redact union of overlap..." | Re-trigger Greptile |
| merged: list[tuple[int, int, str, int]] = [] | ||
| for match in redact_matches: | ||
| if match.end > seen_end: | ||
| continue | ||
| replacement = self._get_replacement(match.rule_name) | ||
| redacted = redacted[: match.start] + replacement + redacted[match.end :] | ||
| seen_end = match.start | ||
| width = match.end - match.start | ||
| if merged and match.start < merged[-1][1]: | ||
| prev_start, prev_end, prev_rule, prev_width = merged[-1] | ||
| new_end = max(prev_end, match.end) | ||
| if width > prev_width: | ||
| merged[-1] = (prev_start, new_end, match.rule_name, width) | ||
| else: | ||
| merged[-1] = (prev_start, new_end, prev_rule, prev_width) | ||
| else: | ||
| merged.append((match.start, match.end, match.rule_name, width)) |
There was a problem hiding this comment.
Consider a
NamedTuple for the merged interval for readability
The inner loop reads and writes the merged list by positional tuple index (merged[-1][1], merged[-1][3], etc.). The positions are correct, but the tuple layout (start, end, rule_name, width) is entirely implicit — a reader must mentally map each index to its meaning.
A lightweight NamedTuple would make accesses self-documenting with zero runtime overhead:
from typing import NamedTuple
class _MergedInterval(NamedTuple):
start: int
end: int
rule_name: str
width: intThen the sweep body becomes:
merged: list[_MergedInterval] = []
for match in redact_matches:
width = match.end - match.start
if merged and match.start < merged[-1].end:
prev = merged[-1]
new_end = max(prev.end, match.end)
if width > prev.width:
merged[-1] = _MergedInterval(prev.start, new_end, match.rule_name, width)
else:
merged[-1] = _MergedInterval(prev.start, new_end, prev.rule_name, prev.width)
else:
merged.append(_MergedInterval(match.start, match.end, match.rule_name, width))This is a non-blocking suggestion; the current code is functionally correct.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Extend the overlap regression tests with two cases grounded in the exact patterns shipped in _STARTER_CONFIG (app/guardrails/cli.py), and with an audit-behavior assertion that demonstrates the match-level audit record remains complete even when the output-level redactions are merged: - test_real_world_api_key_and_aws_access_key_overlap: config: api_key=AKIAIOSFODNN7EXAMPLE aws_access_key pattern matches the [16:36] span; generic_api_token pattern matches the [8:36] span (contains aws_access_key). Pre-fix output leaked "api_key=". Merged output: "config: [REDACTED:generic_api_token]". - test_real_world_aws_secret_key_contains_aws_access_key: export aws_secret_access_key=AKIA<16>abcdefghijklmnopqrst aws_access_key matches [29:49]; aws_secret_key matches [7:69] and contains it. Pre-fix output leaked the label and the 40-char tail. Merged output: "export [REDACTED:aws_secret_key]". - test_audit_logger_records_every_match_even_when_merged: Confirms the AuditLogger still emits one entry per ScanMatch even when multiple matches merge into a single output redaction — a reviewer tracing audit -> output can still account for every rule that fired. Uses a real AuditLogger with a tmp_path JSONL. Verification: - pytest tests/test_guardrails/test_engine.py: 46 pass (was 43; +3) - pytest tests/: 2777 pass, 1 skipped, 0 failures - ruff check + format: clean
Closes the gap between unit-level ``GuardrailEngine.apply`` tests and the
four call sites that invoke it in the pipeline:
- ``app/services/llm_client.py::LLMClient.invoke`` (Anthropic messages + system)
- ``app/services/llm_client.py::BedrockLLMClient.invoke`` (same shape as Anthropic)
- ``app/services/llm_client.py::OpenAILLMClient.invoke`` (OpenAI chat.completions)
- ``app/nodes/chat.py::_apply_guardrails_to_messages`` (LangGraph chat node)
Five new parametrized integration tests load a ruleset modeled on the
shipped ``_STARTER_CONFIG`` — ``aws_access_key`` is a strict substring of
``generic_api_token`` when the token value is itself an AWS key — and
assert that the downstream payload (captured by a stub that replaces the
real API client) contains no fragment of either the label prefix or the
key value:
- ``test_anthropic_client_sends_merged_redaction``:
``LLMClient.invoke("Debug dump: api_key=AKIA... from config.yml")`` is
captured at ``Anthropic.messages.create`` and asserted to carry a
single ``[REDACTED:generic_api_token]`` span with no ``api_key=`` or
``AKIA`` fragment.
- ``test_anthropic_system_prompt_also_redacted``:
Same shape, but asserts the ``system`` kwarg (distinct code path from
per-message ``content``) gets the merged treatment.
- ``test_openai_client_sends_merged_redaction``:
Same shape for ``OpenAILLMClient.invoke`` -> ``chat.completions.create``.
- ``test_chat_node_emits_merged_redaction``:
``_apply_guardrails_to_messages`` on a LangGraph-style message list,
asserts original is untouched (defensive copy) and the returned
message has merged redaction.
- ``test_contained_real_secret_fully_redacted_in_pipeline``:
End-to-end regression guard. Asserts none of ``api_key=``, ``AKIA``,
``IOSFODNN``, ``7EXAMPLE`` appear anywhere in the downstream payload.
Pre-fix main leaks ``api_key=`` and the surrounding characters.
Verification:
- ``pytest tests/``: 2782 pass (was 2777; +5 integration)
- ``pytest tests/test_guardrails/``: 59 pass (46 engine + 13 integration)
- ``ruff check`` / ``ruff format --check``: clean
- ``mypy app/guardrails/``: clean on touched module
Summary
GuardrailEngine.applywalked sorted redact matches right-to-left with a singleseen_endcursor and skipped any match whose end exceeded that cursor. When a wider match overlapped — or fully contained — an already-redacted narrower match, the wider match was silently dropped and its prefix / suffix survived in the output.Concretely, with two rules (wide:
super_secret_token_value, inner:secret_token):main)\"data super_[REDACTED:short]_value end\"\"data [REDACTED:long] end\"The
super_and_valuefragments of the wider keyword — arguably the sensitive bookends the wider rule exists to cover — leaked into the output. PR #520's earlier fix handled only the same-start subcase (via the secondary sort key); matches with different start offsets that fully contained a narrower match still slipped through.This also affects the shipped starter rules. With
_STARTER_CONFIGinapp/guardrails/cli.py, text likeconfig: api_key=AKIAIOSFODNN7EXAMPLEtriggers bothaws_access_key([16:36]) andgeneric_api_token([8:36]). Before this PR the prefixapi_key=leaked; after, the whole span is redacted. Same foraws_secret_access_key=AKIA...xxxxx...where the label and the 40-char tail both leaked pre-fix.Fix
Replace the single-cursor walk with a proper interval merge:
(start ASC, -end)so same-start ties put the widest match first — keeps the "longest-keyword-wins" behavior PR fix: overlapping keyword redaction and hardcoded investigation loop limit #520 restored.startfalls before the current interval'send. The representative rule name for the merged interval is the one with the largest original width.Adjacent matches (
A.end == B.start) intentionally do not merge — they touch but do not overlap. Disjoint matches produce independent redactions with intervening text preserved verbatim. Match-level audit logging is unchanged — everyScanMatchstill produces one audit entry, even when multiple matches collapse into a single output redaction, so reviewers tracing audit → output can account for every rule that fired.E2E validation — full synthetic suites, real Anthropic LLM
HEALTHY_SHORT_CIRCUIT=true, mocked backends, realclaude-sonnet-4-6+claude-haiku-4-5on the PR #780 branch.Full EKS suite — 14 scenarios
alerting-state edge-case scenarios)state: alerting, LLM categorization drift on healthy-recovery cases (pre-existing; same failures appear onmain/ PR #777 verification)Full RDS suite — 15 scenarios
state: alerting, LLM keyword/categorization drift (pre-existing; identical failures to PR #777 verification run)Confirmed every failure is in
state: alerting(programmatic check on each failingalert.json). The guardrail engine is used for prompt redaction; the synthetic fixtures don't contain overlapping-rule secrets, so the failures above are in territory this PR does not touch.Unit coverage — 46 tests in
tests/test_guardrails/test_engine.pytest_contained_span_redacts_union_no_leak(the core regression)test_contained_span_uses_longest_rule_nametest_partial_overlap_redacts_uniontest_disjoint_matches_stay_separatetest_adjacent_matches_not_mergedtest_three_way_chain_of_overlaps_redacts_single_uniontest_contained_pattern_with_wider_keyword_preserves_wider_nametest_real_world_api_key_and_aws_access_key_overlap(uses the exact_STARTER_CONFIGpatterns)test_real_world_aws_secret_key_contains_aws_access_key(uses the exact_STARTER_CONFIGpatterns)test_audit_logger_records_every_match_even_when_merged(audit-under-merge invariant)Existing coverage preserved, including:
test_overlapping_keyword_redaction_prefers_longest— same-start case (from fix: overlapping keyword redaction and hardcoded investigation loop limit #520)test_overlapping_keyword_same_rule_redaction— same-rule overlaptest_multiple_rules_on_same_span— audit + redact on same spanVerification
test (ubuntu-latest)PASS,typecheckPASS,qualityPASS,CodeQLPASS,Analyze (python)PASSpytest tests/(Python 3.14, no coverage): 2777 pass, 1 skipped, 0 failures (was 2767 onmain; +10 new)ruff check/ruff format --check: clean on touched filesmypy app/guardrails/: clean on touched modulemainScope
One source file (
app/guardrails/engine.py, ~40 lines), one test file. No public API changes. No behavior change for non-overlapping matches, audit-only rules, or blocked rules. Directly extends #520, whose same-start case is preserved verbatim.Security note
The bug class here is sensitive-data disclosure: anywhere guardrail rules are applied to outputs shown to users, logged, or exported, two overlapping rules covering the same secret could each partially redact and leave fragments in the wild. The shipped
_STARTER_CONFIGdemonstrates this concretely —api_key=AKIA...andaws_secret_access_key=...values containing an access key both produced partial redactions onmain. This PR closes the gap.