feat(library): add context bloat detection rail#1941
Conversation
Greptile SummaryThis PR adds a new
|
| Filename | Overview |
|---|---|
| nemoguardrails/library/context_bloat_detection/actions.py | Core detection logic; char_count is not refreshed after truncation, causing the min_chars guard and metrics["chars"] to use stale pre-truncation values. _validate_config contains two dead-code guards. |
| nemoguardrails/rails/llm/config.py | Adds ContextBloatDetectionConfig Pydantic model with Literal action validation and numeric bounds; wired via default_factory consistent with other RailsConfigData fields. |
| nemoguardrails/library/context_bloat_detection/flows.co | Colang v2 flows correctly gate on $bloat_result.action before aborting, distinguishing reject/truncate/warn modes properly. |
| nemoguardrails/library/context_bloat_detection/flows.v1.co | Colang v1 flows mirror the v2 logic, correctly checking $bloat_result.action before stopping. |
| tests/test_context_bloat_detection.py | 36 unit tests covering config defaults, helper functions, all detection paths, and action modes; no end-to-end flow execution tests, and no test for the max_chars < min_chars misconfiguration edge case. |
| nemoguardrails/library/context_bloat_detection/config.yml | Example config wiring retrieval and input rails with sensible defaults; documentation comments are accurate. |
| nemoguardrails/library/context_bloat_detection/init.py | Empty module init with correct Apache-2.0 license header. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Input text] --> B{size > max_chars?}
B -- Yes, action=reject --> R1[Return is_bloat=True, action=reject]
B -- Yes, action=truncate --> T[Truncate text to max_chars]
B -- Yes, action=warn --> W1[Add size_cap_exceeded detection]
B -- No --> C
T --> C
W1 --> C
C{char_count >= min_chars?}
C -- No --> AGG
C -- Yes --> D{entropy < min_entropy?}
D -- Yes, reject/truncate --> R2[Return is_bloat=True, action=reject]
D -- Yes, warn --> W2[Add low_entropy detection]
D -- No --> E
W2 --> E
E{run_ratio > max_run_ratio?}
E -- Yes, reject/truncate --> R3[Return is_bloat=True, action=reject]
E -- Yes, warn --> W3[Add long_run detection]
E -- No --> F
W3 --> F
F{rep_ratio > max_repetition_ratio?}
F -- Yes, reject/truncate --> R4[Return is_bloat=True, action=reject]
F -- Yes, warn --> W4[Add high_repetition detection]
F -- No --> AGG
W4 --> AGG
AGG[Aggregate result]
AGG --> G[Return ContextBloatResult]
Reviews (6): Last reviewed commit: "fix: address maintainer review feedback" | Re-trigger Greptile
📝 WalkthroughWalkthroughThis PR introduces context bloat detection for NeMo Guardrails, a feature that identifies and mitigates oversized, padded, or highly repetitive input text. The implementation includes a configurable detection action with entropy and character-run metrics, flow wiring to monitor tool output and user input, and comprehensive unit and end-to-end tests. ChangesContext Bloat Detection Feature
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
nemoguardrails/library/context_bloat_detection/flows.co (1)
38-38:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd a newline at the end of the file.
The pre-commit
end-of-file-fixerhook failed. Add one blank line after line 38.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoguardrails/library/context_bloat_detection/flows.co` at line 38, Add a missing newline at end of file: open nemoguardrails/library/context_bloat_detection/flows.co and add a single blank line after the current last line (line 38) so the file ends with a newline to satisfy the pre-commit end-of-file-fixer hook.nemoguardrails/library/context_bloat_detection/flows.v1.co (1)
38-38:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd a newline at the end of the file.
The pre-commit
end-of-file-fixerhook failed. Add one blank line after line 38 to comply with POSIX standards.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoguardrails/library/context_bloat_detection/flows.v1.co` at line 38, Add a trailing newline to the end of the file nemoguardrails/library/context_bloat_detection/flows.v1.co so the file ends with a single blank line (POSIX newline); this fixes the pre-commit end-of-file-fixer hook failure by ensuring the file ends with a newline character.nemoguardrails/library/context_bloat_detection/config.yml (1)
43-43:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd a newline at the end of the file.
The pre-commit
end-of-file-fixerhook failed. Add one blank line after line 43.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoguardrails/library/context_bloat_detection/config.yml` at line 43, Ensure the YAML config file ends with a single trailing newline character so the pre-commit end-of-file-fixer passes: open the config.yml and add one blank line (a terminating newline) after the current last line, save and commit the change.
🧹 Nitpick comments (2)
tests/test_context_bloat_detection.py (1)
80-99: ⚡ Quick winAdd config-boundary validation tests for numeric fields.
You already validate
action; please add parallel tests for invalidmax_chars,ngram_size, and ratio/entropy ranges so schema constraints stay protected.🧪 Suggested test additions
class TestValidation: @@ def test_default_config_is_valid(self): config = MagicMock() config.rails.config = RailsConfigData() _validate_config(config) + + `@pytest.mark.parametrize`( + "overrides", + [ + {"max_chars": 0}, + {"ngram_size": 0}, + {"max_repetition_ratio": -0.1}, + {"max_repetition_ratio": 1.1}, + {"max_run_ratio": -0.1}, + {"max_run_ratio": 1.1}, + {"min_entropy": -0.1}, + {"min_entropy": 9.0}, + ], + ) + def test_invalid_numeric_thresholds_raise_at_config_time(self, overrides): + from pydantic import ValidationError + with pytest.raises(ValidationError): + ContextBloatDetectionConfig(**overrides)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_context_bloat_detection.py` around lines 80 - 99, Add tests that mirror the existing action tests but assert schema validation rejects out-of-range numeric fields: create tests like test_invalid_max_chars_raises_at_config_time, test_invalid_ngram_size_raises_at_config_time, test_invalid_ratio_and_entropy_ranges_raise_at_config_time that either call ContextBloatDetectionConfig(...) directly or use the helper _make_config(...) then _validate_config(...), and wrap each in pytest.raises(pydantic.ValidationError) to ensure invalid values for max_chars (e.g. negative or zero), ngram_size (e.g. zero or negative), and invalid range values for ratio/entropy (e.g. min > max or values outside allowed bounds) trigger validation errors.nemoguardrails/library/context_bloat_detection/actions.py (1)
132-132: 💤 Low valueClarify the inline comment.
The comment "truncate only applies here" is slightly misleading. While text truncation only happens at the size cap, the
action="truncate"mode also causes early returns in subsequent checks (entropy, run ratio) at lines 152 and 167, similar to reject mode.Consider rewording to: "Text truncation only applies here; early-exit behavior applies to all checks."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoguardrails/library/context_bloat_detection/actions.py` at line 132, Update the inline comment at the "Size cap" section that currently reads "truncate only applies here" to a clearer phrase such as: "Text truncation only applies here; early-exit behavior applies to all checks." This makes it clear that action="truncate" performs text truncation at the size cap but also triggers early returns in the subsequent entropy and run-ratio checks (see logic around the entropy and run-ratio branches that reference action="truncate"). Ensure the revised comment sits immediately above the size-cap check where truncation is performed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nemoguardrails/library/context_bloat_detection/actions.py`:
- Line 196: The file nemoguardrails/library/context_bloat_detection/actions.py
is missing a trailing newline at EOF; open the file and add a single blank line
after the final closing token (the solitary ")" at the end of the file) so the
file ends with a newline character to satisfy the pre-commit end-of-file-fixer
hook and POSIX conventions.
- Line 150: The entropy check currently uses a falsy guard (`if entropy and
entropy < cfg.min_entropy:`) which skips zero-entropy inputs; update it to
explicitly guard against None and allow 0.0 to be evaluated, e.g. use `if
entropy is not None and entropy < cfg.min_entropy:` (referencing the entropy
variable and cfg.min_entropy in the same block), so the entropy check works
independently of the longest_run_ratio/max_run_ratio logic and still avoids
None-valued entropy.
In `@nemoguardrails/rails/llm/config.py`:
- Around line 1221-1224: The config field context_bloat_detection is currently
typed Optional[ContextBloatDetectionConfig] but runtime logic expects it to be
present; change its annotation to ContextBloatDetectionConfig (remove Optional)
so Pydantic enforces non-null at parse-time, keeping the Field(...,
default_factory=ContextBloatDetectionConfig, description=...) to still provide a
default instance if omitted; update any imports/types if needed to reflect the
non-optional type.
- Around line 1093-1112: ContextBloatDetectionConfig accepts invalid numeric
values; add pydantic validation constraints to each Field and/or validators in
the ContextBloatDetectionConfig class to enforce valid ranges: set max_chars to
gt=0, ngram_size to ge=1 (or gt=0), min_entropy to gt=0 (or ge=0), and constrain
max_repetition_ratio and max_run_ratio with ge=0 and le=1 (or gt/lt if
exclusive); implement these via Field(..., gt=..., ge=..., le=...) or add
`@validator` methods on ContextBloatDetectionConfig to raise clear errors when
values fall outside these bounds so invalid configs fail fast.
---
Outside diff comments:
In `@nemoguardrails/library/context_bloat_detection/config.yml`:
- Line 43: Ensure the YAML config file ends with a single trailing newline
character so the pre-commit end-of-file-fixer passes: open the config.yml and
add one blank line (a terminating newline) after the current last line, save and
commit the change.
In `@nemoguardrails/library/context_bloat_detection/flows.co`:
- Line 38: Add a missing newline at end of file: open
nemoguardrails/library/context_bloat_detection/flows.co and add a single blank
line after the current last line (line 38) so the file ends with a newline to
satisfy the pre-commit end-of-file-fixer hook.
In `@nemoguardrails/library/context_bloat_detection/flows.v1.co`:
- Line 38: Add a trailing newline to the end of the file
nemoguardrails/library/context_bloat_detection/flows.v1.co so the file ends with
a single blank line (POSIX newline); this fixes the pre-commit end-of-file-fixer
hook failure by ensuring the file ends with a newline character.
---
Nitpick comments:
In `@nemoguardrails/library/context_bloat_detection/actions.py`:
- Line 132: Update the inline comment at the "Size cap" section that currently
reads "truncate only applies here" to a clearer phrase such as: "Text truncation
only applies here; early-exit behavior applies to all checks." This makes it
clear that action="truncate" performs text truncation at the size cap but also
triggers early returns in the subsequent entropy and run-ratio checks (see logic
around the entropy and run-ratio branches that reference action="truncate").
Ensure the revised comment sits immediately above the size-cap check where
truncation is performed.
In `@tests/test_context_bloat_detection.py`:
- Around line 80-99: Add tests that mirror the existing action tests but assert
schema validation rejects out-of-range numeric fields: create tests like
test_invalid_max_chars_raises_at_config_time,
test_invalid_ngram_size_raises_at_config_time,
test_invalid_ratio_and_entropy_ranges_raise_at_config_time that either call
ContextBloatDetectionConfig(...) directly or use the helper _make_config(...)
then _validate_config(...), and wrap each in
pytest.raises(pydantic.ValidationError) to ensure invalid values for max_chars
(e.g. negative or zero), ngram_size (e.g. zero or negative), and invalid range
values for ratio/entropy (e.g. min > max or values outside allowed bounds)
trigger validation errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ddcb3fb1-91d4-4571-8b44-ac78a07eea69
📒 Files selected for processing (7)
nemoguardrails/library/context_bloat_detection/__init__.pynemoguardrails/library/context_bloat_detection/actions.pynemoguardrails/library/context_bloat_detection/config.ymlnemoguardrails/library/context_bloat_detection/flows.conemoguardrails/library/context_bloat_detection/flows.v1.conemoguardrails/rails/llm/config.pytests/test_context_bloat_detection.py
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
d0487e4 to
f69bc23
Compare
Pouyanpi
left a comment
There was a problem hiding this comment.
Thank you @MuneezaAzmat for your contribution! Please see the review comments below:
(summary)
- drop tool-related rails because not public.
- truncate mode must reject non-size detections.
- add global before rewriting $relevant_chunks / $user_message in Colang 2.
|
Thanks for the feedback @Pouyanpi, I've addressed all three comments (summary)
|
|
@MuneezaAzmat thanks for making the changes. I've opened a review PR that does some minor cleanups. Would you please merge that? after that do a rebase on latest develop and accept both changes. I'll merge afterwards. |
7760243 to
39dc3c7
Compare
Pouyanpi
left a comment
There was a problem hiding this comment.
Thanks @MuneezaAzmat 🚀
Add a new guardrail that detects context-manipulation attacks where attacker-controlled content is padded, oversized, or repetitively structured to cause system prompt forgetting or exhaust token budget. Checks (cheapest first): size cap, Shannon entropy, longest char run, n-gram repetition. Supports reject, truncate, and warn actions.
- Fix entropy zero-check to catch zero-entropy inputs (e.g. "aaaa...") - Add early-return for high_repetition on reject/truncate, consistent with other checks - Add should_block field to ContextBloatResult so flows can distinguish warn from reject - Update flows to check should_block before aborting - Add numeric bounds to config fields (gt/ge/le constraints) - Update tests to verify should_block behavior
…, action field - Fix entropy zero-check to catch zero-entropy inputs - Add early-return for high_repetition on reject/truncate - Replace should_block with action field in ContextBloatResult - Update flows to handle all 3 modes: reject aborts, truncate writes back, warn passes through - Add min_chars config to skip entropy/run/repetition checks on short texts - Add numeric bounds (gt/ge/le) to config fields - Add short message test to verify no false positives on "Hi", "Hello", etc.
- Remove tool output execution rails (not publicly released yet) - Non-size detections return action="reject" in truncate mode so the flow correctly aborts instead of passing bad content through - Add global keyword for $user_message and $relevant_chunks in Colang v2 flows so truncated text propagates downstream
39dc3c7 to
e0e5573
Compare
Description
Context-window manipulation is a class of attacks where adversarial content, injected via tool outputs, retrieved documents, or user input, uses padding, repetition, or excessive length to push critical instructions out of the model's attention window or exhaust its token budget. Existing NeMo Guardrails cover prompt injection and content safety, but there is no built-in defense against context bloat. This PR adds a lightweight, pure-Python guardrail that catches these attacks before they reach the LLM, with no external dependencies.
Summary
context_bloat_detectionguardrail that detects context-manipulation attacks (padded, oversized, or repetitive content in tool outputs, RAG chunks, or user input)reject,truncate, andwarnactions via configreject— stops the flow with a user-facing messagetruncate— truncates to max_chars at size cap, rejects on remaining checks, writes truncated text back to source variablewarn— logs detection, does not block or modifymin_chars, default 50) skip entropy/run/repetition checks to avoid false positives on messages like "Hi" or "Hello"ContextBloatDetectionConfigPydantic model inRailsConfigDatawith field validation (Literal action type, numeric bounds)Checklist