feat: add Agent Threat Rules (ATR) detection library rail by Oxygen56 · Pull Request #1996 · NVIDIA-NeMo/Guardrails

Oxygen56 · 2026-06-05T14:31:18Z

Summary

Implements the Agent Threat Rules (ATR) detection library rail as proposed in #1991.

Adds a new input rail that evaluates user messages against the bundled ATR rule set via the pyatr package, covering: prompt injection, jailbreak, tool poisoning, MCP attacks, skill compromise, and more.

Design

Mirrors the existing injection_detection rail pattern
Lazy-imports pyatr with a clear pip install pyatr hint
Configurable severity threshold via rails.config.atr_detection.severities (default: critical, high)
Colang 1.0 and 2.0 flow definitions
When threats detected: blocks input with polite refusal + triggered rule IDs
When enable_rails_exceptions is true: raises ATRDetectionRailException

Files

nemoguardrails/library/atr/__init__.py — License header
nemoguardrails/library/atr/actions.py — Core detection action with config validation, severity filtering, lazy pyatr import
nemoguardrails/library/atr/flows.co — Colang 1.0 input-rail flow
nemoguardrails/library/atr/flows.v1.co — Colang 2.0 input-rail flow
tests/test_atr_detection.py — 16 tests (13 unit + 3 E2E)

Fixes #1991

Summary by CodeRabbit

Release Notes

New Features
- Added Agent Threat Rules (ATR) detection capability for input scanning.
- Configurable severity filtering to customize threat detection levels.
- Compatible with Colang 1.0 and 2.0 flow systems.
Tests
- Comprehensive test suite added for ATR detection functionality.

Implements a new input rail for detecting agent-specific threats following the ATR framework. The rail evaluates user messages against the bundled ATR rule set via the pyatr package. - Lazy-imports pyatr with clear install instructions - Configurable severity threshold (default: critical, high) - Colang 1.0 and 2.0 flow definitions - 16 tests (13 unit + 3 E2E) with mocked pyatr engine Fixes NVIDIA-NeMo#1991 Signed-off-by: Oxygen56 <1391083091@qq.com> Signed-off-by: Oxygen <1391083091@qq.com>

coderabbitai · 2026-06-05T14:38:57Z

📝 Walkthrough

Walkthrough

This PR introduces a new Agent Threat Rules (ATR) detection library rail that scans user input for AI-agent attacks using the optional pyatr package. It includes a configuration-aware action with severity filtering, Colang 1.0 and 2.0 input-checking flows that conditionally block threats, and a full test suite with mocks to avoid hard dependencies.

Changes

ATR Detection Library Rail

Layer / File(s)	Summary
ATR Detection Action Implementation `nemoguardrails/library/atr/__init__.py`, `nemoguardrails/library/atr/actions.py`	`ATRDetectionResult` TypedDict defines threat detection output shape. Action lazily imports `pyatr` (ATREngine, AgentEvent) with optional-dependency pattern. Config validation ensures severities are list/tuple of allowed levels; defaults to `{"critical", "high"}`. Core `_evaluate_atr` constructs AgentEvent, runs engine.evaluate(), filters matches by configured severities, and returns result with `is_threat` and matched rule IDs. Public `atr_detection` action orchestrates dependency checks, config extraction, severity normalization, and evaluation.
Input Checking Flows (Colang) `nemoguardrails/library/atr/flows.co`, `nemoguardrails/library/atr/flows.v1.co`	Colang 1.0 and 2.0 `atr check input` flows invoke `atr_detection` on user messages. If `is_threat` is true, flows conditionally block: when `enable_rails_exceptions` is enabled, emit `ATRDetectionRailException`; otherwise, send user-facing message with matched rule IDs joined by comma and stop.
Test Suite `tests/test_atr_detection.py`	Mock helpers simulate ATR match/engine objects without requiring `pyatr` install. Tests cover config validation (missing/invalid/valid cases), severity extraction with lowercasing, import availability, core evaluation with empty/no-match/below-threshold/matching inputs, and end-to-end flows verifying clean inputs pass through while threat inputs trigger rule-ID reporting or exception mode.

Sequence Diagram

sequenceDiagram
  participant User
  participant Flow as atr check input
  participant Action as atr_detection
  participant Engine as ATREngine
  participant Response
  User->>Flow: user input
  Flow->>Action: evaluate text
  Action->>Engine: create AgentEvent & evaluate
  Engine-->>Action: severity-labeled matches
  Action->>Action: filter by config severities
  alt is_threat == true
    alt exceptions enabled
      Action-->>Response: ATRDetectionRailException
    else
      Action->>User: "Threat detected: rule1, rule2"
      Flow->>Flow: stop
    end
  else
    Flow->>Response: continue processing
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and concisely describes the main change: adding an ATR detection library rail as a new feature.
Linked Issues check	✅ Passed	All coding requirements from issue `#1991` are met: ATR rail implementation with pyatr lazy import, configurable severity filtering, Colang 1.0/2.0 flows, test coverage, and optional-dependency pattern.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing the ATR detection rail feature. No out-of-scope modifications were introduced.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR description documents testing information: 16 tests (13 unit, 3 E2E) added in test_atr_detection.py with coverage of configuration, severity filtering, and end-to-end behavior.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/library/atr/actions.py`:
- Around line 149-150: The loop that validates severities currently calls
sev.lower() unguarded and will raise AttributeError for non-string items; update
the validation (in the loop that iterates "for sev in severities" in
nemoguardrails/library/atr/actions.py) to first check that sev is a str (e.g.,
if not isinstance(sev, str): raise ValueError(...)) before calling sev.lower(),
then verify sev.lower() is in VALID_SEVERITIES and raise a ValueError with a
clear message if not.

In `@tests/test_atr_detection.py`:
- Around line 281-309: The tests simulate user input but never execute the
bot/app generation, so ATR blocking/pass-through isn't actually validated;
update both tests (using TestChat) to call the code path that triggers ATR
evaluation (e.g., invoke chat.bot(...) or chat.app.generate(...) after queuing
input) instead of only using chat >> "…", ensuring the mocked
_ATREngine.evaluate is exercised and the response is produced and asserted
(refer to TestChat, chat.bot, chat.app.generate and _ATREngine.evaluate to
locate the spots to change).
- Around line 284-286: The tests patch only _ATREngine but _evaluate_atr() also
constructs _AgentEvent, which is None when pyatr isn't installed; update the E2E
test blocks (the ones currently patching "_ATREngine" in
tests/test_atr_detection.py around the blocks at lines ~284, ~296, ~327) to
patch "_AgentEvent" as well (e.g., with
patch("nemoguardrails.library.atr.actions._AgentEvent") and set its return_value
appropriately) so that both _ATREngine and _AgentEvent are mocked in pyatr-less
environments and event creation won't raise.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2e5da617-cc6d-473a-98d2-3ac844dd08cd

📥 Commits

Reviewing files that changed from the base of the PR and between 06233b7 and a2626e6.

📒 Files selected for processing (5)

nemoguardrails/library/atr/__init__.py
nemoguardrails/library/atr/actions.py
nemoguardrails/library/atr/flows.co
nemoguardrails/library/atr/flows.v1.co
tests/test_atr_detection.py

greptile-apps · 2026-06-05T14:40:46Z

Greptile Summary

Adds an Agent Threat Rules (ATR) input rail that evaluates user messages against the pyatr rule set locally, covering prompt injection, jailbreak, tool poisoning, MCP attacks, and skill compromise. The implementation follows the existing injection_detection rail pattern with a lazy import, severity-filtered detection, and a module-level engine cache.

actions.py introduces the atr_detection action with config validation, _ATREngine caching, and a TypedDict result type; config.py receives extra=\"allow\" on RailsConfigData so the atr_detection YAML key is accepted without a dedicated Pydantic field.
Colang 1.0 (flows.co) and 2.0 (flows.v1.co) flows are provided; the 1.0 flow uses $-prefixed variable declarations while the Jinja template references them without $, diverging from the injection_detection reference and risking a silent empty rule list in the blocked message.
The action hard-errors with ValueError when the atr_detection config section is absent, which breaks the opt-in convention used by every peer rail in the library.

Confidence Score: 3/5

The action hard-errors when the config section is absent and the Colang 1.0 flow has a variable-naming inconsistency that may silently drop the rule-list from blocked messages; the engine-cache write is also unguarded against concurrent thread-pool execution.

Two issues directly affect observable behaviour on the changed path: the ValueError on missing config stops the rail from working with opt-in defaults (as every peer rail supports), and the $response / {{ response.detections }} inconsistency in flows.co can produce a blocked message with no rule IDs. The engine-cache initialisation is also unguarded against concurrent thread-pool execution.

nemoguardrails/library/atr/actions.py (config-validation error handling and engine-cache initialisation) and nemoguardrails/library/atr/flows.co (variable declaration style vs. Jinja template reference).

Important Files Changed

Filename	Overview
nemoguardrails/library/atr/actions.py	Core ATR detection action with lazy pyatr import, severity filtering, and module-level engine cache; has a missing-config hard-error that differs from peer rails, and an unguarded engine-init race under thread-pool executors.
nemoguardrails/library/atr/flows.co	Colang 1.0 input-rail flow; declares variables with `$` sigil but Jinja template references them without `$`, diverging from the `injection_detection` reference pattern and risking an empty rule-list in the blocked message.
nemoguardrails/library/atr/flows.v1.co	Colang 2.0 flow; correctly uses `$` variables and `bot say`; mirrors the injection_detection pattern cleanly.
nemoguardrails/rails/llm/config.py	Adds `extra="allow"` to `RailsConfigData` so arbitrary rail config keys (like `atr_detection`) are accepted without a Pydantic field; minimal and safe change.
tests/test_atr_detection.py	Comprehensive unit and E2E tests with full pyatr mocking; covers config validation, severity filtering, and blocked/allowed message paths.

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
nemoguardrails/library/atr/actions.py:127-145
**`ValueError` on missing config prevents opt-in use of the action**

`_validate_atr_config` raises `ValueError` when the `atr_detection` section is absent from the config, making the section mandatory. Every other rail in the library (e.g. `jailbreak_detection`, `injection_detection`) silently uses defaults when no config section is present. A user who wants to try ATR detection without any YAML edits will get a confusing `ValueError` instead of working detection with sensible defaults. Consider returning `DEFAULT_SEVERITIES` when the section is missing rather than treating its absence as a hard error.

### Issue 2 of 3
nemoguardrails/library/atr/actions.py:289-290
The `_cached_engine` global is read and written without any synchronisation. In environments that run the async action inside a thread-pool executor, two threads can both observe `_cached_engine is None` simultaneously and both construct a new `_ATREngine()`, discarding one after loading rules from disk. This is wasteful and potentially inconsistent if the rule-load itself is stateful. A `threading.Lock` around the creation block eliminates the race.

```suggestion
    if _cached_engine is None:
        import threading
        _engine_lock = getattr(atr_detection, "_engine_lock", None)
        if _engine_lock is None:
            atr_detection._engine_lock = threading.Lock()
        with atr_detection._engine_lock:
            if _cached_engine is None:
                _cached_engine = _ATREngine()
```

### Issue 3 of 3
nemoguardrails/library/atr/flows.co:7-8
The Jinja template refers to `response.detections` but the Colang variable is `$response`. The `injection_detection/flows.co` reference implementation declares variables without the `$` sigil (`response = await ...`) and uses the same template. If the Colang 1.0 runtime only populates the Jinja context from non-`$` locals, the template will silently produce an empty string for the rule list. Aligning the declaration style with the reference avoids this ambiguity.

```suggestion
  response = await ATRDetectionAction(text=$user_message)
  join_separator = ", "
```

_{Reviews (5): Last reviewed commit: "fix: handle AttributeError when pyatr AP..." | Re-trigger Greptile}

- Colang 2.0: use instead of bare for refusal message (silent failure in Colang 2.0 otherwise) - Make DEFAULT_SEVERITIES and VALID_SEVERITIES frozenset (immutable) - Add isinstance(str) guard before calling sev.lower() in validation - Remove unreachable None-branch in _extract_atr_config; guard defensively for callers that bypass validation - Cache ATREngine at module level to avoid per-request rule loading - E2E tests: also patch _AgentEvent so tests work in pyatr-less CI - Add _reset_cache fixture for test isolation with cached engine Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Oxygen56 · 2026-06-05T17:13:36Z

All four issues from the Greptile review have been addressed in commit d260323 ("fix: address review feedback for ATR detection rail"):

✅ bot → bot say in flows.v1.co
✅ set → frozenset for DEFAULT_SEVERITIES and VALID_SEVERITIES
✅ Module-level _cached_engine lazy init for ATREngine
✅ Comment added in _extract_atr_config explaining the guard is defensive, not dead code

The review was based on commit a2626e6 which predates these fixes.

…t assertions - Add missing Set import to typing imports (line 35 of actions.py) - Fix test_clean_input_passes_through to actually assert pass-through behavior using chat << instead of bare chat >> - Fix test_threat_input_is_blocked to use chat << for proper execution instead of checking chat.history[-1] which is never populated Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Add model_config = ConfigDict(extra="allow") to RailsConfigData so extra fields like atr_detection are preserved and accessible via getattr - Auto-fix ruff/ruff-format issues: unused imports, line length, import ordering, with-statement formatting

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread nemoguardrails/library/atr/actions.py

Comment thread tests/test_atr_detection.py

Comment thread tests/test_atr_detection.py Outdated

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread nemoguardrails/library/atr/flows.v1.co Outdated

Comment thread nemoguardrails/library/atr/actions.py Outdated

Comment thread nemoguardrails/library/atr/actions.py Outdated

Comment thread nemoguardrails/library/atr/actions.py Outdated

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread nemoguardrails/library/atr/actions.py Outdated

Oxygen56 and others added 2 commits June 6, 2026 10:07

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread nemoguardrails/library/atr/actions.py Outdated

fix: handle AttributeError when pyatr API changes

3a40514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Agent Threat Rules (ATR) detection library rail#1996

feat: add Agent Threat Rules (ATR) detection library rail#1996
Oxygen56 wants to merge 5 commits into
NVIDIA-NeMo:developfrom
Oxygen56:feat/atr-detection-rail-1991

Oxygen56 commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 5, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading

Confidence Score: 3/5

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Oxygen56 commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oxygen56 commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Files

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 5, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Oxygen56 commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oxygen56 commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading