Skip to content

feat: add Agent Threat Rules (ATR) detection library rail#1996

Open
Oxygen56 wants to merge 5 commits into
NVIDIA-NeMo:developfrom
Oxygen56:feat/atr-detection-rail-1991
Open

feat: add Agent Threat Rules (ATR) detection library rail#1996
Oxygen56 wants to merge 5 commits into
NVIDIA-NeMo:developfrom
Oxygen56:feat/atr-detection-rail-1991

Conversation

@Oxygen56

@Oxygen56 Oxygen56 commented Jun 5, 2026

Copy link
Copy Markdown

Summary

Implements the Agent Threat Rules (ATR) detection library rail as proposed in #1991.

Adds a new input rail that evaluates user messages against the bundled ATR rule set via the pyatr package, covering: prompt injection, jailbreak, tool poisoning, MCP attacks, skill compromise, and more.

Design

  • Mirrors the existing injection_detection rail pattern
  • Lazy-imports pyatr with a clear pip install pyatr hint
  • Configurable severity threshold via rails.config.atr_detection.severities (default: critical, high)
  • Colang 1.0 and 2.0 flow definitions
  • When threats detected: blocks input with polite refusal + triggered rule IDs
  • When enable_rails_exceptions is true: raises ATRDetectionRailException

Files

  • nemoguardrails/library/atr/__init__.py — License header
  • nemoguardrails/library/atr/actions.py — Core detection action with config validation, severity filtering, lazy pyatr import
  • nemoguardrails/library/atr/flows.co — Colang 1.0 input-rail flow
  • nemoguardrails/library/atr/flows.v1.co — Colang 2.0 input-rail flow
  • tests/test_atr_detection.py — 16 tests (13 unit + 3 E2E)

Fixes #1991

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Agent Threat Rules (ATR) detection capability for input scanning.
    • Configurable severity filtering to customize threat detection levels.
    • Compatible with Colang 1.0 and 2.0 flow systems.
  • Tests

    • Comprehensive test suite added for ATR detection functionality.

Implements a new input rail for detecting agent-specific threats
following the ATR framework. The rail evaluates user messages against
the bundled ATR rule set via the pyatr package.

- Lazy-imports pyatr with clear install instructions
- Configurable severity threshold (default: critical, high)
- Colang 1.0 and 2.0 flow definitions
- 16 tests (13 unit + 3 E2E) with mocked pyatr engine

Fixes NVIDIA-NeMo#1991

Signed-off-by: Oxygen56 <1391083091@qq.com>
Signed-off-by: Oxygen <1391083091@qq.com>
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a new Agent Threat Rules (ATR) detection library rail that scans user input for AI-agent attacks using the optional pyatr package. It includes a configuration-aware action with severity filtering, Colang 1.0 and 2.0 input-checking flows that conditionally block threats, and a full test suite with mocks to avoid hard dependencies.

Changes

ATR Detection Library Rail

Layer / File(s) Summary
ATR Detection Action Implementation
nemoguardrails/library/atr/__init__.py, nemoguardrails/library/atr/actions.py
ATRDetectionResult TypedDict defines threat detection output shape. Action lazily imports pyatr (ATREngine, AgentEvent) with optional-dependency pattern. Config validation ensures severities are list/tuple of allowed levels; defaults to {"critical", "high"}. Core _evaluate_atr constructs AgentEvent, runs engine.evaluate(), filters matches by configured severities, and returns result with is_threat and matched rule IDs. Public atr_detection action orchestrates dependency checks, config extraction, severity normalization, and evaluation.
Input Checking Flows (Colang)
nemoguardrails/library/atr/flows.co, nemoguardrails/library/atr/flows.v1.co
Colang 1.0 and 2.0 atr check input flows invoke atr_detection on user messages. If is_threat is true, flows conditionally block: when enable_rails_exceptions is enabled, emit ATRDetectionRailException; otherwise, send user-facing message with matched rule IDs joined by comma and stop.
Test Suite
tests/test_atr_detection.py
Mock helpers simulate ATR match/engine objects without requiring pyatr install. Tests cover config validation (missing/invalid/valid cases), severity extraction with lowercasing, import availability, core evaluation with empty/no-match/below-threshold/matching inputs, and end-to-end flows verifying clean inputs pass through while threat inputs trigger rule-ID reporting or exception mode.

Sequence Diagram

sequenceDiagram
  participant User
  participant Flow as atr check input
  participant Action as atr_detection
  participant Engine as ATREngine
  participant Response
  User->>Flow: user input
  Flow->>Action: evaluate text
  Action->>Engine: create AgentEvent & evaluate
  Engine-->>Action: severity-labeled matches
  Action->>Action: filter by config severities
  alt is_threat == true
    alt exceptions enabled
      Action-->>Response: ATRDetectionRailException
    else
      Action->>User: "Threat detected: rule1, rule2"
      Flow->>Flow: stop
    end
  else
    Flow->>Response: continue processing
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely describes the main change: adding an ATR detection library rail as a new feature.
Linked Issues check ✅ Passed All coding requirements from issue #1991 are met: ATR rail implementation with pyatr lazy import, configurable severity filtering, Colang 1.0/2.0 flows, test coverage, and optional-dependency pattern.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the ATR detection rail feature. No out-of-scope modifications were introduced.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR description documents testing information: 16 tests (13 unit, 3 E2E) added in test_atr_detection.py with coverage of configuration, severity filtering, and end-to-end behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/library/atr/actions.py`:
- Around line 149-150: The loop that validates severities currently calls
sev.lower() unguarded and will raise AttributeError for non-string items; update
the validation (in the loop that iterates "for sev in severities" in
nemoguardrails/library/atr/actions.py) to first check that sev is a str (e.g.,
if not isinstance(sev, str): raise ValueError(...)) before calling sev.lower(),
then verify sev.lower() is in VALID_SEVERITIES and raise a ValueError with a
clear message if not.

In `@tests/test_atr_detection.py`:
- Around line 281-309: The tests simulate user input but never execute the
bot/app generation, so ATR blocking/pass-through isn't actually validated;
update both tests (using TestChat) to call the code path that triggers ATR
evaluation (e.g., invoke chat.bot(...) or chat.app.generate(...) after queuing
input) instead of only using chat >> "…", ensuring the mocked
_ATREngine.evaluate is exercised and the response is produced and asserted
(refer to TestChat, chat.bot, chat.app.generate and _ATREngine.evaluate to
locate the spots to change).
- Around line 284-286: The tests patch only _ATREngine but _evaluate_atr() also
constructs _AgentEvent, which is None when pyatr isn't installed; update the E2E
test blocks (the ones currently patching "_ATREngine" in
tests/test_atr_detection.py around the blocks at lines ~284, ~296, ~327) to
patch "_AgentEvent" as well (e.g., with
patch("nemoguardrails.library.atr.actions._AgentEvent") and set its return_value
appropriately) so that both _ATREngine and _AgentEvent are mocked in pyatr-less
environments and event creation won't raise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2e5da617-cc6d-473a-98d2-3ac844dd08cd

📥 Commits

Reviewing files that changed from the base of the PR and between 06233b7 and a2626e6.

📒 Files selected for processing (5)
  • nemoguardrails/library/atr/__init__.py
  • nemoguardrails/library/atr/actions.py
  • nemoguardrails/library/atr/flows.co
  • nemoguardrails/library/atr/flows.v1.co
  • tests/test_atr_detection.py

Comment thread nemoguardrails/library/atr/actions.py
Comment thread tests/test_atr_detection.py
Comment thread tests/test_atr_detection.py Outdated
@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds an Agent Threat Rules (ATR) input rail that evaluates user messages against the pyatr rule set locally, covering prompt injection, jailbreak, tool poisoning, MCP attacks, and skill compromise. The implementation follows the existing injection_detection rail pattern with a lazy import, severity-filtered detection, and a module-level engine cache.

  • actions.py introduces the atr_detection action with config validation, _ATREngine caching, and a TypedDict result type; config.py receives extra=\"allow\" on RailsConfigData so the atr_detection YAML key is accepted without a dedicated Pydantic field.
  • Colang 1.0 (flows.co) and 2.0 (flows.v1.co) flows are provided; the 1.0 flow uses $-prefixed variable declarations while the Jinja template references them without $, diverging from the injection_detection reference and risking a silent empty rule list in the blocked message.
  • The action hard-errors with ValueError when the atr_detection config section is absent, which breaks the opt-in convention used by every peer rail in the library.

Confidence Score: 3/5

The action hard-errors when the config section is absent and the Colang 1.0 flow has a variable-naming inconsistency that may silently drop the rule-list from blocked messages; the engine-cache write is also unguarded against concurrent thread-pool execution.

Two issues directly affect observable behaviour on the changed path: the ValueError on missing config stops the rail from working with opt-in defaults (as every peer rail supports), and the $response / {{ response.detections }} inconsistency in flows.co can produce a blocked message with no rule IDs. The engine-cache initialisation is also unguarded against concurrent thread-pool execution.

nemoguardrails/library/atr/actions.py (config-validation error handling and engine-cache initialisation) and nemoguardrails/library/atr/flows.co (variable declaration style vs. Jinja template reference).

Important Files Changed

Filename Overview
nemoguardrails/library/atr/actions.py Core ATR detection action with lazy pyatr import, severity filtering, and module-level engine cache; has a missing-config hard-error that differs from peer rails, and an unguarded engine-init race under thread-pool executors.
nemoguardrails/library/atr/flows.co Colang 1.0 input-rail flow; declares variables with $ sigil but Jinja template references them without $, diverging from the injection_detection reference pattern and risking an empty rule-list in the blocked message.
nemoguardrails/library/atr/flows.v1.co Colang 2.0 flow; correctly uses $ variables and bot say; mirrors the injection_detection pattern cleanly.
nemoguardrails/rails/llm/config.py Adds extra="allow" to RailsConfigData so arbitrary rail config keys (like atr_detection) are accepted without a Pydantic field; minimal and safe change.
tests/test_atr_detection.py Comprehensive unit and E2E tests with full pyatr mocking; covers config validation, severity filtering, and blocked/allowed message paths.
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
nemoguardrails/library/atr/actions.py:127-145
**`ValueError` on missing config prevents opt-in use of the action**

`_validate_atr_config` raises `ValueError` when the `atr_detection` section is absent from the config, making the section mandatory. Every other rail in the library (e.g. `jailbreak_detection`, `injection_detection`) silently uses defaults when no config section is present. A user who wants to try ATR detection without any YAML edits will get a confusing `ValueError` instead of working detection with sensible defaults. Consider returning `DEFAULT_SEVERITIES` when the section is missing rather than treating its absence as a hard error.

### Issue 2 of 3
nemoguardrails/library/atr/actions.py:289-290
The `_cached_engine` global is read and written without any synchronisation. In environments that run the async action inside a thread-pool executor, two threads can both observe `_cached_engine is None` simultaneously and both construct a new `_ATREngine()`, discarding one after loading rules from disk. This is wasteful and potentially inconsistent if the rule-load itself is stateful. A `threading.Lock` around the creation block eliminates the race.

```suggestion
    if _cached_engine is None:
        import threading
        _engine_lock = getattr(atr_detection, "_engine_lock", None)
        if _engine_lock is None:
            atr_detection._engine_lock = threading.Lock()
        with atr_detection._engine_lock:
            if _cached_engine is None:
                _cached_engine = _ATREngine()
```

### Issue 3 of 3
nemoguardrails/library/atr/flows.co:7-8
The Jinja template refers to `response.detections` but the Colang variable is `$response`. The `injection_detection/flows.co` reference implementation declares variables without the `$` sigil (`response = await ...`) and uses the same template. If the Colang 1.0 runtime only populates the Jinja context from non-`$` locals, the template will silently produce an empty string for the rule list. Aligning the declaration style with the reference avoids this ambiguity.

```suggestion
  response = await ATRDetectionAction(text=$user_message)
  join_separator = ", "
```

Reviews (5): Last reviewed commit: "fix: handle AttributeError when pyatr AP..." | Re-trigger Greptile

Comment thread nemoguardrails/library/atr/flows.v1.co Outdated
Comment thread nemoguardrails/library/atr/actions.py Outdated
Comment thread nemoguardrails/library/atr/actions.py Outdated
Comment thread nemoguardrails/library/atr/actions.py Outdated
- Colang 2.0: use  instead of bare  for refusal message
  (silent failure in Colang 2.0 otherwise)
- Make DEFAULT_SEVERITIES and VALID_SEVERITIES frozenset (immutable)
- Add isinstance(str) guard before calling sev.lower() in validation
- Remove unreachable None-branch in _extract_atr_config; guard
  defensively for callers that bypass validation
- Cache ATREngine at module level to avoid per-request rule loading
- E2E tests: also patch _AgentEvent so tests work in pyatr-less CI
- Add _reset_cache fixture for test isolation with cached engine

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Oxygen56

Oxygen56 commented Jun 5, 2026

Copy link
Copy Markdown
Author

All four issues from the Greptile review have been addressed in commit d260323 ("fix: address review feedback for ATR detection rail"):

  1. botbot say in flows.v1.co
  2. setfrozenset for DEFAULT_SEVERITIES and VALID_SEVERITIES
  3. ✅ Module-level _cached_engine lazy init for ATREngine
  4. ✅ Comment added in _extract_atr_config explaining the guard is defensive, not dead code

The review was based on commit a2626e6 which predates these fixes.

Comment thread nemoguardrails/library/atr/actions.py Outdated
Oxygen56 and others added 2 commits June 6, 2026 10:07
…t assertions

- Add missing Set import to typing imports (line 35 of actions.py)
- Fix test_clean_input_passes_through to actually assert pass-through
  behavior using chat << instead of bare chat >>
- Fix test_threat_input_is_blocked to use chat << for proper execution
  instead of checking chat.history[-1] which is never populated

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add model_config = ConfigDict(extra="allow") to RailsConfigData so
  extra fields like atr_detection are preserved and accessible via getattr
- Auto-fix ruff/ruff-format issues: unused imports, line length,
  import ordering, with-statement formatting
Comment thread nemoguardrails/library/atr/actions.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an Agent Threat Rules (ATR) detection library rail

1 participant