Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two Guardrails Hub validators— Changes
Sequence Diagram(s)mermaid Client->>API: POST /api/v1/guardrails/run (input + configs) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
backend/app/core/validators/README.md (1)
377-377: Minor: Consider hyphenating "Profanity-Free" for consistency.The section title uses "Profanity Free" but compound adjectives typically use hyphens.
📝 Suggested fix
-### 9) Profanity Free Validator (`profanity_free`) +### 9) Profanity-Free Validator (`profanity_free`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/core/validators/README.md` at line 377, The section title "Profanity Free Validator (`profanity_free`)" should use a hyphen for the compound adjective; update the heading to "Profanity-Free Validator (`profanity_free`)" so it reads consistently with other compound-adjective headings and matches the validator identifier `profanity_free`.backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py (1)
1-1: Consider using built-inlistinstead oftyping.List.Python 3.10+ (the project's minimum version) supports generic type hints directly on built-in types. Using
list[str]instead ofList[str]is the modern approach and aligns with Ruff's UP035 rule.♻️ Suggested refactor
-from typing import List, Literal, Optional +from typing import Literal, OptionalAnd on line 10:
- policies: Optional[List[str]] = None + policies: Optional[list[str]] = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py` at line 1, Replace usages of typing.List with the built-in generic type: remove List from the import list in the top import line and change all type annotations that use List (e.g., List[str], List[int], etc.) to use list[...] (e.g., list[str]) in llamaguard_7b_safety_validator_config.py; ensure you update the import statement to keep only Literal and Optional (or any other still-used typing names) and scan the file for any remaining occurrences of the symbol List to convert them to the built-in list form.backend/app/tests/test_toxicity_hub_validators.py (2)
37-148: Consider parameterizing repeated assertion patterns to reduce drift.There is substantial duplication across classes (default/custom build args, on_fail mapping, invalid-on_fail, wrong type, extra fields). Converting repeated cases to
pytest.mark.parametrize+ shared helper would reduce maintenance cost and make future validator additions safer.Refactor sketch
+@pytest.mark.parametrize( + "config_cls,type_value,patch_target,kwargs,expected_kwargs", + [ + (NSFWTextSafetyValidatorConfig, "nsfw_text", _NSFW_PATCH, {}, {"threshold": 0.8, "validation_method": "sentence", "device": "cpu", "model_name": "michellejieli/NSFW_text_classifier"}), + (ToxicLanguageSafetyValidatorConfig, "toxic_language", _TOXIC_PATCH, {}, {"threshold": 0.5, "validation_method": "sentence", "device": "cpu", "model_name": "unbiased-small"}), + ], +) +def test_build_forwards_expected_kwargs(config_cls, type_value, patch_target, kwargs, expected_kwargs): + config = config_cls(type=type_value, **kwargs) + with patch(patch_target) as mock_validator: + config.build() + _, actual = mock_validator.call_args + for k, v in expected_kwargs.items(): + assert actual[k] == vAlso applies to: 155-277, 284-362, 369-504
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/test_toxicity_hub_validators.py` around lines 37 - 148, Refactor the repeated tests for LlamaGuard7BSafetyValidatorConfig by converting duplicate assertion patterns into parametrized tests: create pytest.mark.parametrize cases for policies (None, [], ["O1"], all_policies), for on_fail mapping ("fix"/"exception"/"rephrase"/invalid) and for schema validation (wrong type literal, extra fields), and replace the repeated with patch(_LLAMAGUARD_PATCH) calls with a small helper that builds the config and returns mock_validator.call_args; update assertions to use that helper and reference LlamaGuard7BSafetyValidatorConfig, _LLAMAGUARD_PATCH, OnFailAction, and pytest.mark.parametrize so each behavior (default/custom policies, on_fail resolution, invalid on_fail, wrong type, extra fields) is covered by parametrized cases instead of duplicated test methods.
187-203: Add explicit out-of-range threshold tests (-0.01 / 1.01).You currently validate numeric type and boundary inclusion (0.0, 1.0), but there is no assertion that out-of-range values are rejected. If thresholds are intended to be constrained to
[0, 1], add negative and above-one cases to lock that contract.Also applies to: 401-421, 274-277, 502-504
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/test_toxicity_hub_validators.py` around lines 187 - 203, Add tests that assert out-of-range thresholds are rejected: create two new tests (e.g., test_build_with_threshold_below_zero and test_build_with_threshold_above_one) that instantiate NSFWTextSafetyValidatorConfig with threshold=-0.01 and threshold=1.01 respectively, patch _NSFW_PATCH as in the existing tests, and assert that calling config.build() raises a validation exception (use ValueError or the specific validation exception your code uses). Apply the same pattern to the other validator test blocks mentioned (the ranges around lines 274-277, 401-421, and 502-504) so each validator verifies thresholds outside [0,1] are rejected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`:
- Line 1: Replace usages of typing.List with the built-in generic type: remove
List from the import list in the top import line and change all type annotations
that use List (e.g., List[str], List[int], etc.) to use list[...] (e.g.,
list[str]) in llamaguard_7b_safety_validator_config.py; ensure you update the
import statement to keep only Literal and Optional (or any other still-used
typing names) and scan the file for any remaining occurrences of the symbol List
to convert them to the built-in list form.
In `@backend/app/core/validators/README.md`:
- Line 377: The section title "Profanity Free Validator (`profanity_free`)"
should use a hyphen for the compound adjective; update the heading to
"Profanity-Free Validator (`profanity_free`)" so it reads consistently with
other compound-adjective headings and matches the validator identifier
`profanity_free`.
In `@backend/app/tests/test_toxicity_hub_validators.py`:
- Around line 37-148: Refactor the repeated tests for
LlamaGuard7BSafetyValidatorConfig by converting duplicate assertion patterns
into parametrized tests: create pytest.mark.parametrize cases for policies
(None, [], ["O1"], all_policies), for on_fail mapping
("fix"/"exception"/"rephrase"/invalid) and for schema validation (wrong type
literal, extra fields), and replace the repeated with patch(_LLAMAGUARD_PATCH)
calls with a small helper that builds the config and returns
mock_validator.call_args; update assertions to use that helper and reference
LlamaGuard7BSafetyValidatorConfig, _LLAMAGUARD_PATCH, OnFailAction, and
pytest.mark.parametrize so each behavior (default/custom policies, on_fail
resolution, invalid on_fail, wrong type, extra fields) is covered by
parametrized cases instead of duplicated test methods.
- Around line 187-203: Add tests that assert out-of-range thresholds are
rejected: create two new tests (e.g., test_build_with_threshold_below_zero and
test_build_with_threshold_above_one) that instantiate
NSFWTextSafetyValidatorConfig with threshold=-0.01 and threshold=1.01
respectively, patch _NSFW_PATCH as in the existing tests, and assert that
calling config.build() raises a validation exception (use ValueError or the
specific validation exception your code uses). Apply the same pattern to the
other validator test blocks mentioned (the ranges around lines 274-277, 401-421,
and 502-504) so each validator verifies thresholds outside [0,1] are rejected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9e3faf93-8398-459b-a53a-fa511b15fc40
📒 Files selected for processing (9)
backend/app/api/API_USAGE.mdbackend/app/core/validators/README.mdbackend/app/core/validators/config/llamaguard_7b_safety_validator_config.pybackend/app/core/validators/config/nsfw_text_safety_validator_config.pybackend/app/core/validators/config/profanity_free_safety_validator_config.pybackend/app/core/validators/config/toxic_language_safety_validator_config.pybackend/app/core/validators/validators.jsonbackend/app/schemas/guardrail_config.pybackend/app/tests/test_toxicity_hub_validators.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/core/validators/README.md`:
- Around line 399-430: Update the README wording for the Profanity Free
Validator (the section titled "Profanity Free Validator" referring to
profanity_free_safety_validator_config.py and hub://guardrails/profanity_free)
to use hyphenated compound adjectives where appropriate: change phrases like
"Profanity Free" to "Profanity-Free", "model based" to "model-based", and
"matching based" to "matching-based" (and apply the same hyphenation to similar
compound forms such as "first-pass" if inconsistent) so the grammar is tightened
throughout that validator's documentation.
In `@backend/app/tests/test_guardrails_api_integration.py`:
- Around line 345-364: The docstrings for the tests are mislabelled against the
validator policy mapping: update the docstring in
test_input_guardrails_with_llamaguard_7b_geography_policy (the first test) and
the docstring in test_input_guardrails_with_llamaguard_7b_violence_policy (the
next test) so the policy identifiers (O2 vs O3) correctly describe the active
policy in each test and match the validator guide; locate the two functions by
their names and swap or rewrite the inline descriptions so the geography test
references the geography policy and the violence/sex-crimes test references the
correct O2/O3 label.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 96811ae3-73c9-4853-b1f7-191d9ac0af0c
📒 Files selected for processing (8)
backend/app/api/API_USAGE.mdbackend/app/api/routes/guardrails.pybackend/app/core/enum.pybackend/app/core/validators/README.mdbackend/app/core/validators/validators.jsonbackend/app/schemas/guardrail_config.pybackend/app/tests/test_guardrails_api_integration.pybackend/app/tests/test_toxicity_hub_validators.py
✅ Files skipped from review due to trivial changes (1)
- backend/app/core/enum.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/schemas/guardrail_config.py
- backend/app/core/validators/validators.json
- backend/app/api/API_USAGE.md
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/api/docs/guardrails/run_guardrails.md (1)
11-20: Add blank line after the table to comply with Markdown formatting standards.The table at lines 13-20 should be followed by a blank line before the next content (line 21).
📝 Proposed fix
| `no_illegal_drugs` | No illegal drugs | | `no_encourage_self_harm` | No encouragement of self-harm | + - `rephrase_needed=true` means the system could not safely auto-fix the input/output and wants the user to retry with a rephrased query.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/docs/guardrails/run_guardrails.md` around lines 11 - 20, Add a single blank line after the policies markdown table in the llamaguard_7b section so the table (the rows starting with `no_violence_hate` through `no_encourage_self_harm`) is followed by an empty line before the next paragraph; update the run_guardrails.md content to insert one newline after the table to comply with Markdown formatting standards.backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py (1)
1-1: Use modernlisttype hint instead oftyping.List.
typing.Listis deprecated since Python 3.9. Use the built-inlistfor type hints.♻️ Proposed fix
-from typing import List, Literal, Optional +from typing import Literal, OptionalAnd update the type hints:
- policies: Optional[List[str]] = None + policies: Optional[list[str]] = None - def _resolve_policies(self) -> Optional[List[str]]: + def _resolve_policies(self) -> Optional[list[str]]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py` at line 1, Replace deprecated typing.List with the built-in list: remove List from the import line in llamaguard_7b_safety_validator_config.py and update any type annotations that reference List[...] to use list[...]; keep Literal and Optional imports as-is (or import them from typing if still used) and ensure all occurrences (e.g., in class attributes, function signatures or return types) are converted to the modern list[...] form.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`:
- Around line 21-32: The _resolve_policies() currently only maps human-readable
names via POLICY_NAME_MAP and rejects raw codes like "O1", so update
_resolve_policies (in llamaguard_7b_safety_validator_config.py) to accept both
forms: for each policy, first check POLICY_NAME_MAP.get(policy.lower()) and if
that returns None, then check if the policy (uppercased) matches a raw policy
code (e.g., "O1".."O6") and if so append the uppercased code unchanged;
otherwise raise the same ValueError. This keeps existing mapping behavior while
allowing tests that pass raw codes to succeed.
---
Nitpick comments:
In `@backend/app/api/docs/guardrails/run_guardrails.md`:
- Around line 11-20: Add a single blank line after the policies markdown table
in the llamaguard_7b section so the table (the rows starting with
`no_violence_hate` through `no_encourage_self_harm`) is followed by an empty
line before the next paragraph; update the run_guardrails.md content to insert
one newline after the table to comply with Markdown formatting standards.
In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`:
- Line 1: Replace deprecated typing.List with the built-in list: remove List
from the import line in llamaguard_7b_safety_validator_config.py and update any
type annotations that reference List[...] to use list[...]; keep Literal and
Optional imports as-is (or import them from typing if still used) and ensure all
occurrences (e.g., in class attributes, function signatures or return types) are
converted to the modern list[...] form.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4662bec1-1cd2-4eb3-84cc-265fb76badcb
📒 Files selected for processing (4)
backend/app/api/docs/guardrails/run_guardrails.mdbackend/app/core/validators/README.mdbackend/app/core/validators/config/llamaguard_7b_safety_validator_config.pybackend/app/tests/test_guardrails_api_integration.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/tests/test_guardrails_api_integration.py
backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py
Show resolved
Hide resolved
Co-authored-by: dennyabrain <denny.george90@gmail.com>
Co-authored-by: dennyabrain <denny.george90@gmail.com>
Summary
Target issue is #81.
Explain the motivation for making this change. What existing problem does the pull request solve?
As we expand safety coverage in our validation pipeline, we need stronger and more layered defenses against harmful content. The current system relies primarily on rule-based and lexical validators, which are effective but have limitations:
The validators introduced in this PR aim to mitigate the following categories of harm:
This PR introduces two complementary validators to address these gaps:
LLAMA Guard 7B
Uses the Meta AI LlamaGuard-7B model via Guardrails Hub to classify text as safe or unsafe.
Evaluates content against configurable safety policies:
Profanity Free
Additional Changes
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests