Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 8 minutes and 52 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR adds support for three new toxicity/safety validators ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Note: This diff contains multiple unresolved Git merge conflicts across 7+ files (enum.py, validators.json, README.md, guardrail_config.py, pyproject.toml, and test files), which significantly impacts review complexity. Conflicts must be resolved before merge. Additionally, the heterogeneity of changes (Docker, enums, configs, schemas, tests, docs) and logic density (policy mapping, validator instantiation) warrant elevated scrutiny. Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
🧹 Nitpick comments (3)
backend/scripts/install_guardrails_from_hub.sh (1)
9-9: Consider the implications of enabling remote inferencing by default.Changing
ENABLE_REMOTE_INFERENCINGdefault from"false"to"true"means data will be sent to Guardrails AI's remote inference endpoints by default. While the AI summary indicates this aligns with updated documentation, ensure this is an intentional decision given:
- Privacy: User input data may be transmitted to external services
- Latency: Remote calls add network overhead
- Cost: Remote inferencing may incur API usage charges
If this is intentional for the new NSFW/toxicity validators, consider documenting why remote inferencing is now preferred in the script comments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/scripts/install_guardrails_from_hub.sh` at line 9, The default for ENABLE_REMOTE_INFERENCING was changed to "true", which causes user data to be sent to Guardrails AI remote endpoints by default; either revert the default back to "false" or, if leaving it enabled intentionally for NSFW/toxicity validators, add a clear comment above the ENABLE_REMOTE_INFERENCING declaration and update relevant docs explaining privacy/latency/cost implications and that enabling remote inferencing sends user input to external services—refer to the ENABLE_REMOTE_INFERENCING variable in this script and the install_guardrails_from_hub.sh header for where to add the explanatory comment and link to documentation or opt-in instructions.backend/app/api/routes/guardrails.py (1)
186-190: Only the first validator's metadata is returned.When multiple validators have
validator_metadataset (e.g., multiple validators failed with emptyfix_value), only the first one's metadata is surfaced to the API response. This may obscure which validators contributed to the outcome.Consider aggregating metadata from all validators that set it, or at minimum document this "first-wins" behavior:
♻️ Optional: Aggregate all validator metadata
- meta = next( - (v.validator_metadata for v in validators if v.validator_metadata), - None, - ) + all_meta = [v.validator_metadata for v in validators if v.validator_metadata] + meta = {"validators": all_meta} if all_meta else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/guardrails.py` around lines 186 - 190, The current code uses next(...) to pick only the first non-empty validator_metadata from validators and passes it as meta to APIResponse.success_response, which hides other validators' metadata; replace that logic in the function that builds the response (referencing validators, validator_metadata, meta, response_model and the call to APIResponse.success_response) with aggregation: collect all non-empty v.validator_metadata entries (e.g., into a list or merged dict depending on schema), deduplicate/merge as appropriate, and pass the aggregated metadata to APIResponse.success_response instead of the single-first value (or explicitly document the first-wins behavior if aggregation is undesired).backend/README.md (1)
215-215: Use descriptive link text instead of "here".The link text "here" is non-descriptive. Screen readers and users benefit from meaningful link text that describes the destination.
📝 Suggested improvement
-1. Ensure that the `.env` file contains the correct value for `GUARDRAILS_HUB_API_KEY`. The key can be fetched from [here](https://hub.guardrailsai.com/keys). +1. Ensure that the `.env` file contains the correct value for `GUARDRAILS_HUB_API_KEY`. The key can be fetched from the [Guardrails Hub API Keys page](https://hub.guardrailsai.com/keys).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/README.md` at line 215, Replace the non-descriptive link text "here" with meaningful descriptive text that explains the destination (for example "Guardrails Hub API keys page" or "Guardrails Hub API key management") in the README sentence about GUARDRAILS_HUB_API_KEY so the link reads like "The key can be fetched from Guardrails Hub API keys page" and improves accessibility and clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/API_USAGE.md`:
- Around line 103-107: Remove the unresolved merge conflict markers and update
the `type=` filter list to include `nsfw_text` (i.e., ensure the line reads with
`type=uli_slur_match|pii_remover|gender_assumption_bias|ban_list|llm_critic|topic_relevance|llamaguard_7b|profanity_free|nsfw_text`),
keeping the rest of the documentation text unchanged so the PR objective (adding
nsfw_text) is reflected.
- Around line 468-471: Resolve the unresolved merge conflict in the validator
types list by removing the Git conflict markers (<<<<<<<, =======, >>>>>>>) and
ensuring the `nsfw_text` entry is present in the list of validators; update the
block that currently contains the conflict markers so it reads a clean list that
includes `nsfw_text` alongside the other validator type names.
In `@backend/app/core/enum.py`:
- Around line 38-41: Remove the git conflict markers and restore the missing
enum member by keeping the line defining NSFWText = "nsfw_text" inside the
appropriate Enum (the ValidatorType/enum class in backend/app/core/enum.py);
delete the "<<<<<<< HEAD", "=======", and ">>>>>>> ..." lines so the file is
valid Python and the NSFWText member is present for
ValidatorBase/ValidatorUpdate and API validation to work.
In `@backend/app/core/validators/config/base_validator_config.py`:
- Around line 22-36: Remove the unresolved merge conflict markers and ensure the
branch that sets validator_metadata also returns the empty string; specifically,
inside the _on_fix method where validator_metadata is assigned (using self.type
in the reason), keep the version that includes the return "" immediately after
setting validator_metadata so _on_fix returns "" instead of falling through to
return the falsy fix_value; remove the conflict markers (<<<<<<<, =======,
>>>>>>>) and any duplicate blocks so the method is a single consistent
implementation.
In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`:
- Around line 5-9: There is an unresolved merge conflict and an unused import:
remove the conflict markers (<<<<<<< HEAD, =======, >>>>>>> ...) and delete the
unused import of GuardrailOnFail so only the necessary import remains (keep
BaseValidatorConfig). Ensure the import block is clean and linter-safe and run
tests/linters to confirm no unused-import errors remain.
In `@backend/app/core/validators/README.md`:
- Around line 17-20: The Supported Validators section in
backend/app/core/validators/README.md contains unresolved merge conflict
markers; remove the conflict markers (<<<<<<<, =======, >>>>>>>) and restore the
`nsfw_text` entry (source: `hub://guardrails/nsfw_text`) so the README lists
`nsfw_text` as intended in that section.
- Around line 416-458: Resolve the unresolved merge conflict in the README's
NSFW Text Validator section by removing the conflict markers (<<<<<<< HEAD,
=======, >>>>>>> 60f5067) and keeping the HEAD content under the "NSFW Text
Validator (`nsfw_text`)" heading (the full explanatory block including Config,
Source, Parameters, Notes/limitations and Recommendation). Ensure there are no
leftover conflict markers and the section reads as a single cohesive
paragraph/block as in the HEAD version.
- Around line 541-545: Remove the unresolved merge conflict markers (<<<<<<<
HEAD, =======, >>>>>>> ...) from the Related Files section in the README and
ensure the file list contains the intended entries; replace the conflict block
with a clean list that includes both `nsfw_text_safety_validator_config.py` and
`profanity_free_safety_validator_config.py` (or only the correct one if project
intends a single entry), making the Related Files section syntactically valid
Markdown.
In `@backend/app/core/validators/validators.json`:
- Around line 42-50: Resolve the Git merge conflict in validators.json by
removing the conflict markers (<<<<<<< HEAD, =======, >>>>>>>) and keeping the
HEAD entry for the new validator object (the "nsfw_text" object with "type":
"nsfw_text", "version": "0.1.0", "source": "hub://guardrails/nsfw_text"); ensure
the surrounding JSON array/object punctuation is corrected (commas/braces) so
the file is valid JSON and the validators manifest loads successfully.
In `@backend/app/schemas/guardrail_config.py`:
- Around line 30-38: There is an unresolved merge conflict in
guardrail_config.py around the imports: remove the conflict markers (<<<<<<<
HEAD and >>>>>>>) and restore the intended imports so both
NSFWTextSafetyValidatorConfig and ProfanityFreeSafetyValidatorConfig are
imported; ensure the import block contains NSFWTextSafetyValidatorConfig (as
requested) alongside ProfanityFreeSafetyValidatorConfig and that there are no
leftover conflict tokens.
- Around line 47-55: The discriminated union list for the validator configs has
an unresolved merge conflict that omitted NSFWTextSafetyValidatorConfig,
breaking recognition of payloads with type "nsfw_text"; fix by removing the
conflict markers and ensuring NSFWTextSafetyValidatorConfig is included
alongside LlamaGuard7BSafetyValidatorConfig, ProfanityFreeSafetyValidatorConfig,
and TopicRelevanceSafetyValidatorConfig in the union used with
Field(discriminator="type") so Pydantic can properly dispatch on type.
In `@backend/app/tests/test_guardrails_api_integration.py`:
- Around line 327-388: The file contains unresolved Git merge conflict markers
(<<<<<<<, =======, >>>>>>>) around the NSFWText test block which breaks pytest
import; remove the conflict markers and reconcile the duplicated sections so
only one coherent set of tests remains (keep the intended tests like
test_input_guardrails_with_nsfw_text_on_explicit_content,
test_input_guardrails_with_nsfw_text_with_low_threshold,
test_input_guardrails_with_nsfw_text_exception_action and any other tests
referenced later), ensuring there are no leftover markers elsewhere (notably the
similar region flagged at lines ~438-537) and that the module imports and test
function names are intact.
- Around line 349-366: The test
test_input_guardrails_with_nsfw_text_with_low_threshold currently uses the same
explicit prompt as the default-threshold test so it doesn't prove threshold
handling; replace the input with a borderline/ambiguous NSFW sample (e.g.,
suggestive but not explicit) and assert that the default-threshold validator
(the other test) returns success while this test, which posts the same
borderline input but with
{"type":"nsfw_text","threshold":0.1,"on_fail":"exception"} to VALIDATE_API_PATH
(using the same request_id, organization_id, project_id) returns failure —
ensuring different outcomes prove the threshold parameter is respected.
- Around line 229-245: The test
test_input_guardrails_with_profanity_free_on_profane_text currently only asserts
that the sanitized value differs from the original, which can hide a broken fix
path where fix_value falls back to "" in base_validator_config.py; update the
assertions to (1) assert that body["data"][SAFE_TEXT_FIELD] is a non-empty
string and (2) assert that known profane tokens from the original input (e.g.,
"damn", "fucking") are not present in body["data"][SAFE_TEXT_FIELD] so the fix
is both usable and actually removed profanities.
In `@backend/app/tests/test_toxicity_hub_validators.py`:
- Around line 13-27: Remove the unresolved merge markers and restore the HEAD
version: keep the import of NSFWTextSafetyValidatorConfig and the _NSFW_PATCH
assignment, retain the _LLAMAGUARD_PATCH constant, and delete the conflict lines
(<<<<<<<, =======, >>>>>>>). Ensure the module defines
NSFWTextSafetyValidatorConfig, _LLAMAGUARD_PATCH (value
"app.core.validators.config.llamaguard_7b_safety_validator_config.LlamaGuard7B")
and _NSFW_PATCH (value
"app.core.validators.config.nsfw_text_safety_validator_config.NSFWText") exactly
as in HEAD so imports and patch constants resolve.
- Around line 294-451: Resolve the merge conflict by removing the conflict
markers and keeping the HEAD version of the TestNSFWTextSafetyValidatorConfig
class (delete the >>>>>>> and <<<<<<< blocks and the redundant empty region),
and update the failing expectation in test_build_with_defaults to assert
model_name == "textdetox/xlmr-large-toxicity-classifier" so the test matches the
config default; ensure all tests in TestNSFWTextSafetyValidatorConfig (including
test_build_with_defaults) refer to the correct default model_name and that there
are no leftover conflict markers anywhere in that class.
In `@backend/pyproject.toml`:
- Around line 53-61: Resolve the merge conflict in pyproject.toml so the
CPU-only PyTorch source config is activated: ensure the dependency line for
torch remains as "torch>=2.0.0" in the [tool.poetry.dependencies] (or equivalent
dependencies section) and keep the [tool.uv.sources] and [[tool.uv.index]]
blocks intact (the entries torch = [{ index = "pytorch-cpu" }] and the
pytorch-cpu index URL). Remove conflict markers and ensure only the incoming
CPU-wheel index configuration and the torch>=2.0.0 dependency coexist so the
project pulls CPU-only wheels.
- Around line 35-39: The pyproject.toml contains unresolved git conflict markers
(<<<<<<< HEAD, =======, >>>>>>>) around the dependency block; remove the
conflict markers and restore the required dependencies by keeping
"transformers>=5.0.0" and "torch>=2.0.0" in the [project] / dependencies list so
dependency resolution succeeds and the Dockerfile and NSFWText validator that
rely on transformers/torch continue to work.
---
Nitpick comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 186-190: The current code uses next(...) to pick only the first
non-empty validator_metadata from validators and passes it as meta to
APIResponse.success_response, which hides other validators' metadata; replace
that logic in the function that builds the response (referencing validators,
validator_metadata, meta, response_model and the call to
APIResponse.success_response) with aggregation: collect all non-empty
v.validator_metadata entries (e.g., into a list or merged dict depending on
schema), deduplicate/merge as appropriate, and pass the aggregated metadata to
APIResponse.success_response instead of the single-first value (or explicitly
document the first-wins behavior if aggregation is undesired).
In `@backend/README.md`:
- Line 215: Replace the non-descriptive link text "here" with meaningful
descriptive text that explains the destination (for example "Guardrails Hub API
keys page" or "Guardrails Hub API key management") in the README sentence about
GUARDRAILS_HUB_API_KEY so the link reads like "The key can be fetched from
Guardrails Hub API keys page" and improves accessibility and clarity.
In `@backend/scripts/install_guardrails_from_hub.sh`:
- Line 9: The default for ENABLE_REMOTE_INFERENCING was changed to "true", which
causes user data to be sent to Guardrails AI remote endpoints by default; either
revert the default back to "false" or, if leaving it enabled intentionally for
NSFW/toxicity validators, add a clear comment above the
ENABLE_REMOTE_INFERENCING declaration and update relevant docs explaining
privacy/latency/cost implications and that enabling remote inferencing sends
user input to external services—refer to the ENABLE_REMOTE_INFERENCING variable
in this script and the install_guardrails_from_hub.sh header for where to add
the explanatory comment and link to documentation or opt-in instructions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0055de2d-430a-49bd-94b5-0c5735342722
📒 Files selected for processing (17)
backend/Dockerfilebackend/README.mdbackend/app/api/API_USAGE.mdbackend/app/api/docs/guardrails/run_guardrails.mdbackend/app/api/routes/guardrails.pybackend/app/core/enum.pybackend/app/core/validators/README.mdbackend/app/core/validators/config/base_validator_config.pybackend/app/core/validators/config/llamaguard_7b_safety_validator_config.pybackend/app/core/validators/config/nsfw_text_safety_validator_config.pybackend/app/core/validators/config/profanity_free_safety_validator_config.pybackend/app/core/validators/validators.jsonbackend/app/schemas/guardrail_config.pybackend/app/tests/test_guardrails_api_integration.pybackend/app/tests/test_toxicity_hub_validators.pybackend/pyproject.tomlbackend/scripts/install_guardrails_from_hub.sh
backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py
Outdated
Show resolved
Hide resolved
Co-authored-by: dennyabrain <denny.george90@gmail.com>
59dcd2d to
31af2f6
Compare
Resolved conflicts keeping nsfw_text validator features from this branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fier Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
354cb64 to
5b2fe3b
Compare
…t model Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
Summary
Target issue is #84
Explain the motivation for making this change. What existing problem does the pull request solve?
This PR introduces a new validator:
nsfw_text, powered by Guardrails AI, with support for custom HuggingFace models.We configure it to use:
textdetox/xlmr-large-toxicity-classifier
This significantly improves our ability to detect multilingual, code-mixed, and context-aware unsafe content, addressing gaps in the current validator stack.
Our existing validator suite (profanity, slur detection, toxic language, etc.) primarily relies on:
While effective for straightforward cases, they fail in more complex real-world scenarios, such as:
Existing Validators (Context)
To make this PR self-contained, here’s a quick overview of what we already use and where they fall short:
We need a strong model-based classifier to complement these.
What This PR Adds
1. New Validator:
nsfw_textFile:
backend/app/core/validators/config/nsfw_text_safety_validator_config.py2. Custom Model Integration
Instead of using default models, we configure: textdetox/xlmr-large-toxicity-classifier
This model is
Design Decision
Previous Approach (Discarded)
Current Approach
NSFWTextvalidatormodel_nameBenefits
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores