Toxicity Detection validators by rkritika1508 · Pull Request #80 · ProjectTech4DevAI/kaapi-guardrails

rkritika1508 · 2026-04-01T04:17:03Z

Summary

Target issue is #81.
Explain the motivation for making this change. What existing problem does the pull request solve?
As we expand safety coverage in our validation pipeline, we need stronger and more layered defenses against harmful content. The current system relies primarily on rule-based and lexical validators, which are effective but have limitations:

They may miss nuanced or context-dependent harmful content (e.g., indirect violence, coded language).
They may overfit to keyword matching, leading to false positives or missed cases.
They lack a model-level understanding of intent and semantics.

The validators introduced in this PR aim to mitigate the following categories of harm:

Violence / Hate Speech – abusive, threatening, or discriminatory content
Sexual Content – explicit or inappropriate material
Criminal Planning / Weapons – instructions or facilitation of illegal activity
Self-harm Encouragement – harmful mental health content
Profanity / Toxic Language – offensive or inappropriate language

This PR introduces two complementary validators to address these gaps:

LLAMA Guard 7B
Uses the Meta AI LlamaGuard-7B model via Guardrails Hub to classify text as safe or unsafe.
Evaluates content against configurable safety policies:

Violence / hate
Sexual content
Criminal planning
Weapons
Illegal drugs
Self-harm encouragement

Profanity Free

Detects profanity using a linear SVM model (alt-profanity-check)
Fails validation if profane content is detected

Additional Changes

Added unit tests for each validator
Added integration tests for validator combinations
Updated API documentation to include:
- New validator types
- Configuration options
- Policy controls

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

New Features
- Added LlamaGuard 7B validator (configurable human-readable policies) and Profanity Free validator (automatic profanity handling with configurable on-fail behavior).
Bug Fixes
- Validator logging no longer attempts to persist when no validation result exists.
Documentation
- API and runtime docs updated to list the new validators, policy options, default behaviors, and configuration examples.
Tests
- Added unit and integration tests covering the new validators and on-fail behaviors.

coderabbitai · 2026-04-01T04:17:15Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds two Guardrails Hub validators—llamaguard_7b and profanity_free—including config classes, schema and enum updates, manifest entries, tests, docs, a runtime dependency, and a small API change to skip logging when a validator returns no result.

Changes

Cohort / File(s)	Summary
Documentation `backend/app/api/API_USAGE.md`, `backend/app/core/validators/README.md`, `backend/app/api/docs/guardrails/run_guardrails.md`	Documented new validators (`llamaguard_7b`, `profanity_free`), config fields (`policies`, `on_fail`), default strategies, and run-time behavior.
New Validator Configs `backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`, `backend/app/core/validators/config/profanity_free_safety_validator_config.py`	Added `LlamaGuard7BSafetyValidatorConfig` (with policy name→code mapping and resolver) and `ProfanityFreeSafetyValidatorConfig`; both implement `build()` to instantiate Hub validators.
Schema & Core `backend/app/schemas/guardrail_config.py`, `backend/app/core/enum.py`, `backend/app/core/validators/validators.json`	Extended `ValidatorConfigItem` discriminator union; added `ValidatorType` members (`llm_critic`, `llamaguard_7b`, `profanity_free`); appended new validators to manifest.
API route `backend/app/api/routes/guardrails.py`	`add_validator_logs()` now skips creating/persisting logs when `log.validation_result` is `None`.
Tests `backend/app/tests/test_toxicity_hub_validators.py`, `backend/app/tests/test_guardrails_api_integration.py`	Added unit tests for new config classes (Pydantic validation, build behavior, policy mapping, on_fail semantics) and integration tests covering single and combined validator flows and on_fail variants.
Dependencies `backend/pyproject.toml`	Added `huggingface-hub>=1.5.0,<2.0` to runtime dependencies.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant API as API Server
participant ValidatorManager as Validator Config / Builder
participant GuardrailsHub as Guardrails Hub (remote/local)
participant DB as DB / Logs

Client->>API: POST /api/v1/guardrails/run (input + configs)
API->>ValidatorManager: Parse configs (discriminator by type)
ValidatorManager->>ValidatorManager: Build validator instances (LlamaGuard7B, ProfanityFree)
API->>GuardrailsHub: Execute validator(s) (pass policies/on_fail)
GuardrailsHub-->>API: Validation result (PassResult / FailResult / None)
alt validation_result is None
API->>DB: (skip) no log created
else validation_result present
API->>DB: Persist ValidatorLog (result, on_fail)
end
API-->>Client: Response (success, output, metadata)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Added validator logs and organized code #18 — touches the same add_validator_logs() behavior in backend/app/api/routes/guardrails.py.
Guardrails: Config Management #30 — overlaps enum and validator-config surface changes (adds validator types/config handling).
Parameter and exception handling enhancements #38 — related changes to schema/validator configuration normalization and validation behavior.

Suggested reviewers

nishika26
AkhileshNegi

Poem

🐰 I hopped through code with tiny paws,
Two new guards to mind the laws,
Llama checks and policies arrayed,
Profanity trimmed, no mess made,
Cheers from me — a tidy deploy hooray! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Toxicity Detection validators' directly matches the PR's main objective of adding toxicity/profanity detection validators (LlamaGuard7B and ProfanityFree) to the guardrails pipeline.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/toxicity-hub-validators

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

backend/app/core/validators/README.md (1)

377-377: Minor: Consider hyphenating "Profanity-Free" for consistency.

The section title uses "Profanity Free" but compound adjectives typically use hyphens.

📝 Suggested fix

-### 9) Profanity Free Validator (`profanity_free`)
+### 9) Profanity-Free Validator (`profanity_free`)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/validators/README.md` at line 377, The section title
"Profanity Free Validator (`profanity_free`)" should use a hyphen for the
compound adjective; update the heading to "Profanity-Free Validator
(`profanity_free`)" so it reads consistently with other compound-adjective
headings and matches the validator identifier `profanity_free`.

backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py (1)

1-1: Consider using built-in list instead of typing.List.

Python 3.10+ (the project's minimum version) supports generic type hints directly on built-in types. Using list[str] instead of List[str] is the modern approach and aligns with Ruff's UP035 rule.

♻️ Suggested refactor

-from typing import List, Literal, Optional
+from typing import Literal, Optional

And on line 10:

-    policies: Optional[List[str]] = None
+    policies: Optional[list[str]] = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`
at line 1, Replace usages of typing.List with the built-in generic type: remove
List from the import list in the top import line and change all type annotations
that use List (e.g., List[str], List[int], etc.) to use list[...] (e.g.,
list[str]) in llamaguard_7b_safety_validator_config.py; ensure you update the
import statement to keep only Literal and Optional (or any other still-used
typing names) and scan the file for any remaining occurrences of the symbol List
to convert them to the built-in list form.

backend/app/tests/test_toxicity_hub_validators.py (2)

37-148: Consider parameterizing repeated assertion patterns to reduce drift.

There is substantial duplication across classes (default/custom build args, on_fail mapping, invalid-on_fail, wrong type, extra fields). Converting repeated cases to pytest.mark.parametrize + shared helper would reduce maintenance cost and make future validator additions safer.

Refactor sketch

+@pytest.mark.parametrize(
+    "config_cls,type_value,patch_target,kwargs,expected_kwargs",
+    [
+        (NSFWTextSafetyValidatorConfig, "nsfw_text", _NSFW_PATCH, {}, {"threshold": 0.8, "validation_method": "sentence", "device": "cpu", "model_name": "michellejieli/NSFW_text_classifier"}),
+        (ToxicLanguageSafetyValidatorConfig, "toxic_language", _TOXIC_PATCH, {}, {"threshold": 0.5, "validation_method": "sentence", "device": "cpu", "model_name": "unbiased-small"}),
+    ],
+)
+def test_build_forwards_expected_kwargs(config_cls, type_value, patch_target, kwargs, expected_kwargs):
+    config = config_cls(type=type_value, **kwargs)
+    with patch(patch_target) as mock_validator:
+        config.build()
+    _, actual = mock_validator.call_args
+    for k, v in expected_kwargs.items():
+        assert actual[k] == v

Also applies to: 155-277, 284-362, 369-504

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/test_toxicity_hub_validators.py` around lines 37 - 148,
Refactor the repeated tests for LlamaGuard7BSafetyValidatorConfig by converting
duplicate assertion patterns into parametrized tests: create
pytest.mark.parametrize cases for policies (None, [], ["O1"], all_policies), for
on_fail mapping ("fix"/"exception"/"rephrase"/invalid) and for schema validation
(wrong type literal, extra fields), and replace the repeated with
patch(_LLAMAGUARD_PATCH) calls with a small helper that builds the config and
returns mock_validator.call_args; update assertions to use that helper and
reference LlamaGuard7BSafetyValidatorConfig, _LLAMAGUARD_PATCH, OnFailAction,
and pytest.mark.parametrize so each behavior (default/custom policies, on_fail
resolution, invalid on_fail, wrong type, extra fields) is covered by
parametrized cases instead of duplicated test methods.

187-203: Add explicit out-of-range threshold tests (-0.01 / 1.01).

You currently validate numeric type and boundary inclusion (0.0, 1.0), but there is no assertion that out-of-range values are rejected. If thresholds are intended to be constrained to [0, 1], add negative and above-one cases to lock that contract.

Also applies to: 401-421, 274-277, 502-504

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/test_toxicity_hub_validators.py` around lines 187 - 203,
Add tests that assert out-of-range thresholds are rejected: create two new tests
(e.g., test_build_with_threshold_below_zero and
test_build_with_threshold_above_one) that instantiate
NSFWTextSafetyValidatorConfig with threshold=-0.01 and threshold=1.01
respectively, patch _NSFW_PATCH as in the existing tests, and assert that
calling config.build() raises a validation exception (use ValueError or the
specific validation exception your code uses). Apply the same pattern to the
other validator test blocks mentioned (the ranges around lines 274-277, 401-421,
and 502-504) so each validator verifies thresholds outside [0,1] are rejected.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`:
- Line 1: Replace usages of typing.List with the built-in generic type: remove
List from the import list in the top import line and change all type annotations
that use List (e.g., List[str], List[int], etc.) to use list[...] (e.g.,
list[str]) in llamaguard_7b_safety_validator_config.py; ensure you update the
import statement to keep only Literal and Optional (or any other still-used
typing names) and scan the file for any remaining occurrences of the symbol List
to convert them to the built-in list form.

In `@backend/app/core/validators/README.md`:
- Line 377: The section title "Profanity Free Validator (`profanity_free`)"
should use a hyphen for the compound adjective; update the heading to
"Profanity-Free Validator (`profanity_free`)" so it reads consistently with
other compound-adjective headings and matches the validator identifier
`profanity_free`.

In `@backend/app/tests/test_toxicity_hub_validators.py`:
- Around line 37-148: Refactor the repeated tests for
LlamaGuard7BSafetyValidatorConfig by converting duplicate assertion patterns
into parametrized tests: create pytest.mark.parametrize cases for policies
(None, [], ["O1"], all_policies), for on_fail mapping
("fix"/"exception"/"rephrase"/invalid) and for schema validation (wrong type
literal, extra fields), and replace the repeated with patch(_LLAMAGUARD_PATCH)
calls with a small helper that builds the config and returns
mock_validator.call_args; update assertions to use that helper and reference
LlamaGuard7BSafetyValidatorConfig, _LLAMAGUARD_PATCH, OnFailAction, and
pytest.mark.parametrize so each behavior (default/custom policies, on_fail
resolution, invalid on_fail, wrong type, extra fields) is covered by
parametrized cases instead of duplicated test methods.
- Around line 187-203: Add tests that assert out-of-range thresholds are
rejected: create two new tests (e.g., test_build_with_threshold_below_zero and
test_build_with_threshold_above_one) that instantiate
NSFWTextSafetyValidatorConfig with threshold=-0.01 and threshold=1.01
respectively, patch _NSFW_PATCH as in the existing tests, and assert that
calling config.build() raises a validation exception (use ValueError or the
specific validation exception your code uses). Apply the same pattern to the
other validator test blocks mentioned (the ranges around lines 274-277, 401-421,
and 502-504) so each validator verifies thresholds outside [0,1] are rejected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e3faf93-8398-459b-a53a-fa511b15fc40

📥 Commits

Reviewing files that changed from the base of the PR and between 791820f and 650369c.

📒 Files selected for processing (9)

backend/app/api/API_USAGE.md
backend/app/core/validators/README.md
backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py
backend/app/core/validators/config/nsfw_text_safety_validator_config.py
backend/app/core/validators/config/profanity_free_safety_validator_config.py
backend/app/core/validators/config/toxic_language_safety_validator_config.py
backend/app/core/validators/validators.json
backend/app/schemas/guardrail_config.py
backend/app/tests/test_toxicity_hub_validators.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/core/validators/README.md`:
- Around line 399-430: Update the README wording for the Profanity Free
Validator (the section titled "Profanity Free Validator" referring to
profanity_free_safety_validator_config.py and hub://guardrails/profanity_free)
to use hyphenated compound adjectives where appropriate: change phrases like
"Profanity Free" to "Profanity-Free", "model based" to "model-based", and
"matching based" to "matching-based" (and apply the same hyphenation to similar
compound forms such as "first-pass" if inconsistent) so the grammar is tightened
throughout that validator's documentation.

In `@backend/app/tests/test_guardrails_api_integration.py`:
- Around line 345-364: The docstrings for the tests are mislabelled against the
validator policy mapping: update the docstring in
test_input_guardrails_with_llamaguard_7b_geography_policy (the first test) and
the docstring in test_input_guardrails_with_llamaguard_7b_violence_policy (the
next test) so the policy identifiers (O2 vs O3) correctly describe the active
policy in each test and match the validator guide; locate the two functions by
their names and swap or rewrite the inline descriptions so the geography test
references the geography policy and the violence/sex-crimes test references the
correct O2/O3 label.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96811ae3-73c9-4853-b1f7-191d9ac0af0c

📥 Commits

Reviewing files that changed from the base of the PR and between 949647d and 141e5fc.

📒 Files selected for processing (8)

backend/app/api/API_USAGE.md
backend/app/api/routes/guardrails.py
backend/app/core/enum.py
backend/app/core/validators/README.md
backend/app/core/validators/validators.json
backend/app/schemas/guardrail_config.py
backend/app/tests/test_guardrails_api_integration.py
backend/app/tests/test_toxicity_hub_validators.py

✅ Files skipped from review due to trivial changes (1)

backend/app/core/enum.py

🚧 Files skipped from review as they are similar to previous changes (3)

backend/app/schemas/guardrail_config.py
backend/app/core/validators/validators.json
backend/app/api/API_USAGE.md

backend/app/core/validators/README.md

backend/app/tests/test_guardrails_api_integration.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

backend/app/api/docs/guardrails/run_guardrails.md (1)

11-20: Add blank line after the table to comply with Markdown formatting standards.

The table at lines 13-20 should be followed by a blank line before the next content (line 21).

📝 Proposed fix

   | `no_illegal_drugs`          | No illegal drugs                 |
   | `no_encourage_self_harm`    | No encouragement of self-harm    |
+
 - `rephrase_needed=true` means the system could not safely auto-fix the input/output and wants the user to retry with a rephrased query.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/docs/guardrails/run_guardrails.md` around lines 11 - 20, Add
a single blank line after the policies markdown table in the llamaguard_7b
section so the table (the rows starting with `no_violence_hate` through
`no_encourage_self_harm`) is followed by an empty line before the next
paragraph; update the run_guardrails.md content to insert one newline after the
table to comply with Markdown formatting standards.

backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py (1)

1-1: Use modern list type hint instead of typing.List.

typing.List is deprecated since Python 3.9. Use the built-in list for type hints.

♻️ Proposed fix

-from typing import List, Literal, Optional
+from typing import Literal, Optional

And update the type hints:

-    policies: Optional[List[str]] = None
+    policies: Optional[list[str]] = None

-    def _resolve_policies(self) -> Optional[List[str]]:
+    def _resolve_policies(self) -> Optional[list[str]]:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`
at line 1, Replace deprecated typing.List with the built-in list: remove List
from the import line in llamaguard_7b_safety_validator_config.py and update any
type annotations that reference List[...] to use list[...]; keep Literal and
Optional imports as-is (or import them from typing if still used) and ensure all
occurrences (e.g., in class attributes, function signatures or return types) are
converted to the modern list[...] form.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`:
- Around line 21-32: The _resolve_policies() currently only maps human-readable
names via POLICY_NAME_MAP and rejects raw codes like "O1", so update
_resolve_policies (in llamaguard_7b_safety_validator_config.py) to accept both
forms: for each policy, first check POLICY_NAME_MAP.get(policy.lower()) and if
that returns None, then check if the policy (uppercased) matches a raw policy
code (e.g., "O1".."O6") and if so append the uppercased code unchanged;
otherwise raise the same ValueError. This keeps existing mapping behavior while
allowing tests that pass raw codes to succeed.

---

Nitpick comments:
In `@backend/app/api/docs/guardrails/run_guardrails.md`:
- Around line 11-20: Add a single blank line after the policies markdown table
in the llamaguard_7b section so the table (the rows starting with
`no_violence_hate` through `no_encourage_self_harm`) is followed by an empty
line before the next paragraph; update the run_guardrails.md content to insert
one newline after the table to comply with Markdown formatting standards.

In `@backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py`:
- Line 1: Replace deprecated typing.List with the built-in list: remove List
from the import line in llamaguard_7b_safety_validator_config.py and update any
type annotations that reference List[...] to use list[...]; keep Literal and
Optional imports as-is (or import them from typing if still used) and ensure all
occurrences (e.g., in class attributes, function signatures or return types) are
converted to the modern list[...] form.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4662bec1-1cd2-4eb3-84cc-265fb76badcb

📥 Commits

Reviewing files that changed from the base of the PR and between 141e5fc and 74f8a82.

📒 Files selected for processing (4)

backend/app/api/docs/guardrails/run_guardrails.md
backend/app/core/validators/README.md
backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py
backend/app/tests/test_guardrails_api_integration.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/tests/test_guardrails_api_integration.py

backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py

backend/app/api/routes/guardrails.py

backend/app/core/validators/README.md

backend/app/tests/test_guardrails_api_integration.py

backend/app/api/docs/guardrails/run_guardrails.md

backend/app/api/API_USAGE.md

Co-authored-by: dennyabrain <denny.george90@gmail.com>

added toxicity detection validators

650369c

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

dennyabrain marked this pull request as draft April 1, 2026 04:31

rkritika1508 marked this pull request as ready for review April 1, 2026 04:32

fixed import error

949647d

rkritika1508 marked this pull request as draft April 1, 2026 04:48

rkritika1508 and others added 4 commits April 2, 2026 18:29

removed redundant validators

da50537

fixed test

b64d0e9

fix: profanity free validator description

09b6a05

Added integration tests

51c9266

rkritika1508 marked this pull request as ready for review April 7, 2026 09:03

rkritika1508 added 2 commits April 7, 2026 14:33

Merge branch 'main' into feat/toxicity-hub-validators

141e5fc

added integration tests

c76f829

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

backend/app/core/validators/README.md Show resolved Hide resolved

backend/app/tests/test_guardrails_api_integration.py Outdated Show resolved Hide resolved

rkritika1508 self-assigned this Apr 7, 2026

rkritika1508 linked an issue Apr 7, 2026 that may be closed by this pull request

Add toxicity validators from Guardrails Hub #81

Closed

rkritika1508 added enhancement New feature or request ready-for-review labels Apr 7, 2026

updated policies for llama guard

74f8a82

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

backend/app/core/validators/config/llamaguard_7b_safety_validator_config.py Show resolved Hide resolved

fixed tests

6676414

nishika26 reviewed Apr 8, 2026

View reviewed changes

backend/app/api/routes/guardrails.py Show resolved Hide resolved

backend/app/core/validators/README.md Outdated Show resolved Hide resolved

backend/app/tests/test_guardrails_api_integration.py Show resolved Hide resolved

rkritika1508 added 2 commits April 8, 2026 16:11

updated readme and fixed llama guard inference

6443c1b

fixed test

af933ef

nishika26 reviewed Apr 10, 2026

View reviewed changes

backend/app/api/docs/guardrails/run_guardrails.md Outdated Show resolved Hide resolved

backend/app/api/docs/guardrails/run_guardrails.md Show resolved Hide resolved

backend/app/api/API_USAGE.md Show resolved Hide resolved

rkritika1508 added 3 commits April 10, 2026 15:47

Merge branch 'main' into feat/toxicity-hub-validators

9aca5f2

resolved comments

664ded8

fixed llama guard

02fd043

nishika26 approved these changes Apr 10, 2026

View reviewed changes

rkritika1508 merged commit 60f5067 into main Apr 10, 2026
2 checks passed

rkritika1508 deleted the feat/toxicity-hub-validators branch April 10, 2026 11:57

rkritika1508 added a commit that referenced this pull request Apr 10, 2026

Toxicity Detection validators (#80)

59dcd2d

Co-authored-by: dennyabrain <denny.george90@gmail.com>

coderabbitai bot mentioned this pull request Apr 10, 2026

Added NSFW text validator #83

Open

2 tasks

rkritika1508 added a commit that referenced this pull request Apr 10, 2026

Toxicity Detection validators (#80)

31af2f6

Co-authored-by: dennyabrain <denny.george90@gmail.com>

Conversation

rkritika1508 commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rkritika1508 commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading