Skip to content

docs: draft documentation for the hf_classifier rail#1969

Open
m-misiura wants to merge 1 commit into
NVIDIA-NeMo:developfrom
m-misiura:hf_rail_docs
Open

docs: draft documentation for the hf_classifier rail#1969
m-misiura wants to merge 1 commit into
NVIDIA-NeMo:developfrom
m-misiura:hf_rail_docs

Conversation

@m-misiura

@m-misiura m-misiura commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Description

This PR add upstream documentation for hf_classifier rail

Related Issue(s)

Upstream documentation is needed for the feature introduced in PR#1853

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

cc @miyoungc @Pouyanpi @tgasser-nv

Summary by CodeRabbit

  • Documentation
    • Added comprehensive documentation for HuggingFace Classifier integration, covering supported inference backends (local pipeline, vLLM, KServe, IBM FMS), configuration setup, and practical examples for input, output, and retrieval flows.
    • Included behavior semantics for blocking detection, logging, and streaming configurations.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1969

@m-misiura m-misiura marked this pull request as ready for review June 2, 2026 10:27
@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds upstream documentation for the hf_classifier rail introduced in PR #1853, covering all four inference backends (local, vLLM, KServe, FMS), configuration options, and per-flow usage examples for input, output, and retrieval rails.

  • hf-classifier.mdx — new reference page with backend comparison table, config schema, labeled example blocks for each flow type, a complete multi-classifier streaming example, and a behavior semantics section.
  • third-party.mdx — adds a HuggingFace Classifier entry with a summary example and a deep-link following the established /configure-guardrails/guardrail-catalog/third-party/{slug} URL pattern used by all other community integrations.
  • index.yml — registers the new page at the correct alphabetical position under the Third-Party APIs section.

Confidence Score: 5/5

Documentation-only PR with no code changes; safe to merge.

All three changed files are documentation. The navigation entry, URL links, and YAML examples are internally consistent and follow the same patterns as existing community integration pages. The only gap is the mTLS section omitting a flow-activation snippet, which leaves one example incomplete but does not break any functionality.

No files require special attention; the mTLS example in hf-classifier.mdx could benefit from a flow-activation snippet for completeness.

Important Files Changed

Filename Overview
docs/configure-rails/guardrail-catalog/community/hf-classifier.mdx New comprehensive reference page for the hf_classifier rail covering all four inference backends, config options, and per-flow examples; the mTLS section is the only example that omits the flow-activation stanza, leaving it incomplete.
docs/configure-rails/guardrail-catalog/third-party.mdx Adds a HuggingFace Classifier summary section with a valid NER example and a correctly-structured deep-link following the existing /configure-guardrails/guardrail-catalog/third-party/{slug} URL pattern.
docs/index.yml Inserts the hf-classifier page at the correct alphabetical position (between GuardrailsAI and Llama Guard) under the Third-Party APIs community section with the correct path and slug.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Message] --> B{Input Rail\nhf classifier check input}
    B -- blocked --> C[Return: I'm sorry, I can't respond to that.]
    B -- allowed --> D[Retrieval\nhf classifier check retrieval]
    D -- blocked --> E[Clear all retrieved chunks]
    D -- allowed --> F[LLM Generation / Streaming]
    F --> G{Output Rail\nhf classifier check output}
    G -- blocked --> H[Return: I'm sorry, I can't respond to that.]
    G -- allowed --> I[Response to User]

    subgraph Backends
        J[local\ntransformers pipeline]
        K[vllm\n/classify endpoint]
        L[kserve\n/v1/models/:predict]
        M[fms\n/api/v1/text/contents]
    end

    B -.uses.-> Backends
    D -.uses.-> Backends
    G -.uses.-> Backends
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
docs/configure-rails/guardrail-catalog/community/hf-classifier.mdx:263-280
**mTLS example missing flow activation**

Every other example in this document pairs the `config` block with the corresponding `input`, `output`, or `retrieval` flow entry. The mTLS section only shows the `rails.config.hf_classifier` stanza — a reader copying it verbatim would have a correctly-configured TLS connection but no active rail, since without a `flows` entry the classifier is never invoked. Adding even a one-liner showing which flow to attach (e.g., `input.flows: - hf classifier check input $classifier=toxicity`) would make the example self-contained.

Reviews (3): Last reviewed commit: ":constrtuction: adapt docs format expect..." | Re-trigger Greptile

Comment on lines +311 to +314
input:
flows:
- hf classifier check output $classifier=named_entity_recognition
output:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Wrong flow name in input: section — this is a copy-paste error. hf classifier check output is the output rail flow; using it under input: means the input path will invoke the output-checking action (hf_classifier_check_output), which reads bot_message from context instead of user_message, so the input rail silently receives an empty string and never blocks anything.

Suggested change
input:
flows:
- hf classifier check output $classifier=named_entity_recognition
output:
input:
flows:
- hf classifier check input $classifier=named_entity_recognition
output:
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/configure-rails/guardrail-catalog/third-party.md
Line: 311-314

Comment:
Wrong flow name in `input:` section — this is a copy-paste error. `hf classifier check output` is the output rail flow; using it under `input:` means the input path will invoke the output-checking action (`hf_classifier_check_output`), which reads `bot_message` from context instead of `user_message`, so the input rail silently receives an empty string and never blocks anything.

```suggestion
  input:
    flows:
      - hf classifier check input $classifier=named_entity_recognition
  output:
```

How can I resolve this? If you propose a fix, please make it concise.


### Example usage

To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The description says "toxicity detection" but the YAML immediately below uses dslim/distilbert-NER — a Named Entity Recognition model that blocks PER, LOC, and ORG labels — which has nothing to do with toxicity. This mismatch will confuse readers trying to understand the use case for this example.

Suggested change
To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:
To set up a single local HuggingFace classifier rail for named entity recognition (NER) on input and output flows, you can use the following configuration:
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/configure-rails/guardrail-catalog/third-party.md
Line: 293

Comment:
The description says "toxicity detection" but the YAML immediately below uses `dslim/distilbert-NER` — a Named Entity Recognition model that blocks `PER`, `LOC`, and `ORG` labels — which has nothing to do with toxicity. This mismatch will confuse readers trying to understand the use case for this example.

```suggestion
To set up a single local HuggingFace classifier rail for named entity recognition (NER) on input and output flows, you can use the following configuration:
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request adds comprehensive documentation for the HuggingFace Classifier integration, including supported inference backends (local pipeline, vLLM, KServe, IBM FMS), setup instructions, configuration options, per-rail blocking behavior, practical examples, return semantics, and logging details. The documentation is integrated into the third-party integrations index with navigation updates.

Changes

HuggingFace Classifier Integration Documentation

Layer / File(s) Summary
Integration overview and supported backends
docs/configure-rails/guardrail-catalog/community/hf-classifier.md
Introduces the HuggingFace Classifier integration with descriptions of supported inference backends (local HuggingFace pipeline, vLLM classify, KServe predict, IBM FMS endpoints), setup steps for local versus remote deployment, and Colang import guidance for library discovery.
Configuration options and rail-specific behavior
docs/configure-rails/guardrail-catalog/community/hf-classifier.md
Documents rails.config.hf_classifier configuration, classifier wiring through input/output/retrieval flows via $classifier, common and backend-specific configuration options (threshold, blocked label handling), and per-rail blocking semantics for each supported engine.
Practical examples with multiple classifiers
docs/configure-rails/guardrail-catalog/community/hf-classifier.md
Provides a complete multi-rail example combining multiple classifiers (FMS for output, KServe for input, vLLM alternative) with flow wiring and streaming configuration for production use.
Return semantics, logging, and advanced configuration
docs/configure-rails/guardrail-catalog/community/hf-classifier.md
Documents return value and logging formats for blocked detections and scores, mTLS and custom CA certificate configuration, and end-to-end rail blocking behavior differences between input/output rails (abort with fixed message or exception) and retrieval rails (clear chunks), including streaming completion checking.
Third-party integration reference and navigation
docs/configure-rails/guardrail-catalog/third-party.md
Adds HuggingFace Classifier entry to the third-party integrations page with descriptive overview and YAML configuration example, and extends the community integrations toctree with a link to the new integration documentation page.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding documentation for the hf_classifier rail feature.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes ✅ Passed PR is documentation-only (no code changes), documenting an existing feature from PR#1853. Minor changes don't require test results per check criteria.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/configure-rails/guardrail-catalog/community/hf-classifier.md (1)

86-86: ⚡ Quick win

Consider restructuring for improved readability.

The sentence begins with "For" three times in succession when describing different backends. While technically correct, reformatting as a bulleted list would improve scannability.

📝 Suggested restructuring
-Values must match the label strings returned by the model or server. For **local** and **vLLM** backends with `text-classification`, labels come from the model's `id2label` mapping (e.g., `"toxic"`, `"LABEL_1"`). For `token-classification` with `aggregation_strategy`, labels are entity groups with the B-/I- prefix stripped (e.g., `"PER"`, `"LOC"`). For **FMS**, labels come from the `detection_type` field in the server response. For **KServe**, labels are stringified class indices (`"0"`, `"1"`).
+Values must match the label strings returned by the model or server:
+
+- **Local** and **vLLM** backends: For `text-classification`, labels come from the model's `id2label` mapping (e.g., `"toxic"`, `"LABEL_1"`). For `token-classification` with `aggregation_strategy`, labels are entity groups with the B-/I- prefix stripped (e.g., `"PER"`, `"LOC"`).
+- **FMS**: Labels come from the `detection_type` field in the server response.
+- **KServe**: Labels are stringified class indices (`"0"`, `"1"`).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/configure-rails/guardrail-catalog/community/hf-classifier.md` at line
86, Split the long sentence into a short introductory line followed by a
bulleted list that documents each backend separately: one bullet for "local and
vLLM" explaining text-classification uses the model's id2label mapping
(examples: "toxic", "LABEL_1"), one bullet for "token-classification" noting
aggregation_strategy strips B-/I- prefixes to yield entity groups (examples:
"PER", "LOC"), one bullet for "FMS" stating labels come from the detection_type
field in the server response, and one bullet for "KServe" stating labels are
stringified class indices (e.g., "0", "1"); keep the same examples and keywords
(id2label, aggregation_strategy, detection_type, text-classification,
token-classification, FMS, KServe) so readers can map back to implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/configure-rails/guardrail-catalog/third-party.md`:
- Line 293: The description currently states "toxicity detection" but the
example uses the NER model "dslim/distilbert-NER" with entity labels "PER",
"LOC", and "ORG"; update the top-line description to say "named entity
recognition (NER) for input and output flows" to match the example, or
alternatively replace the example model and labels with a toxicity
classification model (e.g., a HuggingFace toxicity classifier and
binary/multi-class labels) so the description and the example are consistent;
adjust any mention of labels or flow behavior accordingly to reference the NER
symbols ("dslim/distilbert-NER", "PER", "LOC", "ORG") if you choose the first
option.
- Line 313: Replace the incorrect input flow action string "hf classifier check
output" with "hf classifier check input" wherever the input flow uses the
classifier command (look for the literal token hf classifier check output in the
docs); update the command text in the input flow description to hf classifier
check input so the documented action matches the intended input check.

---

Nitpick comments:
In `@docs/configure-rails/guardrail-catalog/community/hf-classifier.md`:
- Line 86: Split the long sentence into a short introductory line followed by a
bulleted list that documents each backend separately: one bullet for "local and
vLLM" explaining text-classification uses the model's id2label mapping
(examples: "toxic", "LABEL_1"), one bullet for "token-classification" noting
aggregation_strategy strips B-/I- prefixes to yield entity groups (examples:
"PER", "LOC"), one bullet for "FMS" stating labels come from the detection_type
field in the server response, and one bullet for "KServe" stating labels are
stringified class indices (e.g., "0", "1"); keep the same examples and keywords
(id2label, aggregation_strategy, detection_type, text-classification,
token-classification, FMS, KServe) so readers can map back to implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b700040b-273e-45ea-954e-c9677f4d34dd

📥 Commits

Reviewing files that changed from the base of the PR and between 8082e74 and bf06911.

📒 Files selected for processing (2)
  • docs/configure-rails/guardrail-catalog/community/hf-classifier.md
  • docs/configure-rails/guardrail-catalog/third-party.md


### Example usage

To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the description to match the example.

The description mentions "toxicity detection," but the example demonstrates named entity recognition (NER) using dslim/distilbert-NER with entity labels like "PER", "LOC", and "ORG". Update the description to accurately reflect the example.

📝 Suggested fix
-To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:
+To set up a single local HuggingFace classifier rail for named entity recognition on input and output flows, you can use the following configuration:

Alternatively, if you prefer to demonstrate toxicity detection, replace the example with a toxicity classification model.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:
To set up a single local HuggingFace classifier rail for named entity recognition on input and output flows, you can use the following configuration:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/configure-rails/guardrail-catalog/third-party.md` at line 293, The
description currently states "toxicity detection" but the example uses the NER
model "dslim/distilbert-NER" with entity labels "PER", "LOC", and "ORG"; update
the top-line description to say "named entity recognition (NER) for input and
output flows" to match the example, or alternatively replace the example model
and labels with a toxicity classification model (e.g., a HuggingFace toxicity
classifier and binary/multi-class labels) so the description and the example are
consistent; adjust any mention of labels or flow behavior accordingly to
reference the NER symbols ("dslim/distilbert-NER", "PER", "LOC", "ORG") if you
choose the first option.


input:
flows:
- hf classifier check output $classifier=named_entity_recognition

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix the input flow action from "check output" to "check input".

The input flow incorrectly uses hf classifier check output when it should use hf classifier check input. This appears to be a copy-paste error.

🔧 Proposed fix
   input:
     flows:
-      - hf classifier check output $classifier=named_entity_recognition
+      - hf classifier check input $classifier=named_entity_recognition
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- hf classifier check output $classifier=named_entity_recognition
input:
flows:
- hf classifier check input $classifier=named_entity_recognition
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/configure-rails/guardrail-catalog/third-party.md` at line 313, Replace
the incorrect input flow action string "hf classifier check output" with "hf
classifier check input" wherever the input flow uses the classifier command
(look for the literal token hf classifier check output in the docs); update the
command text in the input flow description to hf classifier check input so the
documented action matches the intended input check.

@Pouyanpi Pouyanpi requested a review from miyoungc June 3, 2026 11:30
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant