docs: draft documentation for the `hf_classifier` rail by m-misiura · Pull Request #1969 · NVIDIA-NeMo/Guardrails

m-misiura · 2026-06-02T10:14:37Z

Description

This PR add upstream documentation for hf_classifier rail

Related Issue(s)

Upstream documentation is needed for the feature introduced in PR#1853

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

cc @miyoungc @Pouyanpi @tgasser-nv

Summary by CodeRabbit

Documentation
- Added comprehensive documentation for HuggingFace Classifier integration, covering supported inference backends (local pipeline, vLLM, KServe, IBM FMS), configuration setup, and practical examples for input, output, and retrieval flows.
- Included behavior semantics for blocking detection, logging, and streaming configurations.

github-actions · 2026-06-02T10:26:04Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1969

greptile-apps · 2026-06-02T10:29:44Z

Greptile Summary

This PR adds upstream documentation for the hf_classifier rail introduced in PR #1853, covering all four inference backends (local, vLLM, KServe, FMS), configuration options, and per-flow usage examples for input, output, and retrieval rails.

hf-classifier.mdx — new reference page with backend comparison table, config schema, labeled example blocks for each flow type, a complete multi-classifier streaming example, and a behavior semantics section.
third-party.mdx — adds a HuggingFace Classifier entry with a summary example and a deep-link following the established /configure-guardrails/guardrail-catalog/third-party/{slug} URL pattern used by all other community integrations.
index.yml — registers the new page at the correct alphabetical position under the Third-Party APIs section.

Confidence Score: 5/5

Documentation-only PR with no code changes; safe to merge.

All three changed files are documentation. The navigation entry, URL links, and YAML examples are internally consistent and follow the same patterns as existing community integration pages. The only gap is the mTLS section omitting a flow-activation snippet, which leaves one example incomplete but does not break any functionality.

No files require special attention; the mTLS example in hf-classifier.mdx could benefit from a flow-activation snippet for completeness.

Important Files Changed

Filename	Overview
docs/configure-rails/guardrail-catalog/community/hf-classifier.mdx	New comprehensive reference page for the hf_classifier rail covering all four inference backends, config options, and per-flow examples; the mTLS section is the only example that omits the flow-activation stanza, leaving it incomplete.
docs/configure-rails/guardrail-catalog/third-party.mdx	Adds a HuggingFace Classifier summary section with a valid NER example and a correctly-structured deep-link following the existing /configure-guardrails/guardrail-catalog/third-party/{slug} URL pattern.
docs/index.yml	Inserts the hf-classifier page at the correct alphabetical position (between GuardrailsAI and Llama Guard) under the Third-Party APIs community section with the correct path and slug.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Message] --> B{Input Rail\nhf classifier check input}
    B -- blocked --> C[Return: I'm sorry, I can't respond to that.]
    B -- allowed --> D[Retrieval\nhf classifier check retrieval]
    D -- blocked --> E[Clear all retrieved chunks]
    D -- allowed --> F[LLM Generation / Streaming]
    F --> G{Output Rail\nhf classifier check output}
    G -- blocked --> H[Return: I'm sorry, I can't respond to that.]
    G -- allowed --> I[Response to User]

    subgraph Backends
        J[local\ntransformers pipeline]
        K[vllm\n/classify endpoint]
        L[kserve\n/v1/models/:predict]
        M[fms\n/api/v1/text/contents]
    end

    B -.uses.-> Backends
    D -.uses.-> Backends
    G -.uses.-> Backends

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
docs/configure-rails/guardrail-catalog/community/hf-classifier.mdx:263-280
**mTLS example missing flow activation**

Every other example in this document pairs the `config` block with the corresponding `input`, `output`, or `retrieval` flow entry. The mTLS section only shows the `rails.config.hf_classifier` stanza — a reader copying it verbatim would have a correctly-configured TLS connection but no active rail, since without a `flows` entry the classifier is never invoked. Adding even a one-liner showing which flow to attach (e.g., `input.flows: - hf classifier check input $classifier=toxicity`) would make the example self-contained.

_{Reviews (3): Last reviewed commit: ":constrtuction: adapt docs format expect..." | Re-trigger Greptile}

greptile-apps · 2026-06-02T10:29:48Z

+  input:
+    flows:
+      - hf classifier check output $classifier=named_entity_recognition
+  output:


Wrong flow name in input: section — this is a copy-paste error. hf classifier check output is the output rail flow; using it under input: means the input path will invoke the output-checking action (hf_classifier_check_output), which reads bot_message from context instead of user_message, so the input rail silently receives an empty string and never blocks anything.

Suggested change

input:

flows:

- hf classifier check output $classifier=named_entity_recognition

output:

input:

flows:

- hf classifier check input $classifier=named_entity_recognition

output:

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/configure-rails/guardrail-catalog/third-party.md Line: 311-314 Comment: Wrong flow name in `input:` section — this is a copy-paste error. `hf classifier check output` is the output rail flow; using it under `input:` means the input path will invoke the output-checking action (`hf_classifier_check_output`), which reads `bot_message` from context instead of `user_message`, so the input rail silently receives an empty string and never blocks anything. ```suggestion input: flows: - hf classifier check input $classifier=named_entity_recognition output: ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-02T10:29:49Z

+
+### Example usage
+
+To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:


The description says "toxicity detection" but the YAML immediately below uses dslim/distilbert-NER — a Named Entity Recognition model that blocks PER, LOC, and ORG labels — which has nothing to do with toxicity. This mismatch will confuse readers trying to understand the use case for this example.

Suggested change

To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:

To set up a single local HuggingFace classifier rail for named entity recognition (NER) on input and output flows, you can use the following configuration:

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/configure-rails/guardrail-catalog/third-party.md Line: 293 Comment: The description says "toxicity detection" but the YAML immediately below uses `dslim/distilbert-NER` — a Named Entity Recognition model that blocks `PER`, `LOC`, and `ORG` labels — which has nothing to do with toxicity. This mismatch will confuse readers trying to understand the use case for this example. ```suggestion To set up a single local HuggingFace classifier rail for named entity recognition (NER) on input and output flows, you can use the following configuration: ``` How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

coderabbitai · 2026-06-02T10:33:30Z

📝 Walkthrough

Walkthrough

This pull request adds comprehensive documentation for the HuggingFace Classifier integration, including supported inference backends (local pipeline, vLLM, KServe, IBM FMS), setup instructions, configuration options, per-rail blocking behavior, practical examples, return semantics, and logging details. The documentation is integrated into the third-party integrations index with navigation updates.

Changes

HuggingFace Classifier Integration Documentation

Layer / File(s)	Summary
Integration overview and supported backends `docs/configure-rails/guardrail-catalog/community/hf-classifier.md`	Introduces the HuggingFace Classifier integration with descriptions of supported inference backends (local HuggingFace pipeline, vLLM classify, KServe predict, IBM FMS endpoints), setup steps for local versus remote deployment, and Colang import guidance for library discovery.
Configuration options and rail-specific behavior `docs/configure-rails/guardrail-catalog/community/hf-classifier.md`	Documents `rails.config.hf_classifier` configuration, classifier wiring through input/output/retrieval flows via `$classifier`, common and backend-specific configuration options (threshold, blocked label handling), and per-rail blocking semantics for each supported engine.
Practical examples with multiple classifiers `docs/configure-rails/guardrail-catalog/community/hf-classifier.md`	Provides a complete multi-rail example combining multiple classifiers (FMS for output, KServe for input, vLLM alternative) with flow wiring and streaming configuration for production use.
Return semantics, logging, and advanced configuration `docs/configure-rails/guardrail-catalog/community/hf-classifier.md`	Documents return value and logging formats for blocked detections and scores, mTLS and custom CA certificate configuration, and end-to-end rail blocking behavior differences between input/output rails (abort with fixed message or exception) and retrieval rails (clear chunks), including streaming completion checking.
Third-party integration reference and navigation `docs/configure-rails/guardrail-catalog/third-party.md`	Adds HuggingFace Classifier entry to the third-party integrations page with descriptive overview and YAML configuration example, and extends the community integrations toctree with a link to the new integration documentation page.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding documentation for the hf_classifier rail feature.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes	✅ Passed	PR is documentation-only (no code changes), documenting an existing feature from PR#1853. Minor changes don't require test results per check criteria.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

docs/configure-rails/guardrail-catalog/community/hf-classifier.md (1)

86-86: ⚡ Quick win

Consider restructuring for improved readability.

The sentence begins with "For" three times in succession when describing different backends. While technically correct, reformatting as a bulleted list would improve scannability.

📝 Suggested restructuring

-Values must match the label strings returned by the model or server. For **local** and **vLLM** backends with `text-classification`, labels come from the model's `id2label` mapping (e.g., `"toxic"`, `"LABEL_1"`). For `token-classification` with `aggregation_strategy`, labels are entity groups with the B-/I- prefix stripped (e.g., `"PER"`, `"LOC"`). For **FMS**, labels come from the `detection_type` field in the server response. For **KServe**, labels are stringified class indices (`"0"`, `"1"`).
+Values must match the label strings returned by the model or server:
+
+- **Local** and **vLLM** backends: For `text-classification`, labels come from the model's `id2label` mapping (e.g., `"toxic"`, `"LABEL_1"`). For `token-classification` with `aggregation_strategy`, labels are entity groups with the B-/I- prefix stripped (e.g., `"PER"`, `"LOC"`).
+- **FMS**: Labels come from the `detection_type` field in the server response.
+- **KServe**: Labels are stringified class indices (`"0"`, `"1"`).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/configure-rails/guardrail-catalog/community/hf-classifier.md` at line
86, Split the long sentence into a short introductory line followed by a
bulleted list that documents each backend separately: one bullet for "local and
vLLM" explaining text-classification uses the model's id2label mapping
(examples: "toxic", "LABEL_1"), one bullet for "token-classification" noting
aggregation_strategy strips B-/I- prefixes to yield entity groups (examples:
"PER", "LOC"), one bullet for "FMS" stating labels come from the detection_type
field in the server response, and one bullet for "KServe" stating labels are
stringified class indices (e.g., "0", "1"); keep the same examples and keywords
(id2label, aggregation_strategy, detection_type, text-classification,
token-classification, FMS, KServe) so readers can map back to implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/configure-rails/guardrail-catalog/third-party.md`:
- Line 293: The description currently states "toxicity detection" but the
example uses the NER model "dslim/distilbert-NER" with entity labels "PER",
"LOC", and "ORG"; update the top-line description to say "named entity
recognition (NER) for input and output flows" to match the example, or
alternatively replace the example model and labels with a toxicity
classification model (e.g., a HuggingFace toxicity classifier and
binary/multi-class labels) so the description and the example are consistent;
adjust any mention of labels or flow behavior accordingly to reference the NER
symbols ("dslim/distilbert-NER", "PER", "LOC", "ORG") if you choose the first
option.
- Line 313: Replace the incorrect input flow action string "hf classifier check
output" with "hf classifier check input" wherever the input flow uses the
classifier command (look for the literal token hf classifier check output in the
docs); update the command text in the input flow description to hf classifier
check input so the documented action matches the intended input check.

---

Nitpick comments:
In `@docs/configure-rails/guardrail-catalog/community/hf-classifier.md`:
- Line 86: Split the long sentence into a short introductory line followed by a
bulleted list that documents each backend separately: one bullet for "local and
vLLM" explaining text-classification uses the model's id2label mapping
(examples: "toxic", "LABEL_1"), one bullet for "token-classification" noting
aggregation_strategy strips B-/I- prefixes to yield entity groups (examples:
"PER", "LOC"), one bullet for "FMS" stating labels come from the detection_type
field in the server response, and one bullet for "KServe" stating labels are
stringified class indices (e.g., "0", "1"); keep the same examples and keywords
(id2label, aggregation_strategy, detection_type, text-classification,
token-classification, FMS, KServe) so readers can map back to implementation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b700040b-273e-45ea-954e-c9677f4d34dd

📥 Commits

Reviewing files that changed from the base of the PR and between 8082e74 and bf06911.

📒 Files selected for processing (2)

docs/configure-rails/guardrail-catalog/community/hf-classifier.md
docs/configure-rails/guardrail-catalog/third-party.md

coderabbitai · 2026-06-02T10:33:34Z

+
+### Example usage
+
+To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the description to match the example.

The description mentions "toxicity detection," but the example demonstrates named entity recognition (NER) using dslim/distilbert-NER with entity labels like "PER", "LOC", and "ORG". Update the description to accurately reflect the example.

📝 Suggested fix

-To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration: +To set up a single local HuggingFace classifier rail for named entity recognition on input and output flows, you can use the following configuration:

Alternatively, if you prefer to demonstrate toxicity detection, replace the example with a toxicity classification model.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:

To set up a single local HuggingFace classifier rail for named entity recognition on input and output flows, you can use the following configuration:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/configure-rails/guardrail-catalog/third-party.md` at line 293, The description currently states "toxicity detection" but the example uses the NER model "dslim/distilbert-NER" with entity labels "PER", "LOC", and "ORG"; update the top-line description to say "named entity recognition (NER) for input and output flows" to match the example, or alternatively replace the example model and labels with a toxicity classification model (e.g., a HuggingFace toxicity classifier and binary/multi-class labels) so the description and the example are consistent; adjust any mention of labels or flow behavior accordingly to reference the NER symbols ("dslim/distilbert-NER", "PER", "LOC", "ORG") if you choose the first option.

coderabbitai · 2026-06-02T10:33:34Z

+
+  input:
+    flows:
+      - hf classifier check output $classifier=named_entity_recognition


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix the input flow action from "check output" to "check input".

The input flow incorrectly uses hf classifier check output when it should use hf classifier check input. This appears to be a copy-paste error.

🔧 Proposed fix

input: flows: - - hf classifier check output $classifier=named_entity_recognition + - hf classifier check input $classifier=named_entity_recognition

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- hf classifier check output $classifier=named_entity_recognition

input:

flows:

- hf classifier check input $classifier=named_entity_recognition

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/configure-rails/guardrail-catalog/third-party.md` at line 313, Replace the incorrect input flow action string "hf classifier check output" with "hf classifier check input" wherever the input flow uses the classifier command (look for the literal token hf classifier check output in the docs); update the command text in the input flow description to hf classifier check input so the documented action matches the intended input check.

codecov · 2026-06-10T08:50:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

m-misiura force-pushed the hf_rail_docs branch from e267abe to bf06911 Compare June 2, 2026 10:23

m-misiura marked this pull request as ready for review June 2, 2026 10:27

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

m-misiura force-pushed the hf_rail_docs branch from d703b87 to 9ce97c4 Compare June 2, 2026 10:40

Pouyanpi requested a review from miyoungc June 3, 2026 11:30

:constrtuction: adapt docs format expected by fern

8923504

m-misiura force-pushed the hf_rail_docs branch from 9ce97c4 to 8923504 Compare June 10, 2026 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: draft documentation for the `hf_classifier` rail#1969

docs: draft documentation for the `hf_classifier` rail#1969
m-misiura wants to merge 1 commit into
NVIDIA-NeMo:developfrom
m-misiura:hf_rail_docs

m-misiura commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading

Confidence Score: 5/5

Flowchart

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Uh oh!

coderabbitai Bot Jun 2, 2026

Uh oh!

codecov Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		### Example usage

		To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:

Conversation

m-misiura commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Checklist

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 2, 2026

Documentation preview

Uh oh!

greptile-apps Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 10, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

m-misiura commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading