docs: draft documentation for the hf_classifier rail#1969
Conversation
Documentation preview |
Greptile SummaryThis PR adds upstream documentation for the
|
| Filename | Overview |
|---|---|
| docs/configure-rails/guardrail-catalog/community/hf-classifier.mdx | New comprehensive reference page for the hf_classifier rail covering all four inference backends, config options, and per-flow examples; the mTLS section is the only example that omits the flow-activation stanza, leaving it incomplete. |
| docs/configure-rails/guardrail-catalog/third-party.mdx | Adds a HuggingFace Classifier summary section with a valid NER example and a correctly-structured deep-link following the existing /configure-guardrails/guardrail-catalog/third-party/{slug} URL pattern. |
| docs/index.yml | Inserts the hf-classifier page at the correct alphabetical position (between GuardrailsAI and Llama Guard) under the Third-Party APIs community section with the correct path and slug. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User Message] --> B{Input Rail\nhf classifier check input}
B -- blocked --> C[Return: I'm sorry, I can't respond to that.]
B -- allowed --> D[Retrieval\nhf classifier check retrieval]
D -- blocked --> E[Clear all retrieved chunks]
D -- allowed --> F[LLM Generation / Streaming]
F --> G{Output Rail\nhf classifier check output}
G -- blocked --> H[Return: I'm sorry, I can't respond to that.]
G -- allowed --> I[Response to User]
subgraph Backends
J[local\ntransformers pipeline]
K[vllm\n/classify endpoint]
L[kserve\n/v1/models/:predict]
M[fms\n/api/v1/text/contents]
end
B -.uses.-> Backends
D -.uses.-> Backends
G -.uses.-> Backends
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
docs/configure-rails/guardrail-catalog/community/hf-classifier.mdx:263-280
**mTLS example missing flow activation**
Every other example in this document pairs the `config` block with the corresponding `input`, `output`, or `retrieval` flow entry. The mTLS section only shows the `rails.config.hf_classifier` stanza — a reader copying it verbatim would have a correctly-configured TLS connection but no active rail, since without a `flows` entry the classifier is never invoked. Adding even a one-liner showing which flow to attach (e.g., `input.flows: - hf classifier check input $classifier=toxicity`) would make the example self-contained.
Reviews (3): Last reviewed commit: ":constrtuction: adapt docs format expect..." | Re-trigger Greptile
| input: | ||
| flows: | ||
| - hf classifier check output $classifier=named_entity_recognition | ||
| output: |
There was a problem hiding this comment.
Wrong flow name in
input: section — this is a copy-paste error. hf classifier check output is the output rail flow; using it under input: means the input path will invoke the output-checking action (hf_classifier_check_output), which reads bot_message from context instead of user_message, so the input rail silently receives an empty string and never blocks anything.
| input: | |
| flows: | |
| - hf classifier check output $classifier=named_entity_recognition | |
| output: | |
| input: | |
| flows: | |
| - hf classifier check input $classifier=named_entity_recognition | |
| output: |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/configure-rails/guardrail-catalog/third-party.md
Line: 311-314
Comment:
Wrong flow name in `input:` section — this is a copy-paste error. `hf classifier check output` is the output rail flow; using it under `input:` means the input path will invoke the output-checking action (`hf_classifier_check_output`), which reads `bot_message` from context instead of `user_message`, so the input rail silently receives an empty string and never blocks anything.
```suggestion
input:
flows:
- hf classifier check input $classifier=named_entity_recognition
output:
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| ### Example usage | ||
|
|
||
| To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration: |
There was a problem hiding this comment.
The description says "toxicity detection" but the YAML immediately below uses
dslim/distilbert-NER — a Named Entity Recognition model that blocks PER, LOC, and ORG labels — which has nothing to do with toxicity. This mismatch will confuse readers trying to understand the use case for this example.
| To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration: | |
| To set up a single local HuggingFace classifier rail for named entity recognition (NER) on input and output flows, you can use the following configuration: |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/configure-rails/guardrail-catalog/third-party.md
Line: 293
Comment:
The description says "toxicity detection" but the YAML immediately below uses `dslim/distilbert-NER` — a Named Entity Recognition model that blocks `PER`, `LOC`, and `ORG` labels — which has nothing to do with toxicity. This mismatch will confuse readers trying to understand the use case for this example.
```suggestion
To set up a single local HuggingFace classifier rail for named entity recognition (NER) on input and output flows, you can use the following configuration:
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
📝 WalkthroughWalkthroughThis pull request adds comprehensive documentation for the HuggingFace Classifier integration, including supported inference backends (local pipeline, vLLM, KServe, IBM FMS), setup instructions, configuration options, per-rail blocking behavior, practical examples, return semantics, and logging details. The documentation is integrated into the third-party integrations index with navigation updates. ChangesHuggingFace Classifier Integration Documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/configure-rails/guardrail-catalog/community/hf-classifier.md (1)
86-86: ⚡ Quick winConsider restructuring for improved readability.
The sentence begins with "For" three times in succession when describing different backends. While technically correct, reformatting as a bulleted list would improve scannability.
📝 Suggested restructuring
-Values must match the label strings returned by the model or server. For **local** and **vLLM** backends with `text-classification`, labels come from the model's `id2label` mapping (e.g., `"toxic"`, `"LABEL_1"`). For `token-classification` with `aggregation_strategy`, labels are entity groups with the B-/I- prefix stripped (e.g., `"PER"`, `"LOC"`). For **FMS**, labels come from the `detection_type` field in the server response. For **KServe**, labels are stringified class indices (`"0"`, `"1"`). +Values must match the label strings returned by the model or server: + +- **Local** and **vLLM** backends: For `text-classification`, labels come from the model's `id2label` mapping (e.g., `"toxic"`, `"LABEL_1"`). For `token-classification` with `aggregation_strategy`, labels are entity groups with the B-/I- prefix stripped (e.g., `"PER"`, `"LOC"`). +- **FMS**: Labels come from the `detection_type` field in the server response. +- **KServe**: Labels are stringified class indices (`"0"`, `"1"`).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/configure-rails/guardrail-catalog/community/hf-classifier.md` at line 86, Split the long sentence into a short introductory line followed by a bulleted list that documents each backend separately: one bullet for "local and vLLM" explaining text-classification uses the model's id2label mapping (examples: "toxic", "LABEL_1"), one bullet for "token-classification" noting aggregation_strategy strips B-/I- prefixes to yield entity groups (examples: "PER", "LOC"), one bullet for "FMS" stating labels come from the detection_type field in the server response, and one bullet for "KServe" stating labels are stringified class indices (e.g., "0", "1"); keep the same examples and keywords (id2label, aggregation_strategy, detection_type, text-classification, token-classification, FMS, KServe) so readers can map back to implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/configure-rails/guardrail-catalog/third-party.md`:
- Line 293: The description currently states "toxicity detection" but the
example uses the NER model "dslim/distilbert-NER" with entity labels "PER",
"LOC", and "ORG"; update the top-line description to say "named entity
recognition (NER) for input and output flows" to match the example, or
alternatively replace the example model and labels with a toxicity
classification model (e.g., a HuggingFace toxicity classifier and
binary/multi-class labels) so the description and the example are consistent;
adjust any mention of labels or flow behavior accordingly to reference the NER
symbols ("dslim/distilbert-NER", "PER", "LOC", "ORG") if you choose the first
option.
- Line 313: Replace the incorrect input flow action string "hf classifier check
output" with "hf classifier check input" wherever the input flow uses the
classifier command (look for the literal token hf classifier check output in the
docs); update the command text in the input flow description to hf classifier
check input so the documented action matches the intended input check.
---
Nitpick comments:
In `@docs/configure-rails/guardrail-catalog/community/hf-classifier.md`:
- Line 86: Split the long sentence into a short introductory line followed by a
bulleted list that documents each backend separately: one bullet for "local and
vLLM" explaining text-classification uses the model's id2label mapping
(examples: "toxic", "LABEL_1"), one bullet for "token-classification" noting
aggregation_strategy strips B-/I- prefixes to yield entity groups (examples:
"PER", "LOC"), one bullet for "FMS" stating labels come from the detection_type
field in the server response, and one bullet for "KServe" stating labels are
stringified class indices (e.g., "0", "1"); keep the same examples and keywords
(id2label, aggregation_strategy, detection_type, text-classification,
token-classification, FMS, KServe) so readers can map back to implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b700040b-273e-45ea-954e-c9677f4d34dd
📒 Files selected for processing (2)
docs/configure-rails/guardrail-catalog/community/hf-classifier.mddocs/configure-rails/guardrail-catalog/third-party.md
|
|
||
| ### Example usage | ||
|
|
||
| To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration: |
There was a problem hiding this comment.
Update the description to match the example.
The description mentions "toxicity detection," but the example demonstrates named entity recognition (NER) using dslim/distilbert-NER with entity labels like "PER", "LOC", and "ORG". Update the description to accurately reflect the example.
📝 Suggested fix
-To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration:
+To set up a single local HuggingFace classifier rail for named entity recognition on input and output flows, you can use the following configuration:Alternatively, if you prefer to demonstrate toxicity detection, replace the example with a toxicity classification model.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| To set up a single local HuggingFace classifier rail for toxicity detection on input and output flows, you can use the following configuration: | |
| To set up a single local HuggingFace classifier rail for named entity recognition on input and output flows, you can use the following configuration: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/configure-rails/guardrail-catalog/third-party.md` at line 293, The
description currently states "toxicity detection" but the example uses the NER
model "dslim/distilbert-NER" with entity labels "PER", "LOC", and "ORG"; update
the top-line description to say "named entity recognition (NER) for input and
output flows" to match the example, or alternatively replace the example model
and labels with a toxicity classification model (e.g., a HuggingFace toxicity
classifier and binary/multi-class labels) so the description and the example are
consistent; adjust any mention of labels or flow behavior accordingly to
reference the NER symbols ("dslim/distilbert-NER", "PER", "LOC", "ORG") if you
choose the first option.
|
|
||
| input: | ||
| flows: | ||
| - hf classifier check output $classifier=named_entity_recognition |
There was a problem hiding this comment.
Fix the input flow action from "check output" to "check input".
The input flow incorrectly uses hf classifier check output when it should use hf classifier check input. This appears to be a copy-paste error.
🔧 Proposed fix
input:
flows:
- - hf classifier check output $classifier=named_entity_recognition
+ - hf classifier check input $classifier=named_entity_recognition📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - hf classifier check output $classifier=named_entity_recognition | |
| input: | |
| flows: | |
| - hf classifier check input $classifier=named_entity_recognition |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/configure-rails/guardrail-catalog/third-party.md` at line 313, Replace
the incorrect input flow action string "hf classifier check output" with "hf
classifier check input" wherever the input flow uses the classifier command
(look for the literal token hf classifier check output in the docs); update the
command text in the input flow description to hf classifier check input so the
documented action matches the intended input check.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Description
This PR add upstream documentation for
hf_classifierrailRelated Issue(s)
Upstream documentation is needed for the feature introduced in PR#1853
Checklist
cc @miyoungc @Pouyanpi @tgasser-nv
Summary by CodeRabbit