Conversation
📝 WalkthroughWalkthroughConsolidated evaluation docs by removing detailed "Running evaluation tests" from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/evaluation/README.md (1)
99-103: Standardize run commands touv run pythonfor consistency.The guide mixes virtualenv activation guidance with
python3invocations, while earlier it states scripts are run viauv run python. Using one convention avoids interpreter mismatch.♻️ Suggested doc update
-```bash -python3 app/evaluation/<validator_folder>/run.py -``` +```bash +uv run python app/evaluation/<validator_folder>/run.py +```Also applies to: 127-129, 158-159, 186-187, 215-216, 251-252
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/evaluation/README.md` around lines 99 - 103, Replace occurrences of the direct python3 run command with the standardized uv run python invocation: change instances like "python3 app/evaluation/<validator_folder>/run.py" to "uv run python app/evaluation/<validator_folder>/run.py" in the README examples (the block shown at lines 99-103) and the other listed examples (lines 127-129, 158-159, 186-187, 215-216, 251-252) to ensure consistency with the earlier stated "uv run python" convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/evaluation/README.md`:
- Around line 301-306: The README currently shows running
app/evaluation/multiple_validators/run.py with a plaintext --auth_token which
can leak secrets; update the example to use an environment variable (e.g.,
export AUTH_TOKEN="<your-token>") and pass it into the script as --auth_token
"$AUTH_TOKEN" or describe using a secure prompt/secret manager instead, and
update the README snippet around run.py and the --auth_token usage to recommend
env var injection rather than hardcoding tokens on the command line.
---
Nitpick comments:
In `@backend/app/evaluation/README.md`:
- Around line 99-103: Replace occurrences of the direct python3 run command with
the standardized uv run python invocation: change instances like "python3
app/evaluation/<validator_folder>/run.py" to "uv run python
app/evaluation/<validator_folder>/run.py" in the README examples (the block
shown at lines 99-103) and the other listed examples (lines 127-129, 158-159,
186-187, 215-216, 251-252) to ensure consistency with the earlier stated "uv run
python" convention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 234340a6-da44-465f-80bc-0bcdcf204b9d
📒 Files selected for processing (2)
backend/README.mdbackend/app/evaluation/README.md
| ### Setup | ||
|
|
||
| 1. Ensure `GUARDRAILS_API_URL` is set in your `.env` file (see `.env.example`). Optionally set `GUARDRAILS_TIMEOUT_SECONDS` (default: `60`). | ||
| 2. Ensure the API is running and accessible at the configured URL. |
There was a problem hiding this comment.
API? do you mean the server?
There was a problem hiding this comment.
The guardrails endpoint
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/evaluation/README.md`:
- Around line 7-52: The README's fenced code blocks in
backend/app/evaluation/README.md are missing language identifiers (triggering
MD040); update each triple-backtick block (e.g., the directory tree block
starting with "backend/app/evaluation/" and the multiple outputs examples like
"outputs/lexical_slur/predictions.csv", "outputs/pii_remover/metrics.json",
"outputs/ban_list/<name>-metrics.json",
"outputs/topic_relevance/<domain>-metrics.json", and the
multi_validator_whatsapp outputs) to include appropriate languages (use text for
file/path listings, json for .json snippets, bash for command examples) so all
shown blocks have a language tag. Ensure every affected block mentioned in the
comment (around lines 127-130, 155-158, 185-188, 214-217, 248-251, 316-318) is
updated.
- Line 312: The documentation currently uses a code span with a trailing space
around the Bearer prefix (`` `Bearer ` ``) which violates MD038; update the
README text that describes the `--auth_token` argument to use a code span
without the trailing space (`` `Bearer` ``) so the phrase reads "without the
`Bearer` prefix" and ensure the `--auth_token` reference remains unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9f89636-228e-402b-9d93-f886daa6ce9b
📒 Files selected for processing (1)
backend/app/evaluation/README.md
Summary
Target issue is #77.
Explain the motivation for making this change. What existing problem does the pull request solve?
We have different ways of evaluating each validator. We also have different datasets for each validator. So, we should have a separate markdown in the evaluations folder which contains details about script evaluations, the details about the datasets, how to execute the scripts and how to infer the metrics, etc.
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit