Skip to content

Added evaluation readme#82

Merged
rkritika1508 merged 5 commits intomainfrom
feat/evaluation-readme
Apr 10, 2026
Merged

Added evaluation readme#82
rkritika1508 merged 5 commits intomainfrom
feat/evaluation-readme

Conversation

@rkritika1508
Copy link
Copy Markdown
Collaborator

@rkritika1508 rkritika1508 commented Apr 1, 2026

Summary

Target issue is #77.
Explain the motivation for making this change. What existing problem does the pull request solve?
We have different ways of evaluating each validator. We also have different datasets for each validator. So, we should have a separate markdown in the evaluations folder which contains details about script evaluations, the details about the datasets, how to execute the scripts and how to infer the metrics, etc.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • Documentation
    • Consolidated and expanded evaluation docs: moved detailed evaluation workflow into a dedicated evaluation guide and simplified top-level instructions to link there. Covers offline and end-to-end multi-validator evaluation, prerequisites and dataset placement, how to run validators, generated outputs (predictions and metrics), and guidance for interpreting classification, PII, topic-relevance, and performance metrics.

@rkritika1508 rkritika1508 changed the title added evaluation readme Added evaluation readme Apr 1, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Consolidated evaluation docs by removing detailed "Running evaluation tests" from backend/README.md and adding a comprehensive offline evaluation guide at backend/app/evaluation/README.md describing folder layout, prerequisites, validator runs, datasets, outputs, and metrics interpretation.

Changes

Cohort / File(s) Summary
Backend README change
backend/README.md
Removed detailed "Running evaluation tests" instructions and redirected readers to the new evaluation README.
Evaluation documentation (new)
backend/app/evaluation/README.md
Added comprehensive offline evaluation guide: folder layout, local prerequisites, dataset filenames/locations, per-validator run.py invocation details, scripts/run_all_evaluations.sh usage, ban-list and topic-relevance specifics, multi-validator live-API run, expected predictions.csv/metrics.json outputs, and metrics interpretation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • nishika26
  • AkhileshNegi

Poem

🥕📘 I hopped through docs with tidy delight,
Moved tests to their room and set them right.
Validators hum in a neat little row,
Datasets lined up, metrics all aglow.
A rabbit’s applause for documentation bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Added evaluation readme' is vague and generic, using a minimal descriptor that doesn't specify which evaluation component or the key purpose of the documentation change. Consider a more specific title like 'Add comprehensive evaluation guide for validator suites' or 'Document offline evaluation workflow and validator setup' to better convey the scope and purpose of the new evaluation documentation.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/evaluation-readme

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rkritika1508 rkritika1508 self-assigned this Apr 1, 2026
@rkritika1508 rkritika1508 linked an issue Apr 1, 2026 that may be closed by this pull request
@rkritika1508 rkritika1508 added documentation Improvements or additions to documentation ready-for-review labels Apr 1, 2026
@rkritika1508 rkritika1508 moved this to To Do in Kaapi-dev Apr 1, 2026
@rkritika1508 rkritika1508 moved this from To Do to In Progress in Kaapi-dev Apr 1, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/evaluation/README.md (1)

99-103: Standardize run commands to uv run python for consistency.

The guide mixes virtualenv activation guidance with python3 invocations, while earlier it states scripts are run via uv run python. Using one convention avoids interpreter mismatch.

♻️ Suggested doc update
-```bash
-python3 app/evaluation/<validator_folder>/run.py
-```
+```bash
+uv run python app/evaluation/<validator_folder>/run.py
+```

Also applies to: 127-129, 158-159, 186-187, 215-216, 251-252

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/README.md` around lines 99 - 103, Replace occurrences
of the direct python3 run command with the standardized uv run python
invocation: change instances like "python3
app/evaluation/<validator_folder>/run.py" to "uv run python
app/evaluation/<validator_folder>/run.py" in the README examples (the block
shown at lines 99-103) and the other listed examples (lines 127-129, 158-159,
186-187, 215-216, 251-252) to ensure consistency with the earlier stated "uv run
python" convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/evaluation/README.md`:
- Around line 301-306: The README currently shows running
app/evaluation/multiple_validators/run.py with a plaintext --auth_token which
can leak secrets; update the example to use an environment variable (e.g.,
export AUTH_TOKEN="<your-token>") and pass it into the script as --auth_token
"$AUTH_TOKEN" or describe using a secure prompt/secret manager instead, and
update the README snippet around run.py and the --auth_token usage to recommend
env var injection rather than hardcoding tokens on the command line.

---

Nitpick comments:
In `@backend/app/evaluation/README.md`:
- Around line 99-103: Replace occurrences of the direct python3 run command with
the standardized uv run python invocation: change instances like "python3
app/evaluation/<validator_folder>/run.py" to "uv run python
app/evaluation/<validator_folder>/run.py" in the README examples (the block
shown at lines 99-103) and the other listed examples (lines 127-129, 158-159,
186-187, 215-216, 251-252) to ensure consistency with the earlier stated "uv run
python" convention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 234340a6-da44-465f-80bc-0bcdcf204b9d

📥 Commits

Reviewing files that changed from the base of the PR and between 791820f and deee37b.

📒 Files selected for processing (2)
  • backend/README.md
  • backend/app/evaluation/README.md

### Setup

1. Ensure `GUARDRAILS_API_URL` is set in your `.env` file (see `.env.example`). Optionally set `GUARDRAILS_TIMEOUT_SECONDS` (default: `60`).
2. Ensure the API is running and accessible at the configured URL.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API? do you mean the server?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guardrails endpoint

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/evaluation/README.md`:
- Around line 7-52: The README's fenced code blocks in
backend/app/evaluation/README.md are missing language identifiers (triggering
MD040); update each triple-backtick block (e.g., the directory tree block
starting with "backend/app/evaluation/" and the multiple outputs examples like
"outputs/lexical_slur/predictions.csv", "outputs/pii_remover/metrics.json",
"outputs/ban_list/<name>-metrics.json",
"outputs/topic_relevance/<domain>-metrics.json", and the
multi_validator_whatsapp outputs) to include appropriate languages (use text for
file/path listings, json for .json snippets, bash for command examples) so all
shown blocks have a language tag. Ensure every affected block mentioned in the
comment (around lines 127-130, 155-158, 185-188, 214-217, 248-251, 316-318) is
updated.
- Line 312: The documentation currently uses a code span with a trailing space
around the Bearer prefix (`` `Bearer ` ``) which violates MD038; update the
README text that describes the `--auth_token` argument to use a code span
without the trailing space (`` `Bearer` ``) so the phrase reads "without the
`Bearer` prefix" and ensure the `--auth_token` reference remains unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9f89636-228e-402b-9d93-f886daa6ce9b

📥 Commits

Reviewing files that changed from the base of the PR and between deee37b and 4855059.

📒 Files selected for processing (1)
  • backend/app/evaluation/README.md

@nishika26 nishika26 moved this from In Progress to In Review in Kaapi-dev Apr 10, 2026
@rkritika1508 rkritika1508 merged commit 7e7fce6 into main Apr 10, 2026
2 checks passed
@rkritika1508 rkritika1508 deleted the feat/evaluation-readme branch April 10, 2026 06:45
@github-project-automation github-project-automation bot moved this from In Review to Closed in Kaapi-dev Apr 10, 2026
rkritika1508 added a commit that referenced this pull request Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready-for-review

Projects

Status: Closed

Development

Successfully merging this pull request may close these issues.

Add a separate markdown file for evaluation

3 participants