Commit 5cfef01
fix(eval): handle unevaluated final response v2 results
Merge #5728
## Summary
Fixes a small aggregation edge case in `FinalResponseMatchV2Evaluator`: when every per-invocation result is skipped or not evaluated, the evaluator currently divides by zero while computing the overall score.
## Root Cause
`aggregate_invocation_results()` filters out results whose `score` is `None` or whose `eval_status` is `NOT_EVALUATED`, but it unconditionally computes:
```python
overall_score = num_valid / num_evaluated
```
If all judge samples fail to produce a usable score, `num_evaluated` remains `0` and evaluation crashes instead of returning a not-evaluated aggregate result. Other ADK evaluators handle this condition by returning `overall_score=None` and `overall_eval_status=NOT_EVALUATED`.
## Change
- Return an `EvaluationResult` with `overall_score=None` and `overall_eval_status=NOT_EVALUATED` when no FinalResponseMatchV2 invocation results are evaluable.
- Add a focused regression test for all-skipped/all-not-evaluated invocation results.
## Validation
```bash
uv sync --extra test
uv run pytest tests/unittests/evaluation/test_final_response_match_v2.py
```
Result: `18 passed, 20 warnings`.
Full unit suite was not run; this patch is limited to FinalResponseMatchV2 aggregation and its targeted unit test file.
Co-authored-by: Haran Rajkumar <haranrk@google.com>
COPYBARA_INTEGRATE_REVIEW=#5728 from pragnyanramtha:pragnyan/final-response-v2-no-eval-guard 3d5ab73
PiperOrigin-RevId: 9338182721 parent a546bcf commit 5cfef01
2 files changed
Lines changed: 39 additions & 0 deletions
File tree
- src/google/adk/evaluation
- tests/unittests/evaluation
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
240 | 248 | | |
241 | 249 | | |
242 | 250 | | |
| |||
Lines changed: 31 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
561 | 561 | | |
562 | 562 | | |
563 | 563 | | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
0 commit comments