Skip to content

v1.8.0

Choose a tag to compare

@release-please release-please released this 04 Jun 00:33
c4f9c2a

1.8.0 (2026-06-03)

Features

  • add Codex to agent filters in viewer (a2be23c)
  • add Codex to agent filters in viewer (#387) (2ac6c1f)
  • add filter_native_tools option to trajectory_matcher to optionally ignore native harness tools during scoring (2c34d94)
  • add packaged console script entrypoint to support uvx execution (#385) (8ea07f8)
  • dea: define EvalDeaRequest input model for conversational evaluations (#407) (04b91cf)
  • opt-in function-calling for the Gemini SDK judge (#409) (d97f511)
  • Rename package to google-evalbench and decouple viewer dependencies (#390) (0d75811)
  • scorers: filter native tools in trajectory_matcher with opt-out flag (2c1ab58)
  • stabilize Cloud Run deployment and polish standalone CLI UX (#389) (4720eef)
  • support work_dir for claude code eval (#403) (179e0d3)

Bug Fixes

  • add --no-sync flag to runtime uv run commands to prevent PyPI timeouts (#392) (0c4783c)
  • allow-list files in fake home directory for Gemini CLI (#395) (734bc2a)
  • fix Mesop event routing bug in trends dropdown (023d150)
  • gemini-cli: support 'name' parameter key in skill extraction (#378) (62400da)
  • patch absl help output when running via uvx/launcher (85048e2)
  • prevent silent errors on DB query timeouts and extend deadline (#406) (fbbd31d)
  • surface eval failures instead of silently terminating or crashing (#398) (9c36108)