Releases
v1.8.0
Compare
Sorry, something went wrong.
No results found
1.8.0 (2026-06-03)
Features
add Codex to agent filters in viewer (a2be23c )
add Codex to agent filters in viewer (#387 ) (2ac6c1f )
add filter_native_tools option to trajectory_matcher to optionally ignore native harness tools during scoring (2c34d94 )
add packaged console script entrypoint to support uvx execution (#385 ) (8ea07f8 )
dea: define EvalDeaRequest input model for conversational evaluations (#407 ) (04b91cf )
opt-in function-calling for the Gemini SDK judge (#409 ) (d97f511 )
Rename package to google-evalbench and decouple viewer dependencies (#390 ) (0d75811 )
scorers: filter native tools in trajectory_matcher with opt-out flag (2c1ab58 )
stabilize Cloud Run deployment and polish standalone CLI UX (#389 ) (4720eef )
support work_dir for claude code eval (#403 ) (179e0d3 )
Bug Fixes
add --no-sync flag to runtime uv run commands to prevent PyPI timeouts (#392 ) (0c4783c )
allow-list files in fake home directory for Gemini CLI (#395 ) (734bc2a )
fix Mesop event routing bug in trends dropdown (023d150 )
gemini-cli: support 'name' parameter key in skill extraction (#378 ) (62400da )
patch absl help output when running via uvx/launcher (85048e2 )
prevent silent errors on DB query timeouts and extend deadline (#406 ) (fbbd31d )
surface eval failures instead of silently terminating or crashing (#398 ) (9c36108 )
You can’t perform that action at this time.