feat: add response format handling in judge rubric #332

lakshyaag · 2025-09-16T05:25:57Z

Description

From #331
This PR adds response_format to the default judge rubric parameters. On specifying a OpenAI-compatible ResponseFormat, the request will use the .parse() method in the OpenAI SDK.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass
New tests have been added to cover the changes
Tests have been run locally with uv run pytest

Test Coverage

Current coverage: ___%
Coverage after changes: ___%

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

wcummings · 2025-09-17T16:51:46Z

It would also be nice to be able to use response_format/structured responses in the user_client, or is this better handled with tool calling?

willccbb · 2025-09-18T18:14:03Z

@lakshyaag Can this not already be done by passing response_format via sampling_args? Would rather have that be the all-in-one route for expressing additional configurations rather than explicitly adding each one.

lakshyaag · 2025-09-19T19:39:40Z

@willccbb I'll look into it - should be able to support it.

lakshyaag · 2025-09-20T00:06:25Z

@cursoragent look into it

* Refactor: Move response_format to judge_sampling_args Co-authored-by: lakshyajannu <[email protected]> * feat: Use chat completions parse for structured outputs This change routes chat completion requests to the `parse` API when a `response_format` is specified in the sampling arguments. This ensures that structured outputs are correctly handled. The mock client has also been updated to support this new functionality. Co-authored-by: lakshyajannu <[email protected]> * refactor: Update mock responses and enhance parameter validation This commit modifies the mock responses in the test suite to align with the new structure of parsed chat completions. It also enhances the parameter validation in the `CalculatorResponse` model to include specific properties, ensuring better type safety and clarity in the tests. Additionally, the environment class has been refactored to streamline the handling of response formats and improve the readability of the chat completion request logic. --------- Co-authored-by: Cursor Agent <[email protected]>

lakshyaag added 2 commits September 16, 2025 01:23

add response format handling in judge rubric

74a293e

add tests for judge rubric, add parse to mock oai client

dfcea20

lakshyaag marked this pull request as ready for review September 16, 2025 05:26

lakshyaag and others added 2 commits September 19, 2025 20:40

Merge branch 'main' into lakshya/judge-rubric-structured-outputs

e21c0c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add response format handling in judge rubric #332

feat: add response format handling in judge rubric #332

Uh oh!

lakshyaag commented Sep 16, 2025 •

edited

Loading

Uh oh!

wcummings commented Sep 17, 2025 •

edited

Loading

Uh oh!

willccbb commented Sep 18, 2025

Uh oh!

lakshyaag commented Sep 19, 2025

Uh oh!

lakshyaag commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add response format handling in judge rubric #332

Are you sure you want to change the base?

feat: add response format handling in judge rubric #332

Uh oh!

Conversation

lakshyaag commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Test Coverage

Checklist

Additional Notes

Uh oh!

wcummings commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willccbb commented Sep 18, 2025

Uh oh!

lakshyaag commented Sep 19, 2025

Uh oh!

lakshyaag commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lakshyaag commented Sep 16, 2025 •

edited

Loading

wcummings commented Sep 17, 2025 •

edited

Loading