Skip to content

Conversation

lakshyaag
Copy link
Contributor

@lakshyaag lakshyaag commented Sep 16, 2025

Description

From #331
This PR adds response_format to the default judge rubric parameters. On specifying a OpenAI-compatible ResponseFormat, the request will use the .parse() method in the OpenAI SDK.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass
  • New tests have been added to cover the changes
  • Tests have been run locally with uv run pytest

Test Coverage

  • Current coverage: ___%
  • Coverage after changes: ___%

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

@lakshyaag lakshyaag marked this pull request as ready for review September 16, 2025 05:26
@wcummings
Copy link

wcummings commented Sep 17, 2025

It would also be nice to be able to use response_format/structured responses in the user_client, or is this better handled with tool calling?

@willccbb
Copy link
Member

@lakshyaag Can this not already be done by passing response_format via sampling_args? Would rather have that be the all-in-one route for expressing additional configurations rather than explicitly adding each one.

@lakshyaag
Copy link
Contributor Author

@willccbb I'll look into it - should be able to support it.

@lakshyaag
Copy link
Contributor Author

@cursoragent look into it

lakshyaag and others added 2 commits September 19, 2025 20:40
* Refactor: Move response_format to judge_sampling_args

Co-authored-by: lakshyajannu <[email protected]>

* feat: Use chat completions parse for structured outputs

This change routes chat completion requests to the `parse` API when a `response_format` is specified in the sampling arguments. This ensures that structured outputs are correctly handled. The mock client has also been updated to support this new functionality.

Co-authored-by: lakshyajannu <[email protected]>

* refactor: Update mock responses and enhance parameter validation

This commit modifies the mock responses in the test suite to align with the new structure of parsed chat completions. It also enhances the parameter validation in the `CalculatorResponse` model to include specific properties, ensuring better type safety and clarity in the tests. Additionally, the environment class has been refactored to streamline the handling of response formats and improve the readability of the chat completion request logic.

---------

Co-authored-by: Cursor Agent <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants