Support reasoning summary models in AzureOpenAIEvalClient by taniokay · Pull Request #216 · citadel-ai/langcheck

taniokay · 2025-11-03T05:08:57Z

LiteLLMEvalClient already supports Responses API, but litellm.responses is still in beta, so this PR supports Responses API in AzureOpenAIEvalClient too.

Note

Adds Responses API-based reasoning summary to OpenAI and Azure eval clients, adjusts outputs/logprobs behavior, improves error logging, and bumps version.

Eval Clients (OpenAI/Azure):
- Add optional reasoning summary support (use_reasoning_summary, reasoning_effort, reasoning_summary).
- Route requests via new _dispatch to switch between Chat Completions and Responses API.
- In get_text_responses, extract content and append reasoning summaries when enabled.
- Disallow log-likelihood retrieval when reasoning summary is enabled; refine logprobs config handling.
Error Handling:
- Print full stack traces for caught exceptions in LiteLLM/OpenAI paths.
Version:
- Bump __version__ and project version to 0.10.0.dev12.

^{Written by Cursor Bugbot for commit d98842e. This will update automatically on new commits. Configure here.}

kennysong · 2025-11-03T08:17:21Z

bugbot run

Copilot

Pull Request Overview

This PR adds support for OpenAI's reasoning summary feature to the OpenAI evaluation clients, along with enhanced error logging using traceback for better debugging. The changes introduce new parameters to enable reasoning summaries and refactor the API dispatch logic to support both the Chat Completions API and the Responses API.

Key changes:

Added reasoning summary parameters (use_reasoning_summary, reasoning_effort, reasoning_summary) to OpenAIEvalClient and AzureOpenAIEvalClient
Introduced a new _dispatch method to handle routing between Chat Completions API and Responses API
Enhanced exception handling with traceback.print_exception() calls in multiple places
Updated response processing logic to extract and format reasoning summaries

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/langcheck/metrics/eval_clients/_openai.py	Added reasoning summary support with new parameters, refactored API dispatch logic, enhanced error logging with traceback
src/langcheck/metrics/eval_clients/_litellm.py	Added traceback printing for better error debugging

Comments suppressed due to low confidence (3)

src/langcheck/metrics/eval_clients/_openai.py:554

Corrected spelling of 'Intialize' to 'Initialize'.

        Intialize the Azure OpenAI evaluation client.

src/langcheck/metrics/eval_clients/_openai.py:56

The docstring is missing documentation for the newly added parameters: use_reasoning_summary, reasoning_effort, and reasoning_summary. These should be documented to explain their purpose and usage.

            openai_client (Optional): The OpenAI client to use.
            openai_args (Optional): dict of additional args to pass in to the
            `client.chat.completions.create` function.
            use_async: If True, the async client will be used. Defaults to
                False.
            system_prompt (Optional): The system prompt to use. If not provided,
                no system prompt will be used.
            extractor (Optional): The extractor to use. If not provided, the
                default extractor will be used.

src/langcheck/metrics/eval_clients/_openai.py:572

The docstring is missing documentation for the newly added parameters: use_reasoning_summary, reasoning_effort, and reasoning_summary. These should be documented to explain their purpose and usage.

            text_model_name (Optional): The text model name you want to use with
                the Azure OpenAI API. The name is used as
                `{ "model": text_model_name }` parameter when calling the Azure
                OpenAI API for text models.
            embedding_model_name (Optional): The text model name you want to
                use with the Azure OpenAI API. The name is used as
                `{ "model": embedding_model_name }` parameter when calling the
                Azure OpenAI API for embedding models.
            azure_openai_client (Optional): The Azure OpenAI client to use.
            openai_args (Optional): dict of additional args to pass in to the
                `client.chat.completions.create` function.
            use_async (Optional): If True, the async client will be used.
            system_prompt (Optional): The system prompt to use. If not provided,
                no system prompt will be used.
            extractor (Optional): The extractor to use. If not provided, the
                default extractor will be used.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

kennysong

Done reviewing!

kennysong · 2025-11-03T09:01:24Z

+            }
+
+            # seed and logprobs are not supported in responses API.
+            return self._client.responses.create(


Ah nice, I was wondering if we set this properly to avoid logging prompts

Co-authored-by: Kenny Song <kenny.ysong@gmail.com>

kennysong · 2025-11-03T09:51:44Z

@@ -2,12 +2,14 @@



Let's also bump the version in this PR!

https://langcheck.readthedocs.io/en/latest/contributing.html#publishing

kennysong · 2025-11-03T10:57:41Z

LGTM after bumping the version!

taniokay · 2025-11-03T11:04:54Z

Thanks for your quick review!

taniokay added 3 commits November 3, 2025 04:44

feat: support reasoning summary in AzureOpenAIEvalClient

bfab9a4

feat: add stack trace

c6ae1c4

fix: config propagation

43977cf

kennysong requested a review from Copilot November 3, 2025 08:17

Copilot AI reviewed Nov 3, 2025

View reviewed changes

Comment thread src/langcheck/metrics/eval_clients/_openai.py Outdated

Comment thread src/langcheck/metrics/eval_clients/_openai.py

Comment thread src/langcheck/metrics/eval_clients/_openai.py

This comment was marked as outdated.

Sign in to view

Update src/langcheck/metrics/eval_clients/_openai.py

9516f23

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

kennysong reviewed Nov 3, 2025

View reviewed changes

Comment thread src/langcheck/metrics/eval_clients/_openai.py

Comment thread src/langcheck/metrics/eval_clients/_openai.py

Comment thread src/langcheck/metrics/eval_clients/_openai.py

fix: remove top_logprobs from dispatch args

ce02ed2

This comment was marked as outdated.

Sign in to view

add docstrings

8409f03

kennysong reviewed Nov 3, 2025

View reviewed changes

Comment thread src/langcheck/metrics/eval_clients/_openai.py

Update src/langcheck/metrics/eval_clients/_openai.py

8d826a3

Co-authored-by: Kenny Song <kenny.ysong@gmail.com>

This comment was marked as outdated.

Sign in to view

fix: indentation

0ae2cbd

kennysong reviewed Nov 3, 2025

View reviewed changes

kennysong approved these changes Nov 3, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

fix: only allow logprobs for non reasoning models

422884c

taniokay force-pushed the azure-responses branch from 7a75374 to 422884c Compare November 3, 2025 10:33

bump the version to 0.10.0.dev12

d98842e

cursor Bot reviewed Nov 3, 2025

View reviewed changes

Comment thread src/langcheck/metrics/eval_clients/_openai.py

taniokay merged commit d6e492f into main Nov 3, 2025
27 checks passed

taniokay deleted the azure-responses branch November 3, 2025 11:11

Conversation

taniokay commented Nov 3, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kennysong commented Nov 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

kennysong left a comment

Choose a reason for hiding this comment

Uh oh!

kennysong Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

kennysong Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

kennysong commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taniokay commented Nov 3, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taniokay commented Nov 3, 2025 •

edited by cursor Bot

Loading

kennysong commented Nov 3, 2025 •

edited

Loading