Skip to content

Conversation

rtyer
Copy link

@rtyer rtyer commented Sep 30, 2025

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

This addresses #3395. The underlying issue was simply that the responses_wrapper code didn't support streaming. I tried to follow the ChatStream approach taken for adding streaming support as closely as I was able to given the API differences with Responses.

Screenshot of this working w/ Langfuse. It's hard to demonstrate it specifically being the streaming response. If ya'll have suggestions on how best to demonstrate this, let me know.

Screenshot 2025-10-02 at 9 52 35 AM

Important

Adds streaming support to OpenAI responses API with telemetry and tests for sync and async operations.

  • Behavior:
    • Adds streaming support to OpenAI responses API with ResponseStream class in responses_wrappers.py.
    • Updates responses_get_or_create_wrapper and async_responses_get_or_create_wrapper to handle telemetry for streaming responses.
    • Captures telemetry data including span lifecycle, token counts, and response metrics.
  • Tests:
    • Adds tests in test_responses_streaming.py for streaming responses, async responses, and context manager usage.
    • Verifies span creation and telemetry data capture for streaming responses.
  • Misc:
    • Introduces _with_responses_telemetry_wrapper in utils.py for telemetry handling.

This description was created by Ellipsis for af3a2a4. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features
    • Streaming responses now return a managed streaming object that tracks span lifecycle, records richer traced data (inputs, outputs, model, reasoning, usage, and chunked content), supports parse flows, and ensures proper cleanup on exit or error.
    • Response flows now accept and propagate additional telemetry metrics (token & choice counts, request duration, exception tracking, and streaming time-to-first-token/time-to-generate), with distributed tracing around API calls.
  • Tests
    • Added comprehensive tests for streaming (sync & async), context-manager usage, context propagation, content/chunk extraction, and error scenarios.

- Add ResponseStream class to wrap streaming responses
- Create OpenTelemetry spans for streaming responses
- Handle both sync and async streaming
- Add comprehensive test coverage including test for issue traceloop#3395

The responses API was returning raw Stream objects without any
instrumentation, causing no spans to be generated. This fix wraps
streams in a ResponseStream class that manages span lifecycle.

Fixes traceloop#3395
- Match ChatStream architecture: create spans before streaming starts
- Add comprehensive metrics support (tokens, duration, time-to-first-token)
- Implement proper garbage collection with __del__ method
- Add @dont_throw decorators for safe cleanup
- Pass metric instruments from v1/__init__.py to wrapper functions
- Clean up duplicate exception handling code
- Consolidate test_issue_3395.py into test_responses_streaming.py
@CLAassistant
Copy link

CLAassistant commented Sep 30, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

coderabbitai bot commented Sep 30, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a responses-specific telemetry wrapper factory and threads six telemetry metric objects through v1 responses wrappers; implements ResponseStream proxy classes to manage streaming spans, metrics, and traced data for sync/async flows; updates wrapper signatures/decorators and adds tests for streaming traces and attributes.

Changes

Cohort / File(s) Summary of Changes
Wrapper factory (utils)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py
Adds _with_responses_telemetry_wrapper(func), a higher-order wrapper factory that returns wrappers accepting a tracer plus six telemetry metric objects and forwards them into response wrappers.
v1 init wiring
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
Wires Responses.create/retrieve and AsyncResponses.create/retrieve (and parse variants) to pass tracer plus six metric objects (tokens_histogram, chat_choice_counter, duration_histogram, chat_exception_counter, streaming_time_to_first_token, streaming_time_to_generate) into the response wrappers; adds unwraps for parse.
Responses wrappers & streaming
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Adds ResponseStreamBase and ResponseStream (sync + async) using wrapt.ObjectProxy; introduces streaming telemetry/path that starts CLIENT spans for openai.response, records metrics and attributes, updates richer TracedData during streaming, and ensures spans/metrics are ended on close or errors; updates wrapper signatures and decorates them with the new _with_responses_telemetry_wrapper.
Streaming trace tests
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py
Adds tests for sync and async streaming instrumentation covering span creation, attributes (gen_ai.system, model, prompt/completion content/roles), chunk iteration/collection, context-manager usage, error-path span finalization, and streamed content verification.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor App
  participant OpenAI as OpenAI Client
  participant Wrapper as responses_*_wrapper
  participant OTel as OTel Tracer/Metrics
  participant Stream as ResponseStream

  App->>OpenAI: Responses.create/retrieve(...)
  OpenAI->>Wrapper: instrumentation wrapper invoked (tracer + metrics)
  Wrapper->>OTel: start CLIENT span (openai.response)
  Wrapper->>OpenAI: call underlying API
  alt non-streaming
    OpenAI-->>Wrapper: response object
    Wrapper->>OTel: record metrics & attrs, end span
    Wrapper-->>App: response
  else streaming
    OpenAI-->>Wrapper: streaming iterator/async iterator
    Wrapper->>Stream: wrap iterator with ResponseStream (attach span/telemetry)
    Wrapper-->>App: ResponseStream proxy
    loop each chunk
      App->>Stream: iterate/read chunk
      Stream->>OTel: update TTFT/T2G, counters, traced data
    end
    Stream->>OTel: end span on close/exit/error
  end
Loading
sequenceDiagram
  autonumber
  actor App
  participant AsyncOpenAI as Async OpenAI Client
  participant AWrapper as async_responses_*_wrapper
  participant OTel as OTel Tracer/Metrics
  participant AStream as ResponseStream (async)

  App->>AsyncOpenAI: await Responses.create/retrieve(stream=True)
  AsyncOpenAI->>AWrapper: instrumentation wrapper invoked (tracer + metrics)
  AWrapper->>OTel: start CLIENT span (openai.response)
  AWrapper->>AsyncOpenAI: await API call
  AsyncOpenAI-->>AWrapper: async streaming object
  AWrapper->>AStream: wrap and return ResponseStream proxy
  App->>AStream: async iterate chunks
  loop chunks
    AStream->>OTel: update timing/metrics and traced data
  end
  AStream->>OTel: end span on aclose/exit/error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • nirga
  • dinmukhamedm

Poem

I hop through streams of tokens bright,
Spans and counters in my sight,
TTFT nibbles, T2G takes flight,
Chunks collected through the night,
ResponseStream keeps traces tight. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.96% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly follows conventional commit format and clearly conveys that the primary change is to add streaming support for the OpenAI Responses API, matching the core objective of the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d1997a5 and 5dbcd70.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (18 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
  • responses_get_or_create_wrapper (806-955)
  • async_responses_get_or_create_wrapper (960-1108)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (2)
  • _get_openai_base_url (275-281)
  • metric_shared_attributes (364-377)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (4)
  • _with_responses_telemetry_wrapper (116-149)
  • _with_tracer_wrapper (152-159)
  • dont_throw (168-196)
  • should_send_prompts (213-216)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (3)
  • _ensure_cleanup (783-818)
  • _process_complete_response (737-780)
  • _shared_attributes (726-734)
🪛 Ruff (0.13.3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

291-291: Do not catch blind exception: Exception

(BLE001)


307-307: Do not catch blind exception: Exception

(BLE001)


431-431: Do not catch blind exception: Exception

(BLE001)


526-526: Do not catch blind exception: Exception

(BLE001)


535-535: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (23)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (2)

305-389: LGTM! Consistent telemetry integration.

The wrapper calls for Responses.create, Responses.retrieve, Responses.parse and their async variants correctly pass all six telemetry metric objects (token_counter, choice_counter, duration_histogram, exception_counter, streaming_time_to_first_token, streaming_time_to_generate) in a consistent order. The new parse method wrappers follow the same pattern as existing create/retrieve wrappers, ensuring uniform telemetry capture across all response operations.


415-416: LGTM! Uninstrumentation matches new parse wrappers.

The unwrap calls for Responses.parse and AsyncResponses.parse correctly mirror the new instrumentation added at lines 364-389, ensuring proper cleanup when the instrumentor is disabled.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (21)

2-2: LGTM! Essential imports for streaming support.

The added imports (logging, threading, pydantic, and logger initialization) are necessary for the new streaming telemetry implementation. The threading module enables thread-safe cleanup, logging provides observability, and pydantic supports TracedData modeling.

Also applies to: 4-4, 7-7, 78-78


213-276: LGTM! Well-structured streaming proxy with proper initialization.

The ResponseStream class correctly:

  • Inherits from ResponseStreamBase and ObjectProxy to wrap streaming responses
  • Initializes thread-safe cleanup with threading.Lock and threading.Event for error tracking
  • Stores all telemetry counters and histograms for metric recording
  • Registers traced_data in the global responses dict when response_id is present

The design aligns with the existing ChatStream pattern for consistency.


277-296: LGTM! Context manager cleanup handling.

The synchronous context manager methods (__enter__, __exit__, __del__) properly:

  • Delegate to the wrapped stream's __exit__
  • Call _ensure_cleanup() to close spans and release resources
  • Log cleanup exceptions without propagating them (appropriate for cleanup paths)
  • Ensure cleanup runs even during garbage collection

298-312: LGTM! Async context manager cleanup handling.

The async context manager methods (__aenter__, __aexit__) mirror the sync implementation, properly delegating to the wrapped stream and ensuring cleanup runs after the stream exits.


317-332: LGTM! Correct error handling in sync iteration.

The __next__ method properly handles exceptions:

  • Distinguishes StopIteration (normal completion) from real errors
  • Records exceptions and sets ERROR status on the span before calling cleanup (addressing past review feedback)
  • Uses threading.Event for thread-safe error tracking
  • Calls _process_complete_response() on normal completion
  • Re-raises exceptions after recording telemetry

334-349: LGTM! Correct error handling in async iteration.

The __anext__ method mirrors the sync implementation, properly handling StopAsyncIteration and recording errors before cleanup. The error handling order ensures streaming failures are captured with correct span status and exception details.


351-433: LGTM! Comprehensive streaming chunk processing.

The _process_chunk method effectively:

  • Records streaming chunk events on the span for observability
  • Captures time-to-first-token metrics correctly
  • Handles all major OpenAI streaming event types (ResponseCreatedEvent, ResponseTextDeltaEvent, ResponseOutputItemAddedEvent, ResponseInProgressEvent, ResponseCompletedEvent, etc.)
  • Accumulates response data (text deltas, output blocks, usage, model info) in TracedData
  • Updates the global responses dict to maintain state across streaming chunks
  • Safely catches and logs processing errors without breaking the stream

434-443: LGTM! Clean metric attribute helper.

The _shared_attributes method correctly constructs metric attributes using metric_shared_attributes, including response model, operation name, server address, and streaming flag, following the established pattern from chat wrappers.


445-498: LGTM! Complete telemetry recording with proper error handling.

The _process_complete_response method correctly:

  • Joins accumulated text deltas into final output_text
  • Sets all span attributes via set_data_attributes
  • Records token metrics (input/output) with proper token type attributes
  • Records choice counter based on output blocks
  • Records duration and streaming time-to-generate histograms
  • Checks _error_recorded.is_set() before setting OK status (line 490), preventing error status from being overwritten
  • Ends the span and cleans up the responses dict entry
  • Sets the _cleanup_completed flag

This resolves the earlier concern about error status handling during cleanup.


500-537: LGTM! Thread-safe cleanup with proper error status preservation.

The _ensure_cleanup method provides robust cleanup:

  • Uses _cleanup_lock for thread-safe execution
  • Checks _cleanup_completed to prevent double cleanup
  • Checks _error_recorded.is_set() before setting OK status (lines 515-516), ensuring error status is never overwritten
  • Ends the span and removes the response_id from the global responses dict
  • Has nested exception handling to ensure _cleanup_completed is always set

The bare Exception catches (flagged by static analysis) are appropriate for cleanup paths and prevent resource leaks. The use of threading.Event for _error_recorded ensures thread-safe error flag handling, addressing the race condition concern from previous reviews.


539-642: LGTM! Comprehensive span attribute setting.

The set_data_attributes function correctly sets rich OpenTelemetry attributes:

  • Basic attributes (system, models, response_id)
  • Usage metrics (input/output/total tokens, cached tokens, reasoning tokens)
  • Reasoning attributes (summary, effort)
  • Tool/function definitions with proper serialization
  • System instructions

The function respects privacy settings via should_send_prompts() and handles both dict-style and object-style usage attributes.


642-717: LGTM! Comprehensive prompt content processing.

The prompt processing logic correctly:

  • Handles string input and structured message input
  • Processes content blocks with proper type handling
  • Uses is_validator_iterator check to handle Pydantic iterator fields
  • Safely serializes content to JSON with fallback to string representation
  • Handles special block types (computer_call_output, computer_call, function_call_output)
  • Sets proper role attributes for each prompt type

The new function_call_output handling (lines 705-716) correctly extracts call_id and output for tool result tracing.


719-801: LGTM! Comprehensive completion and tool call processing.

The completion processing correctly:

  • Sets assistant role and output_text content
  • Logs when output_text is missing (useful for debugging)
  • Iterates through output_blocks to extract tool calls
  • Handles multiple tool call types (function_call, file_search_call, web_search_call, computer_call)
  • Processes reasoning blocks with proper JSON serialization
  • Sets tool_call attributes (id, name, arguments) with proper indexing

The extensive block type handling ensures comprehensive telemetry capture for all OpenAI response formats.


805-850: LGTM! Proper wrapper signature and exception handling.

The responses_get_or_create_wrapper correctly:

  • Uses @_with_responses_telemetry_wrapper decorator to inject telemetry parameters
  • Checks suppression key to allow opt-out
  • Creates CLIENT span for the operation
  • Wraps the call with proper timing (start_time, end_time)
  • Records duration and exception metrics on errors
  • Sets ERROR_TYPE attribute, records exception, and sets error status on the span
  • Ends the span before re-raising

The error handling ensures comprehensive telemetry capture even when operations fail.


852-902: LGTM! Correct streaming response handling.

For streaming responses (detected with isinstance(response, Stream)), the wrapper correctly:

  • Retrieves any existing TracedData for the response_id
  • Processes input with process_input helper
  • Creates comprehensive TracedData with all request parameters (input, instructions, tools, models, reasoning)
  • Converts start_time to nanoseconds for span timestamps
  • Returns a ResponseStream proxy with all telemetry objects, ensuring span lifecycle and metrics are managed during streaming

This enables full telemetry capture for streaming scenarios.


903-955: LGTM! Correct non-streaming response handling.

For non-streaming responses, the wrapper correctly:

  • Parses the response with parse_response
  • Retrieves and merges existing TracedData if available
  • Extracts output_text with fallback logic through the response hierarchy
  • Creates complete TracedData with merged tools and all attributes
  • Updates the global responses dict
  • Sets span attributes and ends the span when status is "completed"
  • Returns the original response object

The try/except around TracedData creation ensures the wrapper doesn't break on unexpected response formats.


959-1004: LGTM! Correct async wrapper structure.

The async_responses_get_or_create_wrapper mirrors the sync implementation with proper async/await:

  • Same decorator and parameter structure
  • Awaits the wrapped call
  • Same exception handling with metrics recording
  • Same span lifecycle management

1006-1054: LGTM! Correct async streaming detection.

The async streaming path correctly:

  • Detects both Stream and AsyncStream types (line 1006)
  • Creates the same comprehensive TracedData as the sync version
  • Returns ResponseStream which handles both sync and async iteration

The ResponseStream class implements both __next__ and __anext__, making it compatible with both sync Stream and async AsyncStream objects.


1055-1108: LGTM! Consistent async non-streaming handling.

The async non-streaming path mirrors the sync implementation exactly, ensuring consistent telemetry capture and response processing for both sync and async operations.


1111-1137: LGTM! Proper cancellation handling.

The responses_cancel_wrapper correctly:

  • Skips instrumentation if response is still streaming
  • Retrieves and removes existing TracedData from the responses dict
  • Creates a span with the original start_time (already in nanoseconds, as noted in the comment)
  • Records "Response cancelled" as an exception for observability
  • Sets all data attributes and ends the span

This ensures cancelled responses are properly traced and cleaned up.


1139-1166: LGTM! Async cancellation handling mirrors sync implementation.

The async_responses_cancel_wrapper correctly mirrors the sync version, handling both Stream and AsyncStream types and providing the same comprehensive telemetry capture for cancelled async responses.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 104d842 in 2 minutes and 33 seconds. Click for details.
  • Reviewed 921 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:115
  • Draft comment:
    Consider adding a brief docstring to _with_responses_telemetry_wrapper describing its parameters and intended usage, similar to the other telemetry wrappers.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment suggests adding documentation, which could be helpful. However, none of the other similar wrapper functions have docstrings, so this would be inconsistent with the codebase's style. The function's purpose is fairly clear from its name and parameters. This seems like a nice-to-have rather than a necessary change. The code could be harder to maintain without documentation. Future developers might appreciate having the docstring. While documentation is good, we should maintain consistency with the codebase's existing patterns. If docstrings are needed, it should be done as a separate documentation effort for all wrapper functions. The comment should be deleted as it suggests deviating from the established pattern in the codebase and isn't highlighting a critical issue that needs to be fixed.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:842
  • Draft comment:
    For non-stream responses, a new span is started when parsed_response.status == 'completed', but the initial span created earlier is never ended. This may lead to orphaned spans. Consider reusing and ending the original span instead of starting a new one.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:888
  • Draft comment:
    Similarly, in async_responses_get_or_create_wrapper, the initial span is not ended on non-stream responses, which could lead to orphaned spans. Verify if this is intended or if the original span should be ended.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:745
  • Draft comment:
    There is an inconsistency in time measurements: start_time is obtained with time.time() (seconds) but traced_data.start_time is set using time.time_ns() (nanoseconds). Please ensure consistency in the time units used for span timing.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_TsHL5ITqJWjyuIGW

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

712-790: End the request span on the non-streaming success path

We start a CLIENT span before invoking wrapped, but on the non-streaming path we never close it—we return after building TracedData, and even reassign span when status == "completed", leaving the original span open forever. That leaks spans, drops duration metrics, and makes exporters flush orphaned spans. The async wrapper has the same issue. Please make sure every non-streaming exit (happy path and instrumentation fallback) sets an OK status and ends the original span.

@@
-        if isinstance(response, Stream):
+        if isinstance(response, Stream):
             ...
             return ResponseStream(...)
@@
-    except Exception:
-        return response
+    except Exception:
+        if span.is_recording():
+            span.set_status(Status(StatusCode.OK))
+            span.end()
+        return response
@@
-    if parsed_response.status == "completed":
-        span = tracer.start_span(
+    if parsed_response.status == "completed":
+        completed_span = tracer.start_span(
             SPAN_NAME,
             kind=SpanKind.CLIENT,
             start_time=int(traced_data.start_time),
         )
-        set_data_attributes(traced_data, span)
-        span.end()
-
-    return response
+        set_data_attributes(traced_data, completed_span)
+        completed_span.end()
+
+    if span.is_recording():
+        span.set_status(Status(StatusCode.OK))
+        span.end()
+
+    return response
@@
-    if isinstance(response, (Stream, AsyncStream)):
+    if isinstance(response, (Stream, AsyncStream)):
         ...
         return ResponseStream(...)
@@
-    except Exception:
-        return response
+    except Exception:
+        if span.is_recording():
+            span.set_status(Status(StatusCode.OK))
+            span.end()
+        return response
@@
-    if parsed_response.status == "completed":
-        span = tracer.start_span(
+    if parsed_response.status == "completed":
+        completed_span = tracer.start_span(
             SPAN_NAME,
             kind=SpanKind.CLIENT,
             start_time=int(traced_data.start_time),
         )
-        set_data_attributes(traced_data, span)
-        span.end()
-
-    return response
+        set_data_attributes(traced_data, completed_span)
+        completed_span.end()
+
+    if span.is_recording():
+        span.set_status(Status(StatusCode.OK))
+        span.end()
+
+    return response

Also applies to: 842-850, 999-1007

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 09cd046 and 104d842.

📒 Files selected for processing (4)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (2 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (7 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py
🧬 Code graph analysis (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (3)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
  • wrapper (418-431)
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py (2)
  • wrapper (18-19)
  • wrapper (48-58)
packages/opentelemetry-instrumentation-transformers/opentelemetry/instrumentation/transformers/utils.py (2)
  • wrapper (21-22)
  • wrapper (39-49)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (3)
  • _with_responses_telemetry_wrapper (116-143)
  • dont_throw (162-190)
  • should_send_prompts (207-210)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)
  • _ensure_cleanup (783-818)
  • _process_complete_response (737-780)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
  • responses_get_or_create_wrapper (695-850)
  • async_responses_get_or_create_wrapper (855-1007)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (1)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (4)
  • export (45-51)
  • InMemorySpanExporter (22-61)
  • clear (35-38)
  • get_finished_spans (40-43)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

260-260: Do not catch blind exception: Exception

(BLE001)


276-276: Do not catch blind exception: Exception

(BLE001)


370-371: try-except-pass detected, consider logging the exception

(S110)


370-370: Do not catch blind exception: Exception

(BLE001)


377-377: Do not catch blind exception: Exception

(BLE001)


433-433: Do not catch blind exception: Exception

(BLE001)


442-442: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py

10-10: Unused function argument: instrument_legacy

(ARG001)


71-71: Unused function argument: instrument_legacy

(ARG001)


114-114: Unused function argument: instrument_legacy

(ARG001)


114-114: Unused function argument: openai_client

(ARG001)


161-161: Unused function argument: instrument_legacy

(ARG001)


194-194: Unused function argument: instrument_legacy

(ARG001)


231-231: Unused function argument: instrument_legacy

(ARG001)


246-246: Loop control variable chunk not used within loop body

Rename unused chunk to _chunk

(B007)


249-250: try-except-pass detected, consider logging the exception

(S110)


249-249: Do not catch blind exception: Exception

(BLE001)

…t_responses_streaming.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 104d842 and 0ddba72.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (1)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (4)
  • export (45-51)
  • InMemorySpanExporter (22-61)
  • clear (35-38)
  • get_finished_spans (40-43)
🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py

[error] 254-254: IndentationError: unindent does not match any outer indentation level

(E999)

🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py

254-254: unindent does not match any outer indentation level

(invalid-syntax)


255-255: Unexpected indentation

(invalid-syntax)

- Record exceptions before cleanup to prevent loss of error details
- Add _error_recorded flag to track error state
- Update cleanup methods to preserve ERROR status when set
- Ensure both sync and async paths follow same exception handling order
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (1)

114-119: Consider using an async client fixture for consistency.

The test creates a new AsyncOpenAI() client directly, making the openai_client parameter unused. For consistency with other tests and to leverage fixture configuration (API keys, base URLs, etc.), consider using an async_openai_client fixture if available.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0ddba72 and ef7a46d.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (7 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (1)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (4)
  • export (45-51)
  • InMemorySpanExporter (22-61)
  • clear (35-38)
  • get_finished_spans (40-43)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (2)
  • metric_shared_attributes (364-377)
  • _get_openai_base_url (275-281)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (3)
  • _with_responses_telemetry_wrapper (116-143)
  • dont_throw (162-190)
  • should_send_prompts (207-210)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (3)
  • _ensure_cleanup (783-818)
  • _process_complete_response (737-780)
  • _shared_attributes (726-734)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py

10-10: Unused function argument: instrument_legacy

(ARG001)


71-71: Unused function argument: instrument_legacy

(ARG001)


114-114: Unused function argument: instrument_legacy

(ARG001)


114-114: Unused function argument: openai_client

(ARG001)


161-161: Unused function argument: instrument_legacy

(ARG001)


194-194: Unused function argument: instrument_legacy

(ARG001)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

263-263: Do not catch blind exception: Exception

(BLE001)


279-279: Do not catch blind exception: Exception

(BLE001)


371-372: try-except-pass detected, consider logging the exception

(S110)


371-371: Do not catch blind exception: Exception

(BLE001)


378-378: Do not catch blind exception: Exception

(BLE001)


458-458: Do not catch blind exception: Exception

(BLE001)


467-467: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (15)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py (4)

9-68: LGTM! Comprehensive test for issue #3395.

The test correctly reproduces the reported issue and verifies that streaming responses now emit spans with proper attributes. The instrument_legacy fixture is required for test setup despite the Ruff warning.


70-110: LGTM! Solid streaming test with proper attribute verification.

The test correctly verifies span creation, attributes, and content accumulation for streaming responses. The nested content extraction logic appropriately handles the response structure.


159-190: LGTM! Context manager test ensures proper span lifecycle.

The test correctly verifies that spans are created and finalized after the context manager exits, which is critical for proper resource cleanup and telemetry.


192-226: LGTM! Excellent test for span nesting and context propagation.

This test verifies that streaming instrumentation correctly integrates with parent spans created via start_as_current_span, which is essential for distributed tracing scenarios.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (11)

2-2: LGTM! Import additions support streaming telemetry.

The new imports (logging, threading, trace context management, ObjectProxy) are all necessary for the ResponseStream implementation and proper span lifecycle management.

Also applies to: 5-5, 55-59, 64-65, 69-69, 77-77


196-248: LGTM! Well-structured streaming wrapper initialization.

The ResponseStream class appropriately wraps the underlying stream using ObjectProxy and initializes all necessary telemetry components. The _error_recorded flag and cleanup lock ensure thread-safe error handling and cleanup.


249-284: LGTM! Context manager protocol correctly implemented.

Both sync and async context manager methods properly delegate to the wrapped stream and ensure cleanup. The bare exception catches in lines 263 and 279 are appropriate here to prevent cleanup errors from masking original exceptions.


289-321: LGTM! Iterator error handling correctly ordered.

The exception handling properly records the error and sets span status before calling _ensure_cleanup(), addressing the previous review concern. The _error_recorded flag ensures the error status is preserved during cleanup.


323-380: LGTM! Chunk processing handles streaming data appropriately.

The method correctly updates TracedData from chunks and records time-to-first-token metrics. The exception handling at lines 371-372 (silent fallback for text extraction) and 378-379 (logged errors) are appropriate for best-effort streaming data collection.


381-390: LGTM! Shared attributes helper is straightforward.

The method correctly builds metric attributes consistent with the OpenTelemetry conventions and aligns with the existing ChatStream pattern.


392-437: LGTM! Complete response processing correctly finalizes telemetry.

The method properly:

  • Records all streaming metrics (tokens, choices, duration, time-to-generate)
  • Preserves error status when _error_recorded is true (line 431)
  • Sets span attributes and ends the span
  • Marks cleanup as completed

This addresses the previous review concern about error status preservation.


438-469: LGTM! Cleanup logic is thread-safe and preserves error states.

The method correctly:

  • Uses a lock to prevent race conditions
  • Checks _error_recorded before setting OK status (line 450)
  • Handles cleanup failures gracefully with nested exception handling
  • Ensures _cleanup_completed is always set to prevent infinite loops

The bare exception catches (lines 458, 467) are appropriate for cleanup safety.


717-811: LGTM! Wrapper correctly handles streaming and error paths.

The sync wrapper properly:

  • Creates a CLIENT span with manual lifecycle control (end_on_exit=False)
  • Records metrics and exceptions on error (lines 745-759)
  • Returns a ResponseStream for streaming responses with all telemetry parameters (lines 799-811)
  • Uses time.time() for metrics (not time.time_ns()) which is consistent

874-964: LGTM! Async wrapper mirrors sync implementation correctly.

The async wrapper properly:

  • Mirrors the sync wrapper's span creation and error handling
  • Checks for both Stream and AsyncStream (line 918) which correctly handles async streaming responses
  • Returns a ResponseStream that supports both sync and async iteration protocols
  • Maintains consistency in telemetry recording

812-871: LGTM! Non-streaming response handling is complete.

The non-streaming path correctly:

  • Parses the response and merges with existing TracedData
  • Creates a span with the correct start time when status is "completed"
  • Sets all attributes and ends the span
  • Handles exceptions gracefully by returning the raw response

…on support

The ResponseStream class was missing the __aiter__ method which is required
for async iteration with 'async for' loops. This caused TypeError when trying
to iterate over streaming responses asynchronously.

Added __aiter__ method that returns self, following the standard async iterator
protocol pattern.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

342-402: Consider logging exceptions in chunk processing

The _process_chunk method accumulates streaming data and metrics correctly. However, the try-except-pass block at lines 393-394 and the generic exception handler at line 400 silently suppress errors during chunk processing.

While the current approach prevents streaming interruption, consider whether certain chunk processing errors should be logged for debugging:

                 except Exception:
-                    pass
+                    logger.debug("Error extracting output text from chunk: %s", traceback.format_exc())

And for the outer exception handler:

         except Exception as e:
-            logger.debug("Error processing response chunk: %s", e)
+            logger.debug("Error processing response chunk: %s", traceback.format_exc())

This provides more diagnostic context while still preventing streaming failures.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a28e427 and ecf8040.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (12 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (2)
  • metric_shared_attributes (364-377)
  • _get_openai_base_url (275-281)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (4)
  • _with_responses_telemetry_wrapper (116-143)
  • _with_tracer_wrapper (146-153)
  • dont_throw (162-190)
  • should_send_prompts (207-210)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (3)
  • _ensure_cleanup (783-818)
  • _process_complete_response (737-780)
  • _shared_attributes (726-734)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
🪛 Ruff (0.13.2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

279-279: Do not catch blind exception: Exception

(BLE001)


295-295: Do not catch blind exception: Exception

(BLE001)


393-394: try-except-pass detected, consider logging the exception

(S110)


393-393: Do not catch blind exception: Exception

(BLE001)


400-400: Do not catch blind exception: Exception

(BLE001)


486-486: Do not catch blind exception: Exception

(BLE001)


495-495: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (22)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (22)

1-77: LGTM: Imports and module setup

The new imports properly support streaming telemetry functionality. The addition of logging, threading, OpenTelemetry tracing components, ObjectProxy, and the telemetry wrapper are all appropriately used throughout the file.


99-106: LGTM: Input processing helper

The process_input function properly normalizes input parameters and ensures proper type annotations for list items.


157-164: LGTM: Response parsing helper

The parse_response function correctly handles both legacy and modern OpenAI response formats.


167-182: LGTM: Tool extraction helper

The get_tools_from_kwargs function properly extracts and converts function tool definitions from request kwargs.


185-209: LGTM: Content block normalization

The process_content_block function appropriately normalizes different content types (text, image, file) to a standard format.


212-264: LGTM: ResponseStream initialization

The class is properly structured as an ObjectProxy wrapper with appropriate initialization of telemetry metrics, span, and cleanup state. The cleanup lock and error tracking flags are correctly initialized for thread-safe operation.


265-267: LGTM: Garbage collection cleanup

The __del__ method ensures span cleanup when the object is garbage collected, preventing resource leaks.


269-300: LGTM: Context manager implementation

Both sync and async context manager methods are properly implemented. They delegate to the wrapped stream's context management and ensure cleanup occurs on exit, with appropriate exception logging.


302-306: LGTM: Iterator protocol

The iterator protocol methods correctly delegate to self for proper iteration behavior.


308-340: LGTM: Streaming iteration with proper error handling

Both __next__ and __anext__ correctly handle streaming completion and errors. Based on past review comments, the error recording now properly occurs before cleanup (lines 316-318 and 333-335), ensuring error spans are correctly captured with exception details and ERROR status.


403-412: LGTM: Shared attributes construction

The _shared_attributes method correctly builds metric attributes using the common pattern with proper model fallback and streaming flag.


414-461: LGTM: Complete response processing

The _process_complete_response method correctly:

  • Sets span attributes from traced data
  • Records token metrics with proper token type attributes
  • Records choice/block count metrics
  • Records duration and time-to-generate metrics
  • Respects the _error_recorded flag to avoid overwriting error status (line 456)
  • Properly ends the span and marks cleanup complete

463-497: LGTM: Thread-safe cleanup implementation

The _ensure_cleanup method is well-designed:

  • Thread-safe via _cleanup_lock
  • Prevents double cleanup with _cleanup_completed flag
  • Respects _error_recorded flag to avoid overwriting error status
  • Nested exception handling ensures cleanup always completes even if span operations fail
  • Appropriate debug logging throughout

The broad exception catching at lines 486 and 495 is intentional and correct for cleanup fallback logic.


499-747: LGTM: Comprehensive span attribute setting

The set_data_attributes function properly sets all relevant OpenTelemetry span attributes from the traced response data, including:

  • System, model, and response ID
  • Usage statistics (tokens, cache usage, reasoning tokens)
  • Reasoning attributes (summary and effort)
  • Tool/function definitions
  • Prompts and completions with proper role handling
  • Various content types (messages, computer calls, tool calls)

The logic correctly handles both dict-style and object-style usage data and processes content blocks appropriately.


751-795: LGTM: Sync wrapper with proper span lifecycle and error handling

The responses_get_or_create_wrapper correctly:

  • Uses the new @_with_responses_telemetry_wrapper decorator to accept telemetry metrics
  • Creates a CLIENT span and uses trace.use_span with end_on_exit=False for manual lifecycle management
  • Records duration and exception metrics on errors
  • Sets error type attribute and records exception on the span
  • Properly ends the span after recording error details

797-847: LGTM: Streaming response handling

The streaming path correctly:

  • Detects Stream responses
  • Retrieves existing data if response_id is provided
  • Constructs a comprehensive TracedData object with all request parameters and reasoning attributes
  • Returns a ResponseStream with all telemetry metrics and state needed for streaming span management

848-907: LGTM: Non-streaming response handling

The non-streaming path correctly:

  • Parses the response and merges with existing data
  • Constructs complete TracedData with output blocks, usage, and reasoning attributes
  • Only creates and closes a span if the response status is "completed"
  • Uses the original start_time from TracedData for accurate span timing

912-956: LGTM: Async wrapper with proper span lifecycle and error handling

The async_responses_get_or_create_wrapper mirrors the sync version correctly, with proper async/await usage and identical span lifecycle management and error handling.


958-1004: LGTM: Async streaming response handling

The async streaming path correctly handles both Stream and AsyncStream types and constructs the ResponseStream wrapper with all necessary telemetry components.


1005-1065: LGTM: Async non-streaming response handling

The async non-streaming path correctly mirrors the sync version with proper async operations and span management.


1068-1093: LGTM: Sync cancel wrapper

The responses_cancel_wrapper correctly:

  • Uses @_with_tracer_wrapper (appropriate since cancel doesn't need full telemetry metrics)
  • Pops the response from the global responses dict
  • Creates a span with the original start time and records a cancellation exception
  • Sets span attributes and ends the span

1098-1123: LGTM: Async cancel wrapper

The async_responses_cancel_wrapper correctly mirrors the sync version with proper async operations.

The issue was that Python's async iteration protocol checker looks for __aiter__
at the type level, and wrapt's ObjectProxy metaclass magic was interfering with
this check.

Fixed by:
- Creating ResponseStreamBase class with __aiter__ and __anext__ methods
- Using multiple inheritance: ResponseStream(ResponseStreamBase, ObjectProxy)
- This ensures the async iteration protocol methods are visible to Python's
  type system while still maintaining ObjectProxy functionality

This resolves the 'async for requires an object with __aiter__ method' error
when using ResponseStream with async iteration.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

9-9: Remove unused inspect import.

Static analysis indicates the inspect module is imported but never used in this file.

-import inspect
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ecf8040 and 230b854.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (12 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (2)
  • metric_shared_attributes (364-377)
  • _get_openai_base_url (275-281)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (3)
  • _with_responses_telemetry_wrapper (116-143)
  • dont_throw (162-190)
  • should_send_prompts (207-210)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (3)
  • _ensure_cleanup (783-818)
  • _process_complete_response (737-780)
  • _shared_attributes (726-734)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
🪛 Flake8 (7.3.0)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

[error] 9-9: 'inspect' imported but unused

(F401)

🪛 Ruff (0.13.2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

291-291: Do not catch blind exception: Exception

(BLE001)


307-307: Do not catch blind exception: Exception

(BLE001)


402-403: try-except-pass detected, consider logging the exception

(S110)


402-402: Do not catch blind exception: Exception

(BLE001)


409-409: Do not catch blind exception: Exception

(BLE001)


495-495: Do not catch blind exception: Exception

(BLE001)


504-504: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (9)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (9)

100-210: LGTM! Helper functions are well-documented.

The updated docstrings for process_input, parse_response, get_tools_from_kwargs, and process_content_block clearly explain their purposes and behavior.


213-276: LGTM! ResponseStream class is well-structured.

The class properly inherits from both ResponseStreamBase and ObjectProxy, initializes all necessary state including synchronization primitives, and correctly tracks error state with the _error_recorded flag.


277-313: LGTM! Context manager lifecycle is properly implemented.

Both sync and async context manager protocols are correctly implemented. The broad exception handling in cleanup (lines 291, 307) is appropriate here to prevent cleanup errors from masking the original exception.


314-349: LGTM! Iteration error handling is correctly implemented.

The iteration methods now properly record exceptions and set ERROR status before calling cleanup (lines 324-328, 341-345), ensuring streaming errors are captured with the _error_recorded flag. This addresses the previous review feedback.


351-421: LGTM! Chunk processing logic is sound.

The _process_chunk method correctly accumulates streaming data, records metrics, and updates TracedData. The try-except-pass at lines 402-403 is acceptable here as output text extraction is best-effort and shouldn't break streaming.


423-506: LGTM! Cleanup logic correctly preserves error state.

Both _process_complete_response and _ensure_cleanup properly check _error_recorded before setting OK status (lines 465, 487), ensuring error details are preserved during cleanup. This correctly implements the previous review feedback.


758-916: LGTM! Wrapper correctly handles both streaming and non-streaming flows.

The responses_get_or_create_wrapper properly:

  • Uses the new @_with_responses_telemetry_wrapper decorator with six telemetry parameters
  • Creates CLIENT spans with proper lifecycle management
  • Records exceptions and metrics on errors (lines 790-804)
  • Returns ResponseStream for streaming responses with all necessary context
  • Handles non-streaming responses with span creation only when status is "completed" (lines 907-914)

919-1074: LGTM! Async wrapper mirrors sync implementation correctly.

The async_responses_get_or_create_wrapper correctly implements the async version with the same telemetry, error handling, and span lifecycle patterns as the sync wrapper. Line 967 properly checks for both Stream and AsyncStream types.


1077-1133: LGTM! Cancel wrappers are well-documented.

Both sync and async cancel wrappers now have clear docstrings explaining their purpose and behavior.

rtyer added 2 commits October 1, 2025 11:29
The output text was missing from spans because we were replacing it with each
chunk instead of accumulating it.

Fixed by:
- Changed _process_chunk to accumulate output_text with += instead of =
- Iterate through all content items in output blocks to capture all text
- Initialize output_text as None and only create empty string when needed
- Fixed indentation issues in set_data_attributes for output_blocks iteration
- Added safety check for output_blocks being None before iteration

This ensures the complete response text is captured in the span attributes
for streaming responses.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)

9-9: Remove unused import.

The inspect module is imported but never used in this file.

Apply this diff:

-import inspect

405-406: Log exceptions in try-except blocks for debugging.

This try-except-pass silently swallows exceptions when extracting text from output blocks. Consider adding a debug log statement to help troubleshoot issues during streaming.

Apply this diff:

                 except Exception:
-                    pass
+                    logger.debug("Error extracting text from output blocks: %s", traceback.format_exc())

Note: You'll need to add import traceback at the top of the file if not already present.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 230b854 and 935e51c.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (14 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (2)
  • metric_shared_attributes (364-377)
  • _get_openai_base_url (275-281)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (3)
  • _with_responses_telemetry_wrapper (116-143)
  • dont_throw (162-190)
  • should_send_prompts (207-210)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (3)
  • _ensure_cleanup (783-818)
  • _process_complete_response (737-780)
  • _shared_attributes (726-734)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
🪛 Flake8 (7.3.0)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

[error] 9-9: 'inspect' imported but unused

(F401)

🪛 Ruff (0.13.2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

291-291: Do not catch blind exception: Exception

(BLE001)


307-307: Do not catch blind exception: Exception

(BLE001)


405-406: try-except-pass detected, consider logging the exception

(S110)


405-405: Do not catch blind exception: Exception

(BLE001)


412-412: Do not catch blind exception: Exception

(BLE001)


498-498: Do not catch blind exception: Exception

(BLE001)


507-507: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (9)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (9)

100-107: LGTM!

The helper functions have clear docstrings and handle input normalization correctly. The content block processing properly normalizes different content types to a standard format.

Also applies to: 158-165, 168-183, 186-210


224-276: LGTM!

The ResponseStream class is well-structured with proper initialization of all telemetry metrics and state tracking. The use of ObjectProxy from wrapt ensures transparent proxying of the underlying stream object, and initialization properly stores the TracedData in the global responses dict.


281-312: LGTM!

The context manager implementations correctly delegate to the wrapped stream's context manager methods and ensure cleanup runs on exit. The bare exception handlers in cleanup are appropriate here since they log errors without masking the original exception being propagated.


317-349: LGTM!

The iteration methods correctly handle both normal completion (StopIteration/StopAsyncIteration) and error cases. The error handling flow properly records exceptions and sets ERROR status before calling cleanup, ensuring error information is preserved. This addresses the concerns from previous reviews.


426-473: LGTM!

The _process_complete_response method correctly records all telemetry metrics (tokens, choices, duration, time-to-generate) and sets span attributes. It properly respects the _error_recorded flag to avoid overwriting error status with OK.


512-760: LGTM!

The set_data_attributes function comprehensively sets all relevant OpenTelemetry span attributes from the traced response data. It properly handles different content types, tool calls, reasoning blocks, and usage metrics. The logic for processing output_blocks correctly handles all supported block types (function_call, file_search_call, web_search_call, computer_call, reasoning).


763-858: LGTM!

The responses_get_or_create_wrapper correctly handles both streaming and non-streaming responses:

  • Exception handling properly records metrics, sets error status, and ends the span before re-raising
  • Streaming responses are wrapped in ResponseStream with all telemetry metrics
  • Non-streaming responses are processed with proper data merging and span creation for completed responses

The use of trace.use_span(span, end_on_exit=False) ensures proper context propagation while maintaining manual span lifecycle control.


922-1015: LGTM!

The async wrapper async_responses_get_or_create_wrapper correctly mirrors the sync wrapper's behavior with proper async/await handling. It properly checks for both Stream and AsyncStream types when detecting streaming responses at line 969.


1080-1134: LGTM!

The cancel wrappers correctly handle response cancellation by:

  • Popping the existing tracked data from the responses dict
  • Creating a span with the original start time
  • Recording a cancellation exception
  • Setting all relevant attributes before ending the span

The async variant properly mirrors the sync implementation.

Added extensive debug logging throughout the response streaming pipeline to
diagnose issues with missing input, output, and token usage data.

Debug logging added to:
- process_input: Log input type and value
- _process_chunk: Log chunk details, text accumulation, and usage capture
- _process_complete_response: Log final state before setting span attributes
- set_data_attributes: Log what's being set as prompts and completions
- Stream initialization: Log kwargs and processed input data

This will help identify where data is being lost in the streaming pipeline.
@rtyer rtyer marked this pull request as draft October 1, 2025 20:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)

878-937: Fix time unit mismatch in TracedData initialization.

The TracedData.start_time field stores nanoseconds (from time.time_ns() at lines 838, 1001), but the non-streaming path falls back to start_time in seconds when no existing data is found:

traced_data = TracedData(
    start_time=existing_data.get("start_time", start_time),  # Mixes nanoseconds with seconds!
    ...
)

If existing_data is None, this uses start_time (line 806: time.time() in seconds) instead of nanoseconds, breaking the unit consistency.

Apply this diff:

         traced_data = TracedData(
-            start_time=existing_data.get("start_time", start_time),
+            start_time=existing_data.get("start_time", time.time_ns()),
             response_id=parsed_response.id,

1039-1099: Same time unit mismatch as sync wrapper.

This has the same issue as the sync wrapper at line 900. When existing_data is None, start_time (in seconds from line 969) is used instead of nanoseconds for TracedData.start_time.

Apply this diff:

         traced_data = TracedData(
-            start_time=existing_data.get("start_time", start_time),
+            start_time=existing_data.get("start_time", time.time_ns()),
             response_id=parsed_response.id,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 935e51c and 274da69.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (13 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (2)
  • metric_shared_attributes (364-377)
  • _get_openai_base_url (275-281)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (4)
  • _with_responses_telemetry_wrapper (116-143)
  • _with_tracer_wrapper (146-153)
  • dont_throw (162-190)
  • should_send_prompts (207-210)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (3)
  • _ensure_cleanup (783-818)
  • _process_complete_response (737-780)
  • _shared_attributes (726-734)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
🪛 Flake8 (7.3.0)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

[error] 9-9: 'inspect' imported but unused

(F401)

🪛 Ruff (0.13.2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

292-292: Do not catch blind exception: Exception

(BLE001)


308-308: Do not catch blind exception: Exception

(BLE001)


410-410: Do not catch blind exception: Exception

(BLE001)


417-417: Do not catch blind exception: Exception

(BLE001)


504-504: Do not catch blind exception: Exception

(BLE001)


513-513: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (11)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (11)

100-108: LGTM!

The helper functions are well-documented with clear docstrings and appropriate logging for debugging. The logic is clean and straightforward.

Also applies to: 159-166, 169-184, 187-211


225-280: LGTM!

The ResponseStream initialization properly sets up all telemetry objects, threading primitives, and integrates with the global responses dictionary. The use of ObjectProxy for wrapping is appropriate.


282-313: LGTM!

Context manager implementations properly handle cleanup for both sync and async flows. The exception handling in cleanup (lines 292, 308) uses broad catches intentionally to prevent cleanup failures from masking the original exception, which is acceptable practice.


318-350: Verify thread-safety of _error_recorded flag.

The error handling correctly records exceptions and sets status before cleanup (addressing the previous review). However, there's still a potential race condition:

  1. Error handler sets _error_recorded = True at lines 328/345 (outside lock)
  2. Concurrently, _ensure_cleanup() checks if not self._error_recorded: at lines 496/509 (inside lock)
  3. If cleanup reads the flag before the error handler writes it, the span status could be set to OK instead of ERROR

This race window is small in practice since cleanup is called sequentially after setting the flag. However, for complete correctness, consider:

Solution: Acquire lock when setting the flag

 if self._span and self._span.is_recording():
     self._span.record_exception(e)
     self._span.set_status(Status(StatusCode.ERROR, str(e)))
-    self._error_recorded = True
+    with self._cleanup_lock:
+        self._error_recorded = True
 self._ensure_cleanup()

Apply similarly at line 345 for async.

Based on learnings (past review comment flagged this race condition at lines 324-328, 341-345, 468-469, 490-491, 503-504)


352-419: LGTM!

The _process_chunk method properly:

  • Records streaming events on the span
  • Tracks time to first token
  • Accumulates output data from multiple chunk formats
  • Updates the global responses dictionary
  • Handles errors gracefully with logging

The broad exception catches at lines 410 and 417 are acceptable here since they're logging errors during non-critical data extraction without disrupting the stream.


431-479: LGTM!

The _process_complete_response method correctly:

  • Sets span attributes from accumulated data
  • Records all telemetry metrics (tokens, choices, duration, time to generate)
  • Checks _error_recorded before setting OK status (line 474)
  • Ends the span and marks cleanup complete

The metrics recording logic is well-structured and handles optional counters/histograms appropriately.


481-515: LGTM!

The _ensure_cleanup method is thread-safe with proper locking and idempotency checks. It correctly:

  • Guards against duplicate cleanup with _cleanup_completed flag
  • Checks _error_recorded before setting OK status (lines 496, 509)
  • Handles cleanup failures gracefully

The broad exception catches at lines 504 and 513 are appropriate for cleanup paths to ensure the span is always closed even if intermediate operations fail.


518-775: LGTM!

The set_data_attributes function comprehensively handles:

  • Model and usage attributes with reasoning token support
  • Request reasoning attributes (summary, effort)
  • Tool/function definitions from traced data
  • Multiple prompt formats (string, list of messages, computer calls)
  • Various completion block types (messages, function calls, file/web search, computer calls, reasoning)

The logging statements will be helpful for debugging. The logic correctly normalizes different content formats and handles optional fields safely.


778-877: LGTM!

The synchronous wrapper properly:

  • Creates a CLIENT span for the API call
  • Uses trace.use_span with end_on_exit=False for manual span management
  • Records exception metrics (duration, counter) and span attributes on errors
  • Returns a ResponseStream for streaming responses with all telemetry parameters
  • Handles the streaming case correctly by creating TracedData with proper initialization

The exception handling is comprehensive and ensures spans are properly closed on failures.


941-1038: LGTM!

The async wrapper correctly mirrors the sync implementation with proper async/await handling. It properly:

  • Handles both Stream and AsyncStream types (line 988)
  • Creates spans and records telemetry consistently with the sync version
  • Returns ResponseStream for streaming responses (works for both sync and async iteration)

The use of the same ResponseStream class for both sync and async is correct since it inherits from ResponseStreamBase with both __next__ and __anext__ methods.


1103-1157: LGTM!

Both cancel wrappers (sync and async) properly:

  • Pop the existing data from the global responses dict
  • Create a span with the original start time if data exists
  • Record a cancellation exception on the span
  • Set all traced attributes before ending the span

The logic correctly handles cleanup for cancelled responses.

rtyer added 5 commits October 1, 2025 14:43
The issue causing 488706-hour durations was a time scale mismatch:
- Wrappers capture start_time using time.time() (seconds)
- TracedData now stores it as nanoseconds (int(start_time * 1e9))
- Spans receive the nanosecond value directly (no double conversion)

This aligns with OpenTelemetry's expectation of nanoseconds since Unix epoch
for span start times, while keeping the same pattern as existing code.
Updated chunk processing to handle OpenAI streaming events directly
instead of attempting to parse them as Response objects. The streaming
API returns event objects (ResponseCreatedEvent, ResponseTextDeltaEvent,
ResponseInProgressEvent, ResponseCompletedEvent, etc.) rather than
Response objects.

Changes:
- Handle ResponseCreatedEvent to extract response_id
- Handle ResponseTextDeltaEvent to accumulate output text
- Handle ResponseOutputItemAddedEvent to track output items
- Handle ResponseInProgressEvent and ResponseCompletedEvent for usage data
- Changed log level from debug to info for easier debugging

This fixes the missing input, output, and token data in spans for
streaming responses (issue traceloop#3395).
Fixed ResponseTextDeltaEvent to use 'delta' attribute instead of 'text_delta'.
Added handlers for ResponseTextDoneEvent, ResponseContentPartDoneEvent, and
ResponseOutputItemDoneEvent to properly extract the complete output text from
streaming responses.

This fixes the issue where output_text was None in completed streaming spans.
Changed logging level from debug to info for set_data_attributes function
to make it easier to diagnose input/output attribute setting issues.
Added info-level logs to show input_data when initializing streaming
responses (both sync and async) to help diagnose input attribute issues.
rtyer added 4 commits October 1, 2025 16:26
Added handling for function_call_output type inputs in set_data_attributes
to properly capture tool/function outputs in span attributes. These are
set with role 'tool' and include both call_id and output content.
Updated ResponseOutputItemDoneEvent handler to store the complete item
in output_blocks, not just extract text. This ensures function_call,
file_search_call, web_search_call, computer_call, reasoning, and other
output item types are properly captured for span attributes.

This makes streaming behavior consistent with non-streaming, where
output_blocks contains all output items which are then processed into
tool_calls attributes by set_data_attributes.
Removed all info-level logging statements that were added for debugging
during development. The implementation is now working correctly and the
verbose logging is no longer needed.
- Fix memory leak by deleting responses dict entries after span completion
- Optimize string concatenation using list append + join pattern
- Fix type annotation to use empty string instead of None for response_model
- Remove try/except/pass blocks in favor of defensive checks
- Simplify nested conditionals and use ternary operators where appropriate
- Use getattr for simple single-attribute access patterns
@rtyer rtyer marked this pull request as ready for review October 2, 2025 16:41
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to af3a2a4 in 2 minutes and 24 seconds. Click for details.
  • Reviewed 1376 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:155
  • Draft comment:
    Consider using a weak-reference dictionary (e.g. WeakValueDictionary) for the global 'responses' container to mitigate potential memory leaks if cleanup fails.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:367
  • Draft comment:
    Consider adding a final 'else' or logging for unexpected event types in _process_chunk to aid debugging of unhandled cases.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While the suggestion could help with debugging, the code already has error logging in the try-except block. The function is focused on telemetry/tracing, so missing an event type isn't necessarily an error - it may just be an event type we don't need to track. Adding logging for every unhandled event type could create noise in the logs without providing much value. The comment has a point - without logging unhandled event types, we might miss important events that should be handled. This could lead to missing telemetry data. However, the existing error logging will catch any actual errors, and the code is focused on collecting specific telemetry data. Not every event needs to be handled or logged. The comment should be deleted as it suggests adding potentially noisy logging that isn't necessary for the core functionality of collecting targeted telemetry data.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:928
  • Draft comment:
    Using the dictionary merge operator ('|') requires Python 3.9+; ensure this version requirement is documented in your project specifications.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:852
  • Draft comment:
    The sync and async wrappers (responses_get_or_create_wrapper and async_responses_get_or_create_wrapper) contain similar logic; consider abstracting common portions to reduce duplication and improve maintainability.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_streaming.py:190
  • Draft comment:
    The test cases comprehensively cover streaming scenarios (sync, async, context manager, and nested tracer contexts). Consider also adding tests covering cancellation paths if not already covered.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While testing cancellation paths could be valuable, this is more of a "nice to have" suggestion rather than pointing out a clear issue. The comment is speculative ("if not already covered") and doesn't identify a specific problem. The existing tests already cover the core functionality thoroughly. The suggestion about cancellation paths could be valid from a completeness perspective. Cancellation handling is an important edge case in streaming scenarios. However, per the rules, we should not make speculative suggestions or comments that don't point to clear issues. The comment doesn't identify any actual problems with the current code. The comment should be deleted as it is a speculative suggestion rather than pointing out a clear issue requiring changes.

Workflow ID: wflow_z7pUD5d9ZapvYuc7

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

rtyer added 2 commits October 2, 2025 10:55
Fixed span leak where non-streaming completed responses were starting
a new span instead of reusing the existing one, leaving the original
span unclosed. Also added docstring to _with_responses_telemetry_wrapper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants