Skip to content

Conversation

zhongdaor-nv
Copy link
Contributor

@zhongdaor-nv zhongdaor-nv commented Oct 4, 2025

Overview:

Add e2e tests for gpt oss reasoning effort

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Tests
    • Added end-to-end coverage for reasoning effort settings, verifying higher effort produces more detailed reasoning.
    • Strengthened frontend integration tests with strict response validation, health checks, and robust error handling.
    • Expanded the model test matrix to include an additional GPT-OSS option for broader coverage.
    • Enhanced test harness to orchestrate worker and frontend processes, improving reliability of startup/shutdown flows and real-world behavior validation.

@zhongdaor-nv zhongdaor-nv marked this pull request as ready for review October 4, 2025 00:02
@zhongdaor-nv zhongdaor-nv requested review from a team as code owners October 4, 2025 00:02
@github-actions github-actions bot added the chore label Oct 4, 2025
Copy link
Contributor

coderabbitai bot commented Oct 4, 2025

Walkthrough

Adds an end-to-end reasoning_effort frontend test. Introduces GPTOSSWorkerProcess to run a GPT-OSS worker with health checks. Starts DynamoFrontendProcess and the worker, issues two chat completions with low/high reasoning_effort, compares reasoning metrics, and enhances test constants by adding the GPT_OSS model to TEST_MODELS.

Changes

Cohort / File(s) Summary
Reasoning effort E2E test
tests/frontend/reasoning_effort/test_reasoning_effort.py
New test module adding GPTOSSWorkerProcess (ManagedProcess) to launch a GPT-OSS worker with env and health checks; end-to-end test sends two chat completions (low/high reasoning_effort), extracts reasoning metrics, validates structure, and asserts higher effort yields >= reasoning tokens/length; includes error handling helpers.
Test constants update
tests/utils/constants.py
Adds public constant GPT_OSS = "openai/gpt-oss-20b" and includes it in TEST_MODELS alongside existing entries.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Test as PyTest
  participant Frontend as DynamoFrontendProcess
  participant Worker as GPTOSSWorkerProcess (GPT-OSS)
  participant API as HTTP Endpoint

  Test->>Frontend: start()
  Test->>Worker: start() + health checks
  Note over Worker,Frontend: Both processes running

  Test->>API: POST /chat/completions (reasoning_effort="low")
  API->>Worker: Forward request
  Worker-->>API: Response (low-effort reasoning)
  API-->>Test: JSON response

  Test->>API: POST /chat/completions (reasoning_effort="high")
  API->>Worker: Forward request
  Worker-->>API: Response (high-effort reasoning)
  API-->>Test: JSON response

  Test->>Test: Extract metrics and compare<br/>(high >= low)
  Test->>Frontend: stop()
  Test->>Worker: stop()

  rect rgba(230,245,255,0.6)
  Note right of Test: Validates non-200 errors, missing fields,<br/>and invalid JSON in health checks.
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit taps the test bench, quick and bright,
Spinning up workers in the moonlit night.
“Low” to “High,” I nibble token trails,
Count the thoughts where reasoning prevails.
With GPT-OSS now in the warren’s lore,
I thump—asserts green—then hop for more! 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning While the description includes the required template headings, the Details and Where should the reviewer start sections remain empty placeholders and the Related Issues reference uses a placeholder issue number, leaving the content incomplete and uninformative for reviewers. Please populate the Details section with a summary of the actual changes, specify which files or sections reviewers should focus on under Where should the reviewer start, and replace the placeholder issue reference with the real issue number.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the main change by indicating the addition of an end-to-end test for the reasoning_effort feature specific to the gpt-oss model, is concise, and avoids unnecessary detail or noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c48f49a and bccf2bb.

📒 Files selected for processing (2)
  • tests/frontend/reasoning_effort/test_reasoning_effort.py (1 hunks)
  • tests/utils/constants.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/frontend/reasoning_effort/test_reasoning_effort.py (3)
tests/utils/managed_process.py (1)
  • ManagedProcess (71-568)
tests/utils/payloads.py (1)
  • check_models_api (191-202)
tests/conftest.py (2)
  • runtime_services (218-221)
  • predownload_models (109-121)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3421/merge) by zhongdaor-nv.
tests/frontend/reasoning_effort/test_reasoning_effort.py

[error] 1-1: isort formatting changed imports in test_reasoning_effort.py.


[error] 1-1: black formatting changed/reformatted test_reasoning_effort.py.

🪛 Ruff (0.13.3)
tests/frontend/reasoning_effort/test_reasoning_effort.py

129-129: Avoid specifying long messages outside the exception class

(TRY003)


139-139: Avoid specifying long messages outside the exception class

(TRY003)


151-151: Avoid specifying long messages outside the exception class

(TRY003)


160-160: Unused function argument: runtime_services

(ARG001)


160-160: Unused function argument: predownload_models

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: sglang
  • GitHub Check: vllm (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: Build and Test - dynamo

Signed-off-by: zhongdaor <[email protected]>
Signed-off-by: zhongdaor <[email protected]>
@zhongdaor-nv zhongdaor-nv merged commit 89e7dab into main Oct 7, 2025
20 of 21 checks passed
@zhongdaor-nv zhongdaor-nv deleted the zhongdaor/test-gpt-oss-reasoning-effort branch October 7, 2025 18:36
ptarasiewiczNV pushed a commit that referenced this pull request Oct 8, 2025
Signed-off-by: zhongdaor <[email protected]>
Signed-off-by: zhongdaor-nv <[email protected]>
Signed-off-by: Piotr Tarasiewicz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants