chore: add e2e test for reasoning_effort for gpt-oss model #3421

zhongdaor-nv · 2025-10-04T00:01:55Z

Overview:

Add e2e tests for gpt oss reasoning effort

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Tests
- Added end-to-end coverage for reasoning effort settings, verifying higher effort produces more detailed reasoning.
- Strengthened frontend integration tests with strict response validation, health checks, and robust error handling.
- Expanded the model test matrix to include an additional GPT-OSS option for broader coverage.
- Enhanced test harness to orchestrate worker and frontend processes, improving reliability of startup/shutdown flows and real-world behavior validation.

Signed-off-by: zhongdaor <[email protected]>

coderabbitai · 2025-10-04T00:05:55Z

Walkthrough

Adds an end-to-end reasoning_effort frontend test. Introduces GPTOSSWorkerProcess to run a GPT-OSS worker with health checks. Starts DynamoFrontendProcess and the worker, issues two chat completions with low/high reasoning_effort, compares reasoning metrics, and enhances test constants by adding the GPT_OSS model to TEST_MODELS.

Changes

Cohort / File(s)	Summary
Reasoning effort E2E test `tests/frontend/reasoning_effort/test_reasoning_effort.py`	New test module adding GPTOSSWorkerProcess (ManagedProcess) to launch a GPT-OSS worker with env and health checks; end-to-end test sends two chat completions (low/high reasoning_effort), extracts reasoning metrics, validates structure, and asserts higher effort yields >= reasoning tokens/length; includes error handling helpers.
Test constants update `tests/utils/constants.py`	Adds public constant `GPT_OSS = "openai/gpt-oss-20b"` and includes it in `TEST_MODELS` alongside existing entries.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Test as PyTest
  participant Frontend as DynamoFrontendProcess
  participant Worker as GPTOSSWorkerProcess (GPT-OSS)
  participant API as HTTP Endpoint

  Test->>Frontend: start()
  Test->>Worker: start() + health checks
  Note over Worker,Frontend: Both processes running

  Test->>API: POST /chat/completions (reasoning_effort="low")
  API->>Worker: Forward request
  Worker-->>API: Response (low-effort reasoning)
  API-->>Test: JSON response

  Test->>API: POST /chat/completions (reasoning_effort="high")
  API->>Worker: Forward request
  Worker-->>API: Response (high-effort reasoning)
  API-->>Test: JSON response

  Test->>Test: Extract metrics and compare<br/>(high >= low)
  Test->>Frontend: stop()
  Test->>Worker: stop()

  rect rgba(230,245,255,0.6)
  Note right of Test: Validates non-200 errors, missing fields,<br/>and invalid JSON in health checks.
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit taps the test bench, quick and bright,
Spinning up workers in the moonlit night.
“Low” to “High,” I nibble token trails,
Count the thoughts where reasoning prevails.
With GPT-OSS now in the warren’s lore,
I thump—asserts green—then hop for more! 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	While the description includes the required template headings, the Details and Where should the reviewer start sections remain empty placeholders and the Related Issues reference uses a placeholder issue number, leaving the content incomplete and uninformative for reviewers.	Please populate the Details section with a summary of the actual changes, specify which files or sections reviewers should focus on under Where should the reviewer start, and replace the placeholder issue reference with the real issue number.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly describes the main change by indicating the addition of an end-to-end test for the reasoning_effort feature specific to the gpt-oss model, is concise, and avoids unnecessary detail or noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c48f49a and bccf2bb.

📒 Files selected for processing (2)

tests/frontend/reasoning_effort/test_reasoning_effort.py (1 hunks)
tests/utils/constants.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/frontend/reasoning_effort/test_reasoning_effort.py (3)

tests/utils/managed_process.py (1)

ManagedProcess (71-568)

tests/utils/payloads.py (1)

check_models_api (191-202)

tests/conftest.py (2)

runtime_services (218-221)

predownload_models (109-121)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3421/merge) by zhongdaor-nv.

tests/frontend/reasoning_effort/test_reasoning_effort.py

[error] 1-1: isort formatting changed imports in test_reasoning_effort.py.

[error] 1-1: black formatting changed/reformatted test_reasoning_effort.py.

🪛 Ruff (0.13.3)

tests/frontend/reasoning_effort/test_reasoning_effort.py

129-129: Avoid specifying long messages outside the exception class

(TRY003)

139-139: Avoid specifying long messages outside the exception class

(TRY003)

151-151: Avoid specifying long messages outside the exception class

(TRY003)

160-160: Unused function argument: runtime_services

(ARG001)

160-160: Unused function argument: predownload_models

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: trtllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: sglang
GitHub Check: vllm (arm64)
GitHub Check: vllm (amd64)
GitHub Check: Build and Test - dynamo

tests/utils/constants.py

Signed-off-by: zhongdaor <[email protected]>

tests/frontend/reasoning_effort/test_reasoning_effort.py

Signed-off-by: zhongdaor <[email protected]>

Signed-off-by: zhongdaor-nv <[email protected]>

Signed-off-by: zhongdaor <[email protected]> Signed-off-by: zhongdaor-nv <[email protected]> Signed-off-by: Piotr Tarasiewicz <[email protected]>

chore: add e2e test for reasoning_effort for gpt-oss model

bccf2bb

Signed-off-by: zhongdaor <[email protected]>

pull-request-size bot added the size/L label Oct 4, 2025

zhongdaor-nv marked this pull request as ready for review October 4, 2025 00:02

zhongdaor-nv requested review from a team as code owners October 4, 2025 00:02

github-actions bot added the chore label Oct 4, 2025

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

tests/utils/constants.py Show resolved Hide resolved

black

a0eaef1

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 4, 2025 00:13 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 4, 2025 00:14 Inactive

kthui approved these changes Oct 4, 2025

View reviewed changes

tests/frontend/reasoning_effort/test_reasoning_effort.py Outdated Show resolved Hide resolved

copy DynamoFrontendProcess rather than import it

fb03309

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 19:27 Inactive

Merge branch 'main' into zhongdaor/test-gpt-oss-reasoning-effort

db92873

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 19:27 Inactive

black

56115ff

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 19:31 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 19:49 Inactive

GuanLuo approved these changes Oct 6, 2025

View reviewed changes

Merge branch 'main' into zhongdaor/test-gpt-oss-reasoning-effort

d2628e3

Signed-off-by: zhongdaor-nv <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 7, 2025 16:57 Inactive

Merge branch 'main' into zhongdaor/test-gpt-oss-reasoning-effort

fb4c278

copy-pr-bot bot temporarily deployed to GITLAB October 7, 2025 17:03 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 7, 2025 17:16 Inactive

zhongdaor-nv merged commit 89e7dab into main Oct 7, 2025
20 of 21 checks passed

zhongdaor-nv deleted the zhongdaor/test-gpt-oss-reasoning-effort branch October 7, 2025 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: add e2e test for reasoning_effort for gpt-oss model #3421

chore: add e2e test for reasoning_effort for gpt-oss model #3421

Uh oh!

zhongdaor-nv commented Oct 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: add e2e test for reasoning_effort for gpt-oss model #3421

chore: add e2e test for reasoning_effort for gpt-oss model #3421

Uh oh!

Conversation

zhongdaor-nv commented Oct 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 4, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhongdaor-nv commented Oct 4, 2025 •

edited by coderabbitai bot

Loading