Skip to content

test: make xdist the default Makefile test path#1970

Merged
Pouyanpi merged 14 commits into
NVIDIA-NeMo:developfrom
Pouyanpi:pouyanpi/test/parallel-unit-test-target
Jun 5, 2026
Merged

test: make xdist the default Makefile test path#1970
Pouyanpi merged 14 commits into
NVIDIA-NeMo:developfrom
Pouyanpi:pouyanpi/test/parallel-unit-test-target

Conversation

@Pouyanpi

@Pouyanpi Pouyanpi commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Make the default Makefile test path run the suite in parallel with pytest-xdist, while keeping an explicit serial pytest escape hatch.

What Changed

  • make test now runs non-live pytest with xdist.
  • make test-parallel remains available as an alias for make test.
  • make test-serial runs plain pytest without xdist or live-env filtering.
  • Live provider keys and live-mode flags are cleared for the non-live parallel path.
  • Default FastEmbed usage in tests is replaced with deterministic embeddings unless a test explicitly opts into real embeddings.
  • Shared test state is isolated so the suite behaves correctly under xdist.
  • Real language-detection checks are gated so they do not run in normal unit-test mode.

Why

The recommended local and agent test path should be fast, deterministic, and safe to run without live provider credentials. This makes make test the project-level test command, while preserving make test-serial for debugging raw pytest behavior.

Impact

Local full-suite timing:

  • Serial pytest path: 138.24s
  • Parallel Makefile path: 29s

This makes the recommended local and agent test command about 4.8x faster on the measured machine.

The same change also reduces CI wall time by switching the reusable workflow from raw poetry run pytest to Makefile targets:

  • non-coverage jobs run make test
  • the coverage job runs make test-coverage

The current CI matrix runs PR tests across 4 Ubuntu/Python jobs, full tests across 8 Windows/macOS Python jobs, and latest-deps tests across 12 OS/Python jobs.

CI timing comparison:

Job Baseline job Current job Baseline test step Current test step
Ubuntu / Python 3.10 non-coverage 4m55s 2m18s 4m01s 1m49s
Ubuntu / Python 3.11 coverage 7m43s 3m20s 6m55s 2m44s
Ubuntu / Python 3.12 non-coverage 3m33s 1m57s 2m39s 1m31s
Ubuntu / Python 3.13 non-coverage 4m09s 2m01s 3m09s 1m31s

Note: the measured coverage run completed pytest successfully and failed later during Codecov upload.

Validation

make help
env LIVE_TEST=1 OPENAI_API_KEY=sk-test make test TEST=tests/test_embeddings_openai.py WORKERS=1 ARGS='-q --tb=short'
make test TEST=tests/test_utils.py WORKERS=1 ARGS='-q'

Summary by CodeRabbit

Release Notes

  • Tests

    • Added support for parallel test execution with configurable workers
    • Introduced pytest markers (serial, slow, live, real_embeddings) for organizing and filtering tests
    • Enhanced test determinism with deterministic embedding provider
    • Improved test isolation with automated state cleanup between tests
  • Chores

    • Refactored test targets and GitHub Actions workflows for consistency

@Pouyanpi Pouyanpi self-assigned this Jun 2, 2026
@Pouyanpi Pouyanpi added this to the v0.23.0 milestone Jun 2, 2026
@Pouyanpi Pouyanpi requested a review from tgasser-nv June 2, 2026 10:37
@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi marked this pull request as ready for review June 2, 2026 10:41
@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR makes make test the canonical local and CI test command by switching it to parallel execution via pytest-xdist, while keeping make test-serial as a debugging escape hatch. Live provider credentials are cleared via env -u in UNIT_TEST_ENV, and a new DeterministicEmbeddingSearchProvider replaces FastEmbed in unit tests for reproducibility and speed.

  • Makefile introduces test, test-parallel, test-serial, test-coverage, and test-benchmark targets with configurable WORKERS, DIST, and ARGS variables; make test reports a 4.8× speedup over the serial path.
  • tests/conftest.py adds autouse fixtures that reset context variables between tests and intercept LLMRails._get_embeddings_search_provider_instance via monkeypatch so deterministic hash-embeddings are used by default, with an opt-out via @pytest.mark.real_embeddings.
  • CI workflow updates replace bare poetry run pytest -v calls with make test / make test-coverage, and lint.yml is aligned to the renamed pre-commit target.

Confidence Score: 5/5

Safe to merge. All test infrastructure changes are additive and isolated to the test harness; no production code paths are modified.

The changes are well-scoped to the test toolchain: Makefile targets, conftest fixtures, and CI YAML. The deterministic embedding provider is gated behind an autouse monkeypatch that is automatically undone per-test, and live-credential clearing is handled by env -u in UNIT_TEST_ENV. No new behavioral regressions were identified in the changed files.

No files require special attention. tests/v2_x/test_state_serialization.py still carries a skipped performance test, but that decision is tracked and acceptable given the discussion in the PR thread.

Important Files Changed

Filename Overview
Makefile Restructured test targets: test now runs xdist in parallel with API keys stripped; adds test-serial, test-benchmark, and other named targets. UNIT_TEST_ENV uses env -u (requires POSIX shell, acknowledged in comments). TEST= defaulting to empty correctly delegates to pytest.ini testpaths.
tests/conftest.py Adds three autouse fixtures to reset context variables between tests, plus use_deterministic_embeddings_for_default_fastembed that patches LLMRails._get_embeddings_search_provider_instance via monkeypatch. Correctly opts out for real_embeddings-marked tests and live env vars. No yield needed since monkeypatch finalizer handles cleanup.
tests/testing/embeddings.py New DeterministicEmbeddingSearchProvider using hash-based 64-dim embeddings for reproducible test results. Class docstring explicitly documents the float('inf') default threshold behavior. strict=True on zip is compatible with Python >= 3.10 as required by the project.
pytest.ini Adds four pytest markers. serial is explicitly labeled documentation-only (not enforced by xdist). real_embeddings gates tests that opt out of the deterministic provider. Testpaths unchanged from baseline.
.github/workflows/_test.yml Replaces bare poetry run pytest invocations with make test / make test-coverage. Shell is bash globally so env -u in the Makefile works on all runner OSes.
tests/v2_x/test_state_serialization.py test_serialization_performance remains skipped with the original tight avg_time < 0.2 budget. The module-level config object is not mutated by the new fixture approach, which is an improvement.
pyproject.toml Adds pytest-xdist = "^3.8.0" to dev dependencies. Existing pytest-cov >= 4.1.0 is sufficient for xdist-aware coverage collection.

Reviews (6): Last reviewed commit: "ci: preserve pytest testpaths in make te..." | Re-trigger Greptile

Comment thread tests/conftest.py Outdated
Comment thread tests/testing/embeddings.py
Comment thread tests/v2_x/test_state_serialization.py
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR modernizes the test infrastructure by introducing deterministic embeddings for reproducible tests, refactoring CI/CD to use make targets, improving test isolation through fixture cleanup, and updating tests to be resilient to pre-existing LLM activity. The changes span build configuration, test utilities, fixtures, and individual test updates across the test suite.

Changes

Test Infrastructure Modernization

Layer / File(s) Summary
Build system and CI/CD targets
Makefile, .github/workflows/_test.yml, .github/workflows/lint.yml, pyproject.toml, pytest.ini
Makefile refactored with new test targets (test, test-coverage, test-watch, etc.), configurable variables for parallelism and coverage, and a warm-fastembed-cache target. GitHub Actions workflows updated to invoke make targets instead of direct poetry commands. pyproject.toml adds pytest-xdist dependency for parallel execution. pytest.ini registers custom markers (serial, slow, live, real_embeddings).
Deterministic test embeddings provider
tests/testing/embeddings.py
New module implements DeterministicEmbeddingSearchProvider with token-hash-based 64-dimensional embeddings, in-memory item storage, and similarity-based search for reproducible embedding behavior in tests.
Conftest fixtures and context cleanup
tests/conftest.py
Imports deterministic embedding utilities and adds three autouse fixtures: reset_tool_calls_var() and reset_explain_info_var() reset context variables per test, and use_deterministic_embeddings_for_default_fastembed() conditionally monkeypatches LLMRails to replace default FastEmbed providers with the deterministic test provider unless tests are marked @pytest.mark.real_embeddings or real embedding environment variables are set.
IORails and request-id fixtures
tests/guardrails/test_iorails.py, tests/guardrails/test_request_id.py
IORails fixture split into synchronous iorails_sync and asynchronous iorails variants with proper teardown. New autouse fixture clean_request_id_context() resets request ID ContextVar to "no-req-id" per test.
Test updates for pre-existing LLM call resilience
tests/test_bug_rail_flows_in_prompt.py, tests/test_embeddings_only_user_messages.py, tests/test_general_instructions.py, tests/test_retrieve_relevant_chunks.py
Tests refactored to capture baseline llm_calls count before execution, then slice llm_calls to only inspect newly added calls, making tests resilient to fixture-level LLM activity.
Content safety tests separation
tests/test_content_safety_actions.py
Refactored to separate unit tests (validating _detect_language via sys.modules patching) from live tests. New requires_live_test marker gates live test classes; only runs when LIVE_TEST environment variable is set.
Test marker registration and config updates
tests/test_actions_llm_embedding_lazy_init.py, tests/test_configs/simple_server/config_1/config.py, tests/test_configs/simple_server_2_x/config_2/config.py, tests/test_configs/with_custom_embedding_search_provider/config.py, tests/v2_x/test_llm_embedding_lazy_init.py
Test classes marked with @pytest.mark.real_embeddings to categorize embedding-related tests. New test config modules added for custom LLM providers. Existing embedding search provider config updated to accept optional threshold parameter.
State serialization test refactoring
tests/v2_x/test_state_serialization.py
Initial event processing extracted into _process_initial_events() helper for test reuse. Performance test marked with @pytest.mark.skip due to flakiness.
TestChat constructor isolation test
tests/test_testing_chat_harness.py
New test test_testchat_constructor_does_not_replace_active_explain_context() verifies that constructing multiple TestChat instances from the same config does not replace explain context for already-active instances.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

enhancement

Suggested reviewers

  • tgasser-nv
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test: make xdist the default Makefile test path' directly and clearly describes the main change: making xdist (parallel testing) the default in the Makefile's test target.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes ✅ Passed Major test infrastructure changes are well-documented with performance results (4.8x speedup: 138.24s→29s) and CI validation details in PR description.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/test_content_safety_actions.py (1)

322-325: ⚡ Quick win

Replace setattr with direct attribute assignment.

Ruff flags this as B010 (setattr with a constant attribute name), which may fail the lint/pre-commit job in CI.

♻️ Proposed fix
 def _fast_langdetect_module(detect):
     module = ModuleType("fast_langdetect")
-    setattr(module, "detect", detect)
+    module.detect = detect
     return module
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_content_safety_actions.py` around lines 322 - 325, In
_fast_langdetect_module, avoid using setattr with a constant name to satisfy
Ruff B010: replace the setattr(module, "detect", detect) call with a direct
attribute assignment on the ModuleType instance (assign module.detect = detect)
so the function still exposes detect but no longer triggers the lint rule.
Makefile (2)

22-22: 💤 Low value

Consider documenting the worksteal distribution strategy.

The DIST ?= worksteal default may not be familiar to all users. The worksteal strategy can provide better load balancing but may behave differently than the default load strategy for certain test patterns.

Consider adding a brief comment above line 11 explaining when users might want to override DIST:

+# DIST controls pytest-xdist scheduling: worksteal (default, better load balancing) or load (simpler, predictable order)
 DIST ?= worksteal
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 22, Add a short explanatory comment near the Makefile
variable definition for DIST (currently defaulting to "worksteal") describing
that this sets pytest-xdist's --dist strategy, briefly stating what "worksteal"
does (dynamic task stealing for better load balance) and when to override it
(e.g., use "load" for deterministic distribution or when tests vary widely in
duration); reference the Makefile variables UNIT_TEST_ENV, PYTEST, WORKERS,
DIST, ARGS, TEST so readers understand this affects the pytest invocation `-n
$(WORKERS) --dist $(DIST)`.

14-15: ⚡ Quick win

Makefile: env -u is compatible with this repo’s Windows CI, but document the shell requirement

UNIT_TEST_ENV uses env -u (Makefile lines 14-15). The repo’s Windows test workflows run steps with defaults.run.shell: bash, and they execute make test / make test-coverage on Windows runners, so the Unix-style env usage matches CI. To avoid local Windows surprises (PowerShell/CMD), document that make targets assume a Unix-like shell such as Git Bash/WSL (or add a fallback for non-bash environments).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 14 - 15, The Makefile uses UNIT_TEST_ENV (the variable
defined with "env -u") which requires a Unix-like shell; update the Makefile to
either document this requirement at the top or add a portable fallback: add a
brief comment above UNIT_TEST_ENV stating that make targets assume a Unix-like
shell (e.g., bash/Git Bash/WSL) for local Windows usage, or implement a
conditional fallback for non-Unix shells (detect shell/platform and use an
equivalent approach or skip env -u) so make test and make test-coverage work or
clearly instruct developers to run make under bash-compatible shells.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/testing/embeddings.py`:
- Line 82: The zip call over text_embedding and item_embedding should use
strict=True to fail fast on length mismatches; update the generator expression
in the return statement (the zip(text_embedding, item_embedding) call) to
zip(text_embedding, item_embedding, strict=True) so mismatched lengths raise
immediately while keeping behavior the same when lengths match.

---

Nitpick comments:
In `@Makefile`:
- Line 22: Add a short explanatory comment near the Makefile variable definition
for DIST (currently defaulting to "worksteal") describing that this sets
pytest-xdist's --dist strategy, briefly stating what "worksteal" does (dynamic
task stealing for better load balance) and when to override it (e.g., use "load"
for deterministic distribution or when tests vary widely in duration); reference
the Makefile variables UNIT_TEST_ENV, PYTEST, WORKERS, DIST, ARGS, TEST so
readers understand this affects the pytest invocation `-n $(WORKERS) --dist
$(DIST)`.
- Around line 14-15: The Makefile uses UNIT_TEST_ENV (the variable defined with
"env -u") which requires a Unix-like shell; update the Makefile to either
document this requirement at the top or add a portable fallback: add a brief
comment above UNIT_TEST_ENV stating that make targets assume a Unix-like shell
(e.g., bash/Git Bash/WSL) for local Windows usage, or implement a conditional
fallback for non-Unix shells (detect shell/platform and use an equivalent
approach or skip env -u) so make test and make test-coverage work or clearly
instruct developers to run make under bash-compatible shells.

In `@tests/test_content_safety_actions.py`:
- Around line 322-325: In _fast_langdetect_module, avoid using setattr with a
constant name to satisfy Ruff B010: replace the setattr(module, "detect",
detect) call with a direct attribute assignment on the ModuleType instance
(assign module.detect = detect) so the function still exposes detect but no
longer triggers the lint rule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c971d245-a681-452b-aabd-144041bb217b

📥 Commits

Reviewing files that changed from the base of the PR and between 8082e74 and 8128bd8.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (21)
  • .github/workflows/_test.yml
  • .github/workflows/lint.yml
  • Makefile
  • pyproject.toml
  • pytest.ini
  • tests/conftest.py
  • tests/guardrails/test_iorails.py
  • tests/guardrails/test_request_id.py
  • tests/test_actions_llm_embedding_lazy_init.py
  • tests/test_bug_rail_flows_in_prompt.py
  • tests/test_configs/simple_server/config_1/config.py
  • tests/test_configs/simple_server_2_x/config_2/config.py
  • tests/test_configs/with_custom_embedding_search_provider/config.py
  • tests/test_content_safety_actions.py
  • tests/test_embeddings_only_user_messages.py
  • tests/test_general_instructions.py
  • tests/test_retrieve_relevant_chunks.py
  • tests/test_testing_chat_harness.py
  • tests/testing/embeddings.py
  • tests/v2_x/test_llm_embedding_lazy_init.py
  • tests/v2_x/test_state_serialization.py

Comment thread tests/testing/embeddings.py Outdated
@NVIDIA-NeMo NVIDIA-NeMo deleted a comment from greptile-apps Bot Jun 2, 2026

@tgasser-nv tgasser-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some updates to add missing tests and remove performance-test running as a unit-test.

Also could you run on Windows and make sure there are no issues with the Windows Makefile operation vs MacOS or Linux? The Windows checks aren't part of the PR tests

- name: Run pytest without coverage
if: inputs.with-coverage == false
run: poetry run pytest -v
run: make test

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The poetry run pytest -v command this replaces uses the pytest.ini testpaths: tests, docs/colang-2/examples, benchmark/tests ).

The new Makefile target specifies tests/ if nothing is provided (line 8). So the Makefile overrides pytest's default behaviour of pulling the test paths from the pytest.ini file.

The net effect is the docs/colang-2/examples and benchmark/tests dirs are dropped. The docs/colang-2/examples directory doesn't seem to have any tests, but the benchmark/tests one does. This drops 239 tests from CI coverage.

Recommend not providing the TESTS arg in the Makefile to pytest to remain compatible with the previous behaviour

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't include docs/colang-2 .

I added a separate make test-benchmark target and a path-filtered benchmark workflow instead of folding benchmark/tests into make test (see a3cdef2).

benchmark/tests only cover benchmark tooling and do not import core nemoguardrails code.
keeping them separate preserves make test as the core library suite.
while still ensuring benchmark tooling is tested whenever benchmark code, dependencies, the Makefile, or the workflow changes. we also don't need the full matrix for benchmarking. What do you think Tim?

the new target passed localy: 239 passed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have tests for everything in the repo under the same Makefile for simplicity.



@pytest.mark.asyncio
async def test_serialization_performance():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have performance tests as part of our unit-tests. This threshold magic-number (2.0) has been adjusted upwards many times, and can cause flaky CI results. If we have to make the threshold so high that it always passes, it neither verifies functionality nor performance. Recommend removing this test

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll drop it once more 👍🏻

I need to let @greptileai know about this. It's a bad code change suggestion @greptileai, originally this flaky test was skipped and your review kind of cause this issues. see 8128bd8

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to flag that, and I apologize for the confusion my earlier suggestion caused. The sequence here is clear: the test was flaky, you skipped it, and a prior bot review comment led to it being unskipped — which is the wrong direction. tgasser-nv's feedback is correct.

Dropping test_serialization_performance entirely is the right call. A performance threshold that has been bumped to 2.0 to avoid flakiness provides no signal — it's neither a functional test nor a meaningful perf guard at that point.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeRabbit also note this down about flaky tests, specially performance tests being part of unit-tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pouyanpi, noted! I've recorded this guidance for future reviews.

(ᓀ ᓀ) 🐇


✏️ Learnings added
Learnt from: Pouyanpi
Repo: NVIDIA-NeMo/Guardrails PR: 1970
File: tests/v2_x/test_state_serialization.py:144-144
Timestamp: 2026-06-04T08:31:21.557Z
Learning: In the NVIDIA-NeMo/Guardrails repository, performance/wall-clock timing tests must NOT be included as part of the unit-test suite. Such tests are inherently flaky on shared CI infrastructure, and thresholds that get repeatedly bumped (e.g., `test_serialization_performance` in `tests/v2_x/test_state_serialization.py` where the threshold was raised to `2.0`) provide neither functional coverage nor meaningful performance signal. Any existing performance test of this kind should be permanently removed or kept skipped (e.g., `pytest.mark.skip(reason="Flaky wall-clock performance assertion.")`). Do not suggest re-enabling or unskipping such tests during code review.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: Pouyanpi
Repo: NVIDIA-NeMo/Guardrails PR: 1814
File: tests/integrations/langchain/test_openai_param_filter.py:51-52
Timestamp: 2026-04-23T09:43:00.117Z
Learning: In NVIDIA-NeMo/Guardrails tests, recognize that env-based feature flags may be intentionally implemented using the truthiness pattern `os.environ.get("VAR_NAME") or os.environ.get("OTHER_VAR")` (where the env var values are treated as non-empty strings, e.g., `LIVE_TEST_MODE`, `TEST_LIVE_MODE`). During code review, avoid requesting per-file boolean normalization changes for this pattern; any correction should be handled as a repository-wide cleanup rather than a localized refactor in a single test file.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we still have this test in as of the latest commit, can you remove it @Pouyanpi ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@Pouyanpi

Pouyanpi commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

@tgasser-nv full tests were run:

https://github.com/NVIDIA-NeMo/Guardrails/actions/runs/26814510332
https://github.com/NVIDIA-NeMo/Guardrails/actions/runs/26814315586

@Pouyanpi Pouyanpi force-pushed the pouyanpi/test/parallel-unit-test-target branch from a3cdef2 to 5d3cf1f Compare June 4, 2026 08:27

@tgasser-nv tgasser-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running the Windows full-tests as well. Approving with two blocking items before merging:

  • Adding benchmark/tests back into the main Makefile.
  • Removing the test_serialization_performance test

assert output_events[0]["script"] == "Hello again!"


@pytest.mark.skip(reason="Flaky wall-clock performance assertion.")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgasser-nv this is already skipped.

@Pouyanpi

Pouyanpi commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

@tgasser-nv

I agree that the Makefile should be the central CI entrypoint, and that is a goal of this PR.

I do not think benchmark/tests should be folded into the core make test target or the full unittest matrix. The previous bare poetry run pytest -v command picked up benchmark/tests through pytest.ini testpaths, but that looks like accidental coupling rather than an intentional CI contract. These tests validate benchmark
tooling, not core library behavior.

these tests validate benchmark tooling, not core library behavior. they also do not appear to be reviewed with the same rigor or stability expectations as the core, and the benchmark directory is not part of the package we ship. treating them as part of the core test gate would blur that quality boundary and add matrix cost without much signal.

the compromise II’m aiming for is:

  • keep make test focused on the core tests/ suite,
  • keep make test-benchmark in the same Makefile,
  • run benchmark tests through a dedicated workflow when benchmark-related files, dependencies, pytest config, the makefile, or the workflow itself change.

devs can still run bare poetry run pytest if they want pytest’s default testpaths behavior locally. For CI, the Makefile gives us explicit, named targets for the suites we intentionally run. What do you think?

the performance test is actually skipped.

@tgasser-nv

Copy link
Copy Markdown
Collaborator

@tgasser-nv

I agree that the Makefile should be the central CI entrypoint, and that is a goal of this PR.

I do not think benchmark/tests should be folded into the core make test target or the full unittest matrix. The previous bare poetry run pytest -v command picked up benchmark/tests through pytest.ini testpaths, but that looks like accidental coupling rather than an intentional CI contract. These tests validate benchmark tooling, not core library behavior.

No, it was intentional.

these tests validate benchmark tooling, not core library behavior. they also do not appear to be reviewed with the same rigor or stability expectations as the core

Provide evidence for this statement. I've never asked for relaxed or less rigorous reviews for benchmark PRs. Tests are tests.

... the benchmark directory is not part of the package we ship.

We do ship the benchmarking tools via git clone, not via PyPI. They still have to work just as our PyPI package. We should have the same high standards for both.

treating them as part of the core test gate would blur that quality boundary and add matrix cost without much signal.

the compromise II’m aiming for is:

  • keep make test focused on the core tests/ suite,
  • keep make test-benchmark in the same Makefile,
  • run benchmark tests through a dedicated workflow when benchmark-related files, dependencies, pytest config, the makefile, or the workflow itself change.

devs can still run bare poetry run pytest if they want pytest’s default testpaths behavior locally. For CI, the Makefile gives us explicit, named targets for the suites we intentionally run. What do you think?

Your compromise is predicated on 1. benchmark tests being added to CI unintentionally, 2. Our not shipping benchmarking tools via PyPI packages justifying a lower code-quality. Neither are true.

There's no need to compromise, you're trying to justify a regression in baheviour.

@tgasser-nv tgasser-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment #1970 (comment)

@Pouyanpi

Pouyanpi commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

@tgasser-nv do you see any cases that we need to run benchmark related tests except

when benchmark-related files, dependencies, pytest config, the makefile, or the workflow itself change.

?

I don't understand why we should run those tests as part of the core suite given the full matrix. Do we really need to run the benchmarking test on 4 python versions * 3 different OSs when the change is made to a file under nemoguardrails/? They don't depend on a single line of code in nemoguardrails.

Do we also need codecov coverage reports for those?

@Pouyanpi Pouyanpi force-pushed the pouyanpi/test/parallel-unit-test-target branch from c7ed441 to 5f73784 Compare June 5, 2026 15:22
@Pouyanpi

Pouyanpi commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

@tgasser-nv

make test and `make test-coverage`` no longer force TEST ?= tests/, so pytest now uses pytest.ini testpaths again. that means benchmark/tests is back in the main CI test path. I also removed the separate benchmark workflow so those tests do not run twice.

committed as:

c7ed441 ci: preserve pytest testpaths in make test

validation:

make test TEST=benchmark/tests WORKERS=2 ARGS='-q --maxfail=1'
239 passed

make test WORKERS=2 ARGS='--collect-only -q'
4661 tests collected

make test ARGS='-q --maxfail=1'
4481 passed, 180 skipped

If you think this is the right call please approve and merge the PR (or I'll merge when I'm back). ow I can revert this commit. TY!

@tgasser-nv tgasser-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the updates, approving

@Pouyanpi Pouyanpi merged commit 1839dd2 into NVIDIA-NeMo:develop Jun 5, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants