refactor(benchmark): ignore `tests/benchmark` path in fill and drop markers #1821

LouisTsai-Csie · 2025-11-28T14:09:55Z

🗒️ Description

Continuing the discussion from this comment: based on @danceratopz’s feedback, if we specify the --ignore tests/benchmark flag in the ini file for fill, pytest cannot later revert this behavior.

I looked into this and found a special hook, pytest_ignore_collect, which allows us to dynamically control which paths should be ignored during test collection. Using this hook, if a test path is not explicitly included under tests/benchmark, we can make all benchmark tests being ignored by default.

With this approach, we can remove the benchmark and stateful markers, as well as the filtering logic in conftest.py, which has already become overly complicated.

I tested this draft PR in several combinations:

fill vs. execute
normal benchmark vs. fixed-opcode-count benchmark
repricing marker or not

All combinations behave as expected with the new filtering logic.

My main concern is that this change may affect the current documentation structure, since the docs relies on the benchmark marker as a reference. But it seems the docs now is still under maintenance.

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

codecov · 2025-11-28T14:48:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.45%. Comparing base (f23e4ab) to head (0d63752).
⚠️ Report is 4 commits behind head on forks/osaka.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/osaka    #1821       +/-   ##
================================================
- Coverage        86.08%   53.45%   -32.63%     
================================================
  Files              743      743               
  Lines            44076    44076               
  Branches          3891     3891               
================================================
- Hits             37941    23559    -14382     
- Misses            5657    20306    +14649     
+ Partials           478      211      -267

Flag	Coverage Δ
unittests	`53.45% <ø> (-32.64%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

LouisTsai-Csie · 2025-12-01T07:12:05Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

+    try:
+        collection_path.relative_to(config.rootpath / "tests" / "benchmark")
+    except ValueError:
+        return None
+
+    # Check if benchmark tests explicitly specified in command line arguments
+    benchmark_path = config.rootpath / "tests" / "benchmark"
+    for arg in config.args:
+        arg_path = Path(arg)
+        # Check absolute paths
+        if arg_path.is_absolute():
+            try:
+                arg_path.relative_to(benchmark_path)
+                # Explicitly specified, set op_mode and don't ignore
+                config.op_mode = OpMode.BENCHMARKING  # type: ignore[attr-defined]
+                return False
+            except ValueError:
+                continue
+        # Check relative paths containing 'benchmark'
+        elif "benchmark" in arg:
+            # Explicitly specified, set op_mode and don't ignore
+            config.op_mode = OpMode.BENCHMARKING  # type: ignore[attr-defined]
+            return False


This step checks whether the benchmark path is explicitly specified, but i think this is not the best approach

danceratopz

If we need to keep consensus test and benchmark test filling mutually exclusive, then I think this approach is about as good as it gets 👍 It's still a bit tricky, but much cleaner than the existing marker-based method!

I'll circle back to this PR later, but a few initial comments below.

The content of ./docs aren't currently hosted anywhere, but we should updated them to reflect the changes in this PR. Can you update them and remove any references to -m benchmark? See docs/writing_tests/benchmarks.md.

danceratopz · 2025-12-01T09:49:04Z

tests/benchmark/conftest.py

-    gen_docs = config.getoption("--gen-docs", default=False)
-
-    if gen_docs:
-        for item in items:
-            if (
-                benchmark_dir in Path(item.fspath).parents
-                and not item.get_closest_marker("benchmark")
-                and not item.get_closest_marker("stateful")
-            ):
-                item.add_marker(benchmark_marker)
-        return


This code is pretty ugly, but this PR will have to add some "gen_docs" logic so that ./tests/benchmark is collected if we're generating documentation, so that the docs contain all documentation.

danceratopz · 2025-12-01T09:50:29Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

+        return None
+
+    # Check if benchmark tests explicitly specified in command line arguments
+    benchmark_path = config.rootpath / "tests" / "benchmark"


We could define this above the try block above and use it in the try block, too.

danceratopz · 2025-12-01T09:55:31Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

+        collection_path.relative_to(config.rootpath / "tests" / "benchmark")
+    except ValueError:
+        return None
+


To address the failing documentation generation workflow, if fill is running in documentation mode (i.e., config.getoption("--gen-docs", default=False) is True), then we can simply return False here to not ignore benchmark tests and have them included in the documentation.

danceratopz · 2025-12-01T10:01:42Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

+            except ValueError:
+                continue
+        # Check relative paths containing 'benchmark'
+        elif "benchmark" in arg:


This is a rather loose check, perhaps we could check for "benchmark/compute" or "benchmark/stateful". Of course, it'll break if we add another valid benchmark sub-folder here.

danceratopz

Also, this functionality needs some unit tests to check collection is working as expected for the benchmark case. I think we can add update these tests:
packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py
to test the logic in this PR. We could add an additional valid test module in tests/benchmark/ and test collection/fixture generation works as expected:
https://github.com/danceratopz/execution-specs/blob/519eb26e663f3bbb4d10a230c8cf0ac0f4c94f9b/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py#L472-L474

temp

9fe664c

temp

0d63752

LouisTsai-Csie commented Dec 1, 2025

View reviewed changes

LouisTsai-Csie self-assigned this Dec 1, 2025

LouisTsai-Csie mentioned this pull request Dec 1, 2025

feat(fill,execute,benchmark): test selection via pytest's testpaths instead of markers via --mode #1616

Open

danceratopz reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(benchmark): ignore `tests/benchmark` path in fill and drop markers #1821

refactor(benchmark): ignore `tests/benchmark` path in fill and drop markers #1821

Uh oh!

LouisTsai-Csie commented Nov 28, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 28, 2025 •

edited

Loading

Uh oh!

LouisTsai-Csie Dec 1, 2025

Uh oh!

danceratopz left a comment

Uh oh!

danceratopz Dec 1, 2025

Uh oh!

danceratopz Dec 1, 2025

Uh oh!

danceratopz Dec 1, 2025

Uh oh!

danceratopz Dec 1, 2025

Uh oh!

danceratopz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor(benchmark): ignore tests/benchmark path in fill and drop markers #1821

Are you sure you want to change the base?

refactor(benchmark): ignore tests/benchmark path in fill and drop markers #1821

Uh oh!

Conversation

LouisTsai-Csie commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

codecov bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LouisTsai-Csie Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

danceratopz Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

danceratopz Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

danceratopz Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

danceratopz Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor(benchmark): ignore `tests/benchmark` path in fill and drop markers #1821

refactor(benchmark): ignore `tests/benchmark` path in fill and drop markers #1821

LouisTsai-Csie commented Nov 28, 2025 •

edited

Loading

codecov bot commented Nov 28, 2025 •

edited

Loading