Skip to content

Conversation

@LouisTsai-Csie
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Nov 28, 2025

🗒️ Description

Continuing the discussion from this comment: based on @danceratopz’s feedback, if we specify the --ignore tests/benchmark flag in the ini file for fill, pytest cannot later revert this behavior.

I looked into this and found a special hook, pytest_ignore_collect, which allows us to dynamically control which paths should be ignored during test collection. Using this hook, if a test path is not explicitly included under tests/benchmark, we can make all benchmark tests being ignored by default.

With this approach, we can remove the benchmark and stateful markers, as well as the filtering logic in conftest.py, which has already become overly complicated.

I tested this draft PR in several combinations:

  • fill vs. execute
  • normal benchmark vs. fixed-opcode-count benchmark
  • repricing marker or not

All combinations behave as expected with the new filtering logic.

My main concern is that this change may affect the current documentation structure, since the docs relies on the benchmark marker as a reference. But it seems the docs now is still under maintenance.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.45%. Comparing base (f23e4ab) to head (0d63752).
⚠️ Report is 4 commits behind head on forks/osaka.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/osaka    #1821       +/-   ##
================================================
- Coverage        86.08%   53.45%   -32.63%     
================================================
  Files              743      743               
  Lines            44076    44076               
  Branches          3891     3891               
================================================
- Hits             37941    23559    -14382     
- Misses            5657    20306    +14649     
+ Partials           478      211      -267     
Flag Coverage Δ
unittests 53.45% <ø> (-32.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +44 to +66
try:
collection_path.relative_to(config.rootpath / "tests" / "benchmark")
except ValueError:
return None

# Check if benchmark tests explicitly specified in command line arguments
benchmark_path = config.rootpath / "tests" / "benchmark"
for arg in config.args:
arg_path = Path(arg)
# Check absolute paths
if arg_path.is_absolute():
try:
arg_path.relative_to(benchmark_path)
# Explicitly specified, set op_mode and don't ignore
config.op_mode = OpMode.BENCHMARKING # type: ignore[attr-defined]
return False
except ValueError:
continue
# Check relative paths containing 'benchmark'
elif "benchmark" in arg:
# Explicitly specified, set op_mode and don't ignore
config.op_mode = OpMode.BENCHMARKING # type: ignore[attr-defined]
return False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step checks whether the benchmark path is explicitly specified, but i think this is not the best approach

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to keep consensus test and benchmark test filling mutually exclusive, then I think this approach is about as good as it gets 👍 It's still a bit tricky, but much cleaner than the existing marker-based method!

I'll circle back to this PR later, but a few initial comments below.

The content of ./docs aren't currently hosted anywhere, but we should updated them to reflect the changes in this PR. Can you update them and remove any references to -m benchmark? See docs/writing_tests/benchmarks.md.

Comment on lines -39 to -49
gen_docs = config.getoption("--gen-docs", default=False)

if gen_docs:
for item in items:
if (
benchmark_dir in Path(item.fspath).parents
and not item.get_closest_marker("benchmark")
and not item.get_closest_marker("stateful")
):
item.add_marker(benchmark_marker)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is pretty ugly, but this PR will have to add some "gen_docs" logic so that ./tests/benchmark is collected if we're generating documentation, so that the docs contain all documentation.

return None

# Check if benchmark tests explicitly specified in command line arguments
benchmark_path = config.rootpath / "tests" / "benchmark"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could define this above the try block above and use it in the try block, too.

collection_path.relative_to(config.rootpath / "tests" / "benchmark")
except ValueError:
return None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the failing documentation generation workflow, if fill is running in documentation mode (i.e., config.getoption("--gen-docs", default=False) is True), then we can simply return False here to not ignore benchmark tests and have them included in the documentation.

except ValueError:
continue
# Check relative paths containing 'benchmark'
elif "benchmark" in arg:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather loose check, perhaps we could check for "benchmark/compute" or "benchmark/stateful". Of course, it'll break if we add another valid benchmark sub-folder here.

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this functionality needs some unit tests to check collection is working as expected for the benchmark case. I think we can add update these tests:
packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py
to test the logic in this PR. We could add an additional valid test module in tests/benchmark/ and test collection/fixture generation works as expected:
https://github.com/danceratopz/execution-specs/blob/519eb26e663f3bbb4d10a230c8cf0ac0f4c94f9b/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py#L472-L474

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants