Skip to content

Conversation

kantai
Copy link
Contributor

@kantai kantai commented Aug 20, 2025

This should fix the miner_constructs_replay_block test in CI. It does two things:

  1. Updates the nakamoto relayer so that it doesn't try to spawn a nakamoto mining thread before nakamoto actually starts.
  2. Adds a "no block" stalling directive to the tests, so that the stall directive doesn't depend on having a mining thread actually be running.

@kantai kantai requested review from a team as code owners August 20, 2025 15:45
@kantai kantai changed the title Test: Fix miner replay Test: Fix miner_constructs_replay_block Aug 20, 2025
@obycode
Copy link
Contributor

obycode commented Aug 20, 2025

Adds a "no block" stalling directive to the tests, so that the stall directive doesn't depend on having a mining thread actually be running.

Why is there no mining thread running?

@kantai
Copy link
Contributor Author

kantai commented Aug 20, 2025

Why is there no mining thread running?

In this case, it's because the nakamoto relay thread starts up at block 230 so that it can submit a nakamoto block commit for block 231 (the block at which nakamoto activates). But, even though the miner won block 230, it was pre-nakamoto, so the miner thread either shouldn't start at all (a change this PR makes) or starts up but errors before ever flipping the fault injection flag from PENDING to STALLED.

@obycode
Copy link
Contributor

obycode commented Aug 20, 2025

What if instead of this change, we add the logic in the Neon miner to also stall based on this flag and switch the flag from Pending to NotStalled? I worry about this new function being a footgun causing more flaky tests in the future.

Or is the Neon miner not running either in this case?

@kantai
Copy link
Contributor Author

kantai commented Aug 27, 2025

Or is the Neon miner not running either in this case?

So the sequence for the (unpatched) version of this test where things pass is something like:

  1. Miner submits block commit for last block in 2.5
  2. Nakamoto coordinator, node, miner boots up
  3. Nakamoto relayer thread submits block commit
  4. Test thread issues the stall directive, waiting for a miner thread to acknowledge the directive
  5. Nakamoto relayer thread spawns a nakamoto miner thread for the block commit from (1).
  6. Thread reaches the stall directive and acknowledges it, the test can proceed.

The sequence where things fail is something like:

  1. Miner submits block commit for last block in 2.5
  2. Nakamoto coordinator, node, miner boots up
  3. Nakamoto relayer thread submits block commit
  4. Nakamoto relayer thread spawns a nakamoto miner thread for the block commit from (1).
  5. This thread aborts because the block commit from (1) is not valid for mining a Nakamoto tenure
  6. Test thread issues the stall directive, waiting for a miner thread to acknowledge the directive
  7. Stall until timeout. Test failure.

So, there is no nakamoto miner thread and there is no epoch 2.x thread.

Note, the miner thread would have aborted in the "success" case if it didn't hit the stall directive. The "race" is between the test thread discovering that a nakamoto block commit has been issued+issuing the stall and the relayer thread spawning a miner + attempting to mine. So, it's kind of lucky that the success case works at all because it has a test thread racing to get a directive set before the miner thread discovers that it should abort. We could weight the race more in the test thread's favor by moving the stall directive before the wait for block commits command, but I think its better to just not race here.

@aldur aldur requested review from hstove and jferrant September 2, 2025 14:36
@hstove hstove added this pull request to the merge queue Sep 2, 2025
Merged via the queue into stacks-network:develop with commit e16b7ef Sep 2, 2025
283 of 284 checks passed
Copy link

codecov bot commented Sep 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.40%. Comparing base (0752841) to head (0e5a630).
⚠️ Report is 201 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6399      +/-   ##
===========================================
+ Coverage    75.33%   79.40%   +4.07%     
===========================================
  Files          555      555              
  Lines       350915   350927      +12     
===========================================
+ Hits        264358   278666   +14308     
+ Misses       86557    72261   -14296     
Files with missing lines Coverage Δ
stacks-node/src/nakamoto_node/miner.rs 77.19% <100.00%> (-2.09%) ⬇️
stacks-node/src/nakamoto_node/relayer.rs 79.61% <100.00%> (+1.96%) ⬆️
stacks-node/src/tests/nakamoto_integrations.rs 79.83% <100.00%> (+5.98%) ⬆️

... and 256 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0752841...0e5a630. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants