Skip to content

Improve block proposal cancellation to (best effort) avoid concurrency issues#10219

Open
fab-10 wants to merge 2 commits intobesu-eth:mainfrom
fab-10:fix-cancelled-block-creation-exception
Open

Improve block proposal cancellation to (best effort) avoid concurrency issues#10219
fab-10 wants to merge 2 commits intobesu-eth:mainfrom
fab-10:fix-cancelled-block-creation-exception

Conversation

@fab-10
Copy link
Copy Markdown
Contributor

@fab-10 fab-10 commented Apr 10, 2026

Summary

Today I spent some time reviewing the exception reported below, and it ended up to be an interesting investigation, that led to an improvement of the cancellation path to avoid (as much as possible) this concurrency issue.
Usually, like in the reported case, the exception is not an issue, since there were previous proposals available to return, but in the case the cancellation happens on the very first block creation iteration, then it could result in an empty proposal returned, instead of something better.

"@timestamp":"2026-03-12T06:10:01,044","level":"INFO","thread":"vert.x-worker-thread-0","class":"MergeCoordinator","message":"Start building proposals for block 2401377 identified by 0x1c0bb588326184d4","throwable":""}
{"@timestamp":"2026-03-12T06:10:01,049","level":"INFO","thread":"vert.x-worker-thread-0","class":"AbstractEngineForkchoiceUpdated","message":"FCU(VALID) | head: 650a0.....c9b9d | safe: 82091.....26b9b | finalized: 3b6d9.....8994>
{"@timestamp":"2026-03-12T06:10:01,576","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x86faf7360d4e8783d0b9f575>
{"@timestamp":"2026-03-12T06:10:02,055","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x19c96da5d3bb50ce2c7c9fe0>
{"@timestamp":"2026-03-12T06:10:03,061","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x83a7f9a28e62a77c4e6a6ee3>
{"@timestamp":"2026-03-12T06:10:03,740","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0xde4f0918573c99276894e3e8>
{"@timestamp":"2026-03-12T06:10:04,520","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0xeca9b4dbd0ef06a5a7952d39>
{"@timestamp":"2026-03-12T06:10:05,410","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x5fe621c357a9c541fb3da0f6>
{"@timestamp":"2026-03-12T06:10:06,267","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0xde4cae0d767ec02060b6ae2c>
{"@timestamp":"2026-03-12T06:10:07,089","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x6ed9e875c8d875c73f2d48aa>
{"@timestamp":"2026-03-12T06:10:08,031","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0xee609340f1e4e6228ac660ed>
{"@timestamp":"2026-03-12T06:10:09,203","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x0f6bae3b7d0e19a3f0d98b60>
{"@timestamp":"2026-03-12T06:10:10,343","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x83adaef273d73dece22c1f65>
{"@timestamp":"2026-03-12T06:10:11,429","level":"INFO","thread":"EthScheduler-BlockCreation-0-2401377","class":"PostMergeContext","message":"New proposal for payloadId 0x1c0bb588326184d4 block 2401377 (0x4befc6807ed116dc658fe0f1>
{"@timestamp":"2026-03-12T06:10:12,029","level":"INFO","thread":"vert.x-worker-thread-0","class":"AbstractEngineGetPayload","message":"Produced #2,401,377  (4befc.....66005)|   37 tx | 16 ws | 50,704,142 (84.5%) gas in 1.585s | >
{"@timestamp":"2026-03-12T06:10:12,029","level":"WARN","thread":"EthScheduler-BlockCreation-0-2401377","class":"MergeCoordinator","message":"Something went wrong creating block for payload id 0x1c0bb588326184d4, error java.lang.>
        at org.hyperledger.besu.ethereum.blockcreation.AbstractBlockCreator.createBlock(AbstractBlockCreator.java:362)
        at org.hyperledger.besu.consensus.merge.blockcreation.MergeBlockCreator.createBlock(MergeBlockCreator.java:87)
        at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.lambda$tryToBuildBetterBlock$5(MergeCoordinator.java:434)
        at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.recoverableBlockCreation(MergeCoordinator.java:529)
        at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.retryBlockCreationUntilUseful(MergeCoordinator.java:496)
        at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.lambda$tryToBuildBetterBlock$6(MergeCoordinator.java:461)
        at org.hyperledger.besu.ethereum.eth.manager.EthScheduler.lambda$scheduleBlockCreationTask$11(EthScheduler.java:234)
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.util.ConcurrentModificationException
        at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1605)
        at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1628)
        at org.hyperledger.besu.ethereum.trie.pathbased.common.worldview.accumulator.PathBasedWorldStateUpdateAccumulator.commit(PathBasedWorldStateUpdateAccumulator.java:354)
        at org.hyperledger.besu.ethereum.mainnet.WithdrawalsProcessor.processWithdrawals(WithdrawalsProcessor.java:45)
        at org.hyperledger.besu.ethereum.blockcreation.AbstractBlockCreator.createBlock(AbstractBlockCreator.java:281)
        ... 10 more
","throwable":""}

When block creation is cancelled or times out, the selection thread may still be running briefly after FutureTask.get() returns. Without proper synchronisation, post-selection steps (withdrawals, EL requests, rewards processing) can race against the still-running thread on the shared world state.
Exactly what happened in the above exception, the race condition between the withdrawals processing and the cancelled but still running tx selection thread.

  • Add CountDownLatch to internal tx selection — mirrors the existing mechanism in plugin selection; the finally block counts down after the selection loop completes, giving a reliable signal that the thread has stopped.
  • Extract waitForCancellationToBeProcessed — shared helper called after both plugin and internal selection phases when the latch is non-zero. Correctly handles a negative remaining-time value (logs and returns immediately instead of silently passing a negative timeout to CountDownLatch.await) and logs the outcome of the wait.
  • Split exception handling by typerollback() is now only called for ExecutionException, where the callable has already thrown and the finally block is guaranteed to have run (latch at 0, selection thread stopped). CancellationException and InterruptedException no longer trigger a rollback, removing a potential race where the selection thread could still be mutating selectionPendingActions and the world state updaters.
  • Demote post-cancellation exceptions in MergeCoordinator — exceptions thrown after isBlockCreationCancelled is true are now logged at INFO with a message guiding the user to report if unexpected, rather than at WARN, reducing noise from the expected concurrency edge cases during block proposal cancellation.

Test plan

  • Existing unit and integration tests for BlockTransactionSelector and MergeCoordinator pass
  • Manually verify no WARN-level noise during normal block proposal cancellation (e.g. engine_forkchoiceUpdated followed by a new payload)
  • Verify INFO log appears and includes stack trace when an exception is thrown after cancellation

🤖 Generated with Claude Code

@fab-10 fab-10 changed the title Fix concurrency issues when block proposal is cancelled during tx selection Improve block proposal cancellation to (best effort) avoid concurrency issues Apr 10, 2026
@fab-10 fab-10 force-pushed the fix-cancelled-block-creation-exception branch from f391044 to cf708bc Compare April 10, 2026 16:31
@fab-10 fab-10 marked this pull request as ready for review April 10, 2026 17:00
Copilot AI review requested due to automatic review settings April 10, 2026 17:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Improves block proposal cancellation behavior to reduce race conditions between transaction selection threads and post-selection world state mutations, and reduces log noise for expected cancellation-related exceptions.

Changes:

  • Add a CountDownLatch to internal tx selection and a shared waitForCancellationToBeProcessed(...) helper to best-effort wait for selection threads to stop.
  • Refine exception handling to avoid rollback on cancellation/interruption paths and to handle plugin-selection cancellation gracefully.
  • Demote post-cancellation block creation exceptions in MergeCoordinator from WARN to INFO and add tests covering these scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
ethereum/blockcreation/src/main/java/org/hyperledger/besu/ethereum/blockcreation/txselection/BlockTransactionSelector.java Adds latches + wait helper and refines cancellation/interrupt/exception handling during selection.
ethereum/blockcreation/src/test/java/org/hyperledger/besu/ethereum/blockcreation/AbstractBlockTransactionSelectorTest.java Adds tests for external cancellation during plugin selection and waiting for internal selection thread completion.
consensus/merge/src/main/java/org/hyperledger/besu/consensus/merge/blockcreation/MergeCoordinator.java Logs post-cancellation exceptions at INFO with guidance; keeps WARN for non-cancelled failures.
consensus/merge/src/test/java/org/hyperledger/besu/consensus/merge/blockcreation/MergeCoordinatorTest.java Adds a regression test ensuring post-cancellation exceptions don’t escape the background task.

Copy link
Copy Markdown
Contributor

@daniellehrner daniellehrner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…ection

When block creation is cancelled or times out, the selection thread may still
be running briefly. This change adds a CountDownLatch to internal tx selection
(mirroring the existing plugin selection mechanism) and extracts a shared
waitForCancellationToBeProcessed method that correctly handles negative
remaining-time values and logs the outcome of the wait.

Exception handling in both selection phases is split by type so that
rollback() is only called for ExecutionException, where the selection thread
is guaranteed to have finished. CancellationException and InterruptedException
no longer trigger a rollback, removing a potential race on shared world state.

In MergeCoordinator, exceptions thrown after a cancellation are now logged at
INFO with guidance to report if unexpected, rather than at WARN, reducing noise
from the expected concurrency edge cases during block proposal cancellation.

Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10 fab-10 force-pushed the fix-cancelled-block-creation-exception branch from cf708bc to 93f4a62 Compare April 13, 2026 12:48
- Add test verifying CancellationException during plugin selection is
  handled gracefully (no exception propagated to caller)
- Add test verifying internal selection CountDownLatch causes
  buildTransactionListForBlock() to wait for the selection thread
- Add test verifying Throwable thrown after block creation cancellation
  is handled gracefully (logged at INFO, not propagated)
- Remove early return from timeLimitedSelection when isCancelled is
  true: the guard was causing validPendingTransactionIsNotIncludedIf
  SelectionCancelled to fail because evaluatePendingTransaction (which
  marks each tx as SELECTION_CANCELLED) was never reached; the check
  is unnecessary since evaluatePendingTransaction already handles
  isCancelled on every iteration without touching world state

Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10 fab-10 force-pushed the fix-cancelled-block-creation-exception branch from 93f4a62 to 4c40b4f Compare April 13, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants