fix(blockassembly): close queue race for children of conflicting parents#808
Merged
oskarszoon merged 5 commits intobsv-blockchain:mainfrom May 4, 2026
Conversation
Production observed parent.Conflicting=true with child.Conflicting=false on teranode-mainnet-eu-1 (v0.15.0-beta-3), producing mining candidates rejected as bad-txns-inputs-missingorspent. The cascade in ProcessConflicting / MarkConflictingRecursively only walks recorded spenders, so a child whose Spend has not yet been committed when the parent flips conflicting slips past. The dequeue paths (Phase 2 default-case filter and dequeueDuringBlockMovement) had no Conflicting check on self-hash or on parent inpoints, so the in-flight child landed in the next mining candidate and the block was rejected by ValidateBlock with "parent transaction X of tx Y has no block IDs". Fix: - ProcessConflicting now returns the BFS marked-order slice from MarkConflictingRecursively (previously discarded). - SubtreeProcessor gains a conflictingMap (separate from removeMap), populated by processConflictingTransactions and by BlockAssembler.markAsConflicting via a new MarkConflicting/ GetConflictingMap pair on the Interface. - Both dequeue filters consult conflictingMap on self-hash and on every TxInpoints.ParentTxHashes entry. When a child is rejected because of a conflicting parent, its own hash is added to the map so any later-arriving descendant is also caught without a store round-trip. - Reset clears conflictingMap, mirroring removeMap, so it does not leak across resets. Adds a regression test that reproduces the queue-race shape end-to-end: fails before the fix, passes after.
Contributor
|
🤖 Claude Code Review Status: Complete SummaryThis PR addresses a critical production bug where children of conflicting parents could enter block assembly and produce invalid mining candidates. The fix is well-architected and follows the project's defensive engineering approach. Analysis: No issues found. The implementation correctly closes the identified race window using a transient conflicting-hash set threaded through the block-movement drain path. The approach is minimal, scoped appropriately, and includes regression tests. Key strengths:
The code is production-ready. |
Contributor
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-04 15:22 UTC |
Address review feedback: conflictingMap as a persistent SubtreeProcessor field
imposed a hot-path lookup on every default-case dequeue, which is wrong. The
conflicting-state knowledge is only valid for the duration of one block
movement (and the immediate post-cascade drain), not for the lifetime of the
processor.
- processConflictingTransactions returns a transient
map[chainhash.Hash]struct{} of every hash flagged Conflicting=true by the
BFS cascade (immediate losers + every descendant returned by
MarkConflictingRecursively).
- The set is threaded through RemainderTransactionParams.ConflictingHashes
into dequeueDuringBlockMovement, which rejects any node whose own hash is
in the set OR whose TxInpoints.ParentTxHashes contains a hash in the set.
On parent match the node's hash is added to the set so any later-in-batch
descendants are caught.
- Default-case Phase 2 filter is unchanged: removeMap + currentTxMap dedup
only, no conflicting lookup.
- No new SubtreeProcessor fields, no new Interface methods, no new mock
methods. The cascade information lives only in local variables for the
duration of one moveForwardBlock event.
- BlockAssembler.markAsConflicting reverted to its pre-fix shape; the reload
path's cascade-to-descendants concern is handled separately upstream by
PR bsv-blockchain#806 (validateParentChain).
Test rewritten to drive dequeueDuringBlockMovement directly, no event loop.
…nsactions loadUnminedTransactions runs while gRPC AddTx is already enqueueing on the input queue. If validateUnminedTxInputs cascades a parent + descendants as conflicting in the UTXO store, any in-flight child of those parents that arrived before the cascade ran is sitting in the queue and will be admitted to the next mining candidate by default-case dequeue. This was the post-reset / startup half of the production race seen on teranode-mainnet-eu-1; PR bsv-blockchain#806 closed the in-memory subtree side, but the queue side stayed open. Fix: - Add Interface.DrainQueue(dropHashes) — generic queue drain that drops any tx whose hash, or whose TxInpoints.ParentTxHashes entry, is in dropHashes. On parent-match the dropped tx's own hash is added to the set so any later-in-batch descendant is also caught without an extra store round-trip. Implemented on SubtreeProcessor as a thin wrapper over dequeueDuringBlockMovement with skipNotification=true. - BlockAssembler accumulates cascade hashes in unminedDropHashes during loadUnminedTransactions (markAsConflicting writes the MarkConflictingRecursively return there). Field is serialised by the existing unminedTransactionsLoading flag. - BA.Start drains the queue with that set after loadUnminedTransactions returns, before stp.Start fires the goroutine. - BA.Reset's postProcessFn does the same after its loadUnminedTransactions call, before stp.reset's existing post-postProcess drain runs.
icellan
approved these changes
May 4, 2026
|
oskarszoon
added a commit
that referenced
this pull request
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Problem
Block assembly produced mining candidates that ValidateBlock rejected with:
SVNode reject:
bad-txns-inputs-missingorspent. The dirty parent/child state survives BA pod restarts because it lives in the persistent UTXO store: parent tx hasConflicting=truewhile a child that spends its outputs still hasConflicting=false.Root cause
Three layered gaps in how conflicting state propagates to the block-assembly queue:
MarkConflictingRecursivelywalksparent.outputs → recorded spenderin the store. A child whoseSpendhas not been committed yet (still mid-validation, in flight in the BA queue) is invisible to the cascade — the parent's output has no spender row pointing to it.ProcessConflictingcalledMarkConflictingRecursivelyand discarded the second return value (markedOrder), so even the descendants the cascade did discover never reached any downstream filter.dequeueDuringBlockMovementfiltered bytransactionMap.Exists(hash)andlosingTxHashesMap.Exists(hash). Neither walkedTxInpoints.ParentTxHashes. Same applies to the post-reset / startup queue drain — it dropped everything wholesale without identifying which in-flight items were now invalid.The race on a node moving forward a block:
A second window exists at startup / Reset:
loadUnminedTransactionsruns while gRPCAddTxis already enqueueing. IfvalidateUnminedTxInputscascades a parent + descendants, any in-flight child of those parents already in the queue is admitted by the default-case dequeue once the goroutine starts.Fix
Two transient drop-set drains, both scoped to the cascade event that produced them. Default-case dequeue stays untouched (no hot-path overhead added).
ProcessConflictingnow returns the BFS marked-order slice fromMarkConflictingRecursively(previously discarded).processConflictingTransactionsbuilds a transientmap[chainhash.Hash]struct{}from that slice and threads it throughRemainderTransactionParams.ConflictingHashesintodequeueDuringBlockMovement. The drain rejects any node whose own hash is in the set OR whoseTxInpoints.ParentTxHashescontains a hash in the set. On parent match the node's hash is also added to the set so any later-in-batch descendant is caught.Interface.DrainQueue(dropHashes)is the generic drain entry point used byBlockAssemblerafterloadUnminedTransactionsto flush in-flight queue items whose parents the cascade just flagged. Implemented onSubtreeProcessoras a thin wrapper overdequeueDuringBlockMovementwithskipNotification=trueso it is safe to invoke before subtree-announcement listeners are wired up.BlockAssembler.markAsConflictingaccumulates cascade hashes inunminedDropHashes.BA.Startdrains the queue with that set afterloadUnminedTransactionsreturns and beforestp.Startfires the goroutine.BA.Reset'spostProcessFndoes the same after its ownloadUnminedTransactionscall, before the existing post-postProcess drain.SubtreeProcessor, no overhead on the always-on default-case dequeue.Relationship to #806
PR #806 closed the in-memory subtree side of the same incident:
validateParentChainnow cascade-filters descendants of rejected unmined txs at restart and propagates the conflicting flag to the UTXO store viaMarkConflictingRecursively. Operators with the bad state already on disk should pull #806 — that fix unblocks restart by preventing conflicting descendants from being loaded as unmined on the next start.This PR closes the queue side of the race that #806 addresses for the in-memory subtree state — the in-flight items that arrived via
AddTxduring the cascade window. The two are complementary; both are needed to fully close the window.Tests
TestDequeueDuringBlockMovement_RejectsChildOfConflictingParentinservices/blockassembly/subtreeprocessor/conflicting_queue_race_test.godrivesdequeueDuringBlockMovementdirectly with a synthetic queue and a conflicting set; asserts the child is dropped, the unrelated tx survives, and the rejected child is added to the set so its descendants are caught later in the same drain.Full package suites green:
Lint clean (
golangci-lint run --timeout=5m --disable gosec --disable prealloc).Operational note
For a node already holding the bad parent/child state on disk: pull #806. With #806 in place, the next restart's
validateParentChainwill detect and cascade the orphans, propagate the flag to the UTXO store, and skip them on load. No special RPC or destructive action needed. This PR adds the queue-side defence so the same race does not re-occur once the node is healthy.