Flake: services/propagation Test_handleMultipleTx fails under -race + -cpu=8 stress

## Summary

`Test_handleMultipleTx/Test_handleMultipleTx_with_valid_sibling_transactions` (`services/propagation/Server_test.go:686`) is racy under `-race` on multi-core runners. It passes a single iteration but fails reproducibly under stress.

## Reproduction

```bash
go test -race -tags testtxmetacache -count=1 -timeout 600s \
  -run 'Test_handleMultipleTx$' -cpu=8 ./services/propagation/
```

A single `-count=1` is unreliable; bumping to `-count=200`–`-count=2000` exposes the failure within ~30–60 seconds on an 8-core box. Locally I saw the first failure around iteration ~492 of a 2000-iteration loop; in CI it surfaced on a single run of the test (see e.g. job 76589759556 on PR #828).

## Symptom

```
=== FAIL: services/propagation Test_handleMultipleTx/Test_handleMultipleTx_with_valid_sibling_transactions
    Server_test.go:747:
        Error: expected 200, actual 500
    Server_test.go:752:
        expected: "OK"
        actual:   "Failed to process transactions:
                   PROCESSING (4): [ProcessTransaction][<txid>] failed to validate transaction
                   ..."
```

One or more sibling tx (out of 20) fails validation. The number of failing siblings varies (often 1, sometimes up to ~6) and the failing TxIDs are non-deterministic between runs.

## Setup the test exercises

- `sqlitememory:///test` UTXO store, chain height set to 101.
- A coinbase tx with 20 P2PKH outputs created at block height 1 (matures at 101).
- 20 sibling tx, each spending a different vout of the coinbase, are submitted as a single `/txs` batch.
- The handler (`handleMultipleTx`, `Server.go:679`) dispatches one goroutine per tx (gated by the server-wide `batchWorkerPool` semaphore introduced in #879) and writes per-tx errors into pre-allocated slots.
- Test expects HTTP 200 + body `"OK"`.

## Diagnostic notes

- Re-validating each failed sibling **synchronously** against the same `Validator` and `Store` after the batch completes **always succeeds**. The bad state is transient.
- The underlying validator error is wrapped at the propagation layer and stripped by `errors.UserMessage` (`Server.go:833`) before being written to the response, so the bug-actually-fired message isn't visible in CI logs. Surfacing it via a debug log of the inner error would speed up further triage.
- The failure does NOT reproduce without `-race`, or with a verbose-test logger (whose logging volume changes goroutine scheduling enough to hide it).
- Per-`testify/require` happens-before is fine: `errSlots` slots are written by distinct goroutines, `processingWg.Wait()` precedes reads. The race appears to be inside the validator / sqlitememory store path itself when 20 goroutines concurrently read the same parent coinbase record and write 20 distinct spending tx records.

## Pre-PR-828 evidence

The flake is **not** introduced by PR #828. I checked out `services/propagation/` at the pre-cherry-pick commit `aae62aa95` (the merge commit on PR #828 before any of PR #886 was applied) and reproduced the same failure, same identical TxIDs, under the same stress harness.

Likely first appeared with #879 (`perf(propagation): process /txs batch concurrently with ordered errors`) — that's the commit that turned `handleMultipleTx` from sequential to concurrent.

## Suggested next steps (in order)

1. Surface the wrapped validator error in the test output so the actual failure mode is visible — at least log it via `t.Logf` when the response code is non-200.
2. Determine whether the race is in:
   - The validator's parent-tx lookup path (single reader vs. multiple readers of the same UTXO record), or
   - The sqlitememory store's concurrent write path (multiple `Create`/`Spend` calls in flight), or
   - The propagation handler's tx-store sequencing (`storeTransaction` before `Validate`).
3. Decide whether the fix belongs in the validator, the store, or in `handleMultipleTx` (e.g., serialising txs that share a parent in the same batch).

## Related

- PR #879 — turned `handleMultipleTx` concurrent.
- PR #828 — the green-CI work where this surfaced; not a cause.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flake: services/propagation Test_handleMultipleTx fails under -race + -cpu=8 stress #890

Summary

Reproduction

Symptom

Setup the test exercises

Diagnostic notes

Pre-PR-828 evidence

Suggested next steps (in order)

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flake: services/propagation Test_handleMultipleTx fails under -race + -cpu=8 stress #890

Description

Summary

Reproduction

Symptom

Setup the test exercises

Diagnostic notes

Pre-PR-828 evidence

Suggested next steps (in order)

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions