Summary
CPU profiling of a mainnet legacy sync (operator single-node, Aerospike UTXO backend) shows the Aerospike UTXO store batcher's per-operation channel/select machinery is the dominant non-syscall, non-GC cost once tracing overhead is removed (see PR #1099 for that separate win).
Evidence
From a 60s CPU profile of the legacy service during steady-state block sync (build 129993a45, after #1099 deployed):
runtime.selectgo ≈ 12% cum, driven almost entirely by the Aerospike store:
aerospike.(*Store).Create — 38%
aerospike.(*Store).Spend.func2 — 27%
aerospike.(*Store).PreviousOutputsDecorate — 9%
aerospike.(*Store).get — 9%
go-batcher/v2.(*Batcher[...]).worker (multiple typed batchers) — ~15%
- Lock contention ≈ 16% (
runtime.futex 7% + lock2 3.7% + procyieldAsm 3.3% + unlock2 2%) — a large share is hchan locks from these same per-op channels.
- Allocation: the batchers create a result channel (
errCh chan error / done chan ...) and often a time.NewTimer per operation; makechan + time.newTimer are a visible slice of newobject. The Aerospike client value packing (PutOp / NewBin / NewKey / tryConcreteValue) is the other big allocator on this path.
At sync volume this is millions of UTXO Create/Spend/Get operations per block, each routed through a batcher with a per-request channel.
Scope / ideas (not yet designed)
- Pool/reuse the per-operation result channels and timers (
sync.Pool), or redesign batch result delivery to avoid a channel per request.
- Reduce Aerospike client object construction per UTXO (
Bin/Key/Operation) — partly lives in the bsv-blockchain/aerospike-client-go fork, so may need changes there too.
- Re-profile after each change; the phase mix varies per block, so compare same-phase windows.
Risk
This is the consensus-critical UTXO write path. Any change needs careful review (bitcoin-expert + security-auditor) and correctness testing before perf testing. Not a quick win — opening this to track and scope properly rather than bundle into the tracing PR.
Related
Summary
CPU profiling of a mainnet legacy sync (operator single-node, Aerospike UTXO backend) shows the Aerospike UTXO store batcher's per-operation channel/select machinery is the dominant non-syscall, non-GC cost once tracing overhead is removed (see PR #1099 for that separate win).
Evidence
From a 60s CPU profile of the
legacyservice during steady-state block sync (build129993a45, after #1099 deployed):runtime.selectgo≈ 12% cum, driven almost entirely by the Aerospike store:aerospike.(*Store).Create— 38%aerospike.(*Store).Spend.func2— 27%aerospike.(*Store).PreviousOutputsDecorate— 9%aerospike.(*Store).get— 9%go-batcher/v2.(*Batcher[...]).worker(multiple typed batchers) — ~15%runtime.futex7% +lock23.7% +procyieldAsm3.3% +unlock22%) — a large share ishchanlocks from these same per-op channels.errCh chan error/done chan ...) and often atime.NewTimerper operation;makechan+time.newTimerare a visible slice ofnewobject. The Aerospike client value packing (PutOp/NewBin/NewKey/tryConcreteValue) is the other big allocator on this path.At sync volume this is millions of UTXO Create/Spend/Get operations per block, each routed through a batcher with a per-request channel.
Scope / ideas (not yet designed)
sync.Pool), or redesign batch result delivery to avoid a channel per request.Bin/Key/Operation) — partly lives in thebsv-blockchain/aerospike-client-gofork, so may need changes there too.Risk
This is the consensus-critical UTXO write path. Any change needs careful review (bitcoin-expert + security-auditor) and correctness testing before perf testing. Not a quick win — opening this to track and scope properly rather than bundle into the tracing PR.
Related
perf(tracing): skip gocore.Stat creation and tag alloc when tracing disabled(the first, lower-risk win from the same profiling pass).