Skip to content

[Add] semaphore for blocking aerospike queries#813

Open
teracoder wants to merge 4 commits intobsv-blockchain:mainfrom
teracoder:feat/long_query_semaphore
Open

[Add] semaphore for blocking aerospike queries#813
teracoder wants to merge 4 commits intobsv-blockchain:mainfrom
teracoder:feat/long_query_semaphore

Conversation

@teracoder
Copy link
Copy Markdown
Contributor

Long running connections that perform large scans on aerospike, like the ones used by the pruner, are blocking as they take a long time. the problem is that these connections share the pool with normal connections. this leads to two problems

  1. connection exhaustion where a bunch of simple queries cannot acquire any connection from the pool thus timing out. This is specially problematic for connections such as the aerospike healthchecks to confirm that aerospike is healthy and responding. Failure to pass these healthchecks will stop the pods
  2. high % of long running queries would add a lot of load on the aerospike servers doing these large secondary index scans. Thus aerospike cannot respond to any other queries

@teracoder teracoder requested review from icellan and ordishs May 5, 2026 10:07
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

🤖 Claude Code Review

Status: Complete


Current Review:

No new issues found. All previously reported issues have been fixed:

History:

  • ✅ Fixed: Goroutine leak on partial consumption (client.go:410-414) — Now uses IsActive() polling instead of draining Results(), correctly handles early termination via Close(). Covered by TestQuerySemaphore_ReleaseOnInactive.
  • ✅ Fixed: Shared client mutation conflicts (pruner_provider.go) — EnableQuerySemaphore now uses CompareAndSwap for idempotent, thread-safe initialization. First caller wins. Covered by TestEnableQuerySemaphore_Idempotent and TestEnableQuerySemaphore_ConcurrentEnable.
  • ✅ Fixed: Missing goroutine lifecycle tests — Comprehensive test coverage added including TestWaitForRecordsetInactive, TestQuerySemaphore_ReleaseOnInactive, TestQuerySemaphore_GoroutineCleanup, and TestQuerySemaphore_ConcurrentQueryLimit.

Summary:

This PR implements a query semaphore to prevent long-running Aerospike scans from exhausting the connection pool. The implementation is well-designed with strong concurrency safety, comprehensive test coverage (442 lines of new tests), and clear documentation. All critical issues from the previous review have been addressed.

Key strengths:

  • Atomic pointer with CAS for thread-safe initialization
  • Polling IsActive() prevents record-stealing bug
  • Opt-in design (default off, backward compatible)
  • Excellent test coverage including race conditions, goroutine cleanup, and concurrent initialization
  • Clear inline documentation explaining design decisions

Comment thread util/uaerospike/client.go
// long-running partition scans (QueryPartitions) are rate-limited and
// cannot monopolise the connection pool, starving point operations.
// Uses the default of 25% of ConnectionQueueSize.
s.client.EnableQuerySemaphore(0)
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] Shared client mutation — multiple service instances may conflict ✅ FIXED

Current implementation (lines 354-366): EnableQuerySemaphore now uses CompareAndSwap to make initialization idempotent. Only the first caller wins; subsequent calls are safely ignored. Documentation explicitly states: "EnableQuerySemaphore is idempotent: subsequent calls are ignored... It is safe to call concurrently with QueryPartitions/Query."

Test coverage:

  • TestEnableQuerySemaphore_Idempotent: Verifies second call with different size does not change the channel
  • TestEnableQuerySemaphore_ConcurrentEnable: 32 goroutines race to enable; exactly one channel installed

Issue resolved.

assert.True(t, elapsed >= minSemaphoreTimeout && elapsed < 200*time.Millisecond,
"Expected timeout around %v, got %v", minSemaphoreTimeout, elapsed)
})
}
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Missing test coverage for goroutine lifecycle ✅ FIXED

The current test suite now includes comprehensive goroutine lifecycle tests:

  1. TestWaitForRecordsetInactive (lines 724-770): Tests the polling helper for both active→inactive transition and already-inactive cases

  2. TestQuerySemaphore_ReleaseOnInactive (lines 773-820): Critical integration test that:

    • Acquires a permit
    • Starts the release goroutine (same code path as production)
    • Verifies semaphore held while active (second acquire fails)
    • Marks inactive
    • Verifies permit released within 500ms

This directly tests the early-termination scenario that was missing before.

Issue resolved.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-813 (1e7ce91)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.846µ 1.769µ ~ 0.700
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.90n 61.78n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 62.03n 61.96n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 62.27n 62.10n ~ 0.500
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.04n 30.53n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 54.18n 53.18n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 109.9n 113.6n ~ 0.100
MiningCandidate_Stringify_Short-4 267.4n 276.3n ~ 0.700
MiningCandidate_Stringify_Long-4 2.020µ 1.989µ ~ 0.100
MiningSolution_Stringify-4 1.010µ 1.022µ ~ 0.500
BlockInfo_MarshalJSON-4 1.828µ 1.842µ ~ 0.100
NewFromBytes-4 125.6n 127.2n ~ 0.100
Mine_EasyDifficulty-4 61.93µ 61.83µ ~ 0.700
Mine_WithAddress-4 7.062µ 7.080µ ~ 0.600
BlockAssembler_AddTx-4 0.03101n 0.02797n ~ 0.400
AddNode-4 10.56 10.46 ~ 0.700
AddNodeWithMap-4 11.04 11.06 ~ 1.000
DiskTxMap_SetIfNotExists-4 4.251µ 4.089µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 4.062µ 3.925µ ~ 0.400
DiskTxMap_ExistenceOnly-4 452.3n 396.7n ~ 0.700
Queue-4 214.1n 215.9n ~ 0.400
AtomicPointer-4 4.379n 4.474n ~ 0.400
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 948.5µ 1007.3µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 930.9µ 943.1µ ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/10K-4 120.7µ 126.4µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 63.04µ 62.81µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/10K-4 84.51µ 83.23µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/10K-4 11.53µ 11.59µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 6.126µ 6.254µ ~ 0.700
ReorgOptimizations/NodeFlags/New/10K-4 2.104µ 2.437µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 12.99m 11.32m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 12.99m 12.13m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.263m 1.258m ~ 0.700
ReorgOptimizations/AllMarkFalse/New/100K-4 688.5µ 685.2µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 701.0µ 669.2µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 318.6µ 308.1µ ~ 0.200
ReorgOptimizations/NodeFlags/Old/100K-4 66.47µ 58.07µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 21.83µ 20.20µ ~ 0.100
TxMapSetIfNotExists-4 52.55n 52.33n ~ 0.700
TxMapSetIfNotExistsDuplicate-4 38.39n 38.15n ~ 0.100
ChannelSendReceive-4 655.8n 618.1n ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 72.82n 73.34n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 40.94n 40.92n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 39.69n 39.61n ~ 0.700
DirectSubtreeAdd/1024_per_subtree-4 38.49n 38.36n ~ 0.700
DirectSubtreeAdd/2048_per_subtree-4 38.01n 38.02n ~ 1.000
SubtreeProcessorAdd/4_per_subtree-4 323.6n 325.8n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 314.4n 319.8n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 315.4n 316.0n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 303.4n 308.0n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 302.0n 307.3n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 305.6n 308.2n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 305.4n 304.1n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 304.8n 302.6n ~ 0.400
SubtreeProcessorRotate/1024_per_subtree-4 302.8n 301.9n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 87.59n 87.41n ~ 0.500
SubtreeNodeAddOnly/64_per_subtree-4 65.17n 64.78n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 63.60n 63.66n ~ 0.300
SubtreeNodeAddOnly/1024_per_subtree-4 63.16n 63.45n ~ 1.000
SubtreeCreationOnly/4_per_subtree-4 143.2n 145.4n ~ 0.200
SubtreeCreationOnly/64_per_subtree-4 534.6n 538.0n ~ 1.000
SubtreeCreationOnly/256_per_subtree-4 1.909µ 1.860µ ~ 0.200
SubtreeCreationOnly/1024_per_subtree-4 6.256µ 6.179µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 11.27µ 11.32µ ~ 0.400
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 303.9n 308.5n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 305.0n 304.6n ~ 1.000
ParallelGetAndSetIfNotExists/1k_nodes-4 639.2µ 632.0µ ~ 0.200
ParallelGetAndSetIfNotExists/10k_nodes-4 1.673m 1.701m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 8.706m 8.761m ~ 1.000
ParallelGetAndSetIfNotExists/100k_nodes-4 17.74m 17.66m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 687.4µ 669.0µ ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 3.226m 3.210m ~ 1.000
SequentialGetAndSetIfNotExists/50k_nodes-4 12.08m 12.11m ~ 0.700
SequentialGetAndSetIfNotExists/100k_nodes-4 23.31m 22.96m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 702.5µ 701.3µ ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.612m 4.660m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 20.54m 21.12m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 737.5µ 735.3µ ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.561m 6.730m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 45.63m 45.79m ~ 0.400
CalcBlockWork-4 506.6n 503.6n ~ 1.000
CalculateWork-4 694.7n 698.9n ~ 1.000
BuildBlockLocatorString_Helpers/Size_10-4 1.312µ 1.301µ ~ 0.500
BuildBlockLocatorString_Helpers/Size_100-4 12.59µ 13.12µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 125.3µ 123.4µ ~ 0.400
CatchupWithHeaderCache-4 104.4m 104.2m ~ 0.700
_BufferPoolAllocation/16KB-4 3.565µ 3.369µ ~ 0.100
_BufferPoolAllocation/32KB-4 9.035µ 8.136µ ~ 0.700
_BufferPoolAllocation/64KB-4 14.01µ 18.13µ ~ 0.100
_BufferPoolAllocation/128KB-4 28.98µ 32.21µ ~ 0.100
_BufferPoolAllocation/512KB-4 110.2µ 115.7µ ~ 0.700
_BufferPoolConcurrent/32KB-4 18.56µ 19.51µ ~ 0.100
_BufferPoolConcurrent/64KB-4 29.05µ 30.70µ ~ 0.100
_BufferPoolConcurrent/512KB-4 149.3µ 151.4µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 618.0µ 616.8µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/32KB-4 629.8µ 615.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 661.6µ 603.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 657.5µ 601.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 668.9µ 628.8µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.15m 35.84m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.21m 35.90m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.27m 36.22m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.42m 35.62m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.95m 35.78m ~ 0.400
_PooledVsNonPooled/Pooled-4 738.8n 743.2n ~ 0.400
_PooledVsNonPooled/NonPooled-4 7.084µ 7.392µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.924µ 7.123µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.560µ 10.467µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.082µ 10.212µ ~ 0.100
_prepareTxsPerLevel-4 428.2m 445.3m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.877m 4.147m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 425.2m 435.7m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.770m 4.020m ~ 0.200
SubtreeSizes/10k_tx_4_per_subtree-4 1.390m 1.399m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 329.4µ 330.6µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 77.46µ 77.75µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 19.46µ 19.23µ ~ 0.300
SubtreeSizes/10k_tx_512_per_subtree-4 9.644µ 9.582µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.759µ 4.736µ ~ 0.400
SubtreeSizes/10k_tx_2k_per_subtree-4 2.373µ 2.358µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 75.56µ 75.36µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 19.00µ 19.21µ ~ 0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.766µ 4.714µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 396.0µ 399.2µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 94.86µ 94.90µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.42µ 23.44µ ~ 1.000
SubtreeAllocations/small_subtrees_exists_check-4 159.7µ 159.9µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 165.1µ 164.9µ ~ 1.000
SubtreeAllocations/small_subtrees_full_validation-4 325.6µ 328.1µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.414µ 9.308µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 9.744µ 9.847µ ~ 0.400
SubtreeAllocations/medium_subtrees_full_validation-4 19.16µ 19.22µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.268µ 2.264µ ~ 0.700
SubtreeAllocations/large_subtrees_data_fetch-4 2.366µ 2.379µ ~ 0.400
SubtreeAllocations/large_subtrees_full_validation-4 4.810µ 4.856µ ~ 0.200
StoreBlock_Sequential/BelowCSVHeight-4 326.9µ 325.5µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 333.2µ 328.1µ ~ 0.200
GetUtxoHashes-4 270.5n 269.5n ~ 0.400
GetUtxoHashes_ManyOutputs-4 45.66µ 46.75µ ~ 0.100
_NewMetaDataFromBytes-4 231.4n 230.1n ~ 0.400
_Bytes-4 613.2n 603.9n ~ 0.400
_MetaBytes-4 563.8n 550.0n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-05 14:32 UTC

ordishs added 3 commits May 5, 2026 15:48
The previous query-semaphore wrapper detected recordset completion by
draining rs.Results() in a goroutine. Recordset.Results() returns the
same channel the caller iterates, so the drain goroutine and the
caller competed for records -- roughly half were silently consumed
and discarded whenever EnableQuerySemaphore was active. This affected
the pruner, unmined iterator, consistency scan, and the Query callers
in QueryOldUnminedTransactions and ProcessExpiredPreservations.

Replace channel-draining with IsActive() polling so the caller
observes every record. While here, also:

- Store querySemaphore as an atomic.Pointer to fix the publication
  race between EnableQuerySemaphore and concurrent QueryPartitions
  readers, and snapshot the channel inside QueryPartitions so acquire
  and release always operate on the same instance.
- Make EnableQuerySemaphore idempotent via CompareAndSwap so re-entry
  cannot orphan in-flight permits.
- Add Query and QuerySemaphoreWait stats for observability into both
  semaphore wait time and end-to-end query duration.
- Move the EnableQuerySemaphore call in GetPrunerService to after
  NewService validates the client, preserving the existing
  nil-client error path.
Adds three unit tests motivated by review feedback that the existing
suite verified semaphore acquisition mechanics but not the wrapper's
core promised behaviour or its goroutine lifecycle:

- TestQuerySemaphore_ConcurrentQueryLimit: holds N permits, asserts an
  (N+1)th acquire blocks, then verifies it unblocks once a permit is
  released. Directly exercises the contract the feature exists for.
- TestQuerySemaphore_GoroutineCleanup: drives 20 acquire/poll/release
  cycles using runtime.NumGoroutine() deltas to catch any leak in the
  release goroutine.
- TestQuerySemaphore_RepeatedAcquireRelease: 1000 acquire/release
  cycles to surface permit drift or double-release bugs.

The recordset-coupled scenarios (early termination, full consumption)
require an *aerospike.Recordset, which has no public constructor; they
belong with the TestContainers-backed integration tests in
stores/utxo/aerospike/ rather than as unit tests here.
@ordishs ordishs requested a review from freemans13 May 5, 2026 15:02
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants