Optimize history queries and add reorg cleanup #170

EddieHouston · 2025-10-15T08:11:15Z

Summary

This PR contains two related improvements to the indexer:

History query optimization: Eliminates redundant database lookups by reading block height directly from history rows instead of querying transaction confirmation status for each transaction
Reorg cleanup: Adds cleanup of orphaned data after blockchain reorganizations

Motivation

History Query Performance

The current implementation of _history(), _history_txids(), and utxo_delta() calls tx_confirming_block() for each transaction, which performs a database lookup. For addresses with many transactions, this results in O(N) database
scans.

Since the TxHistoryRow already contains confirmed_height, we can use this directly and only acquire the headers lock once upfront, reducing query time significantly.

Reorg Cleanup

After blockchain reorganizations, orphaned data remains in the database. Without this change we may have gotten incorrect block height from history rows. This PR adds targeted cleanup of:

History entries for orphaned blocks (including asset history on Elements)
Transaction confirmations in orphaned blocks
Cache entries (aggregated stats and UTXOs) for orphaned heights

Performance Impact

Load testing shows significant improvements for addresses with many transactions:

Test scenario: calling blockchain.scripthash.get_history with 20 concurrent connections for 2000 scripthashes

Metric	Before	After	Improvement
Throughput	796 RPS	2,258 RPS	2.84x
Latency	24.6ms	8.4ms	66% reduction

The optimization works by:

Acquiring the headers lock once at query start (O(1))
Using the height from TxHistoryRow.key.confirmed_height directly
Looking up the header by height in memory (O(1))
Only falling back to tx_confirming_block() if header lookup fails (rare)

This eliminates O(N) database lookups per query, significantly reducing RocksDB read pressure and improving response times under concurrent load.

Changes

src/new_index/schema.rs:

Modified _history() to acquire headers lock once upfront and use confirmed_height from rows
Modified _history_txids() with same optimization pattern
Modified utxo_delta() to use height-based header lookups instead of per-tx DB queries
Removed itertools::Itertools import (no longer needed without .unique())
Added cleanup_orphaned_data() method called during update() after reorg detection
Added helper methods: cleanup_history(), cleanup_confirmations(), cleanup_cache()
Enhanced update() to capture orphaned headers from apply() and trigger cleanup
All three cleanup methods use write_batch() for efficient bulk deletion

src/new_index/db.rs:

Added write_batch() method to support efficient batch writes with sync enabled
Used by reorg cleanup to delete orphaned entries in bulk

src/util/block.rs:

Modified HeaderList::apply() to return Vec<HeaderEntry> containing orphaned headers
Changed from discarding orphaned headers to returning them for cleanup
Returns empty vec if no reorg occurred
Enables O(1) orphaned data detection using HashSets in cleanup code

Implementation Details

Query Optimization

All three methods (_history, _history_txids, utxo_delta) now follow the same pattern:

Acquire indexed_headers read lock once at start
Use row.key.confirmed_height to look up header by height
Create BlockId::from(header) in memory (no DB access)
Fall back to tx_confirming_block() only if header not found (handles edge cases during reorg and indexing)

Reorg Cleanup

Builds HashSets of orphaned block hashes and heights for O(1) lookups
Scans relevant database prefixes ('H'/'I' for history, 'C' for confirmations, 'A'/'U' for cache)
Uses batch writes to minimize write amplification
Logs number of deleted entries for monitoring
Cleanup happens after headers are updated, so orphaned data is unreachable

Backwards Compatibility

This change is fully backwards compatible:

No database schema changes
No API changes
Fallback path maintains correctness if optimization fails
Cleanup is safe on existing databases (only deletes unreachable data)

Checklist

Code follows project style guidelines
Tests pass locally
Performance improvement verified
No breaking changes to API or database schema

(Note: Changes made with help of Claude Code)

Eliminate redundant height lookups by using TxHistoryRow data directly. Clean up orphaned database entries after reorgs.

shesek · 2025-10-22T18:33:48Z

src/new_index/schema.rs

    }

+    /// Clean up orphaned data using the specific list of removed headers
+    /// This is much more efficient than scanning the entire database


But it does scan the entire database?

The approach in this PR - iterating the entire H, I, C and A indexes to look for entries with matching heights - seems inefficient to the point of being unfeasible.

The approach I had in mind was to reuse the existing code to 'index' the orphaned blocks, but turn the put operations into deletes. That way we can delete the relevant entries directly by their key, without a full db scan.

shesek · 2025-10-22T18:38:20Z

src/new_index/schema.rs

+            // AggStats keys contain height
+            // The key format is: b'A' + scripthash + height (big-endian u32)
+            if key.len() >= 37 {
+                let height_bytes = &key[33..37];


This makes the code dependent on the exact byte encoding structure for keys, which would break if we ever changed the keys. I would instead deserialize into the db *Key structs and get the height from there.

shesek · 2025-10-22T18:40:38Z

src/new_index/schema.rs

+                    return None;
+                }
+
+                // Skip until we reach the last_seen_txid


Why change the existing skip_while() implementation?

shesek · 2025-10-22T19:08:14Z

src/new_index/schema.rs

+
+        self.cleanup_history(&orphaned_heights)?;
+        self.cleanup_confirmations(&orphaned_hashes)?;
+        self.cleanup_cache(&orphaned_heights)?;


The cache already handles reorgs internally by invalidating the cache and recomputing the stats/utxos, there's no need to cleanup anything here.

We could, however, make this more efficient by explicitly undoing the effects of reorged blocks over the stats/utxo cache*, rather than recomputing it from scratch. This could be done separately in a followup PR.

^{* It will probably no longer be technically accurate to call it a 'cache' once we implement this.}

shesek · 2025-10-22T19:53:16Z

src/new_index/schema.rs

+                    return Some((txid, BlockId::from(header)));
+                }
+
+                // Slow path fallback: Header not yet indexed or reorged


I don't quite get what the "Slow Path" is supposed to do here?

Header not yet indexed or reorged

If that is the case, tx_confirming_block() wouldn't be able to get it either, since it also uses the same in-memory indexed_headers: HeaderList that the "Fast Path" uses (and that only includes headers that are part of the best chain, so reorged blocks are never available regardless).

But more importantly - if we don't have a corresponding header because new blocks are still being processed or due to a reorg (possible with the ordering proposed here), those db entries should be skipped.

With reorg handling implemented, the correct approach would be to use the "Fast Path" only (skipping over entries without a corresponding header), remove tx_confirming_block() entirely, and drop the C index (txid->blockhash confirmations map) which becomes unnecessary.

shesek · 2025-10-22T19:54:48Z

src/new_index/schema.rs

+            orphaned
+        };
+
+        // Cleanup orphaned data AFTER applying headers - no race condition


There is a race condition here. Between updating the in-memory HeaderList and removing orphaned data from the db, there could be entries read from the db that point to a block height that doesn't actually confirm the entry's txid. The cleanup should happen BEFORE the new headers are applied to avoid that.

Also, cleaning up orphaned data should happen BEFORE the entries from the new blocks are written. We have to first undo the reorged blocks and only then apply the new ones, otherwise the cleanup could remove entries that were just added by the new blocks (i.e., if the same tx re-confirmed under a different block at the same height).

I believe the correct order would be:

Remove reorged headers from the in-memory HeaderList

Cleanup reorged history entries from the database

Index new history entries to the database

Apply new headers to the in-memory HeaderList

This ordering also makes the API more consistent - it will never return blocks (e.g. in /blocks/tip or /block/:hash) that aren't fully processed and populated in the history db (both for new blocks and reorged blocks).

But it also means that the tip will momentarily drop back to the common ancestor before advancing up to the new tip. Is that acceptable, or is the tip height expected to increase monotonically in the public APIs? (/cc @philippem @RCasatta)

what does it do today?

shesek · 2025-10-23T10:33:31Z

src/new_index/schema.rs

+            .collect();
+
+        self.cleanup_history(&orphaned_heights)?;
+        self.cleanup_confirmations(&orphaned_hashes)?;


The confirmations index could be removed entirely, see this comment

Optimize history queries and add reorg cleanup

42c64a6

Eliminate redundant height lookups by using TxHistoryRow data directly. Clean up orphaned database entries after reorgs.

shesek requested changes Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize history queries and add reorg cleanup #170

Optimize history queries and add reorg cleanup #170

Uh oh!

EddieHouston commented Oct 15, 2025

Uh oh!

shesek Oct 22, 2025

Uh oh!

shesek Oct 22, 2025

Uh oh!

shesek Oct 22, 2025

Uh oh!

shesek Oct 22, 2025

Uh oh!

shesek Oct 22, 2025

Uh oh!

shesek Oct 22, 2025

Uh oh!

philippem Oct 23, 2025

Uh oh!

shesek Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize history queries and add reorg cleanup #170

Are you sure you want to change the base?

Optimize history queries and add reorg cleanup #170

Uh oh!

Conversation

EddieHouston commented Oct 15, 2025

Summary

Motivation

History Query Performance

Reorg Cleanup

Performance Impact

Changes

Implementation Details

Query Optimization

Reorg Cleanup

Backwards Compatibility

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants