- 
                Notifications
    You must be signed in to change notification settings 
- Fork 154
Optimize history queries and add reorg cleanup #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: new-index
Are you sure you want to change the base?
Optimize history queries and add reorg cleanup #170
Conversation
Eliminate redundant height lookups by using TxHistoryRow data directly. Clean up orphaned database entries after reorgs.
| } | ||
|  | ||
| /// Clean up orphaned data using the specific list of removed headers | ||
| /// This is much more efficient than scanning the entire database | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it does scan the entire database?
The approach in this PR - iterating the entire H, I, C and A indexes to look for entries with matching heights  - seems inefficient to the point of being unfeasible.
The approach I had in mind was to reuse the existing code to 'index' the orphaned blocks, but turn the put operations into deletes. That way we can delete the relevant entries directly by their key, without a full db scan.
| // AggStats keys contain height | ||
| // The key format is: b'A' + scripthash + height (big-endian u32) | ||
| if key.len() >= 37 { | ||
| let height_bytes = &key[33..37]; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the code dependent on the exact byte encoding structure for keys, which would break if we ever changed the keys. I would instead deserialize into the db *Key structs and get the height from there.
| return None; | ||
| } | ||
|  | ||
| // Skip until we reach the last_seen_txid | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the existing skip_while() implementation?
|  | ||
| self.cleanup_history(&orphaned_heights)?; | ||
| self.cleanup_confirmations(&orphaned_hashes)?; | ||
| self.cleanup_cache(&orphaned_heights)?; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache already handles reorgs internally by invalidating the cache and recomputing the stats/utxos, there's no need to cleanup anything here.
We could, however, make this more efficient by explicitly undoing the effects of reorged blocks over the stats/utxo cache*, rather than recomputing it from scratch. This could be done separately in a followup PR.
* It will probably no longer be technically accurate to call it a 'cache' once we implement this.
| return Some((txid, BlockId::from(header))); | ||
| } | ||
|  | ||
| // Slow path fallback: Header not yet indexed or reorged | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get what the "Slow Path" is supposed to do here?
Header not yet indexed or reorged
If that is the case, tx_confirming_block() wouldn't be able to get it either, since it also uses the same in-memory indexed_headers: HeaderList that the "Fast Path" uses (and that only includes headers that are part of the best chain, so reorged blocks are never available regardless).
But more importantly - if we don't have a corresponding header because new blocks are still being processed or due to a reorg (possible with the ordering proposed here), those db entries should be skipped.
With reorg handling implemented, the correct approach would be to use the "Fast Path" only (skipping over entries without a corresponding header), remove tx_confirming_block() entirely, and drop the C index (txid->blockhash confirmations map) which becomes unnecessary.
| orphaned | ||
| }; | ||
|  | ||
| // Cleanup orphaned data AFTER applying headers - no race condition | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race condition here. Between updating the in-memory HeaderList and removing orphaned data from the db, there could be entries read from the db that point to a block height that doesn't actually confirm the entry's txid. The cleanup should happen BEFORE the new headers are applied to avoid that.
Also, cleaning up orphaned data should happen BEFORE the entries from the new blocks are written. We have to first undo the reorged blocks and only then apply the new ones, otherwise the cleanup could remove entries that were just added by the new blocks (i.e., if the same tx re-confirmed under a different block at the same height).
I believe the correct order would be:
- Remove reorged headers from the in-memory HeaderList
- Cleanup reorged history entries from the database
- Index new history entries to the database
- Apply new headers to the in-memory HeaderList
This ordering also makes the API more consistent - it will never return blocks (e.g. in /blocks/tip or /block/:hash) that aren't fully processed and populated in the history db (both for new blocks and reorged blocks).
But it also means that the tip will momentarily drop back to the common ancestor before advancing up to the new tip. Is that acceptable, or is the tip height expected to increase monotonically in the public APIs? (/cc @philippem @RCasatta)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does it do today?
| .collect(); | ||
|  | ||
| self.cleanup_history(&orphaned_heights)?; | ||
| self.cleanup_confirmations(&orphaned_hashes)?; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confirmations index could be removed entirely, see this comment
Summary
This PR contains two related improvements to the indexer:
Motivation
History Query Performance
The current implementation of
_history(),_history_txids(), andutxo_delta()callstx_confirming_block()for each transaction, which performs a database lookup. For addresses with many transactions, this results in O(N) databasescans.
Since the
TxHistoryRowalready containsconfirmed_height, we can use this directly and only acquire the headers lock once upfront, reducing query time significantly.Reorg Cleanup
After blockchain reorganizations, orphaned data remains in the database. Without this change we may have gotten incorrect block height from history rows. This PR adds targeted cleanup of:
Performance Impact
Load testing shows significant improvements for addresses with many transactions:
Test scenario: calling blockchain.scripthash.get_history with 20 concurrent connections for 2000 scripthashes
The optimization works by:
TxHistoryRow.key.confirmed_heightdirectlytx_confirming_block()if header lookup fails (rare)This eliminates O(N) database lookups per query, significantly reducing RocksDB read pressure and improving response times under concurrent load.
Changes
src/new_index/schema.rs:
_history()to acquire headers lock once upfront and useconfirmed_heightfrom rows_history_txids()with same optimization patternutxo_delta()to use height-based header lookups instead of per-tx DB queriesitertools::Itertoolsimport (no longer needed without.unique())cleanup_orphaned_data()method called duringupdate()after reorg detectioncleanup_history(),cleanup_confirmations(),cleanup_cache()update()to capture orphaned headers fromapply()and trigger cleanupwrite_batch()for efficient bulk deletionsrc/new_index/db.rs:
write_batch()method to support efficient batch writes with sync enabledsrc/util/block.rs:
HeaderList::apply()to returnVec<HeaderEntry>containing orphaned headersImplementation Details
Query Optimization
All three methods (
_history,_history_txids,utxo_delta) now follow the same pattern:indexed_headersread lock once at startrow.key.confirmed_heightto look up header by heightBlockId::from(header)in memory (no DB access)tx_confirming_block()only if header not found (handles edge cases during reorg and indexing)Reorg Cleanup
Backwards Compatibility
This change is fully backwards compatible:
Checklist
(Note: Changes made with help of Claude Code)