optimize historical range #3658

rrazvan1 · 2025-01-17T12:02:28Z

Why this should be merged

Direct optimization:

how combined changes are computed between 2 roots
getting changes to a specific root
changes iterator using startKey and/or prefix

Indirect optimization

change proofs
range proofs
view changes iterator

Fixes:

getChangesToGetToRoot(..) no-ops being removed from output

How this works

changeSummary struct has a new field for having the changed keys in a sorted slice
getChangesToGetToRoot(..) -> by having the sortedKeys, we can search (binary search) for the startKey, and also easily stop iterating when we are after endKey.
getValueChanges(..) -> we can easily get change values between startRoot and endRoot, with keys within [startKey, endKey] in the following way:
1. init a minheap where we store a root traverse information: changes, insertNumber and index. Min => the root with the min key and min insertNumber (in this way, by popping out of the minheap, we traverse all the keys in ASC order by [key, insertNumber])
2. iterate through each root's changes, and find (binary search) the index of the first key within [startKey, endKey], and push that initial state of each root into the heap (or not, if there are no keys inside that interval).
3. pop elements out of minheap, and while the key is same, we merge the changes and store the final combined change.

IMPORTANT improvement for getValueChanges(..): we can stop whenever there are maxLength key changes found.

How this was tested

Using the existing unit tests.
Adding new unit tests, or modifying existing ones to properly cover the new code.

joshua-kim

We should have some benchmark results (either manual or preferably through a benchmark test) as part of this PR to verify the results

rrazvan1 · 2025-02-06T12:20:09Z

We should have some benchmark results (either manual or preferably through a benchmark test) as part of this PR to verify the results

Range proofs benchmarking:

The improvements are mostly seen when providing a small maxLength value compared to the total keys.

I attached the result of a benchmark with the following input:

maximum key length => 20
history changes => 100
changes per history => 20000
maximum maxLength provided to getRangeProof(..) => 20% of the total keys inserted/updated

The benchmark was run using the same seed, and it was generating a rangeProof from a randomly chosen interval [start, end], from 2 different random merkleRoot's from the history, with a random maxLength [0, 0.2*totalKeys].

Results with 2 differents seeds:

Benchmark_ChangeProofs-12			10         466853996 ns/op
Benchmark_ChangeProofsOptimized-12		10         317976392 ns/op

Benchmark_ChangeProofs-12		10         751904392 ns/op
Benchmark_ChangeProofsOptimized-12	10         527304283 ns/op

Iterator benchmarking:

I attached the result of a benchmark with the following input:

maximum key length => 20
keys => 1.000.000 (1M)

The benchmark was run using the same seed and it was randomly generating a start and a prefix for creating an iterator with the proper filtered changes.

BenchmarkView_NewIteratorWithStartAndPrefix-12			100          39738423 ns/op
BenchmarkView_NewIteratorWithStartAndPrefixOptimized-12		100          21231005 ns/op

BenchmarkView_NewIteratorWithStartAndPrefix-12				100          37678990 ns/op
BenchmarkView_NewIteratorWithStartAndPrefixOptimized-12			100          21599712 ns/op

github-actions · 2025-03-09T00:00:25Z

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

rrazvan1 · 2025-03-10T10:39:29Z

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

nope! This needs to be reviewed :D

github-actions · 2025-04-13T00:24:41Z

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

joshua-kim · 2025-05-12T17:59:18Z

x/merkledb/history.go

+		historyChanges, ok := th.history.Index(i)
+		if !ok {
+			return nil, fmt.Errorf("missing history changes at index %d", i)
+		}


Is this case even possible to hit? If it's not possible for the caller to handle this error we should just panic

x/merkledb/history.go

joshua-kim · 2025-05-12T22:20:55Z

x/merkledb/history.go

+	}
+
+	// historyChangesIndexHeap is used to traverse the changes sorted by ASC [key] and ASC [insertNumber].
+	historyChangesIndexHeap := heap.NewQueue[*historyChangesIndex](func(a, b *historyChangesIndex) bool {


Why do we use a pointer here for the historyChangesIndex type? Copying a small value isn't a bad trade-off to avoid annoying properties of the heap (bad locality, gc, etc).

joshua-kim · 2025-05-12T22:33:02Z

x/merkledb/history.go

@@ -52,22 +54,29 @@ type changeSummaryAndInsertNumber struct {
 	insertNumber uint64


This isn't related to the PR... but why do we track this? We already track nextChangeNumber... if we know either the next revision's number or the first revision's number, it seems like we could calculate any revision's insertion number by using an the offset in the history deque. Similarly I wonder if we could just have lastChanges just be a map of ids.ID to the index in the history. I'm not familiar enough with this code to know exactly how the data needs to be indexed but it feels like history and lastChanges have redundant information.

For tracing purpuses, even though I mentioned privately to you: I will be using a map of the rootIDs -> insert number because indexes from history double queue are changing, and by having an insert number of a root id, we can compute its index from insertNumber.
And we wont have redundant data stored in two different structures, and also I will get rid of the nextInsertNumber.

x/merkledb/history.go

joshua-kim · 2025-05-12T22:52:05Z

x/merkledb/trie_test.go

+	require.NoError(err)
+
+	keys := make([]string, len(view.changes.sortedKeyChanges))
+	for i, kc := range view.changes.sortedKeyChanges {


Similar comment as above, but this introspects into the implementation of view (changes is not-exported).

joshua-kim · 2025-05-13T03:37:24Z

x/merkledb/history.go

 }

 // Returns the changes to go from the current trie state back to the requested [rootID]
 // for the keys in [start, end].
 // If [start] is Nothing, all keys are considered > [start].
 // If [end] is Nothing, all keys are considered < [end].
 func (th *trieHistory) getChangesToGetToRoot(rootID ids.ID, start maybe.Maybe[[]byte], end maybe.Maybe[[]byte]) (*changeSummary, error) {
-	// [lastRootChange] is the last change in the history resulting in [rootID].
+	// [lastRootChange] is the last change in the historyChanges resulting in [rootID].


Did we mean to update this comment?

joshua-kim · 2025-05-13T03:38:28Z

x/merkledb/history_test.go

+	}
+
+	maxHistoryLen := len(keyChangesSets)
+	history := newTrieHistory(maxHistoryLen)


Same comment w.r.t unexported code

joshua-kim · 2025-05-13T03:40:24Z

x/merkledb/view_iterator.go

+		})
+	}
+
+	for _, kChange := range v.changes.sortedKeyChanges[startKeyIndex:] {


Won't this panic if startKeyIndex is out of bounds?

startKeyIndex is between 0 and len(v.changes.sortedKeyChanges)]
In case startKeyIndex is len(v.changes.sortedKeyChanges), v.changes.sortedKeyChanges[startKeyIndex:] is going to be an empty slice.

values := []int{0, 10, 20, 30} idx, _ := slices.BinarySearch(values, 50) fmt.Println(idx, values[idx:])

output:

4 []

joshua-kim · 2025-05-13T03:43:30Z

x/merkledb/view_test.go

+	viewIntf, err := db.NewView(ctx, ViewChanges{BatchOps: ops})
+	require.NoError(b, err)
+
+	view := viewIntf.(*view)


This cast isn't necessary

joshua-kim · 2025-05-13T03:45:37Z

x/merkledb/db.go

+		values:           map[Key]*keyChange{},
+		nodes:            map[Key]*change[*node]{},
+		sortedKeyChanges: make([]*keyChange, 0),


Similar comment, but I wonder if it makes sense for us to have two copies of keyChange since we have to worry about them always being in sync. Maybe values could be updated to be a map of keys to indicies and do a lookup in sortedKeyChanges?

x/merkledb/history.go

rrazvan1 requested a review from StephenButtolph as a code owner January 17, 2025 12:02

rrazvan1 force-pushed the optimize-historical-range branch from d3bc7f3 to 3b35c08 Compare January 17, 2025 12:03

rrazvan1 marked this pull request as draft January 17, 2025 12:32

rrazvan1 force-pushed the optimize-historical-range branch from 39d1344 to c8050d7 Compare January 20, 2025 13:06

rrazvan1 marked this pull request as ready for review January 20, 2025 14:47

rrazvan1 added the merkledb label Feb 3, 2025

joshua-kim assigned joshua-kim and rrazvan1 and unassigned joshua-kim Feb 4, 2025

joshua-kim self-requested a review February 4, 2025 21:24

joshua-kim reviewed Feb 4, 2025

View reviewed changes

rrazvan1 force-pushed the optimize-historical-range branch 2 times, most recently from 9a19d2e to 89cac9a Compare February 6, 2025 13:16

github-actions bot added the lifecycle/stale label Mar 9, 2025

rrazvan1 removed the lifecycle/stale label Mar 10, 2025

github-actions bot added the lifecycle/stale label Apr 13, 2025

rrazvan1 removed the lifecycle/stale label Apr 14, 2025

rrazvan1 requested a review from joshua-kim April 25, 2025 11:48

joshua-kim moved this from Backlog 🧊 to In Progress 🏗️ in avalanchego May 1, 2025

rrazvan1 force-pushed the optimize-historical-range branch from 89cac9a to b4e5849 Compare May 2, 2025 07:58

joshua-kim reviewed May 13, 2025

View reviewed changes

rrazvan1 added 4 commits May 20, 2025 16:43

optimize historical range

a577fc1

update history test for a better coverage

8745c28

handle getChangesToGetToRoot no-ops

5397a92

fix linter complains

20a054e

rrazvan1 added 2 commits May 20, 2025 16:43

added benchmarks

5e87967

deduplicating fields in structs + improvements based on feedback

31a0536

rrazvan1 force-pushed the optimize-historical-range branch from 24cad71 to 31a0536 Compare May 20, 2025 13:43

rrazvan1 requested a review from joshua-kim May 21, 2025 12:54

joshua-kim reviewed May 27, 2025

View reviewed changes

x/merkledb/history.go Outdated Show resolved Hide resolved

x/merkledb/history.go Outdated Show resolved Hide resolved

x/merkledb/history.go Outdated Show resolved Hide resolved

improve changeSummary struct simplicity

6465727

rrazvan1 force-pushed the optimize-historical-range branch from 89815fa to 6465727 Compare June 2, 2025 07:49

joshua-kim approved these changes Jun 3, 2025

View reviewed changes

		@@ -52,22 +54,29 @@ type changeSummaryAndInsertNumber struct {
		insertNumber uint64

optimize historical range #3658

Are you sure you want to change the base?

optimize historical range #3658

Uh oh!

Conversation

rrazvan1 commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this should be merged

How this works

How this was tested

Uh oh!

joshua-kim left a comment

Choose a reason for hiding this comment

Uh oh!

rrazvan1 commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 9, 2025

Uh oh!

rrazvan1 commented Mar 10, 2025

Uh oh!

github-actions bot commented Apr 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rrazvan1 May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rrazvan1 commented Jan 17, 2025 •

edited

Loading

rrazvan1 commented Feb 6, 2025 •

edited

Loading

rrazvan1 May 14, 2025 •

edited

Loading