Skip to content

optimize historical range #3658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

optimize historical range #3658

wants to merge 7 commits into from

Conversation

rrazvan1
Copy link
Contributor

@rrazvan1 rrazvan1 commented Jan 17, 2025

Why this should be merged

Direct optimization:

  • how combined changes are computed between 2 roots
  • getting changes to a specific root
  • changes iterator using startKey and/or prefix

Indirect optimization

  • change proofs
  • range proofs
  • view changes iterator

Fixes:

  • getChangesToGetToRoot(..) no-ops being removed from output

How this works

  • changeSummary struct has a new field for having the changed keys in a sorted slice
  • getChangesToGetToRoot(..) -> by having the sortedKeys, we can search (binary search) for the startKey, and also easily stop iterating when we are after endKey.
  • getValueChanges(..) -> we can easily get change values between startRoot and endRoot, with keys within [startKey, endKey] in the following way:
    1. init a minheap where we store a root traverse information: changes, insertNumber and index. Min => the root with the min key and min insertNumber (in this way, by popping out of the minheap, we traverse all the keys in ASC order by [key, insertNumber])
    2. iterate through each root's changes, and find (binary search) the index of the first key within [startKey, endKey], and push that initial state of each root into the heap (or not, if there are no keys inside that interval).
    3. pop elements out of minheap, and while the key is same, we merge the changes and store the final combined change.

IMPORTANT improvement for getValueChanges(..): we can stop whenever there are maxLength key changes found.

How this was tested

Using the existing unit tests.
Adding new unit tests, or modifying existing ones to properly cover the new code.

@rrazvan1 rrazvan1 force-pushed the optimize-historical-range branch from d3bc7f3 to 3b35c08 Compare January 17, 2025 12:03
@rrazvan1 rrazvan1 marked this pull request as draft January 17, 2025 12:32
@rrazvan1 rrazvan1 force-pushed the optimize-historical-range branch from 39d1344 to c8050d7 Compare January 20, 2025 13:06
@rrazvan1 rrazvan1 marked this pull request as ready for review January 20, 2025 14:47
@joshua-kim joshua-kim assigned joshua-kim and rrazvan1 and unassigned joshua-kim Feb 4, 2025
@joshua-kim joshua-kim self-requested a review February 4, 2025 21:24
Copy link
Contributor

@joshua-kim joshua-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some benchmark results (either manual or preferably through a benchmark test) as part of this PR to verify the results

@rrazvan1
Copy link
Contributor Author

rrazvan1 commented Feb 6, 2025

We should have some benchmark results (either manual or preferably through a benchmark test) as part of this PR to verify the results

Range proofs benchmarking:

The improvements are mostly seen when providing a small maxLength value compared to the total keys.

I attached the result of a benchmark with the following input:

  • maximum key length => 20
  • history changes => 100
  • changes per history => 20000
  • maximum maxLength provided to getRangeProof(..) => 20% of the total keys inserted/updated

The benchmark was run using the same seed, and it was generating a rangeProof from a randomly chosen interval [start, end], from 2 different random merkleRoot's from the history, with a random maxLength [0, 0.2*totalKeys].

Results with 2 differents seeds:

Benchmark_ChangeProofs-12			10         466853996 ns/op
Benchmark_ChangeProofsOptimized-12		10         317976392 ns/op
Benchmark_ChangeProofs-12		10         751904392 ns/op
Benchmark_ChangeProofsOptimized-12	10         527304283 ns/op

Iterator benchmarking:

I attached the result of a benchmark with the following input:

  • maximum key length => 20
  • keys => 1.000.000 (1M)

The benchmark was run using the same seed and it was randomly generating a start and a prefix for creating an iterator with the proper filtered changes.

BenchmarkView_NewIteratorWithStartAndPrefix-12			100          39738423 ns/op
BenchmarkView_NewIteratorWithStartAndPrefixOptimized-12		100          21231005 ns/op
BenchmarkView_NewIteratorWithStartAndPrefix-12				100          37678990 ns/op
BenchmarkView_NewIteratorWithStartAndPrefixOptimized-12			100          21599712 ns/op

@rrazvan1 rrazvan1 force-pushed the optimize-historical-range branch 2 times, most recently from 9a19d2e to 89cac9a Compare February 6, 2025 13:16
Copy link

github-actions bot commented Mar 9, 2025

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

@rrazvan1
Copy link
Contributor Author

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

nope! This needs to be reviewed :D

Copy link

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

@rrazvan1 rrazvan1 requested a review from joshua-kim April 25, 2025 11:48
@joshua-kim joshua-kim moved this from Backlog 🧊 to In Progress 🏗️ in avalanchego May 1, 2025
@rrazvan1 rrazvan1 force-pushed the optimize-historical-range branch from 89cac9a to b4e5849 Compare May 2, 2025 07:58
Comment on lines 202 to 243
historyChanges, ok := th.history.Index(i)
if !ok {
return nil, fmt.Errorf("missing history changes at index %d", i)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case even possible to hit? If it's not possible for the caller to handle this error we should just panic

}

// historyChangesIndexHeap is used to traverse the changes sorted by ASC [key] and ASC [insertNumber].
historyChangesIndexHeap := heap.NewQueue[*historyChangesIndex](func(a, b *historyChangesIndex) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use a pointer here for the historyChangesIndex type? Copying a small value isn't a bad trade-off to avoid annoying properties of the heap (bad locality, gc, etc).

@@ -52,22 +54,29 @@ type changeSummaryAndInsertNumber struct {
insertNumber uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to the PR... but why do we track this? We already track nextChangeNumber... if we know either the next revision's number or the first revision's number, it seems like we could calculate any revision's insertion number by using an the offset in the history deque. Similarly I wonder if we could just have lastChanges just be a map of ids.ID to the index in the history. I'm not familiar enough with this code to know exactly how the data needs to be indexed but it feels like history and lastChanges have redundant information.

Copy link
Contributor Author

@rrazvan1 rrazvan1 May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tracing purpuses, even though I mentioned privately to you: I will be using a map of the rootIDs -> insert number because indexes from history double queue are changing, and by having an insert number of a root id, we can compute its index from insertNumber.
And we wont have redundant data stored in two different structures, and also I will get rid of the nextInsertNumber.

require.NoError(err)

keys := make([]string, len(view.changes.sortedKeyChanges))
for i, kc := range view.changes.sortedKeyChanges {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above, but this introspects into the implementation of view (changes is not-exported).

}

// Returns the changes to go from the current trie state back to the requested [rootID]
// for the keys in [start, end].
// If [start] is Nothing, all keys are considered > [start].
// If [end] is Nothing, all keys are considered < [end].
func (th *trieHistory) getChangesToGetToRoot(rootID ids.ID, start maybe.Maybe[[]byte], end maybe.Maybe[[]byte]) (*changeSummary, error) {
// [lastRootChange] is the last change in the history resulting in [rootID].
// [lastRootChange] is the last change in the historyChanges resulting in [rootID].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we mean to update this comment?

}

maxHistoryLen := len(keyChangesSets)
history := newTrieHistory(maxHistoryLen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment w.r.t unexported code

})
}

for _, kChange := range v.changes.sortedKeyChanges[startKeyIndex:] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this panic if startKeyIndex is out of bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startKeyIndex is between 0 and len(v.changes.sortedKeyChanges)]
In case startKeyIndex is len(v.changes.sortedKeyChanges), v.changes.sortedKeyChanges[startKeyIndex:] is going to be an empty slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values := []int{0, 10, 20, 30}
idx, _ := slices.BinarySearch(values, 50)
fmt.Println(idx, values[idx:])

output:

4 []

viewIntf, err := db.NewView(ctx, ViewChanges{BatchOps: ops})
require.NoError(b, err)

view := viewIntf.(*view)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast isn't necessary

x/merkledb/db.go Outdated
Comment on lines 1363 to 1365
values: map[Key]*keyChange{},
nodes: map[Key]*change[*node]{},
sortedKeyChanges: make([]*keyChange, 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment, but I wonder if it makes sense for us to have two copies of keyChange since we have to worry about them always being in sync. Maybe values could be updated to be a map of keys to indicies and do a lookup in sortedKeyChanges?

@rrazvan1 rrazvan1 force-pushed the optimize-historical-range branch from 24cad71 to 31a0536 Compare May 20, 2025 13:43
@rrazvan1 rrazvan1 requested a review from joshua-kim May 21, 2025 12:54
@rrazvan1 rrazvan1 force-pushed the optimize-historical-range branch from 89815fa to 6465727 Compare June 2, 2025 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress 🏗️
Development

Successfully merging this pull request may close these issues.

2 participants