-
Notifications
You must be signed in to change notification settings - Fork 758
optimize historical range #3658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d3bc7f3
to
3b35c08
Compare
39d1344
to
c8050d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have some benchmark results (either manual or preferably through a benchmark test) as part of this PR to verify the results
Range proofs benchmarking: The improvements are mostly seen when providing a small maxLength value compared to the total keys. I attached the result of a benchmark with the following input:
The benchmark was run using the same seed, and it was generating a rangeProof from a randomly chosen interval [start, end], from 2 different random merkleRoot's from the history, with a random maxLength [0, 0.2*totalKeys]. Results with 2 differents seeds:
Iterator benchmarking: I attached the result of a benchmark with the following input:
The benchmark was run using the same seed and it was randomly generating a start and a prefix for creating an iterator with the proper filtered changes.
|
9a19d2e
to
89cac9a
Compare
This PR has become stale because it has been open for 30 days with no activity. Adding the |
nope! This needs to be reviewed :D |
This PR has become stale because it has been open for 30 days with no activity. Adding the |
89cac9a
to
b4e5849
Compare
x/merkledb/history.go
Outdated
historyChanges, ok := th.history.Index(i) | ||
if !ok { | ||
return nil, fmt.Errorf("missing history changes at index %d", i) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case even possible to hit? If it's not possible for the caller to handle this error we should just panic
x/merkledb/history.go
Outdated
} | ||
|
||
// historyChangesIndexHeap is used to traverse the changes sorted by ASC [key] and ASC [insertNumber]. | ||
historyChangesIndexHeap := heap.NewQueue[*historyChangesIndex](func(a, b *historyChangesIndex) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use a pointer here for the historyChangesIndex
type? Copying a small value isn't a bad trade-off to avoid annoying properties of the heap (bad locality, gc, etc).
x/merkledb/history.go
Outdated
@@ -52,22 +54,29 @@ type changeSummaryAndInsertNumber struct { | |||
insertNumber uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't related to the PR... but why do we track this? We already track nextChangeNumber
... if we know either the next revision's number or the first revision's number, it seems like we could calculate any revision's insertion number by using an the offset in the history deque. Similarly I wonder if we could just have lastChanges
just be a map of ids.ID
to the index in the history
. I'm not familiar enough with this code to know exactly how the data needs to be indexed but it feels like history
and lastChanges
have redundant information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tracing purpuses, even though I mentioned privately to you: I will be using a map of the rootIDs
-> insert number
because indexes from history double queue are changing, and by having an insert number of a root id, we can compute its index from insertNumber
.
And we wont have redundant data stored in two different structures, and also I will get rid of the nextInsertNumber
.
x/merkledb/trie_test.go
Outdated
require.NoError(err) | ||
|
||
keys := make([]string, len(view.changes.sortedKeyChanges)) | ||
for i, kc := range view.changes.sortedKeyChanges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above, but this introspects into the implementation of view
(changes
is not-exported).
x/merkledb/history.go
Outdated
} | ||
|
||
// Returns the changes to go from the current trie state back to the requested [rootID] | ||
// for the keys in [start, end]. | ||
// If [start] is Nothing, all keys are considered > [start]. | ||
// If [end] is Nothing, all keys are considered < [end]. | ||
func (th *trieHistory) getChangesToGetToRoot(rootID ids.ID, start maybe.Maybe[[]byte], end maybe.Maybe[[]byte]) (*changeSummary, error) { | ||
// [lastRootChange] is the last change in the history resulting in [rootID]. | ||
// [lastRootChange] is the last change in the historyChanges resulting in [rootID]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we mean to update this comment?
x/merkledb/history_test.go
Outdated
} | ||
|
||
maxHistoryLen := len(keyChangesSets) | ||
history := newTrieHistory(maxHistoryLen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment w.r.t unexported code
x/merkledb/view_iterator.go
Outdated
}) | ||
} | ||
|
||
for _, kChange := range v.changes.sortedKeyChanges[startKeyIndex:] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this panic if startKeyIndex is out of bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startKeyIndex
is between 0
and len(v.changes.sortedKeyChanges)]
In case startKeyIndex
is len(v.changes.sortedKeyChanges)
, v.changes.sortedKeyChanges[startKeyIndex:]
is going to be an empty slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values := []int{0, 10, 20, 30}
idx, _ := slices.BinarySearch(values, 50)
fmt.Println(idx, values[idx:])
output:
4 []
x/merkledb/view_test.go
Outdated
viewIntf, err := db.NewView(ctx, ViewChanges{BatchOps: ops}) | ||
require.NoError(b, err) | ||
|
||
view := viewIntf.(*view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast isn't necessary
x/merkledb/db.go
Outdated
values: map[Key]*keyChange{}, | ||
nodes: map[Key]*change[*node]{}, | ||
sortedKeyChanges: make([]*keyChange, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment, but I wonder if it makes sense for us to have two copies of keyChange
since we have to worry about them always being in sync. Maybe values could be updated to be a map of keys to indicies and do a lookup in sortedKeyChanges
?
24cad71
to
31a0536
Compare
89815fa
to
6465727
Compare
Why this should be merged
Direct optimization:
startKey
and/orprefix
Indirect optimization
Fixes:
getChangesToGetToRoot(..)
no-ops being removed from outputHow this works
changeSummary
struct has a new field for having the changed keys in a sorted slicegetChangesToGetToRoot(..)
-> by having thesortedKeys
, we can search (binary search) for thestartKey
, and also easily stop iterating when we are afterendKey
.getValueChanges(..)
-> we can easily get change values betweenstartRoot
andendRoot
, with keys within[startKey, endKey]
in the following way:changes
,insertNumber
andindex
. Min => the root with the min key and min insertNumber (in this way, by popping out of the minheap, we traverse all the keys in ASC order by[key, insertNumber]
)[startKey, endKey]
, and push that initial state of each root into the heap (or not, if there are no keys inside that interval).IMPORTANT improvement for getValueChanges(..): we can stop whenever there are
maxLength
key changes found.How this was tested
Using the existing unit tests.
Adding new unit tests, or modifying existing ones to properly cover the new code.