-
Notifications
You must be signed in to change notification settings - Fork 98
BTrees in Motoko. #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
BTrees in Motoko. #396
Changes from 5 commits
c35b3d9
b95320f
d81a1c3
142a0d0
ffd99ec
61334ec
0d06506
273fb6f
9db84ca
4393a81
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| /// Imperative sequences as B-Trees. | ||
|
|
||
| import A "Array"; | ||
| import I "Iter"; | ||
| import List "List"; | ||
| import Option "Option"; | ||
| import Order "Order"; | ||
| import P "Prelude"; | ||
| import Prim "mo:⛔"; | ||
|
|
||
| module { | ||
|
|
||
| /// Constants we use to shape the tree. | ||
| /// See https://en.wikipedia.org/wiki/B-tree#Definition | ||
| module Constants { | ||
| let MAX_CHILDREN = 4; | ||
| }; | ||
|
|
||
| public type Compare<K> = { | ||
| compare : (K, K) -> Order.Order | ||
| }; | ||
|
|
||
| public type Data<K, V> = [(K, V)]; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious, what's the benefit of the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a slight space savings to the tuple (1 word per instance). Otherwise, I'd prefer the record with labeled fields. I actually had that same record definition initially, but recalled this recent review from Claudio for the HashMap improvements.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, why is I'm probably missing something as it's early and I definitely didn't get enough sleep last night 😅 Is there a reference implementation you used for this that I can peek through?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| public type Index<K, V> = { | ||
| data : Data<K, V>; | ||
| trees : [Tree<K, V>]; | ||
| }; | ||
|
|
||
| public type Tree<K, V> = { | ||
| #index : Index<K, V>; | ||
| #data : Data<K, V>; | ||
|
||
| }; | ||
|
|
||
| func find_data<K, V>(data : Data<K, V>, find_k : K, c : Compare<K>) : ?V { | ||
|
||
| for ((k, v) in data.vals()) { | ||
| if (c.compare(k, find_k) == #equal) { return ?v }; | ||
| }; | ||
| return null | ||
| }; | ||
|
|
||
| func find<K, V>(t : Tree<K, V>, k : K, c : Compare<K>) : ?V { | ||
| switch t { | ||
| case (#data(d)) { return find_data<K, V>(d, k, c) }; | ||
| case (#index(i)) { | ||
| for (j in I.range(0, i.data.size())) { | ||
| switch (c.compare(k, i.data[j].0)) { | ||
| case (#equal) { return ?i.data[j].1 }; | ||
| case (#less) { return find<K, V>(i.trees[j], k, c) }; | ||
| case _ { } | ||
|
||
| } | ||
| }; | ||
| find<K, V>(i.trees[i.data.size()], k, c) | ||
| }; | ||
| }; | ||
| }; | ||
|
|
||
| /// Check that a B-Tree instance observes invariants of B-Trees. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this module is just for testing/debugging, should When I import the default BTree module like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This module is static. The compiler will not compile static code that you import but do not use. (However, class instances will always contain all methods, regardless of usage. Not applicable here.)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Why? It's more enclosed and encapsulated here. There is no way to "hide" a top-level module in |
||
| /// Invariants ensure performance is what we expect. | ||
| /// For testing and debugging. | ||
| public module Check { | ||
|
||
|
|
||
| type CompareOp<K> = { | ||
| compare : (?K, ?K) -> Order.Order | ||
| }; | ||
|
|
||
| func compareOp<K>(c : Compare<K>) : CompareOp<K> = { | ||
|
||
| compare = func (k1 : ?K, k2 : ?K) : Order.Order { | ||
| switch (k1, k2) { | ||
| case (null, null) { assert false; loop {} }; | ||
| case (null, _) #less; | ||
| case (_, null) #greater; | ||
| case (?k1, ?k2) c.compare(k1, k2) | ||
| } | ||
| } | ||
| }; | ||
|
|
||
| public func check<K, V>(c : Compare<K>, t : Tree<K, V>) { | ||
| rec(null, compareOp(c), t, null) | ||
| }; | ||
|
|
||
| func rec<K, V>(lower : ?K, c : CompareOp<K>, t : Tree<K, V>, upper : ?K) { | ||
| switch t { | ||
| case (#data(d)) { data(lower, c, d, upper) }; | ||
| case (#index(i)) { index(lower, c, i, upper) }; | ||
| } | ||
| }; | ||
|
|
||
| func data<K, V>(lower : ?K, c : CompareOp<K>, d : Data<K, V>, upper : ?K) { | ||
| var prev_k : ?K = null; | ||
| for ((k, _) in d.vals()) { | ||
| assert (c.compare(prev_k, ?k) != #greater); | ||
| assert (c.compare(lower, ?k) != #greater); | ||
| assert (c.compare(?k, upper) != #greater); | ||
matthewhammer marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| prev_k := ?k; | ||
| } | ||
| }; | ||
|
|
||
| func index<K, V>(lower : ?K, c : CompareOp<K>, i : Index<K, V>, upper : ?K) { | ||
| assert (i.data.size() + 1 == i.trees.size()); | ||
|
||
| data(lower, c, i.data, upper); | ||
| for (j in I.range(0, i.trees.size())) { | ||
| let lower_ = if (j == 0) { lower } else { ?(i.data[j - 1].0) }; | ||
| let upper_ = if (j == i.data.size()) { upper } else { ?(i.data[j]).0 }; | ||
| rec<K, V>(lower_, c, i.trees[j], upper_) | ||
| } | ||
| }; | ||
| }; | ||
|
|
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the max 4? https://panthema.net/2007/stx-btree/stx-btree-0.8.3/doxygen-html/speedtest.html shows that 32-128 perform considerably better at large n.
I think it therefore might make sense to allow the developer to configure larger child values (i.e. 4, 16, 32, 64, 128, 256). I'd be curious to run some performance tests inserting a running counter or batch of elements (ordered or unordered) to see what difference this might make as the tree grows in size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to write simple tests before adjusting this number into something that varies, which I agree is desirable.