-
-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Would it be desirable and feasible to support user-supplied key comparison? Basically an equivalent feature to comparators in RocksDB.
Byte-wise comparison is great for many use cases, particularly fixed-size keys and UTF-8 data. However, it leaves something to be desired when working with variable-length binary keys.
Example 1
Let's say my key type is (String, u8)
and we want to store two keys, ("a", 255)
and ("ab", 0)
. Under lexicographical ordering, ("a", 255)
comes before ("ab", 0)
. If we concatenate the two components into a Vec<u8>
, then we have the encodings
("a", 255) => 61 FF
("ab", 0) => 61 62 00
You can see how this flips the order of the two keys. I can work around this by using null-terminated strings, so the keys become
("a", 255) => 61 00 FF
("ab", 0) => 61 62 00 FF
which gives the correct ordering at the cost of converting my Rust strings to C strings and not supporting UTF-8 null.
Example 2
Let's say the key type is (Vec<u8>, u8)
. Here, we're in even more trouble, because we can't use the null terminator trick. Take (b"a", 255)
and (b"a\0b", 255)
as examples. The first key should come before the second, but there's no cheap way to ensure this. We have to re-encode the byte strings into another format where the null-terminator trick works, such as null-terminated hex strings.
Custom comparison
With custom comparisons, you can encode keys using any fast, zero-copy representation you want and still have precise control over ordering. In the examples above, this would look like
fn my_comparison_operator(left: &[u8], right: &[u8]) -> std::cmp::Ordering {
std::cmp::Ord::cmp(
&(&left[..left.len() - 1], left[left.len() - 1]),
&(&right[..right.len() - 1], right[right.len() - 1])
)
}