Reuse-optimized variant of union_with_key?

Maps have a `union_with_key` method, like `union` but with a callback to decide what to do when a key exists on both sides:

```rust
    pub fn union_with_key<F>(self, other: Self, mut f: F) -> Self
    where
        F: FnMut(&K, V, V) -> V
```

Both `HashMap` and `OrdMap` are internally immutable trees with sharable copy-on-write nodes. `Rc::make_mut` is used to mutate a node, cloning it if it was previously shared. When many possibly-large maps exist that are created by cloning a previous map and making a possibly-small number of changes, this sharing can be very significant and minimizing the number of `make_mut` calls can be impactful.

Would you be open to adding a new API for this?

## Stage 1

The implementation of `union_with_key` consumes and iterates one the input map, while it mutates then returns the other one. Because the signature of the callback involves owned `V` values however, it needs to always call `remove` then `insert`. This leads to more `make_mut` calls than necessary when the value that was already in the map being mutated ends up being used.

Adding a new method where the callback takes borrowed values would help in that case:

```rust
enum MergeResult<V> {
    UseLeftValue,
    UseRightValue,
    UseNewValue(V),
}
```
```rust
    pub fn union_with_merge<F>(self, other: Self, mut merge: F) -> Self
    where
        F: FnMut(&K, /* left: */ &V, /* right: */ &V) -> MergeResult<V>
    {
        for (key, right_value) in other {
            match self.get(&key) {  // get() does not use make_mut() where remove() does
                None => {
                    self.insert(key, right_value);
                }
                Some(left_value) => {
                    match merge(&key, left_value, &right_value) {
                        MergeResult::UseLeftValue => {}, // No insert() here
                        MergeResult::UseRightValue => {
                            self.insert(key, right_value);
                        },
                        MergeResult::UseNewValue(new_value) => {
                            self.insert(key, new_value);
                        },
                    }
                }
            }
        }
        self
    }
```

(Omitted: swapping the two maps to mutate the larger one: https://github.com/bodil/im-rs/pull/163. That swap is where `MergeResult::UseRightValue` becomes useful.)

## Stage 2

This is a bit fuzzy. More of an intuition that something might be possible, than a fully-formed plan.

Writing this issue is motivated by an optimization effort of an algorithm in Mercurial (copy tracing). Currently that algorithm does the equivalent of Stage 1 above when one of the two maps to merge is much (2×) larger than the other. When they are of similar size it goes even further to maximize re-use: call [`OrdMap::diff`](https://docs.rs/im-rc/15.0.0/im_rc/struct.OrdMap.html#method.diff) and collect all respective key-value pairs that we’d need to insert to transform each of the two maps into a merged map. Only then we pick the side with the smaller set of changes.

Since https://github.com/bodil/im-rs/pull/113 `diff` short-circuits shared internal subtrees based on `ptr_eq`. So more node sharing make `diff` faster and avoids redundant merge computations, which in turn increases sharing in a virtuous cycle.

Still, collecting two sets of changes before applying one of them feels like more work than necessary.

A `union_with_merge` method in the `im` crate could potentially walk internal trees directly instead of using `ConsumingIter`, merge one sub-tree at a time, short-circuit shared sub-trees based on `ptr_eq`, and… *handwave*… have some better algorithm than the one based of `OrdMap::diff`.

Walking internal trees while mutating them to insert merged value and maintaining all invariants would likely be tricky (`diff` is comparatively easier since read-only). But **does this sound possible at all?**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reuse-optimized variant of union_with_key? #166

Stage 1

Stage 2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

Reuse-optimized variant of union_with_key? #166

Description

Stage 1

Stage 2

Activity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

Issue actions