Skip to content

Add set operations to @immut/hash{map, set} and @internal/sparse_array Summary #2145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Asterless
Copy link

Related Issue

Fixes #1830: Efficient set operations for @immut/hash{map, set}

Changes

@immut/hashmap

  • union_with(f: (K, V, V) => V)
    Merges two hashmaps, resolving key conflicts with a custom function f.
  • intersection()
    Returns a new hashmap containing keys present in both input maps, with values from the first map.
  • intersection_with(f: (K, V, V) => V)
    Computes intersection, resolving overlapping keys' values with function f.
  • difference()
    Returns entries present in the first map but not in the second.

@immut/hashset

  • intersection()
    Returns a new set containing elements common to both input sets.
  • difference()
    Returns elements present in the first set but not in the second.

@internal/sparse_array

  • intersection()
    Computes index-wise intersection of two sparse arrays.
  • difference()
    Computes index-wise difference between two sparse arrays.

Motivation

These changes provide a more complete and consistent set of set operations for immutable collections, making it easier to perform common set algebra tasks and improving API parity across collection types.

Tests

Added and updated unit tests for all new and modified methods to ensure correctness and expected behavior.

Checklist

All new and existing tests pass
Code is formatted and documented where appropriate

Copy link

peter-jerry-ye-code-review bot commented May 21, 2025

Potential performance issue in hashset intersection implementation

Category
Correctness
Code Snippet
Branch(sa1), Branch(sa2)) => {
let res = sa1.intersection(sa2, fn(m1, m2) { m1.intersection(m2) })
if res.size() == 0 {
Empty
} else {
Branch(res)
}
Recommendation
Avoid creating intermediate Branch if result is empty:

Branch(sa1), Branch(sa2)) => {
  let res = sa1.intersection(sa2, fn(m1, m2) { m1.intersection(m2) })
  match res.size() {
    0 => Empty
    _ => Branch(res)
  }
}
**Reasoning**
The current implementation creates a Branch node and then checks if it's empty, potentially wasting memory. By checking size first, we can return Empty directly without allocating unnecessary Branch node.

</details>
<details>

<summary> Missing documentation for important parameter in union_with </summary>

**Category**
Maintainability
**Code Snippet**
pub fn[K : Eq + Hash, V] union_with(
  self : T[K, V],
  other : T[K, V],
  f : (K, V, V) -> V
) -> T[K, V]
**Recommendation**
Add detailed documentation for the union function parameter:
```moonbit
///|
/// Union two hashmaps with a function
/// @param f Function to resolve conflicts, receives (key, value1, value2) and returns merged value
/// @return New hashmap containing all keys with values combined using f for duplicates
pub fn[K : Eq + Hash, V] union_with(...)
**Reasoning**
The function parameter f is critical for understanding how value conflicts are resolved, but lacks documentation about its expected behavior and contract. Clear documentation helps users implement correct combining functions.

</details>
<details>

<summary> Inefficient handling of empty cases in difference operation </summary>

**Category**
Performance
**Code Snippet**
pub fn[K : Eq + Hash] difference(self : T[K], other : T[K]) -> T[K] {
  match (self, other) {
    (Empty, _) => Empty
    (_, Empty) => self
**Recommendation**
Add more specific pattern matching for efficiency:
```moonbit
match (self, other) {
  (Empty, _) => Empty
  (s, Empty) => s.clone()
  (Leaf(k), Leaf(k2)) when k == k2 => Empty
  (Leaf(k), Leaf(_)) => self
**Reasoning**
Adding more specific pattern matching for common cases like identical single elements can avoid unnecessary iteration and improve performance for simple cases. The clone() is explicit about creating a new copy.

</details>

@coveralls
Copy link
Collaborator

coveralls commented May 21, 2025

Pull Request Test Coverage Report for Build 6913

Details

  • 52 of 103 (50.49%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 92.01%

Changes Missing Coverage Covered Lines Changed/Added Lines %
immut/internal/sparse_array/sparse_array.mbt 15 16 93.75%
immut/hashset/HAMT.mbt 7 26 26.92%
immut/hashmap/HAMT.mbt 30 61 49.18%
Totals Coverage Status
Change from base Build 6912: -0.4%
Covered Lines: 8740
Relevant Lines: 9499

💛 - Coveralls

@Asterless Asterless marked this pull request as ready for review May 21, 2025 21:39
@bobzhang bobzhang force-pushed the feature/20250522_HAMT branch from 44c0a79 to eb6ef82 Compare May 22, 2025 01:12
@peter-jerry-ye peter-jerry-ye requested a review from Lampese May 22, 2025 02:46
Asterless and others added 2 commits May 22, 2025 11:10
Add set operations to @immut/hash{map, set} and @internal/sparse_array Summary and re-fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Efficient set operations for @immut/hash{map, set}
3 participants