Skip to content

add get_by_view method to Map #2082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Guest0x0
Copy link
Contributor

This PR adds a new method get_by_view to Map. The motivation is the need for querying a Map[String, V] with @string.View. Converting the view to String first requires a copy, storing @string.View directly in the map may result in memory leak. So a new API is necessary for this case.

The signature of the new API is as follows:

/// Get the value associated with the view of a key.
/// When searching the map, instead of comparing keys directly,
/// a view of keys, obtained by the parameter `view`, is used.
/// This is useful for e.g. querying a `Map` of `String` with `@string.View`.
///
/// The `view` function and the `Hash` implementation of `View` should satisfy:
/// ```
/// view(key).hash() == key.hash() 
/// ```
fn get_by_view[K, View : Hash + Eq, Value](
  self : Self[K, Value],
  key : View,
  view~ : (K) -> View
) -> Value?

The map will be searched with the hash code of key. When comparing equality, existing keys in the map will be first converted to View using view.

Note that for the @string.View case, a specialized method may be adequate. But there are other cases such as @bytes.View. So a generic implementation is used here. The view function will only be called when comparing keys with identical hash code, so it will not be invoked a lot. Therefore even if get_by_view is not inlined, the indirect call cost of view should be acceptable (since using get_by_view implies that the key is probably a large type with non-trivial comparison, such as String)

@Guest0x0 Guest0x0 requested a review from bobzhang May 12, 2025 03:12
Copy link

peter-jerry-ye-code-review bot commented May 12, 2025

Potential hash collision handling issue in get_by_view

Category
Correctness
Code Snippet
if entry.hash == hash && view(entry.key) == key {
break Some(entry.value)
}
Recommendation
Consider adding additional collision resolution logic similar to the insert method, in case multiple keys hash to the same value but have different views
Reasoning
The current implementation only checks if the hash matches and the view matches, but doesn't handle the case where different keys might produce the same hash and view. This could lead to incorrect results in rare cases.

Test coverage could be improved

Category
Maintainability
Code Snippet
test "map::get_by_view::string_view" {
let map = { "a": 1, "b": 2, "c": 3 }
Recommendation
Add test cases for:

  1. Empty string views
  2. Hash collision cases
  3. Other view types (e.g. bytes.View)
  4. Large strings to verify performance claims
    Reasoning
    The current test only covers basic string view cases. Given the generic nature of the implementation and the performance claims in the PR description, more comprehensive testing would increase confidence in the implementation.
view function called on each comparison during probing

Category
Performance
Code Snippet
if entry.hash == hash && view(entry.key) == key {
Recommendation
Consider caching the view of the key if the probe sequence is expected to be long, or add a comment explaining why this is not necessary
Reasoning
While the PR description mentions that view conversion won't be called often due to hash filtering, in a heavily loaded map with many collisions, the view function could be called multiple times during probing. A comment or optimization might be warranted.

@coveralls
Copy link
Collaborator

coveralls commented May 12, 2025

Pull Request Test Coverage Report for Build 6660

Details

  • 4 of 5 (80.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.009%) to 92.442%

Changes Missing Coverage Covered Lines Changed/Added Lines %
builtin/linked_hash_map.mbt 4 5 80.0%
Totals Coverage Status
Change from base Build 6659: -0.009%
Covered Lines: 6140
Relevant Lines: 6642

💛 - Coveralls

@Guest0x0 Guest0x0 force-pushed the Guest0x0/add-map-get-by-view branch 2 times, most recently from 8b22d17 to 0a0400a Compare May 12, 2025 03:31
@hackwaly
Copy link
Contributor

This is one of the case which show generic params of traits is useful.

@bobzhang bobzhang force-pushed the Guest0x0/add-map-get-by-view branch from 0a0400a to c0a033e Compare May 13, 2025 01:22
@bobzhang
Copy link
Contributor

this may be somthing easier to understand :

get_by_hashcode[K, View: Eq, V](self : Self[K,V], hashcode : Int , v : View, f~ : (K) -> View) -> Value? 

Then we can build something safe on top of it, is it a premature optimization?

@Guest0x0
Copy link
Contributor Author

this may be somthing easier to understand :

get_by_hashcode[K, View: Eq, V](self : Self[K,V], hashcode : Int , v : View, f~ : (K) -> View) -> Value? 

Then we can build something safe on top of it, is it a premature optimization?

Is this a user-facing API, or just an internal helper? In terms of implementation, get_by_hashcode seems to be just get_by_view here with the hash code computation lifted out, is there any use of get_by_hashcode except for get_by_view?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants