Skip to content

Conversation

@xu3kev
Copy link

@xu3kev xu3kev commented Jan 11, 2023

Since each hashtable can be constructed once and then discarded after the union-find step, we can rearrange the for loop into constructing the B hashtables one by one instead of constructing them all at once. (Note that each hashtable consumes a large amount of memory because it holds the (document idx, hash) for the entire dataset.) We can then potentially save the memory usage by up to B times.

@xu3kev xu3kev changed the title construct the hashtables iteratively to save memory by B times construct the hashtables iteratively to save memory up to B times Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant