-
Notifications
You must be signed in to change notification settings - Fork 1
hsel improvements #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1. 'options'-map made global since clearing-cost < initialization-cost 2. represented the first k spots in the virtual [0,n-1]-array by the 'result'-array and only the rest with the 'options'-hashmap. This allows us to replace one of the two hashmap-lookups with an array-lookup (much faster)
I don't think I can accept change 1 because that makes the code not thread safe. I'm trying to understand your change 2 now, but I notice that my current code can't possibly be correct, since the parameter to |
OK I understand your change and I think it's good. It would improve readability for me if in each "if/else", you make the "if" branch shorter than the "else" branch where possible, by inverting the boolean if necessary. A couple of other coding style things for this repo:
There's a Thank you! |
Thanks for considering my PR :) The standard suggestion to make it thread-safe would be to wrap it in a functor class (with What do you think about that? |
Weird - how come the allocation is so expensive compared to everything else? |
Hmm, I can only speculate about the internals of Now that I think about it, EDIT: wait, I just saw that the hashmap was already constructed with 2k buckets, but it's faster with 1.45k buckets for some reason oO I'll have to investigate a little more...
But it's not only the initial bucket count that influences performance. I've modified the non-global hsel to store the last bucket count to initialize the next iteration with, in order to match the buckets in the global variant. This "fixes" the performance issue for k<130, but there is another dent in the curve at k=130 that I cannot explain.
(here "#buck" is the average bucket count after an iteration). |
Oh and I wrote a vector-based hash-set that can be abused as hash-map, and hsel with that one blows everything out of the water (around 50% faster than hsel with global-state unordered_map):
("VH" for "vector hash") |
This PR improves
random_hsel
byresult
array in the lower (size-k) part of the virtual [0,n-1]-arrayThis makes
random_hsel
run faster thancardchoose
for k > 60 on my machine.