Skip to content

hsel improvements #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

hsel improvements #3

wants to merge 4 commits into from

Conversation

igel-kun
Copy link

This PR improves random_hsel by

  1. making the hash-map global to avoid paying construction+destruction each function call
  2. replacing the hash-map with the result array in the lower (size-k) part of the virtual [0,n-1]-array

This makes random_hsel run faster than cardchoose for k > 60 on my machine.

1. 'options'-map made global since clearing-cost < initialization-cost
2. represented the first k spots in the virtual [0,n-1]-array by the 'result'-array and only the rest with the 'options'-hashmap. This allows us to replace one of the two hashmap-lookups with an array-lookup (much faster)
@ciphergoth
Copy link
Owner

I don't think I can accept change 1 because that makes the code not thread safe. I'm trying to understand your change 2 now, but I notice that my current code can't possibly be correct, since the parameter to randbelow is n every time.

@ciphergoth
Copy link
Owner

OK I understand your change and I think it's good. It would improve readability for me if in each "if/else", you make the "if" branch shorter than the "else" branch where possible, by inverting the boolean if necessary.

A couple of other coding style things for this repo:

  • space after if
  • always use braces, even for one-liners

There's a .clang-format file in the cpp directory that should be applied.

Thank you!

@igel-kun
Copy link
Author

Thanks for considering my PR :)
I've now implemented the style changes. However, the avoidance of the construction and destruction of the hash-map is the main part of the change. It's not going to beat cardchoose for reasonable numbers of k and n if we insist on measuring memory allocations instead of productive work.

The standard suggestion to make it thread-safe would be to wrap it in a functor class (with operator()) that keeps around its own hashmap and reuses it. This however would entail a few changes to the test environment (a functor object would have to be created in the beginning of a test-run and that functor object would be passed instead of the function pointer).

What do you think about that?

@ciphergoth
Copy link
Owner

Weird - how come the allocation is so expensive compared to everything else?

@igel-kun
Copy link
Author

igel-kun commented Jun 11, 2024

Hmm, I can only speculate about the internals of std::unordered_map but I suppose that it's re-hashing alot for the lower sizes since it can't know how many items there are still to come and it wants to keep its load-factor within a constant throughout. Each time a re-hash takes place, all buckets are re-distributed causing a ton of re-allocations.

Now that I think about it, unordered_map does have a reserve() method that works similarly to vector::reserve() and a quick test yields promising results, even if we don't reuse the unordered_map. I'll update the PR :)

EDIT: wait, I just saw that the hashmap was already constructed with 2k buckets, but it's faster with 1.45k buckets for some reason oO I'll have to investigate a little more...
EDIT: these are the current test results for cardchoose ("card") vs. hsel (construction with 1.45k + destruction) vs. hsel ("global" hash-map with clear()). The latter 2 are pretty much en par with each other until k=90 (implying 90*1.45=130 buckets). I have no idea yet why....

k = 30	card:	10.9ms	fiya:	13.4ms	fyGL:	12.2ms
k = 35	card:	13.2ms	fiya:	15.1ms	fyGL:	14.4ms
k = 40	card:	15.5ms	fiya:	17.1ms	fyGL:	16.8ms
k = 45	card:	18.1ms	fiya:	19.3ms	fyGL:	19.3ms
k = 50	card:	20.5ms	fiya:	21.7ms	fyGL:	21.8ms
k = 55	card:	23.0ms	fiya:	23.2ms	fyGL:	24.4ms
k = 60	card:	25.5ms	fiya:	25.7ms	fyGL:	23.9ms
k = 65	card:	28.0ms	fiya:	27.3ms	fyGL:	25.9ms
k = 70	card:	30.6ms	fiya:	30.3ms	fyGL:	28.8ms
k = 75	card:	34.0ms	fiya:	31.7ms	fyGL:	30.5ms
k = 80	card:	35.9ms	fiya:	33.1ms	fyGL:	32.7ms
k = 85	card:	38.4ms	fiya:	35.9ms	fyGL:	35.2ms
k = 90	card:	40.9ms	fiya:	58.5ms	fyGL:	37.6ms
k = 95	card:	43.7ms	fiya:	61.9ms	fyGL:	40.0ms

But it's not only the initial bucket count that influences performance. I've modified the non-global hsel to store the last bucket count to initialize the next iteration with, in order to match the buckets in the global variant. This "fixes" the performance issue for k<130, but there is another dent in the curve at k=130 that I cannot explain.

k = 100fiya:	 #buck: 127	43.2ms	fyGL:	 #buck: 127	42.3ms
k = 110fiya:	 #buck: 127	48.0ms	fyGL:	 #buck: 127	47.4ms
k = 120fiya:	 #buck: 127	53.1ms	fyGL:	 #buck: 127	52.7ms
k = 130fiya:	 #buck: 256	77.5ms	fyGL:	 #buck: 257	51.3ms
k = 140fiya:	 #buck: 257	86.0ms	fyGL:	 #buck: 257	55.8ms
k = 150fiya:	 #buck: 257	92.1ms	fyGL:	 #buck: 257	60.2ms
[...]
k = 250fiya:	 #buck: 257	161.1msfyGL:	 #buck: 257	109.6ms
k = 260fiya:	 #buck: 418	158.2msfyGL:	 #buck: 383	108.6ms
k = 270fiya:	 #buck: 540	156.3msfyGL:	 #buck: 540	105.0ms
k = 280fiya:	 #buck: 541	162.2msfyGL:	 #buck: 541	108.9ms
k = 290fiya:	 #buck: 541	168.4msfyGL:	 #buck: 541	114.1ms
k = 300fiya:	 #buck: 541	174.4msfyGL:	 #buck: 541	118.4ms

(here "#buck" is the average bucket count after an iteration).
Also, I cannot explain, why the the average bucket counts for k=260 differ oO.

@igel-kun
Copy link
Author

igel-kun commented Jun 11, 2024

Oh and I wrote a vector-based hash-set that can be abused as hash-map, and hsel with that one blows everything out of the water (around 50% faster than hsel with global-state unordered_map):

k = 100 card:	47.4ms fiya:	43.1ms fyGL:	42.8ms fyVH:	27.9ms
k = 110 card:	52.1ms fiya:	48.0ms fyGL:	47.9ms fyVH:	31.9ms
k = 120 card:	57.6ms fiya:	53.2ms fyGL:	52.8ms fyVH:	36.2ms
k = 130 card:	63.1ms fiya:	78.5ms fyGL:	51.8ms fyVH:	31.8ms
k = 140 card:	68.7ms fiya:	84.8ms fyGL:	56.9ms fyVH:	34.8ms
k = 150 card:	74.5ms fiya:	92.9ms fyGL:	61.7ms fyVH:	37.4ms
k = 160 card:	80.0ms fiya:	98.7ms fyGL:	65.9ms fyVH:	40.6ms
k = 170 card:	85.6ms fiya:	107.7ms fyGL:	72.1ms fyVH:	44.4ms
k = 180 card:	92.1ms fiya:	113.6ms fyGL:	74.3ms fyVH:	47.5ms
k = 190 card:	97.8ms fiya:	122.0ms fyGL:	79.8ms fyVH:	51.1ms
k = 200 card:	103.0ms fiya:	127.6ms fyGL:	84.3ms fyVH:	54.8ms
k = 210 card:	108.8ms fiya:	134.6ms fyGL:	89.2ms fyVH:	58.8ms
k = 220 card:	114.5ms fiya:	141.6ms fyGL:	94.5ms fyVH:	62.8ms
k = 230 card:	120.7ms fiya:	148.5ms fyGL:	99.6ms fyVH:	66.8ms
k = 240 card:	126.3ms fiya:	155.1ms fyGL:	104.9ms fyVH:	71.2ms
k = 250 card:	136.2ms fiya:	162.1ms fyGL:	109.7ms fyVH:	76.2ms
k = 260 card:	138.4ms fiya:	169.6ms fyGL:	115.5ms fyVH:	62.4ms
k = 270 card:	143.9ms fiya:	158.5ms fyGL:	105.9ms fyVH:	65.2ms
k = 280 card:	150.3ms fiya:	165.4ms fyGL:	110.2ms fyVH:	68.3ms
k = 290 card:	156.3ms fiya:	171.0ms fyGL:	114.2ms fyVH:	74.5ms
k = 300 card:	163.6ms fiya:	178.2ms fyGL:	118.7ms fyVH:	74.4ms
k = 310 card:	168.6ms fiya:	182.7ms fyGL:	123.0ms fyVH:	77.4ms
k = 320 card:	174.6ms fiya:	190.3ms fyGL:	127.6ms fyVH:	80.6ms
k = 330 card:	180.7ms fiya:	196.9ms fyGL:	132.0ms fyVH:	83.8ms
k = 340 card:	187.0ms fiya:	203.4ms fyGL:	136.8ms fyVH:	87.1ms
k = 350 card:	192.7ms fiya:	208.8ms fyGL:	140.6ms fyVH:	90.4ms
k = 360 card:	199.1ms fiya:	216.1ms fyGL:	145.6ms fyVH:	94.3ms
k = 370 card:	205.4ms fiya:	221.8ms fyGL:	150.2ms fyVH:	97.6ms
k = 380 card:	211.4ms fiya:	229.2ms fyGL:	154.6ms fyVH:	101.4ms
k = 390 card:	217.8ms fiya:	234.6ms fyGL:	159.7ms fyVH:	105.3ms
k = 400 card:	223.6ms fiya:	241.4ms fyGL:	165.0ms fyVH:	108.6ms

("VH" for "vector hash")
The rough idea is to hash x to (x % size()) and, on collision put x into the next cell after (x % size()). This incurs some cost because if we already have something with a different hash in that cell, we'll need to shift and, if we're doing a lookup, we have to scan over items with smaller hash because the item we're looking for might have been shifted earlier. Anyways, there's a lot of details, but if we reserve enough space, the probability that high cost is incurred on random inputs is low and that's what we're seeing here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants