hsel improvements #3

igel-kun · 2024-04-30T10:12:11Z

This PR improves random_hsel by

making the hash-map global to avoid paying construction+destruction each function call
replacing the hash-map with the result array in the lower (size-k) part of the virtual [0,n-1]-array

This makes random_hsel run faster than cardchoose for k > 60 on my machine.

1. 'options'-map made global since clearing-cost < initialization-cost 2. represented the first k spots in the virtual [0,n-1]-array by the 'result'-array and only the rest with the 'options'-hashmap. This allows us to replace one of the two hashmap-lookups with an array-lookup (much faster)

ciphergoth · 2024-06-10T00:51:24Z

I don't think I can accept change 1 because that makes the code not thread safe. I'm trying to understand your change 2 now, but I notice that my current code can't possibly be correct, since the parameter to randbelow is n every time.

ciphergoth · 2024-06-10T01:10:15Z

OK I understand your change and I think it's good. It would improve readability for me if in each "if/else", you make the "if" branch shorter than the "else" branch where possible, by inverting the boolean if necessary.

A couple of other coding style things for this repo:

space after if
always use braces, even for one-liners

There's a .clang-format file in the cpp directory that should be applied.

Thank you!

igel-kun · 2024-06-10T12:51:55Z

Thanks for considering my PR :)
I've now implemented the style changes. However, the avoidance of the construction and destruction of the hash-map is the main part of the change. It's not going to beat cardchoose for reasonable numbers of k and n if we insist on measuring memory allocations instead of productive work.

The standard suggestion to make it thread-safe would be to wrap it in a functor class (with operator()) that keeps around its own hashmap and reuses it. This however would entail a few changes to the test environment (a functor object would have to be created in the beginning of a test-run and that functor object would be passed instead of the function pointer).

What do you think about that?

ciphergoth · 2024-06-11T01:26:58Z

Weird - how come the allocation is so expensive compared to everything else?

igel-kun · 2024-06-11T08:40:21Z

Hmm, I can only speculate about the internals of std::unordered_map but I suppose that it's re-hashing alot for the lower sizes since it can't know how many items there are still to come and it wants to keep its load-factor within a constant throughout. Each time a re-hash takes place, all buckets are re-distributed causing a ton of re-allocations.

Now that I think about it, unordered_map does have a reserve() method that works similarly to vector::reserve() and a quick test yields promising results, even if we don't reuse the unordered_map. I'll update the PR :)

EDIT: wait, I just saw that the hashmap was already constructed with 2k buckets, but it's faster with 1.45k buckets for some reason oO I'll have to investigate a little more...
EDIT: these are the current test results for cardchoose ("card") vs. hsel (construction with 1.45k + destruction) vs. hsel ("global" hash-map with clear()). The latter 2 are pretty much en par with each other until k=90 (implying 90*1.45=130 buckets). I have no idea yet why....

k = 30	card:	10.9ms	fiya:	13.4ms	fyGL:	12.2ms
k = 35	card:	13.2ms	fiya:	15.1ms	fyGL:	14.4ms
k = 40	card:	15.5ms	fiya:	17.1ms	fyGL:	16.8ms
k = 45	card:	18.1ms	fiya:	19.3ms	fyGL:	19.3ms
k = 50	card:	20.5ms	fiya:	21.7ms	fyGL:	21.8ms
k = 55	card:	23.0ms	fiya:	23.2ms	fyGL:	24.4ms
k = 60	card:	25.5ms	fiya:	25.7ms	fyGL:	23.9ms
k = 65	card:	28.0ms	fiya:	27.3ms	fyGL:	25.9ms
k = 70	card:	30.6ms	fiya:	30.3ms	fyGL:	28.8ms
k = 75	card:	34.0ms	fiya:	31.7ms	fyGL:	30.5ms
k = 80	card:	35.9ms	fiya:	33.1ms	fyGL:	32.7ms
k = 85	card:	38.4ms	fiya:	35.9ms	fyGL:	35.2ms
k = 90	card:	40.9ms	fiya:	58.5ms	fyGL:	37.6ms
k = 95	card:	43.7ms	fiya:	61.9ms	fyGL:	40.0ms

But it's not only the initial bucket count that influences performance. I've modified the non-global hsel to store the last bucket count to initialize the next iteration with, in order to match the buckets in the global variant. This "fixes" the performance issue for k<130, but there is another dent in the curve at k=130 that I cannot explain.

k = 100fiya:	 #buck: 127	43.2ms	fyGL:	 #buck: 127	42.3ms
k = 110fiya:	 #buck: 127	48.0ms	fyGL:	 #buck: 127	47.4ms
k = 120fiya:	 #buck: 127	53.1ms	fyGL:	 #buck: 127	52.7ms
k = 130fiya:	 #buck: 256	77.5ms	fyGL:	 #buck: 257	51.3ms
k = 140fiya:	 #buck: 257	86.0ms	fyGL:	 #buck: 257	55.8ms
k = 150fiya:	 #buck: 257	92.1ms	fyGL:	 #buck: 257	60.2ms
[...]
k = 250fiya:	 #buck: 257	161.1msfyGL:	 #buck: 257	109.6ms
k = 260fiya:	 #buck: 418	158.2msfyGL:	 #buck: 383	108.6ms
k = 270fiya:	 #buck: 540	156.3msfyGL:	 #buck: 540	105.0ms
k = 280fiya:	 #buck: 541	162.2msfyGL:	 #buck: 541	108.9ms
k = 290fiya:	 #buck: 541	168.4msfyGL:	 #buck: 541	114.1ms
k = 300fiya:	 #buck: 541	174.4msfyGL:	 #buck: 541	118.4ms

(here "#buck" is the average bucket count after an iteration).
Also, I cannot explain, why the the average bucket counts for k=260 differ oO.

igel-kun · 2024-06-11T18:35:57Z

Oh and I wrote a vector-based hash-set that can be abused as hash-map, and hsel with that one blows everything out of the water (around 50% faster than hsel with global-state unordered_map):

k = 100 card:	47.4ms fiya:	43.1ms fyGL:	42.8ms fyVH:	27.9ms
k = 110 card:	52.1ms fiya:	48.0ms fyGL:	47.9ms fyVH:	31.9ms
k = 120 card:	57.6ms fiya:	53.2ms fyGL:	52.8ms fyVH:	36.2ms
k = 130 card:	63.1ms fiya:	78.5ms fyGL:	51.8ms fyVH:	31.8ms
k = 140 card:	68.7ms fiya:	84.8ms fyGL:	56.9ms fyVH:	34.8ms
k = 150 card:	74.5ms fiya:	92.9ms fyGL:	61.7ms fyVH:	37.4ms
k = 160 card:	80.0ms fiya:	98.7ms fyGL:	65.9ms fyVH:	40.6ms
k = 170 card:	85.6ms fiya:	107.7ms fyGL:	72.1ms fyVH:	44.4ms
k = 180 card:	92.1ms fiya:	113.6ms fyGL:	74.3ms fyVH:	47.5ms
k = 190 card:	97.8ms fiya:	122.0ms fyGL:	79.8ms fyVH:	51.1ms
k = 200 card:	103.0ms fiya:	127.6ms fyGL:	84.3ms fyVH:	54.8ms
k = 210 card:	108.8ms fiya:	134.6ms fyGL:	89.2ms fyVH:	58.8ms
k = 220 card:	114.5ms fiya:	141.6ms fyGL:	94.5ms fyVH:	62.8ms
k = 230 card:	120.7ms fiya:	148.5ms fyGL:	99.6ms fyVH:	66.8ms
k = 240 card:	126.3ms fiya:	155.1ms fyGL:	104.9ms fyVH:	71.2ms
k = 250 card:	136.2ms fiya:	162.1ms fyGL:	109.7ms fyVH:	76.2ms
k = 260 card:	138.4ms fiya:	169.6ms fyGL:	115.5ms fyVH:	62.4ms
k = 270 card:	143.9ms fiya:	158.5ms fyGL:	105.9ms fyVH:	65.2ms
k = 280 card:	150.3ms fiya:	165.4ms fyGL:	110.2ms fyVH:	68.3ms
k = 290 card:	156.3ms fiya:	171.0ms fyGL:	114.2ms fyVH:	74.5ms
k = 300 card:	163.6ms fiya:	178.2ms fyGL:	118.7ms fyVH:	74.4ms
k = 310 card:	168.6ms fiya:	182.7ms fyGL:	123.0ms fyVH:	77.4ms
k = 320 card:	174.6ms fiya:	190.3ms fyGL:	127.6ms fyVH:	80.6ms
k = 330 card:	180.7ms fiya:	196.9ms fyGL:	132.0ms fyVH:	83.8ms
k = 340 card:	187.0ms fiya:	203.4ms fyGL:	136.8ms fyVH:	87.1ms
k = 350 card:	192.7ms fiya:	208.8ms fyGL:	140.6ms fyVH:	90.4ms
k = 360 card:	199.1ms fiya:	216.1ms fyGL:	145.6ms fyVH:	94.3ms
k = 370 card:	205.4ms fiya:	221.8ms fyGL:	150.2ms fyVH:	97.6ms
k = 380 card:	211.4ms fiya:	229.2ms fyGL:	154.6ms fyVH:	101.4ms
k = 390 card:	217.8ms fiya:	234.6ms fyGL:	159.7ms fyVH:	105.3ms
k = 400 card:	223.6ms fiya:	241.4ms fyGL:	165.0ms fyVH:	108.6ms

("VH" for "vector hash")
The rough idea is to hash x to (x % size()) and, on collision put x into the next cell after (x % size()). This incurs some cost because if we already have something with a different hash in that cell, we'll need to shift and, if we're doing a lookup, we have to scan over items with smaller hash because the item we're looking for might have been shifted earlier. Anyways, there's a lot of details, but if we reserve enough space, the probability that high cost is incurred on random inputs is low and that's what we're seeing here :)

igel-kun added 2 commits April 30, 2024 11:57

required C++17 for structured bindings

2d032fb

igel-kun mentioned this pull request Apr 30, 2024

"every possible result is equally likely" #2

Closed

style

4c6a55e

using unordered_map::reserve() to avoid repeated allocations

7071be9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hsel improvements #3

hsel improvements #3

Uh oh!

igel-kun commented Apr 30, 2024

Uh oh!

ciphergoth commented Jun 10, 2024

Uh oh!

ciphergoth commented Jun 10, 2024

Uh oh!

igel-kun commented Jun 10, 2024

Uh oh!

ciphergoth commented Jun 11, 2024

Uh oh!

igel-kun commented Jun 11, 2024 •

edited

Loading

Uh oh!

igel-kun commented Jun 11, 2024 •

edited

Loading

Uh oh!

Uh oh!

hsel improvements #3

Are you sure you want to change the base?

hsel improvements #3

Uh oh!

Conversation

igel-kun commented Apr 30, 2024

Uh oh!

ciphergoth commented Jun 10, 2024

Uh oh!

ciphergoth commented Jun 10, 2024

Uh oh!

igel-kun commented Jun 10, 2024

Uh oh!

ciphergoth commented Jun 11, 2024

Uh oh!

igel-kun commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igel-kun commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

igel-kun commented Jun 11, 2024 •

edited

Loading

igel-kun commented Jun 11, 2024 •

edited

Loading