-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Add a steal_half method to sync::chase_lev #72
Conversation
Questions: * We could change the signature to make callers supply a vec, rather than allocate our own in `steal_half`. This could potentially be more efficient if the caller is repeatedly retrying. * Is this safe? I think so, but I'm not 100% sure. The paper mentions that it would be interesting to apply a steal-half operation to this deque, but they didn't do it. I couldn't find any other papers or libraries that implemented this operation either, although my google-fu may be weak in this regard. The most important part, as far as I can tell, is populating the results array before performing the CAS to ensure that we get the correct set of resulting items (after the CAS, the circular buffer could circle back over the slots we're stealing and overwrite them with new values). Everything seems to be pretty much the same as stealing one item, which leads to my next question. * Should we factor out the common parts between `steal` and `steal_half`? There's a ton of duplication here, as I've authored it. If you think this is valuable, then I can write some tests and fix this up.
Maybe folks have tried this and it doesn't really give any better load balancing than stealing one item at a time does? |
I can't find the paper, but steal_half performs far better in cases where jobs don't generate many children. |
return StealHalf::Empty; | ||
} | ||
|
||
let half = size + 1 / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe (size + 1) / 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ;P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is indeed a bug. The stealer effectively tries to steal all the remaining elements, breaking the invariant for the popper that the last element will not be stolen. I think it should be let half = (size + 1) / 2;
I "guess" this would kill the performance for some degree. Maybe we can relax it by using two locations (just say The implementation seems correct to my eyes, for the same reason that that of |
Have you seen Correct and Efficient Work-Stealing for Weak Memory Models? For someone who is familiar with this area of research (which I am definitely not, but I think you are ;)) I assume it probably shouldn't be too hard to extend the proofs for this case. I really enjoyed the Promising paper, btw 👍 |
Yes I've read it, but the proof is just beyond my level and I couldn't grasp it.. Yet I understand the key idea, which is what the authors called "cumulativity", that implies it is impossible that concurrent invocations of These days I am working on reproving (weak) Chase-Lev deque in a (hypothetical) program logic, and I will try to come up with a proof of
I really thank you for reading the paper ;-) As soon as the slide for POPL is done, I will share it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the comment I left, I think some test code will be very helpful ;)
return StealHalf::Empty; | ||
} | ||
|
||
let half = size + 1 / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is indeed a bug. The stealer effectively tries to steal all the remaining elements, breaking the invariant for the popper that the last element will not be stolen. I think it should be let half = (size + 1) / 2;
Do we have any updates on this? I think Rayon might benefit from a I did some googling, but couldn't find any papers about steal-half operation in Chase-Lev deques. Papers that do talk about steal-half usually have the following structure: they mention Chase-Lev as previous work, state that steal-half is a difficult operation to implement, and then present a totally new deque design that supports steal-half. While papers never say that steal-half cannot be supported in Chase-Lev deques, the fact that apparently noone did it makes me think it's definitely non-trivial. That said, we could still add a steal-half method to our Chase-Lev implementation that simply repeatedly calls |
Please steal this PR from my work queue :) |
The Chase-Lev paper simply says at the end: "It may be interesting to see how our techniques are applied to other schemes that improve on ABP-work stealing such as the locality-guided work-stealing of Acar, Blelloch and Blumofe [1] or the steal-half algorithm of Hendler and Shavit [6]." There is also a paper from 2013 that mentions Chase-Lev: "Several years later the nonblocking algorithm was extended by Chase and Lev to support dynamic resizing [10]. Their first and, to our knowledge, only proof of correctness of a nonblocking work-stealing algorithm is not trivial: it spans over thirty pages [11]. Moreover, there is, in the literature, no nonblocking algorithm which combines resizeability with other extensions, such as steal half, possibly owing to the complexity involved in extending the proof of correctness." |
@stjepang The paper @fitzgen mentioned is also published in PPoPP 2013, and this contains a proof of an implementation of Chase-Lev in C/C++11. I am quite confident that the proof scales to AFAICT, Chase-Lev deque is difficult to formally reason about due to two reasons: (i) use of relaxed atomics, and (ii) use of SC fences. I am interested in both difficulties, and these issues make Chase-Lev very interesting to me. I will dig further into it.. For the time being, I prefer merging it (after fixing the bug I mentioned in the comment), but I agree not to do so if anyone objects to do that. |
That's great then! :) No objections to merging from my side either. Two things we should probably do that'd increase our confidence of correctness:
|
Closing it for the same reason with #148 |
Questions:
allocate our own in
steal_half
. This could potentially be more efficient ifthe caller is repeatedly retrying.
would be interesting to apply a steal-half operation to this deque, but they
didn't do it. I couldn't find any other papers or libraries that implemented
this operation either, although my google-fu may be weak in this regard. The
most important part, as far as I can tell, is populating the results array
before performing the CAS to ensure that we get the correct set of resulting
items (after the CAS, the circular buffer could circle back over the slots
we're stealing and overwrite them with new values). Everything seems to be
pretty much the same as stealing one item, which leads to my next question.
steal
andsteal_half
?There's a ton of duplication here, as I've authored it.
If you think this is a valuable operation to have, then I can write some tests
and fix this up.