Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Add a steal_half method to sync::chase_lev #72

Closed
wants to merge 1 commit into from

Conversation

fitzgen
Copy link
Contributor

@fitzgen fitzgen commented May 4, 2016

Questions:

  • We could change the signature to make callers supply a vec, rather than
    allocate our own in steal_half. This could potentially be more efficient if
    the caller is repeatedly retrying.
  • Is this safe? I think so, but I'm not 100% sure. The paper mentions that it
    would be interesting to apply a steal-half operation to this deque, but they
    didn't do it. I couldn't find any other papers or libraries that implemented
    this operation either, although my google-fu may be weak in this regard. The
    most important part, as far as I can tell, is populating the results array
    before performing the CAS to ensure that we get the correct set of resulting
    items (after the CAS, the circular buffer could circle back over the slots
    we're stealing and overwrite them with new values). Everything seems to be
    pretty much the same as stealing one item, which leads to my next question.
  • Should we factor out the common parts between steal and steal_half?
    There's a ton of duplication here, as I've authored it.

If you think this is a valuable operation to have, then I can write some tests
and fix this up.

Questions:

* We could change the signature to make callers supply a vec, rather than
  allocate our own in `steal_half`. This could potentially be more efficient if
  the caller is repeatedly retrying.

* Is this safe? I think so, but I'm not 100% sure. The paper mentions that it
  would be interesting to apply a steal-half operation to this deque, but they
  didn't do it. I couldn't find any other papers or libraries that implemented
  this operation either, although my google-fu may be weak in this regard. The
  most important part, as far as I can tell, is populating the results array
  before performing the CAS to ensure that we get the correct set of resulting
  items (after the CAS, the circular buffer could circle back over the slots
  we're stealing and overwrite them with new values). Everything seems to be
  pretty much the same as stealing one item, which leads to my next question.

* Should we factor out the common parts between `steal` and `steal_half`?
  There's a ton of duplication here, as I've authored it.

If you think this is valuable, then I can write some tests and fix this up.
@fitzgen
Copy link
Contributor Author

fitzgen commented May 4, 2016

Maybe folks have tried this and it doesn't really give any better load balancing than stealing one item at a time does?

@schets
Copy link
Member

schets commented May 5, 2016

Maybe folks have tried this and it doesn't really give any better load balancing than stealing one item at a time does?

I can't find the paper, but steal_half performs far better in cases where jobs don't generate many children.

return StealHalf::Empty;
}

let half = size + 1 / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe (size + 1) / 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ;P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is indeed a bug. The stealer effectively tries to steal all the remaining elements, breaking the invariant for the popper that the last element will not be stolen. I think it should be let half = (size + 1) / 2;

@jeehoonkang
Copy link
Contributor

jeehoonkang commented Dec 30, 2016

The most important part, as far as I can tell, is populating the results array
before performing the CAS to ensure that we get the correct set of resulting
items (after the CAS, the circular buffer could circle back over the slots
we're stealing and overwrite them with new values).

I "guess" this would kill the performance for some degree. Maybe we can relax it by using two locations (just say bottom1 and bottom2) for the purposes of the present bottom variable: (1) signalling that the circular buffer is reclaimed and reusable, and (2) signalling that elements before bottom are already stolen by some threads. It's just a wild speculation: I would like to think on it more.

The implementation seems correct to my eyes, for the same reason that that of steal is correct. Though I am not 100% confident as I don't have a formal proof yet...

@fitzgen
Copy link
Contributor Author

fitzgen commented Dec 30, 2016

The implementation seems correct to my eyes, for the same reason that that of steal is correct. Though I am not 100% confident as I don't have a formal proof yet...

Have you seen Correct and Efficient Work-Stealing for Weak Memory Models? For someone who is familiar with this area of research (which I am definitely not, but I think you are ;)) I assume it probably shouldn't be too hard to extend the proofs for this case.

I really enjoyed the Promising paper, btw 👍

@jeehoonkang
Copy link
Contributor

Have you seen Correct and Efficient Work-Stealing for Weak Memory Models?

Yes I've read it, but the proof is just beyond my level and I couldn't grasp it.. Yet I understand the key idea, which is what the authors called "cumulativity", that implies it is impossible that concurrent invocations of steal and pop see old values of bottom and top at the same time. The exact same logic applies to steal_half AFAICT.

These days I am working on reproving (weak) Chase-Lev deque in a (hypothetical) program logic, and I will try to come up with a proof of steal_half.

I really enjoyed the Promising paper, btw 👍

I really thank you for reading the paper ;-) As soon as the slide for POPL is done, I will share it.

Copy link
Contributor

@jeehoonkang jeehoonkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the comment I left, I think some test code will be very helpful ;)

return StealHalf::Empty;
}

let half = size + 1 / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is indeed a bug. The stealer effectively tries to steal all the remaining elements, breaking the invariant for the popper that the last element will not be stolen. I think it should be let half = (size + 1) / 2;

@ghost
Copy link

ghost commented Jul 5, 2017

Do we have any updates on this? I think Rayon might benefit from a steal_half method.

I did some googling, but couldn't find any papers about steal-half operation in Chase-Lev deques. Papers that do talk about steal-half usually have the following structure: they mention Chase-Lev as previous work, state that steal-half is a difficult operation to implement, and then present a totally new deque design that supports steal-half.

While papers never say that steal-half cannot be supported in Chase-Lev deques, the fact that apparently noone did it makes me think it's definitely non-trivial.

That said, we could still add a steal-half method to our Chase-Lev implementation that simply repeatedly calls steal(). Note that every call to steal() pins the current thread, so we could actually do steal-half by enclosing all those steals with a single call to pin(). That could be much faster than naively calling steal() multiple times in a row.

@fitzgen
Copy link
Contributor Author

fitzgen commented Jul 5, 2017

Please steal this PR from my work queue :)

@jeehoonkang
Copy link
Contributor

@stjepang Would you please give me the references to the papers that says it's hard to support steal_half() in Chase-Lev? Because still, @fitzgen's implementation looks correct to my eyes (albeit the comment I left).

@ghost
Copy link

ghost commented Jul 6, 2017

@jeehoonkang

The Chase-Lev paper simply says at the end:

"It may be interesting to see how our techniques are applied to other schemes that improve on ABP-work stealing such as the locality-guided work-stealing of Acar, Blelloch and Blumofe [1] or the steal-half algorithm of Hendler and Shavit [6]."

There is also a paper from 2013 that mentions Chase-Lev:

"Several years later the nonblocking algorithm was extended by Chase and Lev to support dynamic resizing [10]. Their first and, to our knowledge, only proof of correctness of a nonblocking work-stealing algorithm is not trivial: it spans over thirty pages [11]. Moreover, there is, in the literature, no nonblocking algorithm which combines resizeability with other extensions, such as steal half, possibly owing to the complexity involved in extending the proof of correctness."

@jeehoonkang
Copy link
Contributor

@stjepang The paper @fitzgen mentioned is also published in PPoPP 2013, and this contains a proof of an implementation of Chase-Lev in C/C++11. I am quite confident that the proof scales to steal_half. Though I agree that this issue can only be answered decisively with a rigorous mathematical proof. (The decades of research prove that "confident" means nothing in concurrency.)

AFAICT, Chase-Lev deque is difficult to formally reason about due to two reasons: (i) use of relaxed atomics, and (ii) use of SC fences. I am interested in both difficulties, and these issues make Chase-Lev very interesting to me. I will dig further into it..

For the time being, I prefer merging it (after fixing the bug I mentioned in the comment), but I agree not to do so if anyone objects to do that.

@ghost
Copy link

ghost commented Jul 18, 2017

That's great then! :) No objections to merging from my side either.

Two things we should probably do that'd increase our confidence of correctness:

  1. Reaching out to the authors of "Correct and Efficient Work-Stealing for Weak Memory Models" and asking about the steal-half operation.
  2. Stress testing on a weakly ordered architecture.

@jeehoonkang
Copy link
Contributor

Closing it for the same reason with #148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants