-
Notifications
You must be signed in to change notification settings - Fork 6
Conversation
src/lib.rs
Outdated
// argument." | ||
if self.inner | ||
.top | ||
.compare_exchange(t, t.wrapping_add(1), Ordering::AcqRel, Ordering::Acquire) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should change the orderings to compare_exchange(t, t.wrapping_add(1), Ordering::Release, Ordering::Relaxed)
and add a fence(Ordering::Acquire)
in the error path.
An acquire fence is generally slower than an acquire load (I know this is the case for ARM64 and PPC at least), however this is in the error path which is only hit in the rare case of a race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I changed it. Thanks!
The benchmark result of
|
Oh, it's obvious that the performance doesn't improve much: different orderings doesn't matter in x86 compilation results. In particular, CAS(SeqCst, Relaxed) = CAS(Release, Relaxed) when compiled down to x86. The only possible difference is replacing a release-swap with a release-store (https://github.com/crossbeam-rs/crossbeam-deque/pull/2/files#diff-b4aea3e418ccdb71239b96952d9cddb6R222), but it happens rarely. In order to assess the performance improvement, maybe we need to go to POWER or ARM.. |
44d755b
to
df1ac52
Compare
What should we do with this PR? Resolve conflicts and resubmit in the main repo? Close? |
The correctness of this PR is explained in this RFC PR.