Skip to content

Commit 187e3e0

Browse files
committed
Discuss an ARM implementation
1 parent 5f9d500 commit 187e3e0

File tree

1 file changed

+27
-6
lines changed

1 file changed

+27
-6
lines changed

text/2018-01-07-deque-proof.md

+27-6
Original file line numberDiff line numberDiff line change
@@ -969,12 +969,31 @@ one above is possible, where the CAS at `'L213` reads `top = 0` and then spuriou
969969
## Comparison to Target-dependent Implementations
970970

971971
Alternatively, we can write a deque for each target architecture in order to achieve better
972-
performance. For example, [this paper][deque-bounded-tso] presents a variant of various deques in
973-
the "bounded TSO" x86 model, where you don't need to issue the expensive `mfence` barrier (think:
974-
seqcst-fence) in `pop()`. Also, [this paper][chase-lev-weak] presents a version of Chase-Lev deque
975-
for ARMv7 that doesn't issue `isync`-like fences, while the proposed implementation issues
976-
some. Probably `Consume` is relevant for the latter case. These further optimizations are left as
977-
future work.
972+
performance.
973+
974+
We believe the proposed implementation is the most efficient in the x86-TSO model. Though [this
975+
paper][deque-bounded-tso] presents a variant of various deques in the "bounded x86-TSO" model, where
976+
you don't need to issue the expensive `mfence` barrier (think: seqcst-fence) in `pop()`.
977+
978+
For ARM/POWER, you can further optimize the compilation result of the proposed implementation as
979+
follows:
980+
981+
- `'L102` can be just plain load: `'L109` is the only synchronization target, and they have RW ctrl
982+
dependency.
983+
984+
- `'L408` can be just plain load: `'L409` is the only synchronization target, and they have RR addr
985+
dependency. In an ideal world, this synchronizing dependency should be expressible in C11 using
986+
the `Consume` ordering.
987+
988+
- `'L404` can be just plain load, but `isync/isb` should be inserted right before `'L408`: `'L408`'s
989+
read, `'L409`'s read, `'L410`'s read/write, and the end view of `steal()` in the successful case
990+
are the synchronization targets, and they have RR/RW ctrl+`isync/isb` dependency.
991+
992+
We believe [this paper][chase-lev-weak] has a bug in their ARMv7 implementation of Chase-Lev
993+
deque. Roughly speaking, they used a plain load for `'L404`, and put ctrl+`isync/isb` right after
994+
`'L409`. But in that case, the reads at `'L408` and `'L409` can be reordered before `'L404`. See
995+
the [this tutorial][arm-power] §4.2 on [the MP+dmb+ctrl litmus test][mp+dmb+ctrl] for more
996+
details.
978997

979998

980999

@@ -992,3 +1011,5 @@ future work.
9921011
[cppatomic]: http://en.cppreference.com/w/cpp/atomic/atomic
9931012
[n3710]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html
9941013
[c11]: www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
1014+
[mp+dmb+ctrl]: https://www.cl.cam.ac.uk/~pes20/arm-supplemental/arm033.html
1015+
[arm-power]: https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf

0 commit comments

Comments
 (0)