Skip to content

Commit f9cbcc9

Browse files
committed
Wording
1 parent 46d99af commit f9cbcc9

File tree

1 file changed

+60
-59
lines changed

1 file changed

+60
-59
lines changed

text/2018-01-07-deque-proof.md

+60-59
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ Crossbeam, and its correctness proof. this RFC has a [companion PR to `crossbeam
77

88
# Motivation
99

10-
We believed that the current implementation of deque is not fully optimized yet, leaving
11-
non-negligible room for performance improvement. However, it was unclear how to further optimize the
12-
current implementation while guaranteeing its correctness. This RFC resolves this issue by proving
13-
an optimized implementation.
10+
The current implementation of deque is not fully optimized yet, leaving non-negligible room for
11+
performance improvement. However, it is unclear how to further optimize the current implementation
12+
while guaranteeing its correctness. This RFC resolves this issue by proving an optimized
13+
implementation.
1414

1515

1616

@@ -23,8 +23,8 @@ consistency model, and (2) it is functionally correct, i.e. it acts like a deque
2323
## Semantics of C/C++ relaxed-memory concurrency
2424

2525
For the semantics of C/C++ relaxed-memory concurrency on which the deque implementation is based, we
26-
use the state-of-art [promising semantics][promising]. For a gentle introduction to the semantics,
27-
we refer the reader to [this blog post][synch-patterns].
26+
use the state-of-the-art [promising semantics][promising]. For a gentle introduction to the
27+
semantics, we refer the reader to [this blog post][synch-patterns].
2828

2929
One caveat of the original promising semantics is its lack of support for memory allocation. In this
3030
RFC, we use the following semantics for memory allocation:
@@ -33,15 +33,15 @@ RFC, we use the following semantics for memory allocation:
3333
Q+`. Now we define `Timestamp = Option<Q+>`, where `None` means the location is unallocated.
3434
Suppose `None < Some(t)` for all `t`.
3535

36-
- At the beginning, the thread's view is `None` for all the locations. When a thread allocates a
36+
- At the beginning, a thread's view is `None` for all the locations. When a thread allocates a
3737
location, its view on the location becomes `0`.
3838

3939
- If a thread is reading a location and it's view on the location is `None`, i.e. it is unallocated
40-
in the thread's point of view, the behavior is undefined.
40+
in the thread's point of view, then the behavior is undefined.
4141

42-
We believe this definition is compatible with all the results presented in the [paper][promising] on
43-
the promising semantics. We hope [the Coq formalization][promising-coq] of promising semantics be
44-
updated accordingly soon.
42+
We believe this definition is compatible with all the results presented in the [the promising
43+
semantics paper][promising]. We hope [its Coq formalization][promising-coq] be updated accordingly
44+
soon.
4545

4646

4747
## Proposed implementation
@@ -68,9 +68,9 @@ Chase and Lev proposed [an efficient implementation of deque][chase-lev] on sequ
6868
memory model, and Lê et. al. proposed [its variant for ARM and C/C++ relaxed-memory
6969
concurrency][chase-lev-weak]. This RFC presents an even more optimized variant for C/C++.
7070

71-
A Chase-Lev deque consists of (1) the `bottom` index for the worker, (2) the `top` index for both
72-
the worker and the stealers, and (3) the underlying `buffer` that contains items. When the buffer is
73-
too small or too large, the worker may replace it with one with more appropriate size:
71+
A Chase-Lev deque consists of (1) the `bottom` index for the owner, (2) the `top` index for both the
72+
owner and the stealers, and (3) the underlying `buffer` that contains items. When the buffer is too
73+
small or too large, the owner may replace it with one with more appropriate size:
7474

7575
```rust
7676
struct<T: Send> Inner<T> {
@@ -231,14 +231,15 @@ thereby modifying `self.bottom` and `self.buffer`.
231231

232232
After `self.buffer` is initialized, it is modified only by the owner in `fn resize(): 'L307`. Since
233233
the contents inside a buffer is always accessed modulo the buffer's capacity (`'L109`, `'L211`,
234-
`'L409`) and the buffer's size is always nonzero, there is no buffer overrun.
234+
`'L409`) and the buffer's size is always nonzero, there are no buffer overruns.
235235

236236
Thus it remains to prove that the buffer is not used after freed. Thanks to Crossbeam, we don't need
237237
to take care of all the details of memory reclamation; [this RFC][rfc-relaxed-memory] says that as a
238238
user of Crossbeam, we only need to prove that:
239239

240-
- When a buffer is deferred to be dropped (`'L308`), there is no longer a reference to the buffer in
241-
the shared memory, and no concurrent mutator introduces the reference to the memory again.
240+
- When a buffer is deferred to be dropped (`'L308`), there are no longer references to the buffer in
241+
the shared memory, and no concurrent mutators introduce a reference to the buffer in the memory
242+
again.
242243

243244
You can easily check these properties because the owner basically "owns" the buffer: only the
244245
owner introduces/eliminates references to a buffer to/from the shared memory. Indeed, before a
@@ -881,33 +882,33 @@ thread calls `steal()` twice. Consider the following execution:
881882

882883
```rust
883884
// owner calls pop()
884-
'L01: read(bottom, 1, Relaxed)
885-
'L02: write(bottom, 0, Relaxed)
886-
'L03: fence(SeqCst)
887-
'L04: read(top, 1, Relaxed) // from `'L17`
888-
'L05: write(bottom, 1, Relaxed)
885+
'L201: read(bottom, 1, Relaxed)
886+
'L202: write(bottom, 0, Relaxed)
887+
'L203: fence(SeqCst)
888+
'L204: read(top, 1, Relaxed) // value from `'L410`
889+
'L207: write(bottom, 1, Relaxed)
889890
// return `Empty`
890891

891892
||
892893

893894
// stealer calls steal()
894-
'L11: read(top, 0, Relaxed)
895-
'L12: fence(SeqCst)
896-
'L13: read(bottom, 0, Acquire) // from `'L02`
895+
'L401: read(top, 0, Relaxed)
896+
'L402: fence(SeqCst)
897+
'L404: read(bottom, 0, Acquire) // value from `'L202`
897898
// return `Empty`
898899

899900
// stealer calls steal()
900-
'L14: read(top, 0, Relaxed)
901-
'L15: fence(SeqCst)
902-
'L16: read(bottom, 1, Acquire) // from `'L05`
903-
'L17: CAS(top, 0, 1, Release) // success
901+
'L401: read(top, 0, Relaxed)
902+
'L402: fence(SeqCst)
903+
'L404: read(bottom, 1, Acquire) // value from `'L207`
904+
'L410: CAS(top, 0, 1, Release) // success
904905
// return `42`
905906
```
906907

907908
While this execution is allowed in C11, it is called out-of-thin-air because there is a genuine
908-
dependency cycle among `'L04`, `'L05`, `'L16`, and `'L17`. There is a [proposal to ban
909-
out-of-thin-air behaviors][n3710] in the C11 standards, but the proposal only vaguely described what
910-
does "out-of-thin-air" mean.
909+
dependency cycle among `'L204`, `'L207`, `'L404` in the second call to `steal()`, and `'L410`. There
910+
is a [proposal to ban out-of-thin-air behaviors][n3710] in the C11 standards, but the proposal only
911+
vaguely described what does "out-of-thin-air" mean.
911912

912913
You can roughly regard the promising semantics as a precise definition of out-of-thin-air
913914
behaviors. Indeed, the promising semantics forbids this execution. See the proof of
@@ -925,55 +926,55 @@ following execution is possible, but is not linearizable:
925926

926927
```rust
927928
// owner calls pop()
928-
'L01: read(bottom, 1, Relaxed)
929-
'L02: write(bottom, 0, Relaxed)
930-
'L03: fence(SeqCst)
931-
'L04: read(top, 0, Relaxed)
932-
'L05: CAS(top, 0, 1, Release) // fail; read from from `'L17`
933-
'L06: write(bottom, 1, Relaxed)
929+
'L201: read(bottom, 1, Relaxed)
930+
'L202: write(bottom, 0, Relaxed)
931+
'L203: fence(SeqCst)
932+
'L204: read(top, 0, Relaxed)
933+
'L213: CAS(top, 0, 1, Release) // fail; value from `'L410`
934+
'L216: write(bottom, 1, Relaxed)
934935
// return `Empty`
935936

936937
||
937938

938939
// stealer calls steal()
939-
'L11: read(top, 0, Relaxed)
940-
'L12: fence(SeqCst)
941-
'L13: read(bottom, 0, Relaxed) // from `'L02`
940+
'L401: read(top, 0, Relaxed)
941+
'L402: fence(SeqCst)
942+
'L404: read(bottom, 0, Relaxed) // value from `'L202`
942943
// return `Empty`
943944

944945
// stealer calls steal()
945-
'L14: read(top, 0, Relaxed)
946-
'L15: fence(SeqCst)
947-
'L16: read(bottom, 1, Relaxed) // from `'L06`
948-
'L17: CAS(top, 0, 1, Release) // success
946+
'L401: read(top, 0, Relaxed)
947+
'L402: fence(SeqCst)
948+
'L404: read(bottom, 1, Relaxed) // value from `'L216`
949+
'L410: CAS(top, 0, 1, Release) // success
949950
// return `42`
950951
```
951952

952-
Note that this example is similar to the one above, but there is no dependency from `'L05` to
953-
`'L06`. So in order to forbid this execution, we have to ensure the ordering of `'L05` and `'L06`,
954-
e.g. by either (1) release-write at `'L216`, (2) requiring seqcst ordering for the success case for
955-
the CAS at `'L213` (as done in [this paper][chase-lev-weak]), or (3) requiring acquire ordering for
956-
the failure case for the CAS at `'L213`. We chose (3), because we believe it incurs the least
957-
performance overhead.
953+
Note that this example is similar to the one above, but there is no dependency from `'L213` to
954+
`'L216`. So in order to forbid this execution, we have to ensure the ordering of `'L213` and
955+
`'L216`, e.g. by either (1) release-write at `'L216`, (2) requiring seqcst ordering for the success
956+
case for the CAS at `'L213` (as done in [this paper][chase-lev-weak]), or (3) requiring acquire
957+
ordering for the failure case for the CAS at `'L213`, or (4) issuing an acquire fence after the CAS
958+
at `'L213` fails. We chose (3) in this RFC, because we believe it incurs the least performance
959+
overhead.
958960

959961
Even though release/acquire orderings are enough for the success/failure cases of the CAS at
960962
`'L213`, C11 (and effectively also LLVM and Rust) requires that "The failure argument shall be no
961-
stronger than the success argument." ([C11][c11] 7.17.7.4 paragraph 2). So instead, we used
962-
release/relaxed for success/failure cases, and issued an acquire fence for the failure case in the
963-
implementation. This C11 requirement may be fail-safe for most use cases, but can actually be
964-
slightly inefficient in this case.
963+
stronger than the success argument." ([C11][c11] 7.17.7.4 paragraph 2). So instead, we chose (4) in
964+
the companion implementation. This C11 requirement may be fail-safe for most use cases, but can
965+
actually be slightly inefficient in this case.
965966

966967
It is worth nothing that the CAS at `'L213` should be strong. Otherwise, a similar execution to the
967-
one above is possible, where the CAS at `'L05` reads `top = 0` and then spuriously fails.
968+
one above is possible, where the CAS at `'L213` reads `top = 0` and then spuriously fails.
968969

969970

970971
## Comparison to Target-dependent Implementations
971972

972973
Alternatively, we can write a deque for each target architecture in order to achieve better
973-
performance. For example, [this paper][deque-bounded-tso] presents a variant of Chase-Lev deque in
974-
the "bounded TSO" x86 model, where you don't need to issue the expensive `MFENCE` barrier (think:
974+
performance. For example, [this paper][deque-bounded-tso] presents a variant of various deques in
975+
the "bounded TSO" x86 model, where you don't need to issue the expensive `mfence` barrier (think:
975976
seqcst-fence) in `pop()`. Also, [this paper][chase-lev-weak] presents a version of Chase-Lev deque
976-
for ARMv7 that doesn't issue an `isync`-like fence, while the proposed implementation issues
977+
for ARMv7 that doesn't issue `isync`-like fences, while the proposed implementation issues
977978
some. Probably `Consume` is relevant for the latter case. These further optimizations are left as
978979
future work.
979980

0 commit comments

Comments
 (0)