Skip to content

Commit 6398709

Browse files
committed
WIP
1 parent ecf3a2f commit 6398709

File tree

1 file changed

+80
-70
lines changed

1 file changed

+80
-70
lines changed

text/2018-01-07-deque-proof.md

+80-70
Original file line numberDiff line numberDiff line change
@@ -363,7 +363,7 @@ the linearization order as follows:
363363

364364
### Auxiliary Lemma
365365

366-
- > Let `WF_i` and `WL_i` be `O_i`'s first and last write to `bottom`.
366+
Let `WF_i` and `WL_i` be `O_i`'s first and last write to `bottom`.
367367

368368
- > (VIEW-OWNER): For arbitrary `i`, since `O_(i-1)` wrote `WF_(i-1), WL_(i-1)` and `O_i` wrote
369369
`WF_i, WL_i`, we have `Timestamp(WL_(i-1)) <= view_beginning(O_i)[bottom] <
@@ -397,11 +397,12 @@ the linearization order as follows:
397397
In order for `S` to read `bottom = x` at `'L404` and `O_i` to read `top >= x` at `'L204` at the
398398
same time, either `S`'s write to `top` at `'L410` should be promised before reading `bottom` at
399399
`'L404` or `O_i`'s write to `bottom` at `'L207` should be promised before reading `top` at
400-
`'L204`. But this is impossible: `S`'s write to `top` at `'L410` is a release-store; and in
401-
order for `O_i` to promise to write `bottom = x` at `'L207`, `O_i` should read `top = x-1` at
402-
`'L204` and then execute the CAS at `'L213` in certification, but (1) it is impossible to
403-
succeed the CAS in arbitrary future memories, and (2) after failing the CAS and acquiring
404-
`top >= x`, it is impossible to fulfill the promise in arbitrary future memories.
400+
`'L204`. But this is impossible. For the former, `S`'s write to `top` at `'L410` is a
401+
release-store. For the latter, in order for `O_i` to promise to write `bottom = x` at `'L207`,
402+
`O_i` should read `top = x-1` at `'L204` and then execute the CAS at `'L213` in
403+
certification. But a certain future memory mandates the CAS to fail, acquiring an arbitrarily
404+
high view, after which it is impossible to fulfill the promise. In short, there is a semantic
405+
dependency from `O_i`'s read from `top` at `'L204` to `O_i`'s write to `bottom` at `'L207`.
405406

406407
+ Case `O_i` executes the CAS at `'L213`.
407408

@@ -424,12 +425,12 @@ For `(VIEW)`, it is sufficient to prove that:
424425

425426
(Note that what would be `(VIEW-OWNER-OWNER)` is obvious.)
426427

427-
#### Proof of `(VIEW-OWNER-STEAL)`.
428+
#### Proof of `(VIEW-OWNER-STEAL)`
428429

429430
By `(VIEW-OWNER)` and `(VIEW-STEAL)`, we have `view_beginning(O_i)[bottom] < Timestamp(WF_i) <=
430431
Timestamp(WF_j) <= view_end(S)[bottom]`. Thus it is not the case that `S -view-> O_i`.
431432

432-
#### Proof of `(VIEW-STEAL-OWNER)`.
433+
#### Proof of `(VIEW-STEAL-OWNER)`
433434

434435
By `(VIEW-OWNER)` and `(VIEW-STEAL)`, we have `view_beginning(S)[bottom] <= Timestamp(WL_(i+1)) <=
435436
Timestamp(WL_j) <= view_end(O_j)[bottom]`. If `view_beginning(S)[bottom] < view_end(O_j)[bottom]`,
@@ -442,7 +443,7 @@ is `pop()` taking the irregular path. Let `x` be the value `O_(i+1)` read from `
442443
`O_(i+1)` read a value `>= x` from `top`. Thus we have `view_beginning(S)[top]` < [the timestamp of
443444
`top = x`] <= `view_end(O_(i+1))[top]`, and it is not the case that `O_(i+1) -view-> S`.
444445

445-
#### Proof of `(VIEW-STEAL-INTER-GROUP)`.
446+
#### Proof of `(VIEW-STEAL-INTER-GROUP)`
446447

447448
By `(VIEW-OWNER)` and `(VIEW-STEAL)`, we have `view_beginning(S_i)[bottom] <= Timestamp(WL_(i+1))`
448449
and `Timestamp(WF_j) <= view_end(S_j)[bottom]`. Thus if either `i+1 < j`,
@@ -456,7 +457,7 @@ Now suppose otherwise. Then `j = i+1`, `O_(i+1)` is `pop()` taking the irregular
456457
< Timestamp(top, y) <= Timestamp(top, x-1) <= view_end(S_j)[top]`, and it is not the case that `S_j
457458
-view-> S_i`.
458459

459-
#### Proof of `(VIEW-STEAL-INTRA-GROUP)`.
460+
#### Proof of `(VIEW-STEAL-INTRA-GROUP)`
460461

461462
- Case 1: `S, S' ∈ STEAL`.
462463

@@ -485,14 +486,19 @@ Let `I_0`, ..., `I_(n-1)` be the invocations sorted according to the constructed
485486
order. In addition to `(SEQ)` and `(SYNC)`, we will simultaneously prove the following conditions
486487
with the same existentially quantified values of `t_i, b_i, A_i`:
487488

488-
> `(CONTENT)`: for all `i` and `x ∈ [t_i, b_i)`, there exists a `push()` invocation into `x` in
489-
> `I_0, ..., I_(i-1)`; and `A_i[x]` is the value inserted by the last such invocation.
489+
> `(BOTTOM)`: If `I_i` is an owner invocation, `b_i` equals to the value `I_i` read from `bottom` at
490+
> `'L101` or `'L201`.
491+
>
492+
> `(TOP)`: `t_i` equals to the value of the last write to `top` in `I_0`, ..., `I_(i-1)`. Also, for
493+
> each `x ∈ [0,t_i)`, there is exactly one invocation in `I_0`, ..., `I_(i-1)` that performs a CAS
494+
> that updates `top = x` to `x+1`.
490495
>
491-
> `(TOP)`: `t_i` equals to the value of the last write to `top` in `I_0`, ..., `I_(i-1)`.
496+
> `(CONTENTS)`: for all `i` and `x ∈ [t_i, b_i)`, there exists a `push()` invocation into `x` in
497+
> `I_0, ..., I_(i-1)`; and `A_i[x]` is the value inserted by the last such invocation.
492498
493-
We prove that `{I_i}` satisfies `(SEQ)`, `(SYNC)`, `(CONTENT)`, and `(TOP)` by induction on `n`:
494-
suppose `I_0`, ..., `I_(i-1)` satisfies those conditions, and let's prove that `I_i` also satisfies
495-
those conditions. We prove for each case of `I_i`.
499+
We prove that `{I_i}` satisfies `(SEQ)`, `(SYNC)`, `(BOTTOM)`, `(TOP)`, and `(CONTENTS)` by
500+
induction on `n`: suppose `I_0`, ..., `I_(i-1)` satisfies those conditions, and let's prove that
501+
`I_i` also satisfies those conditions. We prove for each case of `I_i`.
496502

497503
- Case 1: `I_i` is `push()`.
498504

@@ -501,57 +507,57 @@ those conditions. We prove for each case of `I_i`.
501507
- Case 2: `I_i` is `pop()` taking the regular path.
502508

503509
Let `x` and `y` be the values `I_i` read from `bottom` at `'L201` and `top` at `'L204`,
504-
respectively. Then we have `x = b_i`, as `bottom` is modified only by the writer. Since `I_i` is
505-
taking the regular path, we have `y+2 <= x`. We also prove `t_i <= y+1`. Consider the invocation
506-
`I` that writes `y+2` to `top`, if exists. (Otherwise, `t_i <= y+1` should obviously hold.) If `I`
507-
is `pop()`, then `I` should be linearized after `I_i` thanks to the coherence of `top`; if `I` is
508-
`steal()`, by the synchronization of the seqcst fences from `I_i` to `I` via `top`, `I` should load a
509-
value that is coherence-after-or `WL_j` from `bottom`, where `j` is such an index that `I_i =
510-
O_j`. Thus `I` is linearized after `I_i`, and `t_i <= y+1` holds.
510+
respectively. By `(BOTTOM)`, we have `x = b_i`. Since `I_i` is taking the regular path, we have
511+
`y+2 <= x`. We also prove `t_i <= y+1`. Consider the invocation `I` that writes `y+2` to `top`, if
512+
exists. (Otherwise, `t_i <= y+1` should obviously hold.) If `I` is `pop()`, then `I` should be
513+
linearized after `I_i` thanks to the coherence of `top`; if `I` is `steal()`, then by the
514+
synchronization of the seqcst-fences from `I_i` to `I` via `top`, the value `I` read from `bottom`
515+
should be coherence-after-or `WL_j`, where `j` is such an index that `I_i = O_j`. Thus `I` is
516+
linearized after `I_i`, and `t_i <= y+1` holds.
511517

512518
Then we have `t_i <= y+1 < y+2 <= x = b_i`, and it is legit to pop a value from the bottom end of
513519
the deque and decrease `bottom`.
514520

515521
It remains to prove that `I_i` returns the right value. Let `O_k` be the last `push()` operation
516-
in `I_0`, ..., `I_(i-1)` that pushes to the index `x` and writes `bottom = x+1`, and `v` be the
517-
value `O_k` pushes. Let's prove that `I_i` returns `v`.
522+
in `I_0`, ..., `I_(i-1)` that pushes to the index `x-1` and writes `bottom = x`, and `v` be the
523+
value `O_k` pushes. Thanks to `(CONTENTS)`, it is sufficient to prove that `I_i` returns `v`.
518524

519525
For all `l`, let `WB_l` be the value of `buffer` at the end of the invocation `O_l`. Also, for all
520-
`z`, let `WC[l, z]` be the `z`-th contents of the buffer `WB_l` at the end of the invocation
526+
`z`, let `WC_(l, z)` be the `z`-th contents of the buffer `WB_l` at the end of the invocation
521527
`O_l`. They are well-defined since the pointer `buffer` and the contents of the buffer are
522528
modified only by the owner.
523529

524-
Let's prove by induction that for all `l ∈ [k, j)`, `WC[l, x % size(WB_l)] = v`. Since `O_k` just
525-
pushed a value to `x`, it trivially holds for the base case `l = k`. Now suppose that it holds for
526-
`l = m` for some `m ∈ [k, j-1)` and prove that it holds for `l = m+1`. Since `O_k` is the last
527-
operation that writes to the index `x`, `O_(m+1)` is not a regular `pop()` that writes `bottom =
528-
x`. Let `z` and `w` be the values `O_(m+1)` read from `bottom` at `'L301` and `top` at `'L302`,
529-
respectively, if `O_(m+1)` is resizing. Then we have `z > x` by the choice of `O_k`, and `w <= x`
530-
by the coherence on `top`. Thus `WC[m, x % size(WB_m)] = v` is copied to `WC[m+1, x %
531-
size(WB_(m+1))]`.
530+
Let's prove by induction that for all `l ∈ [k, j)`, `WC_(l, (x-1) % size(WB_l)) = v`. Since `O_k`
531+
just pushed a value to the index `x-1`, it trivially holds for the base case `l = k`. Now suppose
532+
that it holds for `l = m` for some `m ∈ [k, j-1)` and prove that it holds for `l = m+1`. Since
533+
`O_k` is the last operation that writes `bottom = x` and `I_i` writes `bottom = x-1`, `O_(m+1)` is
534+
not a regular `pop()` that writes `bottom = x-1`. Let `z` and `w` be the values `O_(m+1)` read
535+
from `bottom` at `'L301` and `top` at `'L302`, respectively, if `O_(m+1)` is resizing. Then we
536+
have `z >= x` by the choice of `O_k`, and `w <= y < x` by the coherence on `top`. Thus `WC_(m,
537+
(x-1) % size(WB_m)) = v` is copied to `WC_(m+1, (x-1) % size(WB_(m+1)))`.
532538

533-
Thus `I_i = O_j` returns `WC[j-1, x % size(WB_(j-1))] = v`.
539+
Thus `I_i = O_j` returns `WC_(j-1, (x-1) % size(WB_(j-1))) = v`.
534540

535541
- Case 3: `I_i` is `pop()` taking the irregular path.
536542

537543
Let `x` and `y` be the values `I_i` read from `bottom` at `'L201` and `top` at `'L204`,
538-
respectively. Similarly to the above case, we have `x = b_i`. Since `I_i` takes the irregular
539-
path, we have `y >= x-1`.
544+
respectively. By `(BOTTOM)`, we have `x = b_i`. Since `I_i` takes the irregular path, we have `y
545+
>= x-1`.
540546
541547
+ Case `y >= x`.
542548

543549
We prove `x <= t_i` as follows. Consider the invocation `I` that writes `x` to `top`. If `I` is
544550
`pop()`, then `I` should be linearized before `I_i` thanks to the coherence of `top`. Suppose
545-
`I` is `steal()`. Let `W` be the message `I` read from `bottom` at `'L404`. Then since `I`
546-
returns a value, we have `x <= Value(W)`. It is sufficient to prove that `W <= WL_j` in the
547-
coherence order, where `j` is such an index that `I_i = O_j`. Let's suppose otherwise. If
548-
`O_(j+1)` is `pop()`, then `W ≠ WF_(j+1)` since `Value(WF_(j+1)) = x-1`. Otherwise, `W` is
549-
either a release-store, or there is an seqcst fence between `I_i`'s read from `top` at `'L204` and
550-
`W`. But this is impossible: in order for `I` to read from `W` at `'L404` and `I_i` to read `top
551-
>= x` at `'L204` at the same time, either `I`'s write to `top` at `'L410` should be promised
552-
before reading `bottom` at `'L404` or `W` should be promised before `I_i` reading `top` at
553-
`'L204`, but the former is a release-store and the latter is either a release-store or after an
554-
seqcst fence.
551+
`I` is `steal()`. Let `W` be the message `I` read from `bottom` at `'L404`. Since `I` returns a
552+
value, we have `x <= Value(W)`. It is sufficient to prove that `W <= WL_j` in the coherence
553+
order, where `j` is such an index that `I_i = O_j`. Let's suppose otherwise. If `O_(j+1)` is
554+
`pop()`, then `W ≠ WF_(j+1)` since `Value(WF_(j+1)) = x-1`. Otherwise, `W` is either a
555+
release-store, or there is a seqcst-fence between `I_i`'s read from `top` at `'L204` and `W`. In
556+
order for `I` to read from `W` at `'L404` and `I_i` to read `top >= x` at `'L204` at the same
557+
time, either `I`'s write to `top` at `'L410` should be promised before reading `bottom` at
558+
`'L404` or `W` should be promised before `I_i` reading `top` at `'L204`. But this is impossible:
559+
the former is a release-store and the latter is either a release-store or after a release-store
560+
or a seqcst-fence.
555561

556562
Thus we have `b_i = x <= t_i`, and it is legit for `I_i` to go to `'L207`, restore the original
557563
value of `bottom`, and return `EMPTY`.
@@ -571,16 +577,16 @@ those conditions. We prove for each case of `I_i`.
571577

572578
+ Case `y = x-1`.
573579

574-
Then `I_i` performs a (strong) compare-and-swap (CAS) at `'L213`.
580+
Then `I_i` performs a strong CAS at `'L213`.
575581

576582
If the CAS fails, then `I_i` reads `top >= x` at `'L213` and writes `bottom = x` at
577583
`'L216`. Let's prove that `x <= t_i`. Consider the invocation `I` that writes `x` to `top`. If
578-
`I` is `pop()`, then `I` should be linearized before `I_i` thanks to the coherence of
579-
`top`. Suppose `I` is `steal()`. Then there is a release-acquire synchronization from `I`'s
580-
write to `top` at `'L410` to `I_i`'s read from `top` at `'L213`, and `I` should load a value
581-
that is coherence-before `WL_j` from `bottom`, where `j` is such an index that `I_i = O_j`. Thus
582-
`I` is linearized before `I_i`, and `x <= t_i` holds. Thus we have `b_i = x <= t_i`, and it is
583-
legit for `I_i` to return `EMPTY`.
584+
`I` is `pop()`, then `I` should be linearized before `I_i` thanks to the coherence of `top`. If
585+
`I` is `steal()`, then there is a release-acquire synchronization from `I`'s write to `top` at
586+
`'L410` to `I_i`'s read from `top` at `'L213`, and the value `I` read from `bottom` at `'L404`
587+
should be coherence-before `WL_j`, where `j` is such an index that `I_i = O_j`. Thus `I` is
588+
linearized before `I_i`. In either case, `x <= t_i` holds. Thus we have `b_i = x <= t_i`, and it
589+
is legit for `I_i` to return `EMPTY`.
584590

585591
If the CAS succeeds, then `I_i` updates `top` from `x-1` to `x` at `'L213` and writes `bottom =
586592
x` at `'L216`. Let's prove that `x-1 <= t_i`. Consider the invocation `I` that writes `x-1` to
@@ -591,13 +597,15 @@ those conditions. We prove for each case of `I_i`.
591597
order for `I` to read from `W` at `'L404` and `I_i` to read `top = x-1` at `'L204` at the same
592598
time, either `I`'s write to `top` at `'L410` should be promised before reading `bottom` at
593599
`'L404` or `W` should be promised before `I_i` reading `top` at `'L204`. But this is impossible:
594-
the former is a release-store, and the latter (1) has a control dependency to the read from
595-
`top` at `'L204`, (2) is a release-store, or (3) is after an seqcst fence.
600+
the former is a release-store and the latter is either (1) `WF_(j+1)` where `O_(j+1)` is `pop()`
601+
so that `W` has a genuine control dependency on `I_i`'s read from `top` at `'L204`, (2) a
602+
release-store, or (3) after a release-store or a seqcst-fence.
596603

597-
Since `I_i` writes `x` to `top`, we have `t_i = x-1`. Thus `t_i = y = x-1 = (b_i)-1`, and it is
598-
legit to pop the value from the `bottom` end of the deque and increase `top`.
604+
By `(TOP)`, `x-1 <= t_i`, and the fact that `I_i` writes `x` to `top`, we have `t_i = x-1`. Thus
605+
`t_i = y = x-1 = (b_i)-1`, and it is legit to pop the value from the `bottom` end of the deque
606+
and increase `top`.
599607

600-
`I_i` returns the right value for the same reason as above.
608+
`I_i` returns the right value for roughly the same reason as above.
601609

602610
<!-- r t x-2 -->
603611
<!-- ------- -->
@@ -611,6 +619,8 @@ those conditions. We prove for each case of `I_i`.
611619
<!-- u t x-1 x -->
612620
<!-- w b x -->
613621

622+
<!-- r t x-2 -->
623+
614624
<!-- e.g. -->
615625

616626
<!-- [steal] -->
@@ -663,9 +673,9 @@ those conditions. We prove for each case of `I_i`.
663673

664674
Let's first prove that for all regular `pop()` invocation `J` that writes `bottom = y` at `'L202`,
665675
`J` should be linearized before `I_i`. In order for `J` to enter the regular path, `J` should have
666-
read from `top` a value `<= y-1`. Then by the synchronization of the seqcst fences from `J` to `I_i`
667-
via `top`, the value `I_i` read from `bottom` at `'L404` is coherence-after-or the value written
668-
by `J` at `'L202`. This `J` is linearized before `I_i`.
676+
read from `top` a value `<= y-1`. Then by the synchronization of the seqcst-fences from `J` to
677+
`I_i` via `top`, the value `I_i` read from `bottom` at `'L404` is coherence-after-or the value
678+
written by `J` at `'L202`. This `J` is linearized before `I_i`.
669679

670680
Now let's prove that `O_k` is linearized before `I_i`. If `O_k` is the only such an invocation
671681
that writes `bottom = y+1` at `'L110`, then the value `I_i` read from `bottom` should be
@@ -717,17 +727,17 @@ those conditions. We prove for each case of `I_i`.
717727

718728
+ Case `k < l`.
719729

720-
Let's prove by induction that for all `m ∈ [k, l]`, `WC[m, y % size(WB_m)] = v`. By assumption,
730+
Let's prove by induction that for all `m ∈ [k, l]`, `WC_(m, y % size(WB_m)) = v`. By assumption,
721731
it holds for `m = k`. Now suppose that it holds for `m = n` for some `n ∈ [k, l)` and let's
722732
prove that it holds for `m = n+1`. Let `f` and `g` be the values `O_(n+1)` read from `bottom`
723733
and `top`. Then we have `g <= w <= y < b_(n+1) = f`. Since `k < n+1`, `O_(n+1)` is not a regular
724-
`pop()` that writes `bottom = y`. If `O_(n+1)` is resizing, then since `g <= y < f`, `WC[n, y %
725-
size(WB_n)]` is copied to `WC[n+1, y % size(WB_(n+1))]`. Thus we have `WC[n+1, y %
726-
size(WB_(n+1))] = v`.
734+
`pop()` that writes `bottom = y`. If `O_(n+1)` is resizing, then since `g <= y < f`, `WC_(n, y %
735+
size(WB_n))` is copied to `WC_(n+1, y % size(WB_(n+1)))`. Thus we have `WC_(n+1, y %
736+
size(WB_(n+1))) = v`.
727737

728738
By the release-acquire synchronization from `O_l`'s write to `buffer` at `'L307` to `I_i`'s read
729739
from `buffer` at `'L408`, the value `I_i` read from the buffer's content at `'L409` should be
730-
coherence-after-or the value `WC[l, y % size(WB_l)] = v` that `O_l` wrote at `'L305`. Similarly
740+
coherence-after-or the value `WC_(l, y % size(WB_l)) = v` that `O_l` wrote at `'L305`. Similarly
731741
to the above case, `I_i`'s read happens before any overwrites to the same location. Thus `I_i`
732742
should have read `v` at `'L409`.
733743

@@ -853,13 +863,13 @@ calls `pop()`, and the stealer thread calls `steal()` twice. Consider the follow
853863
// stealer calls steal()
854864
'L11: read(top, 0, Relaxed)
855865
'L12: fence(SeqCst)
856-
'L13: read(bottom, 0, Relaxed) // from `'L02`
866+
'L13: read(bottom, 0, Acquire) // from `'L02`
857867
// return `Empty`
858868

859869
// stealer calls steal()
860870
'L14: read(top, 0, Relaxed)
861871
'L15: fence(SeqCst)
862-
'L16: read(bottom, 1, Relaxed) // from `'L05`
872+
'L16: read(bottom, 1, Acquire) // from `'L05`
863873
'L17: CAS(top, 0, 1, Release) // success
864874
// return `42`
865875
```
@@ -931,7 +941,7 @@ actually be slightly inefficient in this case.
931941
Alternatively, we can write a deque for each target architecture in order to achieve better
932942
performance. For example, [this paper][deque-bounded-tso] presents a variant of Chase-Lev deque in
933943
the "bounded TSO" x86 model, where you don't need to issue the expensive `MFENCE` barrier (think:
934-
seqcst fence) in `pop()`. Also, [this paper][chase-lev-weak] presents a version of Chase-Lev deque
944+
seqcst-fence) in `pop()`. Also, [this paper][chase-lev-weak] presents a version of Chase-Lev deque
935945
for ARMv7 that doesn't issue an `isync`-like fence, while the proposed implementation issues
936946
some. Probably `Consume` is relevant for the latter. These further optimizations are left as future
937947
work.

0 commit comments

Comments
 (0)