WIP

jeehoonkang · jeehoonkang · commit 63987097743f · 2018-01-10T00:59:25.000+09:00
diff --git a/text/2018-01-07-deque-proof.md b/text/2018-01-07-deque-proof.md
@@ -363,7 +363,7 @@ the linearization order as follows:
 
 ### Auxiliary Lemma
 
-- > Let `WF_i` and `WL_i` be `O_i`'s first and last write to `bottom`.
+Let `WF_i` and `WL_i` be `O_i`'s first and last write to `bottom`.
 
 - > (VIEW-OWNER): For arbitrary `i`, since `O_(i-1)` wrote `WF_(i-1), WL_(i-1)` and `O_i` wrote
   `WF_i, WL_i`, we have `Timestamp(WL_(i-1)) <= view_beginning(O_i)[bottom] <
@@ -397,11 +397,12 @@ the linearization order as follows:
     In order for `S` to read `bottom = x` at `'L404` and `O_i` to read `top >= x` at `'L204` at the
     same time, either `S`'s write to `top` at `'L410` should be promised before reading `bottom` at
     `'L404` or `O_i`'s write to `bottom` at `'L207` should be promised before reading `top` at
-    `'L204`. But this is impossible: `S`'s write to `top` at `'L410` is a release-store; and in
-    order for `O_i` to promise to write `bottom = x` at `'L207`, `O_i` should read `top = x-1` at
-    `'L204` and then execute the CAS at `'L213` in certification, but (1) it is impossible to
-    succeed the CAS in arbitrary future memories, and (2) after failing the CAS and acquiring
-    `top >= x`, it is impossible to fulfill the promise in arbitrary future memories.
+    `'L204`. But this is impossible. For the former, `S`'s write to `top` at `'L410` is a
+    release-store. For the latter, in order for `O_i` to promise to write `bottom = x` at `'L207`,
+    `O_i` should read `top = x-1` at `'L204` and then execute the CAS at `'L213` in
+    certification. But a certain future memory mandates the CAS to fail, acquiring an arbitrarily
+    high view, after which it is impossible to fulfill the promise. In short, there is a semantic
+    dependency from `O_i`'s read from `top` at `'L204` to `O_i`'s write to `bottom` at `'L207`.
 
   + Case `O_i` executes the CAS at `'L213`.
 
@@ -424,12 +425,12 @@ For `(VIEW)`, it is sufficient to prove that:
 
 (Note that what would be `(VIEW-OWNER-OWNER)` is obvious.)
 
-#### Proof of `(VIEW-OWNER-STEAL)`.
+#### Proof of `(VIEW-OWNER-STEAL)`
 
 By `(VIEW-OWNER)` and `(VIEW-STEAL)`, we have `view_beginning(O_i)[bottom] < Timestamp(WF_i) <=
 Timestamp(WF_j) <= view_end(S)[bottom]`. Thus it is not the case that `S -view-> O_i`.
 
-#### Proof of `(VIEW-STEAL-OWNER)`.
+#### Proof of `(VIEW-STEAL-OWNER)`
 
 By `(VIEW-OWNER)` and `(VIEW-STEAL)`, we have `view_beginning(S)[bottom] <= Timestamp(WL_(i+1)) <=
 Timestamp(WL_j) <= view_end(O_j)[bottom]`. If `view_beginning(S)[bottom] < view_end(O_j)[bottom]`,
@@ -442,7 +443,7 @@ is `pop()` taking the irregular path. Let `x` be the value `O_(i+1)` read from `
 `O_(i+1)` read a value `>= x` from `top`. Thus we have `view_beginning(S)[top]` < [the timestamp of
 `top = x`] <= `view_end(O_(i+1))[top]`, and it is not the case that `O_(i+1) -view-> S`.
 
-#### Proof of `(VIEW-STEAL-INTER-GROUP)`.
+#### Proof of `(VIEW-STEAL-INTER-GROUP)`
 
 By `(VIEW-OWNER)` and `(VIEW-STEAL)`, we have `view_beginning(S_i)[bottom] <= Timestamp(WL_(i+1))`
 and `Timestamp(WF_j) <= view_end(S_j)[bottom]`. Thus if either `i+1 < j`,
@@ -456,7 +457,7 @@ Now suppose otherwise. Then `j = i+1`, `O_(i+1)` is `pop()` taking the irregular
 < Timestamp(top, y) <= Timestamp(top, x-1) <= view_end(S_j)[top]`, and it is not the case that `S_j
 -view-> S_i`.
 
-#### Proof of `(VIEW-STEAL-INTRA-GROUP)`.
+#### Proof of `(VIEW-STEAL-INTRA-GROUP)`
 
 - Case 1: `S, S' ∈ STEAL`.
 
@@ -485,14 +486,19 @@ Let `I_0`, ..., `I_(n-1)` be the invocations sorted according to the constructed
 order. In addition to `(SEQ)` and `(SYNC)`, we will simultaneously prove the following conditions
 with the same existentially quantified values of `t_i, b_i, A_i`:
 
-> `(CONTENT)`: for all `i` and `x ∈ [t_i, b_i)`, there exists a `push()` invocation into `x` in
-> `I_0, ..., I_(i-1)`; and `A_i[x]` is the value inserted by the last such invocation.
+> `(BOTTOM)`: If `I_i` is an owner invocation, `b_i` equals to the value `I_i` read from `bottom` at
+> `'L101` or `'L201`.
+>
+> `(TOP)`: `t_i` equals to the value of the last write to `top` in `I_0`, ..., `I_(i-1)`. Also, for
+> each `x ∈ [0,t_i)`, there is exactly one invocation in `I_0`, ..., `I_(i-1)` that performs a CAS
+> that updates `top = x` to `x+1`.
 >
-> `(TOP)`: `t_i` equals to the value of the last write to `top` in `I_0`, ..., `I_(i-1)`.
+> `(CONTENTS)`: for all `i` and `x ∈ [t_i, b_i)`, there exists a `push()` invocation into `x` in
+> `I_0, ..., I_(i-1)`; and `A_i[x]` is the value inserted by the last such invocation.
 
-We prove that `{I_i}` satisfies `(SEQ)`, `(SYNC)`, `(CONTENT)`, and `(TOP)` by induction on `n`:
-suppose `I_0`, ..., `I_(i-1)` satisfies those conditions, and let's prove that `I_i` also satisfies
-those conditions. We prove for each case of `I_i`.
+We prove that `{I_i}` satisfies `(SEQ)`, `(SYNC)`, `(BOTTOM)`, `(TOP)`, and `(CONTENTS)` by
+induction on `n`: suppose `I_0`, ..., `I_(i-1)` satisfies those conditions, and let's prove that
+`I_i` also satisfies those conditions. We prove for each case of `I_i`.
 
 - Case 1: `I_i` is `push()`.
 
@@ -501,57 +507,57 @@ those conditions. We prove for each case of `I_i`.
 - Case 2: `I_i` is `pop()` taking the regular path.
 
   Let `x` and `y` be the values `I_i` read from `bottom` at `'L201` and `top` at `'L204`,
-  respectively. Then we have `x = b_i`, as `bottom` is modified only by the writer. Since `I_i` is
-  taking the regular path, we have `y+2 <= x`. We also prove `t_i <= y+1`. Consider the invocation
-  `I` that writes `y+2` to `top`, if exists. (Otherwise, `t_i <= y+1` should obviously hold.) If `I`
-  is `pop()`, then `I` should be linearized after `I_i` thanks to the coherence of `top`; if `I` is
-  `steal()`, by the synchronization of the seqcst fences from `I_i` to `I` via `top`, `I` should load a
-  value that is coherence-after-or `WL_j` from `bottom`, where `j` is such an index that `I_i =
-  O_j`. Thus `I` is linearized after `I_i`, and `t_i <= y+1` holds.
+  respectively. By `(BOTTOM)`, we have `x = b_i`. Since `I_i` is taking the regular path, we have
+  `y+2 <= x`. We also prove `t_i <= y+1`. Consider the invocation `I` that writes `y+2` to `top`, if
+  exists. (Otherwise, `t_i <= y+1` should obviously hold.) If `I` is `pop()`, then `I` should be
+  linearized after `I_i` thanks to the coherence of `top`; if `I` is `steal()`, then by the
+  synchronization of the seqcst-fences from `I_i` to `I` via `top`, the value `I` read from `bottom`
+  should be coherence-after-or `WL_j`, where `j` is such an index that `I_i = O_j`. Thus `I` is
+  linearized after `I_i`, and `t_i <= y+1` holds.
 
   Then we have `t_i <= y+1 < y+2 <= x = b_i`, and it is legit to pop a value from the bottom end of
   the deque and decrease `bottom`.
 
   It remains to prove that `I_i` returns the right value. Let `O_k` be the last `push()` operation
-  in `I_0`, ..., `I_(i-1)` that pushes to the index `x` and writes `bottom = x+1`, and `v` be the
-  value `O_k` pushes. Let's prove that `I_i` returns `v`.
+  in `I_0`, ..., `I_(i-1)` that pushes to the index `x-1` and writes `bottom = x`, and `v` be the
+  value `O_k` pushes. Thanks to `(CONTENTS)`, it is sufficient to prove that `I_i` returns `v`.
 
   For all `l`, let `WB_l` be the value of `buffer` at the end of the invocation `O_l`. Also, for all
-  `z`, let `WC[l, z]` be the `z`-th contents of the buffer `WB_l` at the end of the invocation
+  `z`, let `WC_(l, z)` be the `z`-th contents of the buffer `WB_l` at the end of the invocation
   `O_l`. They are well-defined since the pointer `buffer` and the contents of the buffer are
   modified only by the owner.
 
-  Let's prove by induction that for all `l ∈ [k, j)`, `WC[l, x % size(WB_l)] = v`. Since `O_k` just
-  pushed a value to `x`, it trivially holds for the base case `l = k`. Now suppose that it holds for
-  `l = m` for some `m ∈ [k, j-1)` and prove that it holds for `l = m+1`. Since `O_k` is the last
-  operation that writes to the index `x`, `O_(m+1)` is not a regular `pop()` that writes `bottom =
-  x`. Let `z` and `w` be the values `O_(m+1)` read from `bottom` at `'L301` and `top` at `'L302`,
-  respectively, if `O_(m+1)` is resizing. Then we have `z > x` by the choice of `O_k`, and `w <= x`
-  by the coherence on `top`. Thus `WC[m, x % size(WB_m)] = v` is copied to `WC[m+1, x %
-  size(WB_(m+1))]`.
+  Let's prove by induction that for all `l ∈ [k, j)`, `WC_(l, (x-1) % size(WB_l)) = v`. Since `O_k`
+  just pushed a value to the index `x-1`, it trivially holds for the base case `l = k`. Now suppose
+  that it holds for `l = m` for some `m ∈ [k, j-1)` and prove that it holds for `l = m+1`. Since
+  `O_k` is the last operation that writes `bottom = x` and `I_i` writes `bottom = x-1`, `O_(m+1)` is
+  not a regular `pop()` that writes `bottom = x-1`. Let `z` and `w` be the values `O_(m+1)` read
+  from `bottom` at `'L301` and `top` at `'L302`, respectively, if `O_(m+1)` is resizing. Then we
+  have `z >= x` by the choice of `O_k`, and `w <= y < x` by the coherence on `top`. Thus `WC_(m,
+  (x-1) % size(WB_m)) = v` is copied to `WC_(m+1, (x-1) % size(WB_(m+1)))`.
 
-  Thus `I_i = O_j` returns `WC[j-1, x % size(WB_(j-1))] = v`.
+  Thus `I_i = O_j` returns `WC_(j-1, (x-1) % size(WB_(j-1))) = v`.
 
 - Case 3: `I_i` is `pop()` taking the irregular path.
 
   Let `x` and `y` be the values `I_i` read from `bottom` at `'L201` and `top` at `'L204`,
-  respectively. Similarly to the above case, we have `x = b_i`. Since `I_i` takes the irregular
-  path, we have `y >= x-1`.
+  respectively. By `(BOTTOM)`, we have `x = b_i`. Since `I_i` takes the irregular path, we have `y
+  >= x-1`.
 
   + Case `y >= x`.
 
     We prove `x <= t_i` as follows. Consider the invocation `I` that writes `x` to `top`. If `I` is
     `pop()`, then `I` should be linearized before `I_i` thanks to the coherence of `top`. Suppose
-    `I` is `steal()`. Let `W` be the message `I` read from `bottom` at `'L404`. Then since `I`
-    returns a value, we have `x <= Value(W)`. It is sufficient to prove that `W <= WL_j` in the
-    coherence order, where `j` is such an index that `I_i = O_j`. Let's suppose otherwise. If
-    `O_(j+1)` is `pop()`, then `W ≠ WF_(j+1)` since `Value(WF_(j+1)) = x-1`. Otherwise, `W` is
-    either a release-store, or there is an seqcst fence between `I_i`'s read from `top` at `'L204` and
-    `W`. But this is impossible: in order for `I` to read from `W` at `'L404` and `I_i` to read `top
-    >= x` at `'L204` at the same time, either `I`'s write to `top` at `'L410` should be promised
-    before reading `bottom` at `'L404` or `W` should be promised before `I_i` reading `top` at
-    `'L204`, but the former is a release-store and the latter is either a release-store or after an
-    seqcst fence.
+    `I` is `steal()`. Let `W` be the message `I` read from `bottom` at `'L404`. Since `I` returns a
+    value, we have `x <= Value(W)`. It is sufficient to prove that `W <= WL_j` in the coherence
+    order, where `j` is such an index that `I_i = O_j`. Let's suppose otherwise. If `O_(j+1)` is
+    `pop()`, then `W ≠ WF_(j+1)` since `Value(WF_(j+1)) = x-1`. Otherwise, `W` is either a
+    release-store, or there is a seqcst-fence between `I_i`'s read from `top` at `'L204` and `W`. In
+    order for `I` to read from `W` at `'L404` and `I_i` to read `top >= x` at `'L204` at the same
+    time, either `I`'s write to `top` at `'L410` should be promised before reading `bottom` at
+    `'L404` or `W` should be promised before `I_i` reading `top` at `'L204`. But this is impossible:
+    the former is a release-store and the latter is either a release-store or after a release-store
+    or a seqcst-fence.
 
     Thus we have `b_i = x <= t_i`, and it is legit for `I_i` to go to `'L207`, restore the original
     value of `bottom`, and return `EMPTY`.
@@ -571,16 +577,16 @@ those conditions. We prove for each case of `I_i`.
 
   + Case `y = x-1`.
 
-    Then `I_i` performs a (strong) compare-and-swap (CAS) at `'L213`.
+    Then `I_i` performs a strong CAS at `'L213`.
 
     If the CAS fails, then `I_i` reads `top >= x` at `'L213` and writes `bottom = x` at
     `'L216`. Let's prove that `x <= t_i`. Consider the invocation `I` that writes `x` to `top`. If
-    `I` is `pop()`, then `I` should be linearized before `I_i` thanks to the coherence of
-    `top`. Suppose `I` is `steal()`. Then there is a release-acquire synchronization from `I`'s
-    write to `top` at `'L410` to `I_i`'s read from `top` at `'L213`, and `I` should load a value
-    that is coherence-before `WL_j` from `bottom`, where `j` is such an index that `I_i = O_j`. Thus
-    `I` is linearized before `I_i`, and `x <= t_i` holds. Thus we have `b_i = x <= t_i`, and it is
-    legit for `I_i` to return `EMPTY`.
+    `I` is `pop()`, then `I` should be linearized before `I_i` thanks to the coherence of `top`. If
+    `I` is `steal()`, then there is a release-acquire synchronization from `I`'s write to `top` at
+    `'L410` to `I_i`'s read from `top` at `'L213`, and the value `I` read from `bottom` at `'L404`
+    should be coherence-before `WL_j`, where `j` is such an index that `I_i = O_j`. Thus `I` is
+    linearized before `I_i`. In either case, `x <= t_i` holds. Thus we have `b_i = x <= t_i`, and it
+    is legit for `I_i` to return `EMPTY`.
 
     If the CAS succeeds, then `I_i` updates `top` from `x-1` to `x` at `'L213` and writes `bottom =
     x` at `'L216`. Let's prove that `x-1 <= t_i`. Consider the invocation `I` that writes `x-1` to
@@ -591,13 +597,15 @@ those conditions. We prove for each case of `I_i`.
     order for `I` to read from `W` at `'L404` and `I_i` to read `top = x-1` at `'L204` at the same
     time, either `I`'s write to `top` at `'L410` should be promised before reading `bottom` at
     `'L404` or `W` should be promised before `I_i` reading `top` at `'L204`. But this is impossible:
-    the former is a release-store, and the latter (1) has a control dependency to the read from
-    `top` at `'L204`, (2) is a release-store, or (3) is after an seqcst fence.
+    the former is a release-store and the latter is either (1) `WF_(j+1)` where `O_(j+1)` is `pop()`
+    so that `W` has a genuine control dependency on `I_i`'s read from `top` at `'L204`, (2) a
+    release-store, or (3) after a release-store or a seqcst-fence.
 
-    Since `I_i` writes `x` to `top`, we have `t_i = x-1`. Thus `t_i = y = x-1 = (b_i)-1`, and it is
-    legit to pop the value from the `bottom` end of the deque and increase `top`.
+    By `(TOP)`, `x-1 <= t_i`, and the fact that `I_i` writes `x` to `top`, we have `t_i = x-1`. Thus
+    `t_i = y = x-1 = (b_i)-1`, and it is legit to pop the value from the `bottom` end of the deque
+    and increase `top`.
 
-    `I_i` returns the right value for the same reason as above.
+    `I_i` returns the right value for roughly the same reason as above.
 
     <!-- r t x-2 -->
     <!-- ------- -->
@@ -611,6 +619,8 @@ those conditions. We prove for each case of `I_i`.
     <!-- u t x-1 x -->
     <!-- w b x -->
 
+    <!-- r t x-2 -->
+
     <!-- e.g. -->
 
     <!-- [steal] -->
@@ -663,9 +673,9 @@ those conditions. We prove for each case of `I_i`.
 
   Let's first prove that for all regular `pop()` invocation `J` that writes `bottom = y` at `'L202`,
   `J` should be linearized before `I_i`. In order for `J` to enter the regular path, `J` should have
-  read from `top` a value `<= y-1`. Then by the synchronization of the seqcst fences from `J` to `I_i`
-  via `top`, the value `I_i` read from `bottom` at `'L404` is coherence-after-or the value written
-  by `J` at `'L202`. This `J` is linearized before `I_i`.
+  read from `top` a value `<= y-1`. Then by the synchronization of the seqcst-fences from `J` to
+  `I_i` via `top`, the value `I_i` read from `bottom` at `'L404` is coherence-after-or the value
+  written by `J` at `'L202`. This `J` is linearized before `I_i`.
 
   Now let's prove that `O_k` is linearized before `I_i`. If `O_k` is the only such an invocation
   that writes `bottom = y+1` at `'L110`, then the value `I_i` read from `bottom` should be
@@ -717,17 +727,17 @@ those conditions. We prove for each case of `I_i`.
 
   + Case `k < l`.
 
-    Let's prove by induction that for all `m ∈ [k, l]`, `WC[m, y % size(WB_m)] = v`. By assumption,
+    Let's prove by induction that for all `m ∈ [k, l]`, `WC_(m, y % size(WB_m)) = v`. By assumption,
     it holds for `m = k`. Now suppose that it holds for `m = n` for some `n ∈ [k, l)` and let's
     prove that it holds for `m = n+1`. Let `f` and `g` be the values `O_(n+1)` read from `bottom`
     and `top`. Then we have `g <= w <= y < b_(n+1) = f`. Since `k < n+1`, `O_(n+1)` is not a regular
-    `pop()` that writes `bottom = y`. If `O_(n+1)` is resizing, then since `g <= y < f`, `WC[n, y %
-    size(WB_n)]` is copied to `WC[n+1, y % size(WB_(n+1))]`. Thus we have `WC[n+1, y %
-    size(WB_(n+1))] = v`.
+    `pop()` that writes `bottom = y`. If `O_(n+1)` is resizing, then since `g <= y < f`, `WC_(n, y %
+    size(WB_n))` is copied to `WC_(n+1, y % size(WB_(n+1)))`. Thus we have `WC_(n+1, y %
+    size(WB_(n+1))) = v`.
 
     By the release-acquire synchronization from `O_l`'s write to `buffer` at `'L307` to `I_i`'s read
     from `buffer` at `'L408`, the value `I_i` read from the buffer's content at `'L409` should be
-    coherence-after-or the value `WC[l, y % size(WB_l)] = v` that `O_l` wrote at `'L305`. Similarly
+    coherence-after-or the value `WC_(l, y % size(WB_l)) = v` that `O_l` wrote at `'L305`. Similarly
     to the above case, `I_i`'s read happens before any overwrites to the same location. Thus `I_i`
     should have read `v` at `'L409`.
 
@@ -853,13 +863,13 @@ calls `pop()`, and the stealer thread calls `steal()` twice. Consider the follow
 // stealer calls steal()
 'L11: read(top, 0, Relaxed)
 'L12: fence(SeqCst)
-'L13: read(bottom, 0, Relaxed) // from `'L02`
+'L13: read(bottom, 0, Acquire) // from `'L02`
 // return `Empty`
 
 // stealer calls steal()
 'L14: read(top, 0, Relaxed)
 'L15: fence(SeqCst)
-'L16: read(bottom, 1, Relaxed) // from `'L05`
+'L16: read(bottom, 1, Acquire) // from `'L05`
 'L17: CAS(top, 0, 1, Release) // success
 // return `42`
 ```
@@ -931,7 +941,7 @@ actually be slightly inefficient in this case.
 Alternatively, we can write a deque for each target architecture in order to achieve better
 performance. For example, [this paper][deque-bounded-tso] presents a variant of Chase-Lev deque in
 the "bounded TSO" x86 model, where you don't need to issue the expensive `MFENCE` barrier (think:
-seqcst fence) in `pop()`. Also, [this paper][chase-lev-weak] presents a version of Chase-Lev deque
+seqcst-fence) in `pop()`. Also, [this paper][chase-lev-weak] presents a version of Chase-Lev deque
 for ARMv7 that doesn't issue an `isync`-like fence, while the proposed implementation issues
 some. Probably `Consume` is relevant for the latter. These further optimizations are left as future
 work.