Skip to content

Commit 839884b

Browse files
coreyostroveCorey Ostrove
andauthored
lsvec Shared Memory Race Condition (#674)
This patches another shared memory bug in `lsvec`, this time a race condition. The bug is easiest explained by pointing at the relevant lines. Below is the implementation of `lsvec` for `TimeIndependentMDCObjectiveFunction`: ``` lsvec = self.terms(paramvec, oob_check, "LS OBJECTIVE") if _np.any(lsvec < 0): bad_locs = _np.where(lsvec < 0)[0] msg = f""" lsvec is only defined when terms is elementwise nonnegative. We encountered negative values for terms[i] for indices i in {bad_locs}. """ raise RuntimeError(msg) unit_ralloc = self.layout.resource_alloc('atom-processing') shared_mem_leader = unit_ralloc.is_host_leader if shared_mem_leader: lsvec **= 0.5 if raw_objfn_lsvec_signs: if shared_mem_leader: raw_lsvec = self.raw_objfn.lsvec(self.probs, self.counts, self.total_counts, self.freqs) lsvec[:self.nelements][raw_lsvec < 0] *= -1 unit_ralloc.host_comm_barrier() ``` The bug arises due to the interation of the positivity check ` if _np.any(lsvec < 0)` and later on the update of `lsvec` using the `raw_objfn` signs `lsvec[:self.nelements][raw_lsvec < 0] *= -1`. Since the positivity check is happening on all ranks, but is with respect to shared memory, if a rank that is the shared memory leader hits and runs that sign update line before another rank looking at the same memory hits the positivity check this would raise an error. After staring at this code for a bit I couldn't see any reason why this positivity check (or anything else in this function) should be happening on anything other the shared memory leader, so the patch is to just add a corresponding if statement which enforces that. Thanks to @juangmendoza19 for reporting this error. God I hate shared memory sometimes... Co-authored-by: Corey Ostrove <[email protected]>
1 parent bb89527 commit 839884b

File tree

1 file changed

+12
-11
lines changed

1 file changed

+12
-11
lines changed

pygsti/objectivefns/objectivefns.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4406,21 +4406,22 @@ def terms(self, paramvec=None, oob_check=False, profiler_str="TERMS OBJECTIVE"):
44064406
return terms
44074407

44084408
def lsvec(self, paramvec=None, oob_check=False, raw_objfn_lsvec_signs=True):
4409-
lsvec = self.terms(paramvec, oob_check, "LS OBJECTIVE")
4410-
if _np.any(lsvec < 0):
4411-
bad_locs = _np.where(lsvec < 0)[0]
4412-
msg = f"""
4413-
lsvec is only defined when terms is elementwise nonnegative.
4414-
We encountered negative values for terms[i] for indices i
4415-
in {bad_locs}.
4416-
"""
4417-
raise RuntimeError(msg)
44184409
unit_ralloc = self.layout.resource_alloc('atom-processing')
44194410
shared_mem_leader = unit_ralloc.is_host_leader
4411+
lsvec = self.terms(paramvec, oob_check, "LS OBJECTIVE")
4412+
44204413
if shared_mem_leader:
4414+
if _np.any(lsvec < 0):
4415+
bad_locs = _np.where(lsvec < 0)[0]
4416+
msg = f"""
4417+
lsvec is only defined when terms is elementwise nonnegative.
4418+
We encountered negative values for terms[i] for indices i
4419+
in {bad_locs}.
4420+
"""
4421+
raise RuntimeError(msg)
4422+
44214423
lsvec **= 0.5
4422-
if raw_objfn_lsvec_signs:
4423-
if shared_mem_leader:
4424+
if raw_objfn_lsvec_signs:
44244425
raw_lsvec = self.raw_objfn.lsvec(self.probs, self.counts, self.total_counts, self.freqs)
44254426
lsvec[:self.nelements][raw_lsvec < 0] *= -1
44264427
unit_ralloc.host_comm_barrier()

0 commit comments

Comments
 (0)