Skip to content

Commit b4ebbee

Browse files
committed
Amortize transfer cost
As discussed in dask#5325. The idea is that if a key we need has many dependents, we should amortize the cost of transferring it to a new worker, since those other dependencies could then run on the new worker more cheaply. "We'll probably have to move this at some point anyway, might as well do it now." This isn't actually intended to encourage transfers though. It's more meant to discourage transferring keys that could have just stayed in one place. The goal is that if A and B are on different workers, and we're the only task that will ever need A, but plenty of other tasks will need B, we should schedule alongside A even if B is a bit larger to move. But this is all a theory and needs some tests.
1 parent 4c67b0b commit b4ebbee

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

distributed/scheduler.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3415,12 +3415,13 @@ def worker_objective(self, ts: TaskState, ws: WorkerState) -> tuple:
34153415
"""
34163416
dts: TaskState
34173417
nbytes: Py_ssize_t
3418-
comm_bytes: Py_ssize_t = 0
3418+
comm_bytes: double = 0
34193419
xfers: Py_ssize_t = 0
34203420
for dts in ts._dependencies:
34213421
if ws not in dts._who_has:
34223422
nbytes = dts.get_nbytes()
3423-
comm_bytes += nbytes
3423+
# amortize transfer cost over all waiters
3424+
comm_bytes += nbytes / len(dts._waiters)
34243425
xfers += 1
34253426

34263427
stack_time: double = ws._occupancy / ws._nthreads

0 commit comments

Comments
 (0)