Fixes TypeError and infinite looping in MPITaskScheduler #3783

yadudoc · 2025-02-24T20:05:19Z

Description

This PR attempts to fix the following bugs in the MPITaskScheduler:

Currently the MPITaskScheduler's schedule_backlog_tasks method takes tasks from the backlog and attempts to schedule them until the queue is empty. However since calling put_task pops the task back onto the backlog queue, this ends up in an infinite loop if there's at least 1 task that cannot be scheduled.
Putting multiple tasks with the same priority into the internal PriorityQueue results in attempts to sort using the task dict which fails with TypeError unhashable type: dict.
PriorityQueue using increasing order for sorting queue items. This currently results in smaller tasks getting scheduler first while scheduling large tasks is generally preferred.

Changed Behaviour

Larger MPI tasks will be scheduled for execution on the manager.

Fixes

schedule_backlog_tasks is now updated to fetch all tasks in the backlog_queue and then attempt to schedule them avoiding the infinite loop.
A new PrioritizedTask dataclass is added that disable comparison on the task: dict element.
The priority is set num_nodes * -1 to ensure that larger jobs get prioritized.

Type of change

Choose which options apply, and delete the ones which do not apply.

Bug fix
New feature
Code maintenance/cleanup

* test_larger_jobs_prioritized checks to confirm the ordering of jobs in the backlog queue * test_hashable_backlog_queue tests to confirm that the PrioritizedTask dataclass avoid the priority queue failing to hash tasks with the same priority. * an extended test for new MPITaskScheduler logic

…g logic * `schedule_backlog_tasks` is now updated to fetch all tasks in the backlog_queue and then attempt to schedule them avoiding the infinite loop. * A new `PrioritizedTask` dataclass is added that disable comparison on the task: dict element. * The priority is set num_nodes * -1 to ensure that larger jobs get prioritized.

khk-globus · 2025-03-05T14:21:33Z

parsl/executors/high_throughput/mpi_resource_management.py

+        # Separate fetching tasks from the _backlog_queue and scheduling them
+        # since tasks that failed to schedule will be pushed to the _backlog_queue
+        backlogged_tasks = []
+        while not self._backlog_queue.empty():
+            prioritized_task = self._backlog_queue.get(block=False)
+            backlogged_tasks.append(prioritized_task.task)
+
+        for backlogged_task in backlogged_tasks:
+            self.put_task(backlogged_task)



From a static analysis, this looks better to me. No more infinite loop potential, but I do observe that this could mean a lot of unpacking and then repacking. "It works," so I'm not going to fuss about it, but a different data structure might help with that.

More actionably, however, this looks like it would lose tasks? What happens when .get(block=False) raises queue.Empty?

Good catch, Kevin! These few lines do make me worry about race conditions.

Additionally, will the very aggressive scheduling here (always attempt to schedule everything) will still result in large tasks being continually delayed? If there are small tasks, they'll get scheduled before the big one still.

That might be ok with some users, but what about a simple "run in the order of execution" strategy as our baseline?

@WardLT it's issue #3323 but one layer deeper into the dispatch logic!

(we shouldn't overtrivialise this or assume there's a universal solution or try to make a comprehensive set of here are the options that will satisfy everyone)

@khk-globus Thanks for the review, this is a good catch!

Unpacking-repacking: Yep, we shouldn't have to do this if we store the resource_spec

queue.Empty: I was working with the idea that since only this function can pop an item from the queue, checking for empty() is sufficient to guarantee that get will not raise a queue.Empty. I can rework this to avoid this.

@WardLT I share your concern. There's no notion of fairness here, and as @benclifford pointed out coming up with scheduling logic that'll work for everyone is hard. Right now, I expect larger tasks to end up getting delayed. Like @benclifford mentioned we could move this logic to the interchange (#3323), but we still need to implement these alternative scheduling algorithms but I'm hesitant to do so without user feedback.

benclifford · 2025-03-06T16:51:57Z

there's enough interesting stuff here that the "if your PR description is an itemised list, there should be one PR per item" rule probably applies.

yadudoc · 2025-03-06T18:58:34Z

@benclifford Your comment on splitting the PR is fair, I can get that sorted.

…l with `TypeError` (#3794) # Description The `MPITaskScheduler` uses Python's PriorityQueue to prioritize tasks based on the number of nodes requested. When items with identical priorities are are submitted to the PriorityQueue, they attempt to sort based on the task dict which fails with TypeError unhashable type: dict. This PR adds a new `PrioritizedTask` dataclass that sets the task element to `field(compare=False)`. I'm splitting changes in #3783 to keep the PR concise. This is split 1 of 3. # Changed Behaviour Fixes the bug described above. ## Type of change Choose which options apply, and delete the ones which do not apply. - Bug fix

yadudoc force-pushed the mpi_sched_fix branch from 6715018 to f399687 Compare February 24, 2025 20:21

yadudoc added 2 commits February 27, 2025 12:10

yadudoc force-pushed the mpi_sched_fix branch from 6a4b1a1 to 22a4ce0 Compare February 27, 2025 18:10

yadudoc requested review from benclifford, khk-globus and WardLT February 27, 2025 18:10

yadudoc marked this pull request as ready for review February 27, 2025 18:11

yadudoc changed the title ~~[Draft] Fixes TypeError and infinite looping in MPITaskScheduler~~ Fixes TypeError and infinite looping in MPITaskScheduler Feb 28, 2025

khk-globus reviewed Mar 5, 2025

View reviewed changes

WardLT mentioned this pull request Mar 6, 2025

Improve documentation about Config objects #3779

Open

yadudoc mentioned this pull request Mar 7, 2025

Fix bug in MPITaskScheduler where tasks with identical priorities fail with TypeError #3794

Merged

yadudoc marked this pull request as draft March 10, 2025 18:49

yadudoc mentioned this pull request Mar 10, 2025

Fix infinite loop in MPITaskScheduler #3800

Open

yadudoc mentioned this pull request Mar 11, 2025

Make MPITaskScheduler prioritize large tasks #3805

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes TypeError and infinite looping in MPITaskScheduler #3783

Fixes TypeError and infinite looping in MPITaskScheduler #3783

yadudoc commented Feb 24, 2025

khk-globus Mar 5, 2025

WardLT Mar 6, 2025

benclifford Mar 6, 2025

benclifford Mar 6, 2025

yadudoc Mar 6, 2025

benclifford commented Mar 6, 2025

yadudoc commented Mar 6, 2025

Fixes TypeError and infinite looping in MPITaskScheduler #3783

Are you sure you want to change the base?

Fixes TypeError and infinite looping in MPITaskScheduler #3783

Conversation

yadudoc commented Feb 24, 2025

Description

Changed Behaviour

Fixes

Type of change

khk-globus Mar 5, 2025

Choose a reason for hiding this comment

WardLT Mar 6, 2025

Choose a reason for hiding this comment

benclifford Mar 6, 2025

Choose a reason for hiding this comment

benclifford Mar 6, 2025

Choose a reason for hiding this comment

yadudoc Mar 6, 2025

Choose a reason for hiding this comment

benclifford commented Mar 6, 2025

yadudoc commented Mar 6, 2025