Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes TypeError and infinite looping in MPITaskScheduler #3783

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yadudoc
Copy link
Member

@yadudoc yadudoc commented Feb 24, 2025

Description

This PR attempts to fix the following bugs in the MPITaskScheduler:

  1. Currently the MPITaskScheduler's schedule_backlog_tasks method takes tasks from the backlog and attempts to schedule them until the queue is empty. However since calling put_task pops the task back onto the backlog queue, this ends up in an infinite loop if there's at least 1 task that cannot be scheduled.
  2. Putting multiple tasks with the same priority into the internal PriorityQueue results in attempts to sort using the task dict which fails with TypeError unhashable type: dict.
  3. PriorityQueue using increasing order for sorting queue items. This currently results in smaller tasks getting scheduler first while scheduling large tasks is generally preferred.

Changed Behaviour

  • Larger MPI tasks will be scheduled for execution on the manager.

Fixes

  1. schedule_backlog_tasks is now updated to fetch all tasks in the backlog_queue and then attempt to schedule them avoiding the infinite loop.
  2. A new PrioritizedTask dataclass is added that disable comparison on the task: dict element.
  3. The priority is set num_nodes * -1 to ensure that larger jobs get prioritized.

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Bug fix
  • New feature
  • Code maintenance/cleanup

* test_larger_jobs_prioritized checks to confirm the ordering of jobs in the backlog queue
* test_hashable_backlog_queue tests to confirm that the PrioritizedTask dataclass avoid the priority queue failing to hash tasks with the same priority.
* an extended test for new MPITaskScheduler logic
…g logic

* `schedule_backlog_tasks` is now updated to fetch all tasks in the backlog_queue and then attempt to schedule them avoiding the infinite loop.
* A new `PrioritizedTask` dataclass is added that disable comparison on the task: dict element.
* The priority is set num_nodes * -1 to ensure that larger jobs get prioritized.
@yadudoc yadudoc marked this pull request as ready for review February 27, 2025 18:11
@yadudoc yadudoc changed the title [Draft] Fixes TypeError and infinite looping in MPITaskScheduler Fixes TypeError and infinite looping in MPITaskScheduler Feb 28, 2025
Comment on lines +196 to 205
# Separate fetching tasks from the _backlog_queue and scheduling them
# since tasks that failed to schedule will be pushed to the _backlog_queue
backlogged_tasks = []
while not self._backlog_queue.empty():
prioritized_task = self._backlog_queue.get(block=False)
backlogged_tasks.append(prioritized_task.task)

for backlogged_task in backlogged_tasks:
self.put_task(backlogged_task)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a static analysis, this looks better to me. No more infinite loop potential, but I do observe that this could mean a lot of unpacking and then repacking. "It works," so I'm not going to fuss about it, but a different data structure might help with that.

More actionably, however, this looks like it would lose tasks? What happens when .get(block=False) raises queue.Empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, Kevin! These few lines do make me worry about race conditions.

Additionally, will the very aggressive scheduling here (always attempt to schedule everything) will still result in large tasks being continually delayed? If there are small tasks, they'll get scheduled before the big one still.

That might be ok with some users, but what about a simple "run in the order of execution" strategy as our baseline?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WardLT it's issue #3323 but one layer deeper into the dispatch logic!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we shouldn't overtrivialise this or assume there's a universal solution or try to make a comprehensive set of here are the options that will satisfy everyone)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khk-globus Thanks for the review, this is a good catch!

  • Unpacking-repacking: Yep, we shouldn't have to do this if we store the resource_spec
  • queue.Empty: I was working with the idea that since only this function can pop an item from the queue, checking for empty() is sufficient to guarantee that get will not raise a queue.Empty. I can rework this to avoid this.

@WardLT I share your concern. There's no notion of fairness here, and as @benclifford pointed out coming up with scheduling logic that'll work for everyone is hard. Right now, I expect larger tasks to end up getting delayed. Like @benclifford mentioned we could move this logic to the interchange (#3323), but we still need to implement these alternative scheduling algorithms but I'm hesitant to do so without user feedback.

@benclifford
Copy link
Collaborator

there's enough interesting stuff here that the "if your PR description is an itemised list, there should be one PR per item" rule probably applies.

@yadudoc
Copy link
Member Author

yadudoc commented Mar 6, 2025

@benclifford Your comment on splitting the PR is fair, I can get that sorted.

@yadudoc yadudoc marked this pull request as draft March 10, 2025 18:49
github-merge-queue bot pushed a commit that referenced this pull request Mar 10, 2025
…l with `TypeError` (#3794)

# Description

The `MPITaskScheduler` uses Python's PriorityQueue to prioritize tasks
based on the number of nodes requested.
When items with identical priorities are are submitted to the
PriorityQueue, they attempt to sort based on the task dict which fails
with TypeError unhashable type: dict. This PR adds a new
`PrioritizedTask` dataclass that sets the task element to
`field(compare=False)`.

I'm splitting changes in #3783 to keep the PR concise. This is split 1
of 3.


# Changed Behaviour

Fixes the bug described above.

## Type of change

Choose which options apply, and delete the ones which do not apply.

- Bug fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants