Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MPITaskScheduler prioritize large tasks #3805

Open
wants to merge 1 commit into
base: fix_mpi_infinite_loop
Choose a base branch
from

Conversation

yadudoc
Copy link
Member

@yadudoc yadudoc commented Mar 11, 2025

Description

The python PriorityQueue prioritizes items with smaller priority values, which leads us to our current MPITaskScheduler prioritizing MPI tasks with fewer nodes requested since the nodes requested is used as the priority. This PR changes the ordering to prefer larger jobs.

Changed Behaviour

MPI Tasks with larger node requests are prioritized by the manager. Please note that as @WardLT pointed out in the comments on #3783, the greedy scheduling that we use can lead to larger tasks getting delayed in favor of smaller tasks that can get scheduled immediately.

These are change split from #3783 to keep the PR concise.
This is split 3 of 3.

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Update to human readable text: Documentation/error messages/comments

@benclifford benclifford changed the title MPITaskScheduler to prioritize large tasks Make MPITaskScheduler prioritize large tasks Mar 12, 2025
task_package = {"task_id": task_id, "buffer": mock_task_buffer}
scheduler.put_task(task_package)

# Confirm that the tasks are sorted in decreasing order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test checks that tests coming out of the queue are sorted in priority order (that is, is the priority part of PrioritizedTask implemented correctly).

It doesn't check that they are sorted in num_nodes order, I think? (for example, introduce a bug into the calculation at line 200 and I think this test will still pass, even though the code is buggy?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll update this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed the updated test that tests for the backlog queue returning tasks in decreasing order of nodes requested.

@benclifford
Copy link
Collaborator

what is happening with the base of this PR? this should be going to master, right?

@benclifford
Copy link
Collaborator

Update to human readable text: Documentation/error messages/comments

I think this is not the right change description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants