Calling sacct
on many jobs Slurm hangs causing Globus Compute tasks to timeout
#3814
Labels
sacct
on many jobs Slurm hangs causing Globus Compute tasks to timeout
#3814
Describe the bug
A user saw that Slurm calls for
_status
hang when there are more than 200 jobs on Perlmutter. This could also be the case on other systems as well but haven't tested.To Reproduce
Start many long running jobs on Perlmutter or other Slurm cluster through Parsl.
Expected behavior
Calls to
_status
should return quickly and not hang when interacting with Slurm. This should also work the same for a small job list as well as large a large job list.Environment
Distributed Environment
The text was updated successfully, but these errors were encountered: