AWS ParallelCluster 2.3.1
We're excited to announce the release of AWS ParallelCluster Node 2.3.1.
This is associated with AWS ParallelCluster v2.3.1.
Changes
sqswatcher
: Slurm - dynamically adjust max cluster size based on ASG settingssqswatcher
: Slurm - use FUTURE state for dummy nodes to prevent Slurm daemon from contacting unexisting nodessqswatcher
: Slurm - dynamically change the number of configured FUTURE nodes based on the actual nodes that join the cluster. The max size of the cluster seen by the scheduler always matches the max capacity of the ASG.sqswatcher
: Slurm - process nodes added to or removed from the cluster in batches. This speeds up cluster scaling which is able to react with a delay of less than 1 minute to variations in the ASG capacity.sqswatcher
: Slurm - add support for job dependencies and pending reasons. The cluster won't scale up if the job cannot start due to an unsatisfied dependency.- Slurm - set
ReturnToService=1
in scheduler config in order to recover instances that were initially marked as down due to a transient issue. sqswatcher
: remove DynamoDB table creation- improve and standardize shell command execution
- add retries on failures and exceptions
Bug Fixes
sqswatcher
: Slurm - set compute nodes to DRAIN state before removing them from cluster. This prevents the scheduler from submitting a job to a node that is being terminated.sqswatcher
: Slurm - Fix host removal
Support
Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192