Skip to content

Commit 1cf4cc4

Browse files
authored
Merge branch 'release-3.14' into release-3.14.0
2 parents 2e9d1fe + e81104a commit 1cf4cc4

File tree

2 files changed

+39
-12
lines changed

2 files changed

+39
-12
lines changed

CHANGELOG.md

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,23 @@ This file is used to list changes made in each version of the AWS ParallelCluste
77
------
88

99
**ENHANCEMENTS**
10-
- Add support for P6e-GB200 instances. ParallelCluster sets up Slurm topology plugin to handle P6e-GB200 UltraServers. See limitations section for important additional setup requirements.
11-
- Add support for P6-B200 instances for all OSs except AL2.
10+
- Include drivers for P6e-GB200 and P6-B200 instances. ParallelCluster sets up Slurm topology plugin to handle P6e-GB200 UltraServers. See limitations section for important additional setup requirements.
11+
- Support `prioritized` and `capacity-optimized-prioritized` Allocation Strategy. This allows users to prioritize subnets for instance placement to optimize costs and performance.
1212
- Add `build-image` support for Amazon Linux 2023 AMIs based on kernel 6.12 (in addition to 6.1).
13+
- Support DCV on Amazon Linux 2023.
14+
- Echo chef-client logs in the instance console when a node fails to bootstrap. This helps with investigating bootstrap failures in cases CloudWatch logs are not available.
1315

1416
**LIMITATIONS**
1517
- P6e-GB200 instances are only tested on Amazon Linux 2023, Ubuntu 22.04 and Ubuntu 24.04.
16-
- Using IMEX on P6e-GB200 requires additional setup. Please refer to <PLACE_HOLDER for the tutorial link>.
18+
- Using IMEX on P6e-GB200 requires additional setup. Please refer to the dedicated tutorial in our public documentation.
19+
- P6-B200 instances are only tested on Amazon Linux 2023, RHEL9, Ubuntu 22.04 and Ubuntu 24.04.
1720

1821
**CHANGES**
19-
- Install nvidia-imex for all OSs except AL2.
20-
- Remove `berkshelf`. All cookbooks are local and do not need `berkshelf` dependency management.
22+
- Install nvidia-imex for all OSs except Amazon Linux 2.
2123
- Remove `UnkillableStepTimeout` from slurm.conf and let slurm set this value.
24+
- Upgrade Python runtime used by Lambda functions to Python 3.12 (from 3.9). See Lambda Documentation for important information about Python 3.9 EOL: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
25+
- Support encryption of EFS file system used for the head node internal shared storage via a new configuration parameter `HeadNode/SharedStorageEfsSettings/Encrypted`
26+
- Add validator that warns against using non GPU instances with DCV.
2227
- Upgrade Slurm to version 24.11.6 (from 24.05.8).
2328
- Upgrade EFA installer to 1.43.2 (from 1.41.0).
2429
- Efa-driver: efa-2.17.2-1
@@ -28,20 +33,26 @@ This file is used to list changes made in each version of the AWS ParallelCluste
2833
- Rdma-core: rdma-core-58.0-1
2934
- Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6-11
3035
- Upgrade Cinc Client to version 18.4.12 (from 18.2.7).
31-
- Upgrade NVIDIA driver to version 570.172.08 (from 570.86.15) for all OSs except AL2.
32-
- Upgrade CUDA Toolkit to version 12.8.1 (from 12.8.0) for all OSs except AL2.
33-
- Upgrade DCGM to version 4.4.1 (from 3.3.6) for all OSs except AL2.
34-
- Upgrade Python to 3.12.11 (from 3.12.8) for all OSs except AL2.
35-
- Upgrade Python to 3.9.23 (from 3.9.20) for AL2.
36+
- Upgrade NVIDIA driver to version 570.172.08 (from 570.86.15) for all OSs except Amazon Linux 2.
37+
- Upgrade CUDA Toolkit to version 12.8.1 (from 12.8.0) for all OSs except Amazon Linux 2.
38+
- Upgrade DCGM to version 4.4.1 (from 3.3.6) for all OSs except Amazon Linux 2.
39+
- Upgrade Python to 3.12.11 (from 3.12.8) for all OSs except Amazon Linux 2.
40+
- Upgrade Python to 3.9.23 (from 3.9.20) for Amazon Linux 2.
3641
- Upgrade Intel MPI Library to 2021.16.0 (from 2021.13.1).
3742
- Upgrade DCV to version 2024.0-19030.
3843
- Upgrade the official ParallelCluster Amazon Linux 2023 AMIs to kernel 6.12 (from 6.1).
3944

4045
**BUG FIXES**
41-
- Fix a race condition in CloudWatch Agent startup that could cause nodes bootstrap failures.
42-
- Fix cluster id mismatch issue by deleting the file `/var/spool/slurm.state/clustername` before configuring Slurm accounting.
46+
- Prevent `build-image` stack deletion failures by deploying a global role that automatically deletes the `build-image` stack after images either succeed or fail the build.
47+
The role is meant to exist even after the stack has been deleted. See https://github.com/aws/aws-parallelcluster/issues/5914.
48+
- Fix an issue where Security Group validation failed when a rule contained both IPv4 ranges (IpRanges) and security group references (UserIdGroupPairs).
49+
- Fix `build-image` failure on Rocky 9, occurring when the parent image does not ship the latest kernel version on the latest Rocky minor version.
50+
- Fix cluster id mismatch issue which causes cluster update failures when slurm accounting is used.
51+
- Fix a race condition in CloudWatch Agent startup that could cause node bootstrap failures.
4352

4453
**DEPRECATIONS**
54+
- The configuration parameter `LoginNodes/Pools/Ssh/KeyName` has been deprecated, and it will be removed in future releases. The CLI now returns a warning message when it is used in the cluster configuration.
55+
See https://github.com/aws/aws-parallelcluster/issues/6811.
4556
- Ubuntu 20.04 is no longer supported.
4657

4758
3.13.2

cookbooks/aws-parallelcluster-platform/resources/dcv/partial/_ubuntu_common.rb

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,4 +81,20 @@ def optionally_disable_rnd
8181
command "sed --in-place '/RANDFILE/d' /etc/ssl/openssl.cnf"
8282
end
8383
end
84+
85+
def post_install
86+
# ubuntu-desktop comes with NetworkManager. On a cloud instance NetworkManager is unnecessary and causes delay.
87+
# Instruct Netplan to use networkd for better performance
88+
bash 'Instruct Netplan to use networkd' do
89+
code <<-NETPLAN
90+
set -e
91+
cat > /etc/netplan/95-parallelcluster-force-networkd.yaml << 'EOF'
92+
network:
93+
version: 2
94+
renderer: networkd
95+
EOF
96+
netplan apply
97+
NETPLAN
98+
end unless on_docker?
99+
end
84100
end

0 commit comments

Comments
 (0)