Skip to content

Releases: oracle-quickstart/oci-hpc-oke

OKE RDMA Quickstart Resource Manager template v25.5.0

16 May 05:32
0ce2acc
Compare
Choose a tag to compare
  • Added AMD Device Metrics Exporter
  • Added AMD dashboards

OKE RDMA Quickstart Resource Manager template v25.4.0

22 Apr 04:20
3fa53ef
Compare
Choose a tag to compare
  • Added Kubernetes v1.32
  • Changed the default number of maximum pods per node to 110

OKE RDMA Quickstart Resource Manager template v25.3.1

31 Mar 04:54
6bac725
Compare
Choose a tag to compare
  • OKE AMD GPU device plugin is enabled for BM.GPU.MI300X.8 shape
  • OKE DCGM Exporter is disabled (upstream DCGM Exporter is deployed)
  • Helm fix for Grafana load balancer not being deleted properly on Terraform destroy
  • Updated the health checks for Node Problem Detector
  • Updated Grafana dashboards
  • Added the required policies for Oracle Cloud Agent GPU/RDMA monitoring

OKE RDMA Quickstart Resource Manager template v25.3.0

18 Mar 20:49
6bac725
Compare
Choose a tag to compare
  • VCN-native pod networking is now the default option for pod networking instead of Flannel.
  • Node Problem Detector is now deployed part of the stack and integrated with the Prometheus/Grafana stack for alerting.
  • Switched to using the upstream OKE Terraform module.

OKE RDMA Quickstart Resource Manager template v25.3.0-beta

03 Mar 18:14
4abe3df
Compare
Choose a tag to compare
  • VCN-native pod networking is now the default option for pod networking instead of Flannel.
  • Node Problem Detector is now deployed part of the stack.
  • Fixed a Node Exporter issue preventing metrics from being streamed from bare metal GPU nodes.

OKE RDMA Quickstart Resource Manager template v25.2.0

05 Feb 23:43
cd6b384
Compare
Choose a tag to compare
  • The OKE GPU Device plugin is now enabled by default.
  • Added Kubernetes version 1.30 & 1.31.

OKE RDMA Quickstart Resource Manager template v24.10.0

20 Oct 21:27
b09f579
Compare
Choose a tag to compare

Important

Because we moved to Terraform v1.5, this new release is a breaking change. Do not deploy this stack in your existing OKE clusters, only use for deploying new clusters.

  • Updated to Terraform v1.5, the same templates can now be used for both OCI Resource Manager and regular Terraform.
  • The bastion and operator nodes now use Ubuntu.
  • Added an option to deploy the Prometheus/Grafana stack with DCGM Exporter.
  • Added an option to create a RAID 0 array using the local NVMe drives on the nodes and configure Kubernetes to use it for container storage.
  • Added options to create storage classes for FSS (File Storage Service) and high performance block volumes.

OKE RDMA Quickstart Resource Manager template v24.9.2

23 Sep 23:11
d9fb65a
Compare
Choose a tag to compare
  • Updated OS images
  • Moved cloud init script to the repo
  • Rearranged grouping in Resource Manager schema
  • Enabled selecting an image from other compartments

OKE RDMA Quickstart Resource Manager template v24.9.1

11 Sep 02:06
b64a6af
Compare
Choose a tag to compare
  • Updated package location.
  • Added an option to disable the GPU device plugin deployed by OKE.

OKE RDMA Quickstart Resource Manager template v24.7.1

18 Jul 21:06
ad30f60
Compare
Choose a tag to compare
  • Added option to choose the boot volume size for Operational Worker Pool.