Skip to content

Conversation

@tthvo
Copy link
Member

@tthvo tthvo commented Nov 26, 2025

Descriptions

GP3 volumes have the ability to configure throughput from 125 MiB/s to 2000 MiB/s. This allows the ability to set this at install time in the install-config.

This is a take-2 of #9945 after it was reverted in #10131 due to aws serial test failure.

Notes

The following new changes are introduced:

  • The throughput field is now a int32 pointer to match openshift/api. It does break the convention with iops, but at least it gives us an option to represent specified vs left-empty.
  • A nil check is performed before passing throughput to CAPA and MAPI.
  • A set of unit tests to introduce to double-check the validations for throughput.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 26, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 26, 2025

@tthvo: This pull request references CORS-4212 which is a valid jira issue.

In response to this:

Descriptions

GP3 volumes have the ability to configure throughput from 125 MiB/s to 2000 MiB/s. This allows the ability to set this at install time in the install-config.

This is a take-2 of #9945 after it was reverted in #10131 due to aws serial test failure.

Notes

The following new changes are introduced:

  • The throughput field is now a int32 pointer to match openshift/api. It does break the convention with iops, but at least it gives us an option to represent specified vs left-empty.
  • A nil check is performed before passing throughput to CAPA and MAPI.
  • A set of unit tests to introduce to double-check the validations for throughput.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from jhixson74 and mtulio November 26, 2025 18:38
@tthvo
Copy link
Member Author

tthvo commented Nov 26, 2025

/cc @patrickdillon @jianlinliu @yunjiang29 @liweinan

@tthvo
Copy link
Member Author

tthvo commented Nov 26, 2025

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-serial-1of2

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 26, 2025

@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-serial-1of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c7bb9e60-caf7-11f0-854d-0994ba12b72e-0

@tthvo
Copy link
Member Author

tthvo commented Nov 26, 2025

/label platform/aws

@tthvo
Copy link
Member Author

tthvo commented Nov 26, 2025

/test e2e-aws-ovn-public-subnets e2e-aws-ovn-public-ipv4-pool e2e-aws-ovn-custom-iam-profile e2e-aws-overlay-mtu-ovn-1200

@tthvo
Copy link
Member Author

tthvo commented Nov 26, 2025

/cc @JoelSpeed

@openshift-ci openshift-ci bot requested a review from JoelSpeed November 26, 2025 19:52
@tthvo
Copy link
Member Author

tthvo commented Nov 26, 2025

/retest

Test failures seem fine (unrelated) 👀

@tthvo
Copy link
Member Author

tthvo commented Nov 26, 2025

* periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-serial-1of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c7bb9e60-caf7-11f0-854d-0994ba12b72e-0

Payload passed this time 👍

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-overlay-mtu-ovn-1200 d636ff2 link false /test e2e-aws-overlay-mtu-ovn-1200
ci/prow/e2e-aws-ovn-heterogeneous d636ff2 link false /test e2e-aws-ovn-heterogeneous

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@liweinan
Copy link

Used the build: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-aws-modern/1993960674010599424

Verified by the script locally: https://github.com/openshift/release/pull/71426/files#diff-5e156492e6ec7b6cc1782f27eb74071576df36e33d3caf3f92f73084af22a03f

cd /Users/weli/works/oc-swarm/release && SHARED_DIR=/tmp/work4-shared CLUSTER_PROFILE_DIR=/tmp/work4-shared bash ci-operator/step-registry/cucushift/installer/check/aws/rootvolume/cucushift-installer-check-aws-rootvolume-commands.sh
Cluster ID: weli-test-8fm49
Region: us-east-1
Expected compute throughput: 1000 MiB/s
Expected control plane throughput: 1200 MiB/s
Expected worker rootVolume size: 120 GiB
Expected control-plane rootVolume size: 150 GiB
Checking worker nodes
PASS: ip-10-0-39-111.ec2.internal volume vol-034246304384809e8 type=gp3 size=120GiB iops=5000 throughput=1000MiB/s
PASS: ip-10-0-4-150.ec2.internal volume vol-05f55a2f17af0cf1c type=gp3 size=120GiB iops=5000 throughput=1000MiB/s
PASS: ip-10-0-64-225.ec2.internal volume vol-08f7acd7aa708a280 type=gp3 size=120GiB iops=5000 throughput=1000MiB/s
Checking control plane nodes
PASS: ip-10-0-30-251.ec2.internal volume vol-097a2aef4a22e5f3f type=gp3 size=150GiB iops=5000 throughput=1200MiB/s
PASS: ip-10-0-51-181.ec2.internal volume vol-084b2e4af55d5e006 type=gp3 size=150GiB iops=5000 throughput=1200MiB/s
PASS: ip-10-0-64-82.ec2.internal volume vol-0aab6f02caafe824c type=gp3 size=150GiB iops=5000 throughput=1200MiB/s
==========================================
Test Summary
All root volume throughput checks passed.

/verified by liweinan

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 27, 2025
@openshift-ci-robot
Copy link
Contributor

@liweinan: This PR has been marked as verified by liweinan.

In response to this:

Used the build: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-aws-modern/1993960674010599424

Verified by the script locally: https://github.com/openshift/release/pull/71426/files#diff-5e156492e6ec7b6cc1782f27eb74071576df36e33d3caf3f92f73084af22a03f

cd /Users/weli/works/oc-swarm/release && SHARED_DIR=/tmp/work4-shared CLUSTER_PROFILE_DIR=/tmp/work4-shared bash ci-operator/step-registry/cucushift/installer/check/aws/rootvolume/cucushift-installer-check-aws-rootvolume-commands.sh
Cluster ID: weli-test-8fm49
Region: us-east-1
Expected compute throughput: 1000 MiB/s
Expected control plane throughput: 1200 MiB/s
Expected worker rootVolume size: 120 GiB
Expected control-plane rootVolume size: 150 GiB
Checking worker nodes
PASS: ip-10-0-39-111.ec2.internal volume vol-034246304384809e8 type=gp3 size=120GiB iops=5000 throughput=1000MiB/s
PASS: ip-10-0-4-150.ec2.internal volume vol-05f55a2f17af0cf1c type=gp3 size=120GiB iops=5000 throughput=1000MiB/s
PASS: ip-10-0-64-225.ec2.internal volume vol-08f7acd7aa708a280 type=gp3 size=120GiB iops=5000 throughput=1000MiB/s
Checking control plane nodes
PASS: ip-10-0-30-251.ec2.internal volume vol-097a2aef4a22e5f3f type=gp3 size=150GiB iops=5000 throughput=1200MiB/s
PASS: ip-10-0-51-181.ec2.internal volume vol-084b2e4af55d5e006 type=gp3 size=150GiB iops=5000 throughput=1200MiB/s
PASS: ip-10-0-64-82.ec2.internal volume vol-0aab6f02caafe824c type=gp3 size=150GiB iops=5000 throughput=1200MiB/s
==========================================
Test Summary
All root volume throughput checks passed.

/verified by liweinan

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Encrypted: pointer.Bool(true),
KMSKey: machineapi.AWSResourceReference{ARN: pointer.String(in.root.KMSKeyARN)},
}
if in.root.Throughput != nil && *in.root.Throughput > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an end users explicitly sets this to zero, wouldn't the validateThroughput catch it before we get here?

I think the nil check is probably sufficient

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we even need the nil check, we can just assign the value, nil or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it is a bit of a tricky situation: The validateThroughput only runs when the machine pool defines the EBS volume type (i.e. similar to how validateIOPS and validateVolumeSize). See here and block below:

if p.EC2RootVolume.Type != "" {
  allErrs = append(allErrs, validateVolumeSize(p, fldPath)...)
  allErrs = append(allErrs, validateIOPS(p, fldPath)...)
  allErrs = append(allErrs, validateThroughput(p, fldPath)...)
}

This makes sense because the volume type is optional as we set the default type later when generating the machine manifests. But it means the installer can skip validating throughput.

I think everyone's suggestion is the right way 👍 and the problem is that the validation should catch this. So, I updated the PR to set the volume type default (if empty) early so that the throughput validation path runs, avoiding any 0 throughput value.

PTAL 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we set the default type later when generating the machine manifests

By this, I meant the code block below 👇 The commit here just uses the same logic to set the volume type somewhere earlier.

func defaultAWSMachinePoolPlatform(poolName string) awstypes.MachinePool {
defaultEBSType := awstypes.VolumeTypeGp3
// gp3 is not offered in all local-zones locations used by Edge Pools.
// Once it is available, it can be used as default for all machine pools.
// https://aws.amazon.com/about-aws/global-infrastructure/localzones/features
if poolName == types.MachinePoolEdgeRoleName {
defaultEBSType = awstypes.VolumeTypeGp2
}
return awstypes.MachinePool{
EC2RootVolume: awstypes.EC2RootVolume{
Type: defaultEBSType,
Size: decimalRootVolumeSize,
},
}
}

@patrickdillon
Copy link
Contributor

/approve

this lgtm, but some nits for discussion from joel & me

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 1, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 1, 2025
GP3 volumes have the ability to configure throughput from 125 MiB/s to
2000 MiB/s. This allows the ability to set this at install time in the
install-config.

https://issues.redhat.com/browse/CORS-4212
Set the default type EBS volume for machine pools:
- Controlplane, arbiter and worker pool default to gp3 volume.
- Edge pool default to gp2 volume.

The default decision is taken from existing code [0]. This commit just
makes the defaulting earlier.

Reference

[0] https://github.com/openshift/installer/blob/fd5a518e4951510b82705eee8184b3dd4f2723b2/pkg/asset/machines/worker.go#L102-L117
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Dec 2, 2025
@tthvo
Copy link
Member Author

tthvo commented Dec 2, 2025

/approve

this lgtm, but some nits for discussion from joel & me

Thank you! I updated the PR. PTAL again 🙏

@tthvo
Copy link
Member Author

tthvo commented Dec 2, 2025

/test e2e-aws-ovn-shared-vpc-edge-zones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. platform/aws

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants