-
Notifications
You must be signed in to change notification settings - Fork 551
MCO-1805: MCO-1806: Add ManagedBootImagesCPMS feature gate & CPMS type to ManagedBootImages API #2396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@djoshy: This pull request references MCO-1805 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hello @djoshy! Some important instructions when contributing to openshift/api: |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: djoshy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
type MachineManagerMachineSetsResourceType string | ||
|
||
const ( | ||
// MachineSets represent the MachineSet resource type, which manage a group of machines and belong to the Openshift machine API group. | ||
MachineSets MachineManagerMachineSetsResourceType = "machinesets" | ||
// ControlPlaneMachineSets represent the ControlPlaneMachineSets resource type, which manage a group of control-plane machines and belong to the Openshift machine API group. | ||
ControlPlaneMachineSets MachineManagerMachineSetsResourceType = "controlplanemachinesets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is there a way to only enable this value of enum on the feature gate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there is. You would change the usage of +kubebuilder:validation:Enum
to the following:
+openshift:validation:FeatureGateAwareEnum:featureGate="",enum="machinesets"
+openshift:validation:FeatureGateAwareEnum:featureGate="ManagedBootImagesCPMS",enum="machinesets";"controlplanemachinesets"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've updated the PR. PTAL when you get a chance (:
18ab992
to
41adfe1
Compare
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
reportProblemsToJiraComponent("MachineConfigOperator"). | ||
contactPerson("djoshy"). | ||
productScope(ocpSpecific). | ||
enhancementPR("https://github.com/openshift/enhancements/pull/1761"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked EP explicitly calls out not targeting CPMS. Has there been design discussion of the impacts of enabling boot image updates on CPMS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes I can update that, will open a PR for it so the reference here can be corrected. We were asked by Service Delivery folks to bump the priority for this and we had initially this slated for TechPreview in 4.21. Some recent developments pushed Azure to 4.21, so I decided to pull this into 4.20. Since CPMS do not use marketplace AMIs/images, this should be hopefully just re-using a lot of the existing implementaton for GCP/AWS management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure from your side it's easy to get the CPMS updated, but there's a big difference between CPMS and MachineSets that needs to be discussed, primarily, that when you update the CPMS, it could trigger a complete control plane replacement, that is potentially not desirable depending on when it happens, or even, at all in some cases. I think this is worth bringing to an architecture call, and perhaps even bringing some SD opinionated folks along
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context, I agree with your concerns. I will be happy to bring it to the next arch call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone into more detail on the issue you linked, hoping to trigger some discussion with SD, lets see if they respond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, thanks!
kind: MachineConfiguration | ||
spec: | ||
managedBootImages: | ||
machineManagers: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we allow this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our original idea was that it would improve discovery: #1672 (comment)
Currently, it is used to explicitly disable updates in 4.18, so an auto opt-in does not take place on upgrade to 4.19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are making a distinction now between omitted and the empty list? The API wasn't designed with this in mind and I'm not sure how you'd actually be doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API wasn't designed with this in mind and I'm not sure how you'd actually be doing that?
Yeah, it's not pretty 😓 and it is only meant as stop-gap < 4.18 since we have an explicit None
option in 4.19+. I check if the spec list exists, if omitted, the list object would be nil and the MCO considers that to be no opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are trying to achieve doesn't work. Go decoding/encoding won't tell the difference between a persisted []
and the field being omitted completely. Take a look at the output of https://go.dev/play/p/xEYwvCwxqB3.
If you wanted to be able to tell the difference between those two states, you'd need the list to be a pointer (*[]T
).
As soon as a structured client writes to the object after the use has persisted []
, it will be stripped away again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, we don't have omitempty
... that changes it slightly, but damn that is sketchy and fragile 👀 This is not a behaviour I would be comfortable relying on. Kubernetes doesn't have a concept of pointers, it has present, or not present. Lists have size generally, and we should not assume an empty list round trips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, given that the empty list method is now in use by 4.18 ROSA/Managed clusters, what would you suggest is the path forward here? Our 4.18 docs recommend the empty list for disabling prior to an upgrade, and 4.19 docs recommend the None option. Should we do some sort of migration?
operatorLogLevel: Normal | ||
managedBootImages: | ||
machineManagers: | ||
- resource: controlplanemachinesets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPMS is a singleton within the cluster, perhaps we want to validate a specific selection (All?) to be required when this value is controlplanemachinesets
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting, I did not know that! Yes, I can update the validation here. It will also simplify the reconciliation loop in the MCO controller.
This PR adds:
ControlPlaneMachineSets
MachineManagerMachineSetsResourceType
enum for CPMS so they can be opted in for updates via the MachineConfiguration API object.