-
Notifications
You must be signed in to change notification settings - Fork 552
MCO-1805: MCO-1806: Add ManagedBootImagesCPMS feature gate & CPMS type to ManagedBootImages API #2396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
MCO-1805: MCO-1806: Add ManagedBootImagesCPMS feature gate & CPMS type to ManagedBootImages API #2396
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
apiVersion: apiextensions.k8s.io/v1 # Hack because controller-gen complains if we don't have this | ||
name: "MachineConfiguration" | ||
crdName: machineconfigurations.operator.openshift.io | ||
featureGates: | ||
- ManagedBootImages | ||
tests: | ||
onCreate: | ||
- name: Should be able to create a minimal MachineConfiguration | ||
initial: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: {} # No spec is required for a MachineConfiguration | ||
expected: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
logLevel: Normal | ||
operatorLogLevel: Normal | ||
- name: Should be able to create an empty ManagedBootImages configuration knob | ||
initial: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
managedBootImages: | ||
machineManagers: [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we allow this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think our original idea was that it would improve discovery: #1672 (comment) Currently, it is used to explicitly disable updates in 4.18, so an auto opt-in does not take place on upgrade to 4.19. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So you are making a distinction now between omitted and the empty list? The API wasn't designed with this in mind and I'm not sure how you'd actually be doing that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not pretty 😓 and it is only meant as stop-gap < 4.18 since we have an explicit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What you are trying to achieve doesn't work. Go decoding/encoding won't tell the difference between a persisted If you wanted to be able to tell the difference between those two states, you'd need the list to be a pointer ( As soon as a structured client writes to the object after the use has persisted There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh wait, we don't have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Understood, given that the empty list method is now in use by 4.18 ROSA/Managed clusters, what would you suggest is the path forward here? Our 4.18 docs recommend the empty list for disabling prior to an upgrade, and 4.19 docs recommend the None option. Should we do some sort of migration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would backporting the none option look like? And then updating the guidance to recommend against that? Have you had any issues of anyone reporting problems yet? I guess the problems would occur on the 4.18 to 4.19 upgrade boundary right? |
||
expected: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
logLevel: Normal | ||
operatorLogLevel: Normal | ||
managedBootImages: | ||
machineManagers: [] | ||
- name: Should be able to create a ManagedBootImages configuration knob that opts in all ControlPlaneMachineSets | ||
initial: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
managedBootImages: | ||
machineManagers: | ||
- resource: controlplanemachinesets | ||
apiGroup: machine.openshift.io | ||
selection: | ||
mode: All | ||
expected: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
logLevel: Normal | ||
operatorLogLevel: Normal | ||
managedBootImages: | ||
machineManagers: | ||
- resource: controlplanemachinesets | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CPMS is a singleton within the cluster, perhaps we want to validate a specific selection (All?) to be required when this value is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, interesting, I did not know that! Yes, I can update the validation here. It will also simplify the reconciliation loop in the MCO controller. |
||
apiGroup: machine.openshift.io | ||
selection: | ||
mode: All | ||
- name: Should be able to create a ManagedBootImages configuration knob that opts in ControlPlaneMachineSets in partial mode | ||
initial: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
managedBootImages: | ||
machineManagers: | ||
- resource: controlplanemachinesets | ||
apiGroup: machine.openshift.io | ||
selection: | ||
mode: Partial | ||
partial: | ||
machineResourceSelector: | ||
matchLabels: {} | ||
expected: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
logLevel: Normal | ||
operatorLogLevel: Normal | ||
managedBootImages: | ||
machineManagers: | ||
- resource: controlplanemachinesets | ||
apiGroup: machine.openshift.io | ||
selection: | ||
mode: Partial | ||
partial: | ||
machineResourceSelector: | ||
matchLabels: {} | ||
- name: Should not be able to add partial field if machineManager.selection.mode is not set to Partial | ||
initial: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
managedBootImages: | ||
machineManagers: | ||
- resource: controlplanemachinesets | ||
apiGroup: machine.openshift.io | ||
selection: | ||
mode: All | ||
partial: | ||
machineResourceSelector: | ||
matchLabels: {} | ||
expectedError: "Partial is required when type is partial, and forbidden otherwise" | ||
- name: Only one unique pair of resource/apigroup is allowed in machineManagers | ||
initial: | | ||
apiVersion: operator.openshift.io/v1 | ||
kind: MachineConfiguration | ||
spec: | ||
managedBootImages: | ||
machineManagers: | ||
- resource: controlplanemachinesets | ||
apiGroup: machine.openshift.io | ||
selection: | ||
mode: Partial | ||
partial: | ||
machineResourceSelector: | ||
matchLabels: {} | ||
- resource: controlplanemachinesets | ||
apiGroup: machine.openshift.io | ||
selection: | ||
mode: All | ||
expectedError: "spec.managedBootImages.machineManagers[1]: Duplicate value: map[string]interface {}{\"apiGroup\":\"machine.openshift.io\", \"resource\":\"machinesets\"}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -135,8 +135,9 @@ type ManagedBootImages struct { | |
// such as the resource type and the API Group of the resource. It also provides granular control via the selection field. | ||
type MachineManager struct { | ||
// resource is the machine management resource's type. | ||
// The only current valid value is machinesets. | ||
// Valid values are machinesets and controlplanemachinesets. | ||
// machinesets means that the machine manager will only register resources of the kind MachineSet. | ||
// controlplanemachinesets means that the machine manager will only register resources of the kind ControlPlaneMachineSet. | ||
// +required | ||
Resource MachineManagerMachineSetsResourceType `json:"resource"` | ||
|
||
|
@@ -194,12 +195,15 @@ const ( | |
|
||
// MachineManagerManagedResourceType is a string enum used in the MachineManager type to describe the resource | ||
// type to be registered. | ||
// +kubebuilder:validation:Enum:="machinesets" | ||
// +openshift:validation:FeatureGateAwareEnum:featureGate="",enum=machinesets | ||
// +openshift:validation:FeatureGateAwareEnum:featureGate=ManagedBootImagesCPMS,enum=machinesets;controlplanemachinesets | ||
type MachineManagerMachineSetsResourceType string | ||
|
||
const ( | ||
// MachineSets represent the MachineSet resource type, which manage a group of machines and belong to the Openshift machine API group. | ||
MachineSets MachineManagerMachineSetsResourceType = "machinesets" | ||
// ControlPlaneMachineSets represent the ControlPlaneMachineSets resource type, which manage a group of control-plane machines and belong to the Openshift machine API group. | ||
ControlPlaneMachineSets MachineManagerMachineSetsResourceType = "controlplanemachinesets" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: Is there a way to only enable this value of enum on the feature gate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes there is. You would change the usage of
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I've updated the PR. PTAL when you get a chance (: |
||
) | ||
|
||
// MachineManagerManagedAPIGroupType is a string enum used in in the MachineManager type to describe the APIGroup | ||
|
@@ -209,7 +213,7 @@ type MachineManagerMachineSetsAPIGroupType string | |
|
||
const ( | ||
// MachineAPI represent the traditional MAPI Group that a machineset may belong to. | ||
// This feature only supports MAPI machinesets at this time. | ||
// This feature only supports MAPI machinesets and controlplanemachinesets at this time. | ||
MachineAPI MachineManagerMachineSetsAPIGroupType = "machine.openshift.io" | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked EP explicitly calls out not targeting CPMS. Has there been design discussion of the impacts of enabling boot image updates on CPMS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes I can update that, will open a PR for it so the reference here can be corrected. We were asked by Service Delivery folks to bump the priority for this and we had initially this slated for TechPreview in 4.21. Some recent developments pushed Azure to 4.21, so I decided to pull this into 4.20. Since CPMS do not use marketplace AMIs/images, this should be hopefully just re-using a lot of the existing implementaton for GCP/AWS management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure from your side it's easy to get the CPMS updated, but there's a big difference between CPMS and MachineSets that needs to be discussed, primarily, that when you update the CPMS, it could trigger a complete control plane replacement, that is potentially not desirable depending on when it happens, or even, at all in some cases. I think this is worth bringing to an architecture call, and perhaps even bringing some SD opinionated folks along
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context, I agree with your concerns. I will be happy to bring it to the next arch call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone into more detail on the issue you linked, hoping to trigger some discussion with SD, lets see if they respond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, thanks!