Skip to content

MCO-1805: MCO-1806: Add ManagedBootImagesCPMS feature gate & CPMS type to ManagedBootImages API #2396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions features.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
| KMSEncryptionProvider| | | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> |
| MachineAPIMigration| | | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> |
| ManagedBootImagesAzure| | | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> |
| ManagedBootImagesCPMS| | | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> |
| ManagedBootImagesvSphere| | | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> |
| MaxUnavailableStatefulSet| | | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> |
| MinimumKubeletVersion| | | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> | <span style="background-color: #519450">Enabled</span> |
Expand Down
8 changes: 8 additions & 0 deletions features/features.go
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,14 @@ var (
enableIn(configv1.DevPreviewNoUpgrade, configv1.TechPreviewNoUpgrade).
mustRegister()

FeatureGateManagedBootImagesCPMS = newFeatureGate("ManagedBootImagesCPMS").
reportProblemsToJiraComponent("MachineConfigOperator").
contactPerson("djoshy").
productScope(ocpSpecific).
enhancementPR("https://github.com/openshift/enhancements/pull/1761").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked EP explicitly calls out not targeting CPMS. Has there been design discussion of the impacts of enabling boot image updates on CPMS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes I can update that, will open a PR for it so the reference here can be corrected. We were asked by Service Delivery folks to bump the priority for this and we had initially this slated for TechPreview in 4.21. Some recent developments pushed Azure to 4.21, so I decided to pull this into 4.20. Since CPMS do not use marketplace AMIs/images, this should be hopefully just re-using a lot of the existing implementaton for GCP/AWS management.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure from your side it's easy to get the CPMS updated, but there's a big difference between CPMS and MachineSets that needs to be discussed, primarily, that when you update the CPMS, it could trigger a complete control plane replacement, that is potentially not desirable depending on when it happens, or even, at all in some cases. I think this is worth bringing to an architecture call, and perhaps even bringing some SD opinionated folks along

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context, I agree with your concerns. I will be happy to bring it to the next arch call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone into more detail on the issue you linked, hoping to trigger some discussion with SD, lets see if they respond

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, thanks!

enableIn(configv1.DevPreviewNoUpgrade, configv1.TechPreviewNoUpgrade).
mustRegister()

FeatureGateBootImageSkewEnforcement = newFeatureGate("BootImageSkewEnforcement").
reportProblemsToJiraComponent("MachineConfigOperator").
contactPerson("djoshy").
Expand Down
2 changes: 1 addition & 1 deletion openapi/generated_openapi/zz_generated.openapi.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion openapi/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -31373,7 +31373,7 @@
"default": ""
},
"resource": {
"description": "resource is the machine management resource's type. The only current valid value is machinesets. machinesets means that the machine manager will only register resources of the kind MachineSet.",
"description": "resource is the machine management resource's type. Valid values are machinesets and controlplanemachinesets. machinesets means that the machine manager will only register resources of the kind MachineSet. controlplanemachinesets means that the machine manager will only register resources of the kind ControlPlaneMachineSet.",
"type": "string",
"default": ""
},
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
apiVersion: apiextensions.k8s.io/v1 # Hack because controller-gen complains if we don't have this
name: "MachineConfiguration"
crdName: machineconfigurations.operator.openshift.io
featureGates:
- ManagedBootImages
tests:
onCreate:
- name: Should be able to create a minimal MachineConfiguration
initial: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec: {} # No spec is required for a MachineConfiguration
expected: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
logLevel: Normal
operatorLogLevel: Normal
- name: Should be able to create an empty ManagedBootImages configuration knob
initial: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
managedBootImages:
machineManagers: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we allow this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our original idea was that it would improve discovery: #1672 (comment)

Currently, it is used to explicitly disable updates in 4.18, so an auto opt-in does not take place on upgrade to 4.19.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are making a distinction now between omitted and the empty list? The API wasn't designed with this in mind and I'm not sure how you'd actually be doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API wasn't designed with this in mind and I'm not sure how you'd actually be doing that?

Yeah, it's not pretty 😓 and it is only meant as stop-gap < 4.18 since we have an explicit None option in 4.19+. I check if the spec list exists, if omitted, the list object would be nil and the MCO considers that to be no opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are trying to achieve doesn't work. Go decoding/encoding won't tell the difference between a persisted [] and the field being omitted completely. Take a look at the output of https://go.dev/play/p/xEYwvCwxqB3.

If you wanted to be able to tell the difference between those two states, you'd need the list to be a pointer (*[]T).

As soon as a structured client writes to the object after the use has persisted [], it will be stripped away again.

Copy link
Contributor

@JoelSpeed JoelSpeed Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, we don't have omitempty... that changes it slightly, but damn that is sketchy and fragile 👀 This is not a behaviour I would be comfortable relying on. Kubernetes doesn't have a concept of pointers, it has present, or not present. Lists have size generally, and we should not assume an empty list round trips.

Sketchy playground

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, given that the empty list method is now in use by 4.18 ROSA/Managed clusters, what would you suggest is the path forward here? Our 4.18 docs recommend the empty list for disabling prior to an upgrade, and 4.19 docs recommend the None option. Should we do some sort of migration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would backporting the none option look like? And then updating the guidance to recommend against that?

Have you had any issues of anyone reporting problems yet? I guess the problems would occur on the 4.18 to 4.19 upgrade boundary right?

expected: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
logLevel: Normal
operatorLogLevel: Normal
managedBootImages:
machineManagers: []
- name: Should be able to create a ManagedBootImages configuration knob that opts in all ControlPlaneMachineSets
initial: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
managedBootImages:
machineManagers:
- resource: controlplanemachinesets
apiGroup: machine.openshift.io
selection:
mode: All
expected: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
logLevel: Normal
operatorLogLevel: Normal
managedBootImages:
machineManagers:
- resource: controlplanemachinesets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPMS is a singleton within the cluster, perhaps we want to validate a specific selection (All?) to be required when this value is controlplanemachinesets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting, I did not know that! Yes, I can update the validation here. It will also simplify the reconciliation loop in the MCO controller.

apiGroup: machine.openshift.io
selection:
mode: All
- name: Should be able to create a ManagedBootImages configuration knob that opts in ControlPlaneMachineSets in partial mode
initial: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
managedBootImages:
machineManagers:
- resource: controlplanemachinesets
apiGroup: machine.openshift.io
selection:
mode: Partial
partial:
machineResourceSelector:
matchLabels: {}
expected: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
logLevel: Normal
operatorLogLevel: Normal
managedBootImages:
machineManagers:
- resource: controlplanemachinesets
apiGroup: machine.openshift.io
selection:
mode: Partial
partial:
machineResourceSelector:
matchLabels: {}
- name: Should not be able to add partial field if machineManager.selection.mode is not set to Partial
initial: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
managedBootImages:
machineManagers:
- resource: controlplanemachinesets
apiGroup: machine.openshift.io
selection:
mode: All
partial:
machineResourceSelector:
matchLabels: {}
expectedError: "Partial is required when type is partial, and forbidden otherwise"
- name: Only one unique pair of resource/apigroup is allowed in machineManagers
initial: |
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
managedBootImages:
machineManagers:
- resource: controlplanemachinesets
apiGroup: machine.openshift.io
selection:
mode: Partial
partial:
machineResourceSelector:
matchLabels: {}
- resource: controlplanemachinesets
apiGroup: machine.openshift.io
selection:
mode: All
expectedError: "spec.managedBootImages.machineManagers[1]: Duplicate value: map[string]interface {}{\"apiGroup\":\"machine.openshift.io\", \"resource\":\"machinesets\"}"
10 changes: 7 additions & 3 deletions operator/v1/types_machineconfiguration.go
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,9 @@ type ManagedBootImages struct {
// such as the resource type and the API Group of the resource. It also provides granular control via the selection field.
type MachineManager struct {
// resource is the machine management resource's type.
// The only current valid value is machinesets.
// Valid values are machinesets and controlplanemachinesets.
// machinesets means that the machine manager will only register resources of the kind MachineSet.
// controlplanemachinesets means that the machine manager will only register resources of the kind ControlPlaneMachineSet.
// +required
Resource MachineManagerMachineSetsResourceType `json:"resource"`

Expand Down Expand Up @@ -194,12 +195,15 @@ const (

// MachineManagerManagedResourceType is a string enum used in the MachineManager type to describe the resource
// type to be registered.
// +kubebuilder:validation:Enum:="machinesets"
// +openshift:validation:FeatureGateAwareEnum:featureGate="",enum=machinesets
// +openshift:validation:FeatureGateAwareEnum:featureGate=ManagedBootImagesCPMS,enum=machinesets;controlplanemachinesets
type MachineManagerMachineSetsResourceType string

const (
// MachineSets represent the MachineSet resource type, which manage a group of machines and belong to the Openshift machine API group.
MachineSets MachineManagerMachineSetsResourceType = "machinesets"
// ControlPlaneMachineSets represent the ControlPlaneMachineSets resource type, which manage a group of control-plane machines and belong to the Openshift machine API group.
ControlPlaneMachineSets MachineManagerMachineSetsResourceType = "controlplanemachinesets"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is there a way to only enable this value of enum on the feature gate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is. You would change the usage of +kubebuilder:validation:Enum to the following:

  • +openshift:validation:FeatureGateAwareEnum:featureGate="",enum="machinesets"
  • +openshift:validation:FeatureGateAwareEnum:featureGate="ManagedBootImagesCPMS",enum="machinesets";"controlplanemachinesets"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated the PR. PTAL when you get a chance (:

)

// MachineManagerManagedAPIGroupType is a string enum used in in the MachineManager type to describe the APIGroup
Expand All @@ -209,7 +213,7 @@ type MachineManagerMachineSetsAPIGroupType string

const (
// MachineAPI represent the traditional MAPI Group that a machineset may belong to.
// This feature only supports MAPI machinesets at this time.
// This feature only supports MAPI machinesets and controlplanemachinesets at this time.
MachineAPI MachineManagerMachineSetsAPIGroupType = "machine.openshift.io"
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,9 @@ spec:
resource:
description: |-
resource is the machine management resource's type.
The only current valid value is machinesets.
Valid values are machinesets and controlplanemachinesets.
machinesets means that the machine manager will only register resources of the kind MachineSet.
enum:
- machinesets
controlplanemachinesets means that the machine manager will only register resources of the kind ControlPlaneMachineSet.
type: string
selection:
description: selection allows granular control of the machine
Expand Down Expand Up @@ -733,10 +732,9 @@ spec:
resource:
description: |-
resource is the machine management resource's type.
The only current valid value is machinesets.
Valid values are machinesets and controlplanemachinesets.
machinesets means that the machine manager will only register resources of the kind MachineSet.
enum:
- machinesets
controlplanemachinesets means that the machine manager will only register resources of the kind ControlPlaneMachineSet.
type: string
selection:
description: selection allows granular control of the machine
Expand Down
1 change: 1 addition & 0 deletions operator/v1/zz_generated.featuregated-crd-manifests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,7 @@ machineconfigurations.operator.openshift.io:
Category: ""
FeatureGates:
- ManagedBootImages
- ManagedBootImagesCPMS
FilenameOperatorName: machine-config
FilenameOperatorOrdering: "01"
FilenameRunLevel: "0000_80"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,9 @@ spec:
resource:
description: |-
resource is the machine management resource's type.
The only current valid value is machinesets.
Valid values are machinesets and controlplanemachinesets.
machinesets means that the machine manager will only register resources of the kind MachineSet.
enum:
- machinesets
controlplanemachinesets means that the machine manager will only register resources of the kind ControlPlaneMachineSet.
type: string
selection:
description: selection allows granular control of the machine
Expand Down Expand Up @@ -734,10 +733,9 @@ spec:
resource:
description: |-
resource is the machine management resource's type.
The only current valid value is machinesets.
Valid values are machinesets and controlplanemachinesets.
machinesets means that the machine manager will only register resources of the kind MachineSet.
enum:
- machinesets
controlplanemachinesets means that the machine manager will only register resources of the kind ControlPlaneMachineSet.
type: string
selection:
description: selection allows granular control of the machine
Expand Down
Loading