Skip to content

[OTA-1545] Extend ClusterVersion for accepted risks #2360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Jun 9, 2025

The API extensions is proposed in openshift/enhancements#1807

  • Add a new field 'clusterversion.spec.desiredUpdate.accept': It contains
    the names of conditional update risks that are considered acceptable.
  • Move clusterversion.status.conditionalUpdates.risks two levels up as
    clusterversion.status.conditionalUpdateRisks. It contains all the risks
    for clusterversion.status.conditionalUpdates.
  • Add new field 'clusterversion.status.conditionalUpdates.riskNames': It
    contains the names of risk for the conditional update. It deprecates
    clusterversion.status.conditionalUpdates.risks.
  • Add a new field 'clusterversion.status.conditionalUpdateRisks.conditions':
    It contains the observations of the conditional update risk's current
    status.

Copy link
Contributor

openshift-ci bot commented Jun 9, 2025

Hello @hongkailiu! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 9, 2025
@hongkailiu hongkailiu changed the title [OTA-1545] Extend ClusterVersion for accepted risks [wip][OTA-1545] Extend ClusterVersion for accepted risks Jun 9, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2025
@openshift-ci openshift-ci bot requested review from deads2k and everettraven June 9, 2025 21:08
Copy link
Contributor

openshift-ci bot commented Jun 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 10, 2025
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 10, 2025
@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 10, 2025
@hongkailiu hongkailiu force-pushed the OTA-1545 branch 2 times, most recently from 6dc1e06 to e780c3a Compare June 10, 2025 16:12
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 10, 2025
@hongkailiu hongkailiu force-pushed the OTA-1545 branch 4 times, most recently from 5218865 to 4b05550 Compare June 10, 2025 22:49
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 10, 2025
@hongkailiu hongkailiu force-pushed the OTA-1545 branch 3 times, most recently from 854ed1f to f03a0de Compare June 11, 2025 03:39
@hongkailiu hongkailiu force-pushed the OTA-1545 branch 5 times, most recently from dcf9d68 to c239767 Compare July 3, 2025 01:01
@hongkailiu
Copy link
Member Author

/retest-required

1 similar comment
@hongkailiu
Copy link
Member Author

/retest-required

Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware I am late to the party, so feel free to dismiss everything and treat it as an unwanted drive-by. Mostly just voicing my thoughts as I read the structs, but I admit have not yet read the enhancement.

@@ -780,11 +806,23 @@ type ConditionalUpdate struct {
// +required
Release Release `json:"release"`

// riskNames represents the set of the names of conditionalUpdateRisks in the status that are exposed to the release in this conditional update.
// The cluster-version operator will evaluate these risks and only accept the update if there is at least one risk and for every risk it is either not applied to the cluster or considered acceptable by the cluster administrator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// The cluster-version operator will evaluate these risks and only accept the update if there is at least one risk and for every risk it is either not applied to the cluster or considered acceptable by the cluster administrator.
// The cluster version operator will evaluate these risks and only accept the update if there is at least one risk and for every risk it is either not applied to the cluster or considered acceptable by the cluster administrator.

But maybe the API documentation should not make assumptions about how the controller that manages it is named, and we should simply talk about the expected behavior given an API shape? "An update will only be 'accepted' (whatever that means?) if..."

Also the "if there is at least one risk" is confusing to me. Why would having zero risks be unacceptable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would having zero risks be unacceptable?

This is a very valid question.
Is it possible that a ConditionalUpdate has no risk at all? I think No. It should be aligned with the next field Risks:

// risks represents the range of issues associated with
// updating to the target release. The cluster-version
// operator will evaluate all entries, and only recommend the
// update if there is at least one entry and all entries
// recommend the update.
// Deprecated: the risks has been deprecated by riskNames.
// +kubebuilder:validation:MinItems=1
// +patchMergeKey=name
// +patchStrategy=merge
// +listType=map
// +listMapKey=name
// +required
Risks []ConditionalUpdateRisk `json:"risks" patchStrategy:"merge" patchMergeKey:"name"`

Made the following change as well:

+optional
--->
// +kubebuilder:validation:MinItems=1 and // +required

Copy link
Contributor

@everettraven everettraven Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot add a new required field. That is a breaking change to users/clients and invalidates all existing instances/expectations of the API. This field must be optional. You can still add the minimum items requirement but it should always be possible for this field to be unspecified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back to +optional.

You can still add the minimum items requirement

+kubebuilder:validation:MinItems=1 is still left there.

Is this understanding correct?
+optional and +kubebuilder:validation:MinItems=1 mean:

  • riskNames is not specified in the object. Valid up to +optional.
  • riskNames is the empty set. Invalid up to +kubebuilder:validation:MinItems=1.
  • riskNames is not empty. Valid as the expected case for the API extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added the omitempty label because otherwise:

config/v1/types_cluster_version.go:820:2: optionalfields: field RiskNames is optional and does not allow the zero value. It must have the omitempty tag. (kubeapilinter)

Copy link
Member Author

@hongkailiu hongkailiu Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 2 tests for the first 2 cases above.

See #2360 (comment)

@hongkailiu
Copy link
Member Author

Thanks for the feedback. @petr-muller

I will address your comments today or tomorrow.

@hongkailiu hongkailiu force-pushed the OTA-1545 branch 2 times, most recently from e77f91e to 3307276 Compare July 4, 2025 14:44
@hongkailiu hongkailiu force-pushed the OTA-1545 branch 7 times, most recently from e1bda97 to 8991d06 Compare July 7, 2025 19:01
- Add a new field 'clusterversion.spec.desiredUpdate.accept': It contains
  the names of conditional update risks that are considered acceptable.
- Move `clusterversion.status.conditionalUpdates.risks` two levels up as
  `clusterversion.status.conditionalUpdateRisks`. It contains all the risks
  for `clusterversion.status.conditionalUpdates`.
- Add new field 'clusterversion.status.conditionalUpdates.riskNames': It
  contains the names of risk for the conditional update. It deprecates
  `clusterversion.status.conditionalUpdates.risks`.
- Add a new field 'clusterversion.status.conditionalUpdateRisks.conditions':
  It contains the observations of the conditional update risk's current
  status.
@hongkailiu
Copy link
Member Author

hongkailiu commented Jul 7, 2025

At the moment, there are 9 cases added for the extended API.

git --no-pager log --pretty=oneline -1 
6dd18c02beebc16a21d0da35be372665df491371 (HEAD -> OTA-1545, hongkailiu/OTA-1545) make update

$ diff -Naupr config/v1/tests/clusterversions.config.openshift.io/AAA_ungated.yaml config/v1/tests/clusterversions.config.openshift.io/ClusterUpdateAcceptedRisks.yaml | rg '\+    - name:|\+      expected.*Error:'
+    - name: Should be able to set accepted risks
+    - name: A risk name greater than 256 characters is not allowed
+      expectedError: "Too long: may not be more than 256 bytes"
+    - name: Risk names from the accept field must be unique
+      expectedError: 'Duplicate value: "RiskA"'
+    - name: The riskNames field might be unspecified
+    - name: Should be able to update fields related to accepted risks
+    - name: More than a single condition on an update risk is not allowed
+      expectedStatusError: "conditions: Too many: 2: must have at most 1 item"
+    - name: Risk names of a conditional update must be unique
+      expectedStatusError: 'Duplicate value: "DualStackNeedsController"'
+    - name: The type of a risk condition must be Applies
+      expectedStatusError: "type must be 'Applies'"
+    - name: The value of riskNames cannot be the empty set
+      expectedStatusError: "should have at least 1 items"

Copy link
Contributor

openshift-ci bot commented Jul 7, 2025

@hongkailiu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a small thing related to some of the description changes related to the feature showing up for the default CRDs that will get applied even when the feature isn't enabled.

This is probably OK, but it is a bit funky. Easiest workaround is wait until GA to add these description changes to the existing fields - especially the deprecation one.

Aside from that, I think this looks good and is ready for @JoelSpeed to take a pass. Joel, what are your thoughts on the description changes for pre-existing fields here?

// risks represents the range of issues associated with
// updating to the target release. The cluster-version
// operator will evaluate all entries, and only recommend the
// update if there is at least one entry and all entries
// recommend the update.
// Deprecated: the risks has been deprecated by riskNames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something worth noting - this shows up on the non-featuregated version of the API.

I'm not sure there is much of a workaround for this unless you wait to add this deprecation text until you actually go to promote the feature-gate that deprecates this to GA.

I would probably recommend removing this for now.

Comment on lines +275 to +276
// update targets, or in the conditionUpdates set and all associated risks
// are specified in desiredUpdate.accept.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this change also appears in the default CRD. You may want to include that this part of the description only applies when the particular feature gate is enabled or drop it from the description until you GA the feature.

@JoelSpeed
Copy link
Contributor

Joel, what are your thoughts on the description changes for pre-existing fields here?

Not yet looked at the specific case, but in theory, we could have a marker that applies additional description only when a certain feature gate is enabled 🤔 I'm not sure how that would work if you had multiple gates though, you'd have to do what we do with the enum and specify the combinations. I don't this is likely a big enough issue for us to worry about right now

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand the changes here, can someone confirm if this is correct?

At the moment, each update risk is listed within the conditional update that it relates to. With the change here, we move all of the risks together to a higher level, and instead, add a list of riskNames to the conditional update.

Centralising all of the risk information but meaning that when looking at a conditional update, I now have to go and find the information somewhere else?

It doesn't look like we are actually adding any new information here apart from adding a spec list of accepted risk names?

// conditionalUpdateRisks contains the list of risks associated with conditionalUpdates.
// When performing a conditional update, all its associated risks will be compared with the set of accepted risks in the spec.desiredUpdate.accept field.
// If all risks for a conditional update are included in the spec.desiredUpdate.accept set, the conditional update will proceed, otherwise it is blocked.
// The risk's names in the list must be unique.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double plural reads oddly to me, WDYT?

Suggested change
// The risk's names in the list must be unique.
// The risk names in the list must be unique.

Comment on lines +210 to +211
// +patchMergeKey=name
// +patchStrategy=merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not valid for CRDs, please remove

// +listType=map
// +listMapKey=name
// +optional
ConditionalUpdateRisks []ConditionalUpdateRisk `json:"conditionalUpdateRisks,omitempty" patchStrategy:"merge" patchMergeKey:"name"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ConditionalUpdateRisks []ConditionalUpdateRisk `json:"conditionalUpdateRisks,omitempty" patchStrategy:"merge" patchMergeKey:"name"`
ConditionalUpdateRisks []ConditionalUpdateRisk `json:"conditionalUpdateRisks,omitempty"`

// +kubebuilder:validation:MaxItems=1000
// +listType=set
// +optional
Accept []string `json:"accept"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, why not acceptedRisks to match with the history version of this field?

// risks represents the range of issues associated with
// updating to the target release. The cluster-version
// operator will evaluate all entries, and only recommend the
// update if there is at least one entry and all entries
// recommend the update.
// Deprecated: the risks has been deprecated by riskNames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Deprecated: the risks has been deprecated by riskNames.
// Deprecated: the risks field has been deprecated in favour of riskNames.

Comment on lines +809 to +819
// riskNames represents the set of the names of conditionalUpdateRisks in the status that are exposed to the release in this conditional update.
// A condition update is accepted only if each of its risk is either not applied to the cluster or considered acceptable by the cluster administrator.
// Entries must be unique and must not exceed 256 characters.
// riskNames must not contain more than 500 entries.
// +openshift:enable:FeatureGate=ClusterUpdateAcceptedRisks
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:items:MaxLength=256
// +kubebuilder:validation:MaxItems=500
// +listType=set
// +optional
RiskNames []string `json:"riskNames,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is an odd direction for a change. Why are we moving from a list of structured objects to a list of strings? Normally, I see this the other way around?

Comment on lines +809 to +819
// riskNames represents the set of the names of conditionalUpdateRisks in the status that are exposed to the release in this conditional update.
// A condition update is accepted only if each of its risk is either not applied to the cluster or considered acceptable by the cluster administrator.
// Entries must be unique and must not exceed 256 characters.
// riskNames must not contain more than 500 entries.
// +openshift:enable:FeatureGate=ClusterUpdateAcceptedRisks
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:items:MaxLength=256
// +kubebuilder:validation:MaxItems=500
// +listType=set
// +optional
RiskNames []string `json:"riskNames,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to set both risks and riskNames?

// conditions must not contain more than one entry.
// +openshift:enable:FeatureGate=ClusterUpdateAcceptedRisks
// +kubebuilder:validation:items:XValidation:rule="has(self.type) && self.type == 'Applies'",message="type must be 'Applies'"
// +kubebuilder:validation:MaxItems=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinItems too please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants