-
Notifications
You must be signed in to change notification settings - Fork 64
🐛 fix(crd-upgrade-safety): Safely handle changes to description fields #2023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix(crd-upgrade-safety): Safely handle changes to description fields #2023
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2023 +/- ##
===========================================
+ Coverage 43.06% 69.31% +26.25%
===========================================
Files 64 79 +15
Lines 5418 7059 +1641
===========================================
+ Hits 2333 4893 +2560
+ Misses 2741 1884 -857
+ Partials 344 282 -62
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw this PR show up in passing and I am working on a validation very similar to this for kubernetes-sigs/crdify and thought it would be worth taking a look.
@@ -242,3 +242,13 @@ func Type(diff FieldDiff) (bool, error) { | |||
|
|||
return isHandled(diff, reset), err | |||
} | |||
|
|||
// A change in a description is always considered safe and non-breaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that this isn't always true.
While it is true that a basic description change is usually non-breaking, a change to the semantics of a field is a breaking change.
Unfortunately a semantics change isn't something that is easily detectable in a case like this, so this is probably something that OLM wants to allow for the most part (but maybe should warn about somehow?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This.
I came to take a peek and was immediately wary of a blanket statement that a description-only change could never be breaking.
Our position in the past has been "where there is no clear/easy evaluation, fail the safety check".
I think we could argue that this is too strict and we should -- for a few nebulous items -- adopt the position of "notify and proceed".
I could see this growing to participate in an eventual approval flow, but for now is there a third condition (not just pass/fail) where we could add a notification/caution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been some discussion recently about introducing some sort of revision API (not unlike our semi-internal Helm release secrets now), where there is a fairly obvious opportunity to compute a diff and potentially introduce an approval mechanism where this kind of change would show up in the diff.
If we're getting super-fancy, I could imagine some sort of opinionated diff UI that categorizes changes and could, for example, group the CRD field description diffs into a spot that would be easier to assess them for semantic changes.
My opinion for the here and now is:
- Users will almost definitely considered our current behavior (block upgrades for description changes) as a bug.
- Therefore, we should merge this change.
In my opinion, OLM should not be trying -- at runtime -- to have opinions about semantic changes. That is where I think we should draw a clear line and say that we expect/assume that upgrade graphs and version numbers from operator authors to be correct (and if they are not, that is a clear bug against the operator, not OLM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 agree with :
Users will almost definitely considered our current behavior (block upgrades for description changes) as a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, OLM does technically have a workaround that could act as an approval method by disabling the checks to proceed with an upgrade. That being said, it certainly isn't an ideal workflow.
I tend to agree that OLM should not be too strict here - at the end of the day these checks are best effort. Hopefully as the ecosystem of tooling for extension authors progresses there will be more robust pipelines put in place on the author side to catch these potential regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi this is the description change:
argocd-operator v0.5.0:
ApplicationController is a simple, high-level summary of where the Argo CD application controller component is in its lifecycle. There are five possible ApplicationController values:
- Pending: The Argo CD application controller component has been accepted by the Kubernetes system, but one or more of the required resources have not been created.
- Running: All of the required Pods for the Argo CD application controller component are in a Ready state.
- Failed: At least one of the Argo CD application controller component Pods had a failure.
- Unknown: For some reason the state of the Argo CD application controller component could not be obtained.
ApplicationController is a simple, high-level summary of where the Argo CD application controller component is in its lifecycle. There are four possible ApplicationController values:
- Pending: The Argo CD application controller component has been accepted by the Kubernetes system, but one or more of the required resources have not been created.
- Running: All of the required Pods for the Argo CD application controller component are in a Ready state.
- Failed: At least one of the Argo CD application controller component Pods had a failure.
- Unknown: The state of the Argo CD application controller component could not be obtained.
(note the bold portions)
ie in this case itself, there was a correction since there is only 4 possible values before and after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question in my mind is less whether a description change can or not be breaking, but rather, how much responsibility should OLM take over gating author intention...I don't have an answer here. But it's an interesting question. Should we be controlling the release process of the APIs of others? If an upgrade of a package destroys the cluster, is it OLMs fault? The more we gate, the more we take that responsibility, in a way...maybe..? Should this be a catalog concern? Or maybe a policy/approval concern outside of OLM (i.e. some kind of approver type)? sorry - just thinking out loud...
handled: false, | ||
}, | ||
{ | ||
name: "different field changed with description, no error, not handled", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if both ID and Description change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled
will be false
. Added a test case for it 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @anik120
Great work 🥇. I think we just need to address the comment: https://github.com/operator-framework/operator-controller/pull/2023/files#r2140938754
And also another nit.
// A change in a description is always considered safe and non-breaking.
func Description(diff FieldDiff) (bool, error) {
See that the func name is Description
So, the comment (which is how we doc golang) should have // Description .... documentation. Could we please change that and ensure that we have a better explanation showing why and when it is required.
Then, IHMO it will be all great to 🪰
thank you for your contribution
@@ -242,3 +242,13 @@ func Type(diff FieldDiff) (bool, error) { | |||
|
|||
return isHandled(diff, reset), err | |||
} | |||
|
|||
// A change in a description is always considered safe and non-breaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// A change in a description is always considered safe and non-breaking. | |
// Description differences should be allowed. Otherwise, we will block upgrades due to changes in the description, which will be considered as a bug from the end user's perspective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anik120 ^ I would suggest something
But we beed start the comment with Description
We are not using the lint check that catch it, but that is a better practice to doc funcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point @camilamacedo86. I've changed the sentence around a little bit, PTAL, thank you!
f51b02f
to
c1f7279
Compare
Motivation: When attempting to upgrade argocd-operator from v0.5.0 to v0.7.0, the upgrade process fails during the preflight CRD safety validation. The validation correctly detects that the `argocds.argoproj.io` CRD has been modified between the two versions. The specific error reported is: ``` CustomResourceDefinition argocds.argoproj.io failed upgrade safety validation. "ChangeValidator" validation failed: version "v1alpha1", field "^.status.applicationController" has unknown change, refusing to determine that change is safe ``` However, changes between the CRD versions in this instance are limited to non-functional updates in the description fields of various properties (e.g., status.applicationController).`ChangeValidator` lacks a specific rule to classify a description-only update as safe, which blocks legitimate and otherwise safe operator upgrades. Solution: This PR enhances the CRD upgrade safety validation logic to correctly handle changes to description fields by introducing a new `ChangeValidation` check for `Description`, and registering the check by adding it to the default list of `ChangeValidations` used by `ChangeValidator`. Result: Non-functional updates to documentation fields are now deemed safe(which resolves the upgrade failure for argocd-operator from v0.5.0 to v0.7.0)
c1f7279
to
eebe373
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution 🎉
It seems fine for me so I am /approve this one
Please, we should also get a second approver/reviewer ok as well
@perdasilva @tmshort @grokspawn WDYT?
@bentito @tylerslaton @rashmigottipati WDYT too?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: camilamacedo86 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm cancel Should not add LGTM, only approve so that we give a chance for a second reviewer |
/lgtm |
6bf1742
into
operator-framework:main
Description
Motivation:
When attempting to upgrade argocd-operator from v0.5.0 to v0.7.0, the upgrade process fails during the preflight CRD safety validation. The validation correctly detects that the
argocds.argoproj.io
CRD has been modified between the two versions.The specific error reported is:
However, changes between the CRD versions in this instance are limited to non-functional updates in the description fields of various properties (e.g., status.applicationController).
ChangeValidator
lacks a specific rule to classify a description-only update as safe, which blocks legitimate and otherwise safe operator upgrades.Solution:
This PR enhances the CRD upgrade safety validation logic to correctly handle changes to description fields by introducing a new
ChangeValidation
check forDescription
, and registering the check by adding it to the default list ofChangeValidation
s used byChangeValidator
.Result:
Non-functional updates to documentation fields are now deemed safe(which resolves the upgrade failure for argocd-operator from v0.5.0 to v0.7.0)
Reviewer Checklist