-
Notifications
You must be signed in to change notification settings - Fork 551
MCO-1675: [API 2/6] Update API for Status Reporting needs #2383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@naseerahkani: This pull request references MCO-1675 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hello @naseerahkani! Some important instructions when contributing to openshift/api: |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: naseerahkani The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
// +kubebuilder:printcolumn:name="UpdatedOSImage",type="string",JSONPath=.status.conditions[?(@.type=="AppliedOSImage")].status,priority=1 | ||
// +kubebuilder:printcolumn:name="UpdatedFiles",type="string",JSONPath=.status.conditions[?(@.type=="AppliedFiles")].status,priority=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd imagine these and the ImagePulledFromRegistry
column would somehow need to be feature gated since they will not be conditions populated in a non-tech preview cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there exists today a way for us to feature-gate print columns - although this is something we could add if we had enough of a use case for it.
What is the worst case here if the conditions don't exist here? Does it show up empty? Does it result in a failure to display the results from a command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although this is something we could add if we had enough of a use case for it.
Maybe adding this as a low priority backlog item would be good, but I don't think the use case for it here is important enough.
What is the worst case here if the conditions don't exist here? Does it show up empty? Does it result in a failure to display the results from a command?
It shows up empty, which is not the end of the world. I just don't think it's ideal if there's another approach, but it seems that there is not really another option here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These columns will not be printed for the time being to make sure users are not confused. They will be added back when the image mode status reporting is ready to be GAed. The status of this will be tracked in this ticket.
// Deprecated: AppliedFilesAndOS is being replaced by AppliedFiles and AppliedOSImage. | ||
// This constant is maintained for backward compatibility. | ||
MachineConfigNodeAppliedFilesAndOS StateProgress = "AppliedFilesAndOS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure how we want to handle this, but I'm curious if this approach is valid. This is how I'm interpreting this, but I could be incorrect:
- This merging would state that in 4.20 the
MachineConfigNodeAppliedFilesAndOS
status is depreciated in general (so in both tech preview and non-tech preview clusters). - But, this isn't quite correct since a standard non-tech preview enabled cluster in 4.20 does have a supported
MachineConfigNodeAppliedFilesAndOS
condition. - Maybe a feature gate here can distinguish that the idea is to deprecate this starting with the implementation of
ImageModeStatusReporting
? Or maybe wait until the new statuses (MachineConfigNodeUpdateFiles
&MachineConfigNodeUpdateOS
) are implemented before it makes sense to start a deprecation ofMachineConfigNodeAppliedFilesAndOS
(maybe that's done in a subsequent release?)?
This is probably something that would need input from other MCO/API folks. I'm sure there's a set process for this, but I'm not aware of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this type deprecated would only show up for those consuming this as a Go package as these comments won't show up in generated documentation that end-users of OpenShift would see.
That being said, I think it is generally a bit of an anti-pattern to deprecate something before there is a concrete replacement for it. In this case if it is only deprecated in TPNU clusters I would wait to mark this as deprecated until the feature that deprecates it goes GA.
Also note that deprecating something doesn't mean it stops functioning, just that users should be aware that this condition will be removed in a future release and should consider relying on other conditions instead. IIUC, this condition should continue to be populated the same as it always has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type still needs to be deprecated and will be taken care of at another time. For future reference, this ticket will be used to keep track of changes/updates.
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing an initial pass on the API changes themselves. I've got a couple comments that may change the tests a bit so once those are ironed out I'll take another pass and look at the tests as well.
// +kubebuilder:printcolumn:name="UpdatedOSImage",type="string",JSONPath=.status.conditions[?(@.type=="AppliedOSImage")].status,priority=1 | ||
// +kubebuilder:printcolumn:name="UpdatedFiles",type="string",JSONPath=.status.conditions[?(@.type=="AppliedFiles")].status,priority=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there exists today a way for us to feature-gate print columns - although this is something we could add if we had enough of a use case for it.
What is the worst case here if the conditions don't exist here? Does it show up empty? Does it result in a failure to display the results from a command?
// +kubebuilder:validation:MaxLength:=253 | ||
// +required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicitly include the constraints in the GoDoc - users can't see the markers here in generated documentation so if we do not explicitly mention that the field is required and cannot have a length greater than 253 characters users won't know until they attempt to create a resource that fails.
Additionally, what is a fully-qualified pullspec? are there constraints we can enforce beyond the current ones that ensure malformed pullspecs are not being provided by a user? Having more concrete validation in place here helps have a faster feedback loop for end-users as a malformed pullspec would be rejected at admission time rather than resulting in an operator side status condition being set.
// Deprecated: AppliedFilesAndOS is being replaced by AppliedFiles and AppliedOSImage. | ||
// This constant is maintained for backward compatibility. | ||
MachineConfigNodeAppliedFilesAndOS StateProgress = "AppliedFilesAndOS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this type deprecated would only show up for those consuming this as a Go package as these comments won't show up in generated documentation that end-users of OpenShift would see.
That being said, I think it is generally a bit of an anti-pattern to deprecate something before there is a concrete replacement for it. In this case if it is only deprecated in TPNU clusters I would wait to mark this as deprecated until the feature that deprecates it goes GA.
Also note that deprecating something doesn't mean it stops functioning, just that users should be aware that this condition will be removed in a future release and should consider relying on other conditions instead. IIUC, this condition should continue to be populated the same as it always has.
// annotation on the target node, which is set by the Machine Config Pool controller | ||
// to signal the desired image pullspec for the node to update to. | ||
type MachineConfigNodeSpecConfigImage struct { | ||
// desiredImage is the fully-qualified pullspec of the image that the Machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "fully-qualified pullspec"?
Is this an OCI-compliant image reference? If so, https://oras.land/docs/concepts/reference/#:~:text=What%20a%20Reference%20Means%20in%20OCI%20Layout%E2%80%8B,or%20the%20digest%20sha256:921f70dafac450afd63cc4210b2086cb4290ef7d51249eb79c4777e731b87746%20. might be a helpful resource for understanding the "components" of an OCI image reference.
If it is this, I've written some pretty comprehensive validations in CEL for this for OLM in the past that you may be able to use as a reference for more robust input validation here: https://github.com/operator-framework/operator-controller/blob/7d4414b1f9188d1af01034010cd921e6d62f657d/api/v1/clustercatalog_types.go#L290-L338
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, "fully qualified pullspec" does refer to an OCI-compliant image reference. The code has been updated to provide more context and better validation as seen here.
.../v1/zz_generated.crd-manifests/0000_80_machine-config_01_machineconfignodes-Default.crd.yaml
Outdated
Show resolved
Hide resolved
@naseerahkani: This pull request references MCO-1675 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
871c635
to
c82e51f
Compare
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to someone that has not all the required context to properly review :)
.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file sneaked in
// configImage describes the current and desired image for this node. | ||
// +openshift:enable:FeatureGate=ImageModeStatusReporting | ||
// +optional | ||
ConfigImage MachineConfigNodeStatusConfigImage `json:"configImage"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all of our optional fields have the omitempty
for the JSON Marshalling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all the fields in it are optional and it is not a pointer, it is not necessary to have omitempty
(gave an error) since it is not required to have anything populated in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good one, I missed that the field is not a pointer : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll probably want a pointer and omitempty
here.
Without it, you'll end up with a status field that is serialized like:
status:
configImage:
currentImage: ""
desiredImage: ""
instead of it just not being present in the status when that field is not populated.
From my understanding, it sounded like either currentImage
or desiredImage
is expected to be set when the node is opted-in to the image-mode?
If that is the case, you could add a marker to enforce at least one property being set when specifying the field and that should get rid of the linter error I believe you are referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation rule should ensure at least one image name (desired or current) is defined when OCL is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the empty struct is not a valid value, this field also needs to be a pointer.
// desiredImage is the fully qualified image pull spec of the image that the Machine | ||
// Config Operator (MCO) intends to apply to the node. This field is optional. | ||
// The length of the field must be between 1 to 447 characters. | ||
// +kubebuilder:validation:MaxLength=447 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the number 447 chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told by the OCL experts to use these requirements for the image name.
// Config Operator (MCO) intends to apply to the node. This field is optional. | ||
// The length of the field must be between 1 to 447 characters. | ||
// +kubebuilder:validation:MaxLength=447 | ||
// +kubebuilder:validation:XValidation:rule="self.matches('^([a-zA-Z0-9-]+\\\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?(/[a-zA-Z0-9-_]{1,61})*/[a-zA-Z0-9-_.]+:[a-zA-Z0-9._-]+$') || self.matches('^[^.]+\\\\.[^.]+\\\\.svc:\\\\d+\\\\/[^\\\\/]+\\\\/[^\\\\/]+:[^\\\\/]+$')",message="the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme. Or it must be a valid .svc followed by a port, repository, image name, and tag." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did these regex come from?
Is it required that the image provided here only ever use a tag? What if I wanted to specify an image digest? You have tests that use a digest-based image (@sha256:...
) - is sha256
the only hash type allowed for the digest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OCL experts suggested that the above mentioned requirements were met and in order to keep the image name types strings, the ImageDigestFormat type requirements, including the regex, were adapted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you must match those requirements exactly, re-using the ImageDigestFormat
type seems reasonable.
If you need more control over the accepted values beyond what that requires, i think it is reasonable to keep this as a string
type and expand upon the constraints.
Are you bound to those same rules? For example, your current validations allow you to set an image with a tag of any length (valid tags have a length limit) but not a digest. This would be invalid based on the requirements you linked to, which explicitly state it must use a sha256 digest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told that the names fitting the criteria of the validation rule would be accepted and the comments have been updated to provide more context/information to users.
// The desired image represents the image the node will attempt to update to and gets set before the machine config operator validates | ||
// the new image against the current image. | ||
// +openshift:enable:FeatureGate=ImageModeStatusReporting | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is expected behavior if this value is not provided?
Lets explicitly mention in the GoDoc that this field is optional and what it means when it is omitted/empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments have been updated to describe the expected behavior when the field is empty. Please provide any additional suggestions/changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link shared is to comments on a totally different type. My original comment still stands
// configImage describes the current and desired image for this node. | ||
// +openshift:enable:FeatureGate=ImageModeStatusReporting | ||
// +optional | ||
ConfigImage MachineConfigNodeStatusConfigImage `json:"configImage"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll probably want a pointer and omitempty
here.
Without it, you'll end up with a status field that is serialized like:
status:
configImage:
currentImage: ""
desiredImage: ""
instead of it just not being present in the status when that field is not populated.
From my understanding, it sounded like either currentImage
or desiredImage
is expected to be set when the node is opted-in to the image-mode?
If that is the case, you could add a marker to enforce at least one property being set when specifying the field and that should get rid of the linter error I believe you are referring to.
// Config Operator (MCO) intends to apply to the node. This field is optional. | ||
// The length of the field must be between 1 to 447 characters. | ||
// +kubebuilder:validation:MaxLength=447 | ||
// +kubebuilder:validation:XValidation:rule="self.matches('^([a-zA-Z0-9-]+\\\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?(/[a-zA-Z0-9-_]{1,61})*/[a-zA-Z0-9-_.]+:[a-zA-Z0-9._-]+$') || self.matches('^[^.]+\\\\.[^.]+\\\\.svc:\\\\d+\\\\/[^\\\\/]+\\\\/[^\\\\/]+:[^\\\\/]+$')",message="the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme. Or it must be a valid .svc followed by a port, repository, image name, and tag." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you must match those requirements exactly, re-using the ImageDigestFormat
type seems reasonable.
If you need more control over the accepted values beyond what that requires, i think it is reasonable to keep this as a string
type and expand upon the constraints.
Are you bound to those same rules? For example, your current validations allow you to set an image with a tag of any length (valid tags have a length limit) but not a digest. This would be invalid based on the requirements you linked to, which explicitly state it must use a sha256 digest.
// desiredImage is the fully qualified image pull spec of the image that the Machine | ||
// Config Operator (MCO) intends to apply to the node. This field is optional. | ||
// The length of the field must be between 1 to 447 characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposing an alternative description:
// desiredImage is the fully qualified image pull spec of the image that the Machine | |
// Config Operator (MCO) intends to apply to the node. This field is optional. | |
// The length of the field must be between 1 to 447 characters. | |
// desiredImage is an optional field used to configure the image that should be applied to the node. | |
// It must be an OCI-compliant image pull spec, meaning: | |
// {specify here what an OCI-compliant image pull spec is} | |
// It must not exceed 447 characters in length. |
Does this actually need to have a minimum length of 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the idea was since the field was optional, there had to be some input to qualify as a valid image name. The comment has been updated with the suggestion as seen here.
// The length of the field must be between 1 to 447 characters. | ||
// +kubebuilder:validation:MaxLength=447 | ||
// +kubebuilder:validation:XValidation:rule="self.matches('^([a-zA-Z0-9-]+\\\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?(/[a-zA-Z0-9-_]{1,61})*/[a-zA-Z0-9-_.]+:[a-zA-Z0-9._-]+$') || self.matches('^[^.]+\\\\.[^.]+\\\\.svc:\\\\d+\\\\/[^\\\\/]+\\\\/[^\\\\/]+:[^\\\\/]+$')",message="the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme. Or it must be a valid .svc followed by a port, repository, image name, and tag." | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this field be required? Making this field optional means the parent field that is already optional has no required members.
That leads to being able to configure the field to do the same thing by:
- omitting the parent field
- setting the parent field to
{}
- setting it like:
configImage:
desiredImage: ""
We generally try to only have a single way to "say" something for users so there isn't confusion on which approach should be used.
We generally try to avoid allowing fields to be set to {}
.
To narrow it down further as to which approach is preferred, what does it mean for the parent field configImage
to be omitted? What does it mean to set configImage.desiredImage
to ""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation rule should take care of this issue. When the desiredImage is empty, the idea is that OCL is not enabled. If it is enabled, either the currentImage or the desiredImage needs to be set. I can get an OCL expert to confirm if this would be the optimal approach or if there should be a default message printed as a means to provide clarity to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked validation doesn't apply to the question I'm asking here. This additional context I think helps me understand the goal here.
So, I think configImage
needs to be a pointer as the zero value would not be valid. Because it is a pointer you can do a nil
check to detect whether or not configImage
has been explicitly configured or not.
Mark desiredImage
as required. The existing CEL validation you've written does reject the empty string, but lets explicitly exclude the empty string by adding a +kubebuilder:validation:MinLength:=1
marker.
This approach ensures the following behavior is the only valid behavior:
- If a user does not want the OCL functionality enabled, they just do not set the
configImage
setting. - If a user wants to enable the OCL functionality, they set the
configImage
setting, which means they must specify a validdesiredImage
.
Now I'm moving to a separate thing - trying to understand what is and is not a valid value to supply here.
Assuming only the previously linked requirements is what is required, that means that values must:
- start with a valid domain (
registry.example.io
orname.namespace.svc
) - optionally provide a port (
:8080
) - optionally provide a "namespace" (
/openshift
) - provide a "name" (
/machine-config-operator
) - provide a SHA256 digest (
@sha256:asjdyhfrea7YSDajsdfhals
)
Can we write multiple CEL expressions that ensures we are appropriately evaluating these requirements in a way that we can return helpful messages to end users?
For example, when I provide something like registry.example.io:8080/openshift/machine-config-operator:latest
, getting the message the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme. Or it must be a valid .svc followed by a port, repository, image name, and tag.
doesn't tell me what I've done wrong (and I would actually be quite confused because the value I provided actually is valid based on this message). Instead I would expect it to fail with a message like the provided OCI image reference must use a SHA256 digest
.
// +kubebuilder:validation:XValidation:rule="!has(self.currentImage) ? has(self.desiredImage) : true",message="desiredImage must be defined if currentImage is not defined" | ||
// +kubebuilder:validation:XValidation:rule="!has(self.desiredImage) ? has(self.currentImage) : true",message="currentImage must be defined if desiredImage is not defined" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this might technically achieve what you are going for, what happens if you decide you need to add an additional field here that can satisfy the "at least one of the fields is set?" approach? The size and complexity of your CEL expressions begin to get substantially larger.
Instead, if this type will always require that at least one of the fields are set, using +kubebuilder:validation:MinProperties:=1
is probably better suited for this.
@naseerahkani: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Updating the MachineConfigNode (MCN) object to support on cluster layering (OCL). Advice/suggestions for the comments for the updated code would be greatly appreciated.
Here are details on the enhancement.