-
Notifications
You must be signed in to change notification settings - Fork 621
Revise GEP-1713: Update gateway status and conformance test details for ListenerSets #4205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@davidjumani thanks, this is a great write of conformance. Do you think it makes sense to also add the expected conditions on each conformance item, and also the ancestorStatus expected, to make it clear to implementations what we will look for? |
Added the expected status as well as added the |
| - The ListenerSet has the following status : | ||
| | Type | Status | Reason | | ||
| | -------- | ------- | ------- | | ||
| | Accepted | True | ListenersNotValid | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this accepted here be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the GEP
// This can be the reason when "Accepted" is "True" or "False", depending on whether
// the listener being invalid causes the entire Gateway to not be accepted.
I believe this should be true since the ListenerSet is configured but does not adversely affect the parent Gateway.
Additionally, there can be a scenario where only a subset of the listeners is invalid and this would more accurately reflect it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then the reason should be empty or something else, right? It was accepted but not programmed, I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. If the status is Accepted, there should not be a reason for that.
I wouldn't also mix Accepted: True and Programmed: False in this case. The Programmed condition indicates whether the controller was physically able to apply the requested resources (e.g. allocate an IP or open a port). Listener conflicts are configuration errors, not the controller ability to program them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. A subset of listeners can be accepted, so the status should reflect it accordingly.
If not all listeners are accepted, then the listenerSet Accepted condition should be false.
Additionally, the Programmed condition should also reflect the same, since a subset of listeners was applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some confusion here, let me know if the below makes sense
- If I have one listener conflicted:
- This listener should be accepted (means a controller SAW it), no Reason set
- This listener should have a status Conflicted=True, Reason explaining if it is a Protocol conflict, etc (and a good message)
- Programmed will be False, because the proxy won't receive this configuration
As the status conditions is per listener, I think this would make sense, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(specifically on this case, we can think about extending the valid Reasons to something like ConflictWithParent to let the user know there's already a conflicting listener on the Gateway, without leaking any kind of information)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the language in this part is a little confusing:
// This can be the reason when "Accepted" is "True" or "False", depending on whether
// the listener being invalid causes the entire Gateway to not be accepted.
I think it should be something like:
// This can be used as a reason when "Accepted" is "True" or "False", because Accepted
// requires that at least one Listener present on the object is valid. If zero Listeners in the
// ListenerSet are valid, then Accepted is False, and this reason is used to explain why.
// If at least one Listener in the ListenerSet is valid, then Accepted is True, and this reason
// is used to indicate that the ListenerSet is only _partially_ valid.
Acceptance is about being locally correct, where Programmed is about full correctness. Conflicted is then used to indicate that at least one Listener in the ListenerSet is in conflict with a Listener on another object (no matter which one it is).
I don't think that saying that the conflicted Listener is on the Gateway gives too much information away, since the ListenerSet already includes a reference to that parent anyway. That said, adding a ConflictWithParent reason sounds good to me. It's okay to have more options for Reasons, they don't have the overhead that adding extra whole Conditions does.
| - The parent gateway has the following status : | ||
| | Type | Status | Reason | | ||
| | -------- | ------- | ------- | | ||
| | AttachedListenerSets | False | ListenerSetsNotAllowed | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this condition on the parent gateway? I am wondering if in Gateway case we can just not have any count of attached listener, as mostly the user will have its own condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the condition should be set since :
- The user has explicitly disallowed ListenerSets
- ListenerSets have been attached (at least tried) to the Gateway
The user should know about this in the status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of the whole conditions reasoning here tbh, will defer to @youngnick for some better view on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not fully convinced we should have the condition if we would have a counter as well, but I also do not have a very strong opinion on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I think it's useful for the reason that @davidjumani said - the Gateway owner has said "no ListenerSets allowed", and some ListenerSets have attempted to attach. Regardless of exactly how that situation has come up, the Gateway owner needs to know. attachedListeners doesn't tell us that, because it's a count of ListenerSets that have successfully attached.
| - The newer ListenerSet has the following status : | ||
| | Type | Status | Reason | | ||
| | -------- | ------- | ------- | | ||
| | Accepted | True | ListenersNotValid | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, should we have an accepted here? And a reason? Not sure if this doesn't get more confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a better indication that the listenerSet is correct (the parentref is valid, etc.) but one or more of the listeners is invalid which is indicated by the ListnerStatus above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but one or more of the listeners is invalid which is indicated by the ListnerStatus above
If all listeners in the ListenerSet have conflicts, shouldn't we report the ListenerSet as not accepted? This is consistent with other resources (e.g. if an HTTPRoute has no hostname intersection with a Gateway listener, it's reported a not accepted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree and will update it accordingly. I'll make this clearer with the total number of conflicting listeners
|
@davidjumani overall lgtm! I would like us to keep separated a PR that changes the GEP from a PR that changes the API, this way if we need to track any (or revert) we have proper changes Thanks again for the effort here! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: davidjumani The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Thanks for the review @rikatz I've answered the open questions and reverted the API changes. |
rostislavbobo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @davidjumani for a great conformance writeup! Left a couple of comment, let's discuss them today.
| They will validate the following scenarios : | ||
| 1. AllowedListeners is not specified on the parent Gateway | ||
| - `Gateway.spec.allowedListeners` is not specified (defaults to None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing here and in the scenarios below whether the Gateway itself has listeners defined. This could change the statuses below and the outcome, for example, if Gateway listeners and ListenerSet listeners share the same port.
Can we be explicit and state that the Gateways have no listeners defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the Gateway validation requires at least one listener to be defined. I based the tests on this restriction
gateway-api/apis/v1/gateway_types.go
Line 236 in f24f3a6
| // +kubebuilder:validation:MinItems=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we need to change this validation, and there was some discussion about it somewhere I can't remember.
With ListenerSet promoted to standard it would not make much sense to have this MinItems. I will leave this to @robscott @shaneutt and @youngnick for some ideas as they may have faced this on past, but IMO:
- We may want to remove this restriction for ListenerSet
- We need to understand if there is any impact on the implementations. If they previously expected that the API would always have at least 1 Listener, removing this restriction may break implementations
I would probably suggest that we change this MinItems to a CEL validation like:
listeners.size() > 0 || allowedListeners != null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(IIRC we can set a validation for experimental APIs only, and other for standard. We may need to change the validation on standard for listeners.size() > 0 instead of using minItems, and then the experimental to have the more complex rule)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without at least one Listener from somewhere, we end up violating a lot of the assumptions we've made, which will require a lot of checking assumptions in conformance tests etc.
As of now, we define that, if there are no valid Listeners on a Gateway, then the Gateway is not Accepted and cannot be programmed. We have had requests to change this (basically because that would let people pre-provision external loadbalancers in the absence of any other config), but that's the current state.
I think that this probably requires more discussion than we should do on this PR though. I also think we should move ahead with requiring one listener for now, and mark "do we allow 0 Listeners on a Gateway" and "will that let any required Addresses be Programmed?" as a question we must answer before graduation.
|
|
||
| - https://docs.google.com/document/d/1qj7Xog2t2fWRuzOeTsWkabUaVeOF7_2t_7appe8EXwA/edit | ||
|
|
||
| ## Conformance Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clarify that the status is reported per parentRef? A ListenerSet can have different acceptance statuses depending on the Gateway configs it is attached to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A listenerSet can have only one parent as per
| ParentRef ParentGatewayReference `json:"parentRef"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I can't recall if there was any substantial discussion around this previously, but it could be possible to add a pluralized version of this field in some future revision if use cases demanded it. I'd kinda rather get the status nesting per-parentRef now (even if it feels redundant/unnecessary, it would mirror similar patterns on many other GWAPI resources) rather than deal with trying to add nesting to an assumed-singular hierarchy later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we are speaking about ".status" here right?
If so, I agree we may want to "mirror" the behavior of other objects:
- For Listener status, we may want to have the same status as Gateway.status.Listener
- For parentRef, we may want to drop the "Conditions" and add a
.status.parents[]even if we just support a single parent now
That said, maybe we do want to replace ParentRef for ParentRefs but keeping the min and maxitems as 1?
I don't know the impact on controllers performance, tho, based on these changes, so feedbacks are welcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually came here to say I though better, and trying to keep the same behavior as with Routes (and parentRefs) here may be a bad idea.
If we have now an array of parentRefs on spec, this means that now we must have an array of ParentStatuses (as on HTTPRoute).
But on HTTPRoute, we can simply say that the route wasn't programmed because a specific field. While on ListenerSet, we have another array we care, the Listeners.
So in the end, imagine this situation:
- Gateway1 hostnames: [domain1.tld, domain2.tld]
- Gateway2 hostnames: [domain2.tld, domain3.tld]
- ListenerSet: [domain1.tld, domain3.tld, domain4.tld]
Now you have a situation where the Listener status should say it is conflicted on Gateway1 for domain1.tld, on Gateway2 for domain3.tld, and no conflict for domain4.tld
We are doing a huge fanout of status replications here, and this may be very very costy for API to keep reading and writing, doing all of the intersection validations, and also maybe doing a very poluted status.
Instead, I would propose:
- ListenerSet will always have a SINGLE parentRef (that MUST be Gateway for now)
- This means that the current status of ListenetSet is kept: one array of Listeners containing the Listenerstatus, and an array of Conditions for the Programmed, Conflicted, Accepted conditions
- In case a user wants to do some "migration" from Gateways, the user can change the parentRef to a different Gateway
- In case a user wants to receive traffic from both Gateways (eg.: some B/G migration), the user must set 2 parentRefs on the xRoute: one for each ListenerSet (and should define 1 listenerset per gateway)
While this may cause the experience to be a bit more "annoying" for migration between Gateways, I have the feeling they will first allow users to create a separate ListenerSet and check it is properly working, setting the traffic to flow through 2 different Listeners (even one of them being not Accepted), and reduce a lot the complexity for implementations and status parsing for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on that last proposal. ListenerSet is already complicated enough, we should avoid making this a lot worse with multiple parentRefs.
I think that, for migration between Gateways, we should recommend creating a duplicate ListenerSet. ListenerSets don't have any effect outside of the Kubernetes API, so this should be okay.
| - The ListenerSet has the following status : | ||
| | Type | Status | Reason | | ||
| | -------- | ------- | ------- | | ||
| | Accepted | True | ListenersNotValid | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. If the status is Accepted, there should not be a reason for that.
I wouldn't also mix Accepted: True and Programmed: False in this case. The Programmed condition indicates whether the controller was physically able to apply the requested resources (e.g. allocate an IP or open a port). Listener conflicts are configuration errors, not the controller ability to program them.
| - The newer ListenerSet has the following status : | ||
| | Type | Status | Reason | | ||
| | -------- | ------- | ------- | | ||
| | Accepted | True | ListenersNotValid | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but one or more of the listeners is invalid which is indicated by the ListnerStatus above
If all listeners in the ListenerSet have conflicts, shouldn't we report the ListenerSet as not accepted? This is consistent with other resources (e.g. if an HTTPRoute has no hostname intersection with a Gateway listener, it's reported a not accepted).
| | AttachedListenerSets | // depends on whether there are any other valid listeners in the attached ListenerSets // | ListenerSetsAttached | | ||
| | AttachedListeners | // total number of child listenerSets // | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be clear on numbers and Gateway setup (also see comment). We spend a decent amount of time pondering various options that result in different calculations and statuses, so it'd be grate if we could disambiguate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we iron this out in the conformance tests. These serve more as guidelines, as we can always add more tests and update this later. I would like to get something in that we can work towards rather than making a comprehensive test plan that only stays in review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that approach is okay, as long as we all remember we may need multiple tests to cover the various permutations here.
| | -------- | ------- | ------- | | ||
| | Accepted | False | NotAllowed | | ||
| - The parent gateway has the following status : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a headsup, I got a bit confused here to understand that AttachedListenerSets is a condition and AttachedListeners is a .status field, maybe worth just clarifying it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think these might need to be separate tables or something, or maybe something like:
| Type | Status | Reason |
| -------- | ------- | ------- |
| conditions.AttachedListenerSets | Unknown | // any reason // |
| AttachedListeners | 0 | |
| | AttachedListenerSets | Unknown | // any reason // | | ||
| | AttachedListeners | 0 | | | ||
| - The request on the ListenerSet port fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the port? Or on the tuple? If you have a Listener defined on the same port, using a different host (eg.: some HTTP listener) this shouldn't fail.
I think you can make a call out for now that all the tests here are executed with at least one listener defined on gateway (because we enforce this...) and then have a separate set of tests for when we remove the need of Listeners on Gateway, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be "the request to that distinct Listener fails" (that's what the word "distinct" is useful for when talking about Listeners.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, we will definitely need a bunch more tests if and when required Listeners is 0 on a Gateway, both when ListenerSets are attached and when they are not.
| 1. A listener on a ListenerSet has a protocol conflict with a listener on another ListenerSet | ||
| - The conflicting listener on the older ListenerSet is accepted based on the [Listener Precedence](#listener-precedence) with the following status : | ||
| | Type | Status | Reason | | ||
| | -------- | ------- | ------- | | ||
| | Conflicted | False | NoConflicts | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this means that the winning ListenerSet owner will have way to know that there is another ListenerSet that's in conflict with this one.
It feels a little better to me to have this ListenerSet have a Conflicted: True, with a reason of something like ConflictWinner (or some other better name), to indicate that there was a conflict, but it won. Then the losing ListnenerSet would get "ConflictLoser" or something (wow, that's a terrible Reason, but you get what I mean).
That way "Conflicted" means "there is a conflict with something", and there is a Reason to indicate what the result of the conflict was.
I suppose, though, that Accepted is probably also carrying the same information, so maybe those Reasons are redundant. Interested in other's thoughts here.
|
This is really great @davidjumani, I think that we should try and do this exercise more often for conformance tests for other, this will make it easier to ensure we don't miss things in testing. |
What type of PR is this?
Add one of the following kinds:
/kind documentation
/kind gep
Optionally add one or more of the following kinds if applicable:
/area conformance-test
What this PR does / why we need it:
This PR does the following :
AttachedListenersthat is the count of successful ListenerSet attachments to the gateway. Ref: slack theraadThe aim is to get consensus for the set of conformance tests required to validate this feature against implementations, and move towards promoting this to Standard
Does this PR introduce a user-facing change?: