-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3243: Update the design to mutate the label selector based on matchLabelKeys at api-server instead of the scheduler handling it #5033
Conversation
/cc @sanposhiho |
/cc @alculquicondor @wojtek-t (or @alculquicondor) do we need PRR review in this case? (I suppose Yes?) |
/retitle KEP-3243: Update the design to mutate the label selector based on matchLabelKeys at api-server instead of the scheduler handling it |
cc @dom4ha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the content based on the conclusion kubernetes/kubernetes#129480 (comment)
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update other sections such as Test Plan
and other PRR questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please add the current implementation for Alternative
section and describe why we decided to move to a new approach.
I appreciate your comment. |
71f1c8c
to
c6e0a76
Compare
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
/assign @wojtek-t for a PRR part reviewing. Please assign another person if needed. |
Queued - although I will wait for SIG approval to happen first. |
@sanposhiho I have a question.
Are my understandings correct? For example, if the following Pod manifest is applied apiVersion: v1
kind: Pod
metadata:
name: sample
labels:
app: sample
...
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector: {}
matchLabelKeys:
- app kube-apiserver will usually merge key-value labels corresponding to topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
+ - key: app
+ operator: In
+ values:
+ - sample
matchLabelKeys:
- app It is the same if user sets key-value labels corresponding to apiVersion: v1
kind: Pod
metadata:
name: sample
labels:
app: sample
...
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sample
matchLabelKeys:
- app However, if key-value labels corresponding to topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
- key: foo
operator: In
values:
- bar
matchLabelKeys:
- app |
... and you know what, I found I forgot about the update path issue that I raised myself while I'm writing this comment. 😅 So, answering your questions,
Usually, no. Like my above strikethrough-ed comment argued, we can just assume kube-apiserver handles all matchLabelKeys that users added. And, the scheduler has to only worry about the cluster wide default constraints. |
f9e848a
to
eea447d
Compare
@sanposhiho |
02b3f71
to
820dcf8
Compare
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
a38566f
to
688ecc1
Compare
@sanposhiho |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/cc @dom4ha @macsko
/assign @alculquicondor
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
2. Even if it happens, the topology spread on the pod is just ignored (since `labelSelector` no longer matches the pod itself) | ||
and the pod could still be schedulable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true? Even if the labelSelector
no longer matches the pod itself, the topology spreading is not ignored and could still make the pod unschedulable (ofc selfMatchNum
will be 0). If it's what you mean, I'd rephrase it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't simply the label change become ignored? Does the pod itself need to be covered by the selector? I think the bigger problem is that since we'd have both implementations (in api-server and scheduler), won't scheduler add new label ot the labelSelecor
and make the pod completely unschedulable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true? Even if the labelSelector no longer matches the pod itself, the topology spreading is not ignored and could still make the pod unschedulable (ofc selfMatchNum will be 0). If it's what you mean, I'd rephrase it.
Yeah, sorry it's my suggestion and you're right, "ignored" was misleading and we should rephrase; matchNum
- minMatchNum
could still be bigger than maxSkew
and node(s) in some domains could be unschedulable, depending on the current number of matching pods there.
won't scheduler add new label ot the labelSelecor and make the pod completely unschedulable?
The scheduler adds, but the pod won't be completely unschedulable. Rather, the label addition in this case makes the pod easier to get scheduled because selfMatchNum
would be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comments.
I've tried to rephrase this sentence.
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
#### the update to labels specified at `matchLabelKeys` isn't supported | ||
|
||
`MatchLabelKeys` is handled and merged into `LabelSelector` at _a pod's creation_. | ||
It means this feature doesn't support the label's update even though users, of course, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means this feature doesn't support the label's update even though users, of course, | |
It means this feature doesn't support the label's update even though a user | |
could update the label that is specified at `matchLabelKeys` after a pod's creation. |
`TopologySpreadConstraint` which represent a set of label keys only. The scheduler | ||
will use those keys to look up label values from the incoming pod; and those key-value | ||
labels are ANDed with `LabelSelector` to identify the group of existing pods over | ||
`TopologySpreadConstraint` which represent a set of label keys only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`TopologySpreadConstraint` which represent a set of label keys only. | |
`TopologySpreadConstraint` which represents a set of label keys only. |
could update the label that is specified at `matchLabelKeys` after a pod's creation. | ||
So, in such cases, the update of the label isn't reflected onto the merged `LabelSelector`, | ||
even though users might expect it to be. | ||
On the documentation, we'll declare it's not recommended using `matchLabelKeys` with labels that is frequently updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather say that updating labels used is not recommended at all (instead of updating frequently), as it would break the original placement of such pods.
2. Even if it happens, the topology spread on the pod is just ignored (since `labelSelector` no longer matches the pod itself) | ||
and the pod could still be schedulable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't simply the label change become ignored? Does the pod itself need to be covered by the selector? I think the bigger problem is that since we'd have both implementations (in api-server and scheduler), won't scheduler add new label ot the labelSelecor
and make the pod completely unschedulable?
to `labelSelector` of `topologySpreadConstraints` to filter and group pods. | ||
The pods belonging to the same group will be part of the spreading in | ||
`PodTopologySpread`. | ||
In addition, kube-scheduler handles `matchLabelKeys` if the cluster-level default constraints is configured with `matchLabelKeys`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it only a proposal? kubernetes/kubernetes#129198 (comment)
Also, we cannot implement this feature only within kube-apiserver because it'd make it | ||
impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration. | ||
|
||
So we decided to go with the current design that implements this feature within both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we decided to go with the current design that implements this feature within both | |
So we decided to go with the design that implements this feature within both |
But, it may confuse users because this behavior would be different from PodAffinity's `MatchLabelKeys`. | ||
|
||
Also, we cannot implement this feature only within kube-apiserver because it'd make it | ||
impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration. | |
impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration in the future (see https://github.com/kubernetes/kubernetes/issues/129198). |
labels are ANDed with `LabelSelector` to identify the group of existing pods over | ||
`TopologySpreadConstraint` which represent a set of label keys only. | ||
kube-apiserver will use those keys to look up label values from the incoming pod | ||
and those key-value labels are ANDed with `LabelSelector` to identify the group of existing pods over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and those key-value labels are ANDed with `LabelSelector` to identify the group of existing pods over | |
and those key-value labels will be merged with existing `LabelSelector` to identify the group of existing pods over |
will use those keys to look up label values from the incoming pod; and those key-value | ||
labels are ANDed with `LabelSelector` to identify the group of existing pods over | ||
`TopologySpreadConstraint` which represent a set of label keys only. | ||
kube-apiserver will use those keys to look up label values from the incoming pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-apiserver will use those keys to look up label values from the incoming pod | |
At a pod creation, kube-apiserver will use those keys to look up label values from the incoming pod |
which the spreading skew will be calculated. | ||
kube-scheduler will also handle it if the cluster-level default constraints have the one with `MatchLabelKeys`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-scheduler will also handle it if the cluster-level default constraints have the one with `MatchLabelKeys`. | |
Note that in case `MatchLabelKeys`is supported in the cluster-level default constraints (see https://github.com/kubernetes/kubernetes/issues/129198), kube-scheduler will also handle it separately. |
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
Outdated
Show resolved
Hide resolved
lgtm from me. Let's wait for @dom4ha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Time is running out, so better approve now.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, mochizuki875, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
@wojtek-t We haven't modified PRR file and it's merged without your approval. |
No further concerns from my side. I have limited availability this week, so thanks @macsko and @alculquicondor for approvals |
labelSelector.