You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -74,10 +77,10 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
74
77
-[x] (R) Design details are appropriately documented
75
78
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
76
79
-[ ] e2e Tests for all Beta API Operations (endpoints)
77
-
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
80
+
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
78
81
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
79
82
-[x] (R) Graduation criteria is in place
80
-
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
83
+
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
81
84
-[x] (R) Production readiness review completed
82
85
-[ ] (R) Production readiness review approved
83
86
-[x] "Implementation History" section is up-to-date for milestone
@@ -124,7 +127,7 @@ This limitation surfaces in multi-socket, as well as single-socket multi NUMA sy
124
127
125
128
126
129
### Proposed Change
127
-
130
+
128
131
We propose to
129
132
- add a new flag in Kubelet called `TopologyManagerPolicyOptions` in the kubelet config or command line argument called `topology-manager-policy-options` which allows the user to specify the Topology Manager policy option.
130
133
- add a new topology manager option called `prefer-closest-numa-nodes`; if present, this option will enable further refinements of the existing `restricted` and `best-effort` policies, this option has no effect for `none` and `single-numa-node` policies.
@@ -138,11 +141,7 @@ To summarize properties of `prefer-closest-numa-nodes` policy:
138
141
* Choose `NUMANodeAffinity` bitmask which is the narrowest and if the bitmask width is the same prefer the bitmask with smaller average distance.
139
142
140
143
When `prefer-closest-numa-nodes` policy is enabled, we need to retrieve information regarding distances between NUMA nodes.
141
-
Right now Topology manager discovers Node layout using [CAdvisor API](https://github.com/google/cadvisor/blob/master/info/v1/machine.go#L40).
142
-
143
-
We will need to extend the `MachineInfo` struct with a `Distances` field which will describe the distance between a given NUMA node and other NUMA nodes in the system.
144
-
This is already implemented in `cadvisor` by this [patch](https://github.com/google/cadvisor/pull/3179) but it is not yet present in any of the released versions.
145
-
Until a new release of `cadvisor` includes this patch (and it gets vendored into the `kubelet`) we will need to replicate this logic in the `kubelet` code itself.
144
+
Right now Topology manager discovers Node layout using [CAdvisor API](https://github.com/google/cadvisor/blob/master/info/v1/machine.go#L40), which already exposes NUMA distance between nodes since version `v0.46.0`.
146
145
147
146
### Implementation strategy
148
147
@@ -155,7 +154,7 @@ Until a new release of `cadvisor` includes this patch (and it gets vendored into
155
154
- When `TopologyManager` is being created it discovers distances between NUMA nodes and stores them inside `manager` struct. This is temporary until `distance` information lands in `cadvisor`.
156
155
- Pass `TopologyManagerPolicyOptions` to best-effort and restricted policy. When this is specified best-hint is picked based on average distance between NUMA nodes. This would require modification to `compareHints` function to change how the best hint is calculated:
157
156
158
-
```go
157
+
```go
159
158
160
159
// NUMADistance is a matrix representing distances between NUMA nodes
161
160
typeNUMADistance [][]uint64
@@ -178,16 +177,16 @@ func compareHints(bestNonPreferredAffinityCount int, current *TopologyHint, cand
178
177
if current.Preferred && candidate.Preferred {
179
178
if candidate.NUMANodeAffinity.IsNarrowerThan(current.NUMANodeAffinity) {
180
179
return candidate
181
-
}
180
+
}
182
181
if policyOpts.PreferClosestNuma && candidate.NUMANodeAffinity.IsEqual(current.NUMANodeAffinity) {
@@ -289,7 +289,7 @@ These cases will be added in the existing e2e tests:
289
289
290
290
In 1.26 we are releasing this feature to Alpha. We propose the following management of TopologyManager policy options graduation:
291
291
292
-
-`TopologyManagerPolicyOptions` for enabling/disabling the entire feature. As this is an alpha feature, this feature gate would be disabled by default.
292
+
-`TopologyManagerPolicyOptions` for enabling/disabling the entire feature. As this is an alpha feature, this feature gate would be disabled by default.
293
293
Explicitly enabling `TopologyManagerPolicyOptions` feature gate provides us the ability to supply `TopologyManagerPolicyOptions` or `topology-manager-policy-options` flag in Kubelet.
294
294
295
295
-`TopologyManagerPolicyAlphaOptions` is not enabled by default. Topology Manager alpha options (only one as of 1.26), are hidden by default
@@ -309,6 +309,10 @@ In 1.28 this feature is being promoted to Beta. We propose following changes to
309
309
-`TopologyManagerPolicyBetaOptions` feature flag for enabling/disabling beta options will be enabled by default.
310
310
-`prefer-closest-numa-nodes` will be moved to Beta options.
311
311
312
+
In 1.32 this feature is being promoted to Beta. We propose following changes to TopologyManager policy options default visibility:
313
+
314
+
-`prefer-closest-numa-nodes` will be moved to stable options.
315
+
312
316
The graduation Criteria of options is described below:
313
317
314
318
#### Graduation of Options to `Beta-quality` (non-hidden)
@@ -342,7 +346,7 @@ No changes needed
342
346
- Components depending on the feature gate: kubelet
343
347
-[x] Change the kubelet configuration to set a `TopologyManager` policy to `restricted` or `best-effort` and a `TopologyManagerPolicyOptions` to `prefer-closest-numa-nodes`
344
348
- Will enabling / disabling the feature require downtime of the control
345
-
plane?
349
+
plane?
346
350
No.
347
351
- Will enabling / disabling the feature require downtime or reprovisioning
348
352
of a node?
@@ -455,11 +459,11 @@ N/A.
455
459
456
460
There are 2 scenarios where Kubelet may fail to start due to using this feature:
457
461
458
-
- Bad policy option name or using policy option without enabling appropriate feature flag. we are emitting appropriate error message for this case,
462
+
- Bad policy option name or using policy option without enabling appropriate feature flag. we are emitting appropriate error message for this case,
459
463
Kubelet will fail to start and print error message what happened. To recover one just have to provide fix policy option name or disable/enable feature flags.
460
464
461
465
- Cadvisor is not exposing distances for NUMA domains. In this case Kubelet will fail with `error getting NUMA distances from cadvisor` message.
462
-
Reading NUMA distances is only performed when `prefer-clostest-numa-nodes` option is specified.
466
+
Reading NUMA distances is only performed when `prefer-closest-numa-nodes` option is specified.
463
467
To recover one has to either disable `TopologyManagerPolicyOptions` feature-flag or stop using `prefer-closest-numa-nodes` option.
464
468
465
469
###### What steps should be taken if SLOs are not being met to determine the problem?
@@ -470,3 +474,36 @@ N/A.
470
474
471
475
- 2021-09-26: KEP created
472
476
- 2023-06-12: KEP updated for Beta release
477
+
- 2024-09-30: KEP updated for Stable release
478
+
479
+
480
+
## Drawbacks
481
+
482
+
<!--
483
+
Why should this KEP _not_ be implemented?
484
+
-->
485
+
Adds complexity to Topology Manager.
486
+
487
+
## Alternatives
488
+
489
+
<!--
490
+
What other approaches did you consider, and why did you rule them out? These do
491
+
not need to be as detailed as the proposal, but should include enough
492
+
information to express the idea and why it was not acceptable.
493
+
-->
494
+
495
+
Adding new Topology Manager policy similar to `best-effort` which also takes NUMA distance into consideration.
496
+
This approach was ruled out as NUMA distance can be an important factor not only for `best-effort` but also `restricted` policy.
497
+
498
+
So introducing a mechanism similar to CPUManagerPolicyOptions was an already familiar concept that could influence logic of multiple policies.
499
+
500
+
## Infrastructure Needed (Optional)
501
+
502
+
<!--
503
+
Use this section if you need things from the project/SIG. Examples include a
504
+
new subproject, repos requested, or GitHub details. Listing these here allows a
505
+
SIG to get the process for these resources started right away.
506
+
-->
507
+
508
+
To be able to do e2e testing it would be required that CI machines with at least 4 NUMA nodes exist to be able to use `prefer-closest-numa-nodes` policy option properly.
509
+
The [effort to add machines for sig-node e2e tests](https://github.com/kubernetes/k8s.io/issues/7339) is still open and it shouldn't block this KEP from graduating to GA.
0 commit comments