Skip to content

Commit 94bf684

Browse files
committed
address comments
Signed-off-by: Rita Zhang <[email protected]>
1 parent 6c55ab2 commit 94bf684

File tree

3 files changed

+97
-14
lines changed

3 files changed

+97
-14
lines changed

keps/prod-readiness/sig-auth/5018.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
# of http://git.k8s.io/enhancements/OWNERS_ALIASES
44
kep-number: 5018
55
alpha:
6-
approver: ""
6+
approver: "soltysh"

keps/sig-auth/5018-dra-adminaccess/README.md

Lines changed: 95 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
- [Proposal](#proposal)
1010
- [DRAAdminAccess Overview](#draadminaccess-overview)
1111
- [Workflow](#workflow)
12+
- [Risks and Mitigations](#risks-and-mitigations)
1213
- [Design Details](#design-details)
1314
- [API Changes](#api-changes)
1415
- [REST Storage Changes](#rest-storage-changes)
@@ -163,8 +164,10 @@ objects as privileged. This feature includes:
163164
name: admin-resource-claim
164165
namespace: admin-namespace
165166
spec:
166-
resourceClassName: admin-resource-class
167-
adminAccess: true
167+
devices:
168+
requests:
169+
- deviceClassName: admin-resource-class
170+
adminAccess: true
168171
```
169172
170173
1. Namespace Label for DRA Admin Mode:
@@ -222,6 +225,16 @@ objects as privileged. This feature includes:
222225
1. The DRA controller processes the request and grants privileged access to the
223226
requested device.
224227

228+
### Risks and Mitigations
229+
230+
This feature makes the assumption that an admin will only perform read-only
231+
monitoring and troubleshooting operations. However, it is possible that an admin
232+
also performs write access to the device. This risk can be mitigated by the DRA
233+
driver to ensure AdminAccess ResourceClaims can only execute read-only
234+
operations on the device. We can add a warning in the driver documentation.
235+
However, what additional permissions are granted or needed for admin operations
236+
is up to the DRA driver implementation.
237+
225238
## Design Details
226239

227240
### API Changes
@@ -293,13 +306,16 @@ admin namespace label.
293306

294307
In pkg/controller/resourceclaim/controller.go, process requests in `handleClaim`
295308
function to prevent creation of `ResourceClaim` when the `ResourceClaimTemplate`
296-
has the `adminAccess` field while the feature gate is turned off.
309+
has the `adminAccess` field while the feature gate is turned off. This behavior
310+
was added in 1.32.
297311

298312
### Kube-scheduler Changes
299313

300314
In pkg/scheduler/framework/plugins/dynamicresources/allocateddevices.go, handle
301315
claims with `adminAccess` to ensure devices allocated with `adminAccess` are
302-
skipped without invoking the callback bypassing standard allocation checks.
316+
skipped without invoking the callback bypassing standard allocation checks and
317+
these devices are not considered as allocated. This behavior was added in 1.32
318+
in `foreachAllocatedDevice`.
303319

304320
### ResourceQuota
305321

@@ -332,11 +348,19 @@ This can inform certain test coverage improvements that we want to do before
332348
extending the production code to implement this enhancement.
333349
- `<package>`: `<date>` - `<test coverage>`
334350
-->
351+
<!--
352+
Generated with:
353+
354+
go test -cover ./pkg/registry/resource/resourceclaim ./pkg/registry/resource/resourceclaimtemplate ./pkg/scheduler/framework/plugins/dynamicresources/... ./pkg/controller/resourceclaim | sed -e 's/.*\(k8s.io[a-z/-]*\).*coverage: \(.*\) of statements/- `\1`: \2/' | sort
355+
356+
-->
335357

336-
Start of v1.33 development cycle (v1.33.0-alpha.1-):
358+
During v1.33 development cycle, 2025-02-08 as of git commit 69ab91a:
337359

338-
- `k8s.io/kubernetes/pkg/registry/resource/resourceclaim`: %
339-
- `k8s.io/kubernetes/pkg/registry/resource/resourceclaimtemplate`: %
360+
- `k8s.io/kubernetes/pkg/controller/resourceclaim`: 73.5%
361+
- `k8s.io/kubernetes/pkg/registry/resource/resourceclaim`: 84.7%
362+
- `k8s.io/kubernetes/pkg/registry/resource/resourceclaimtemplate`: 68.8%
363+
- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources`: 77.3%
340364

341365
##### Integration tests
342366

@@ -348,6 +372,24 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
348372
https://storage.googleapis.com/k8s-triage/index.html
349373
-->
350374

375+
The scheduler plugin and resource claim controller are covered by the workloads
376+
in
377+
https://github.com/kubernetes/kubernetes/blob/master/test/test/integration/scheduler_perf/dra/performance-config.yaml
378+
379+
Those tests run in:
380+
381+
- [pre-submit](https://testgrid.k8s.io/presubmits-kubernetes-blocking#pull-kubernetes-integration)
382+
and
383+
[periodic](https://testgrid.k8s.io/sig-release-master-blocking#integration-master)
384+
integration testing under
385+
`k8s.io/kubernetes/test/integration/scheduler_perf.scheduler_perf` and
386+
`k8s.io/kubernetes/test/integration/scheduler_perf.dra.dra` and the
387+
`DRAAdminAccess` feature gate is already enabled.
388+
- Additional test cases will be added to `test/integration/scheduler_perf` to
389+
ensure `ResourceClaim` or `ResourceClaimTemplate` with `adminAccess: true`
390+
requests are only authorized if their namespace has the
391+
`kubernetes.io/dra-admin-access` label as described in this KEP.
392+
351393
##### e2e tests
352394

353395
<!--
@@ -380,6 +422,12 @@ ResourceClaimTemplate and ResourceClaim for admin access
380422
[Feature:DRAAdminAccess] [FeatureGate:DRAAdminAccess] [Alpha]
381423
[FeatureGate:DynamicResourceAllocation] [Beta]
382424

425+
- AdminAccess related tests in
426+
https://github.com/kubernetes/kubernetes/blob/69ab91a5c59617872c9f48737c64409a9dec2957/test/e2e/dra/dra.go#L976
427+
and
428+
https://github.com/kubernetes/kubernetes/blob/69ab91a5c59617872c9f48737c64409a9dec2957/test/e2e/dra/dra.go#L1095
429+
will be updated.
430+
383431
### Graduation Criteria
384432

385433
#### Alpha
@@ -475,7 +523,14 @@ What signals should users be paying attention to when the feature is young
475523
that might indicate a serious problem?
476524
-->
477525

478-
Will be considered for beta.
526+
If this feature is not working as intended, it could result in an increase in
527+
the `scheduler_pending_pods` metric in the kube-scheduler or an increase in the
528+
`kubelet_started_containers_errors_total` metric.
529+
530+
Further analysis by reviewing logs and pod events is needed to determine whether
531+
errors are related to this feature.
532+
533+
Will provide more details for beta.
479534

480535
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
481536

@@ -535,7 +590,11 @@ Recall that end users cannot usually observe component logs or access metrics.
535590
- Details:
536591
-->
537592

538-
Will be considered for beta.
593+
- [x] API .status
594+
- Other field: `.status.allocation` will be set for a ResoureClaim using the
595+
DRA feature when needed by a pod and
596+
`.status.allocation.devices.results[].adminAccess` will be set by the
597+
scheduler when it allocates a ResoureClaim.
539598

540599
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
541600

@@ -569,7 +628,23 @@ Pick one more of these and delete the rest.
569628
- Details:
570629
-->
571630

572-
Will be considered for beta.
631+
This feature is part of the DRA feature and therefore can use the same SLIs to
632+
determine health of the service.
633+
634+
For kube-controller-manager:
635+
636+
- [x] Metrics
637+
- Metric name: `resourceclaim_controller_create_total`
638+
- Metric name: `resourceclaim_controller_create_failures_total`
639+
- Metric name: `resourceclaim_controller_resource_claims`
640+
- Metric name: `resourceclaim_controller_allocated_resource_claims`
641+
- Metric name: `workqueue` with `name="resource_claim"`
642+
643+
For kube-scheduler and kubelet, existing metrics for handling Pods such as the
644+
["unschedulable_pods"](https://github.com/kubernetes/kubernetes/blob/6f5fa2eb2f4dc731243b00f7e781e95589b5621f/pkg/scheduler/metrics/metrics.go#L200-L206)
645+
metric in scheduler will identify pods that are currently unschedulable because
646+
of the `DynamicResources` plugin or a misconfiguration of the `AdminAccess`
647+
field.
573648

574649
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
575650

@@ -619,11 +694,16 @@ No.
619694

620695
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
621696

622-
No.
697+
Yes. Setting the field will make the ResourceClaim and ResourceClaimTemplate
698+
resources larger.
623699

624700
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
625701

626-
No.
702+
No. This feature is part of the DRA feature and therefore it uses the same
703+
existing SLIs as DRA which were derrived from metrics similar to the generic
704+
ephemeral volume controller
705+
[were added](https://github.com/kubernetes/kubernetes/blob/163553bbe0a6746e7719380e187085cf5441dfde/pkg/controller/resourceclaim/metrics/metrics.go#L32-L47)
706+
to determine health of the service.
627707

628708
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
629709

@@ -693,3 +773,6 @@ The following options were also considered:
693773
the control so that anyone can apply labels to namespaces, but they can't
694774
prevent the check from running.
695775
- RBAC++: This is not available yet, especially for the DRA timeframe.
776+
- Read-mode: Read-mode is too restrictive. AdminAccess is more like root access
777+
in that it may grant additional permissions and there needs to be checks to
778+
prevent normal users from bypassing.

keps/sig-auth/5018-dra-adminaccess/kep.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ kep-number: 5018
33
authors:
44
- "@ritazh"
55
owning-sig: sig-auth
6-
status: provisional
6+
status: implementable
77
creation-date: 2025-01-02
88
reviewers:
99
- "@pohly"

0 commit comments

Comments
 (0)