Skip to content

Commit b73d1da

Browse files
carloryyliaog
authored andcommittedFeb 11, 2025
KEP-2644: promote HonorPVReclaimPolicy to GA (kubernetes#5035)
* promote HonorPVReclaimPolicy to GA * update KEP with the latest template Signed-off-by: carlory <baofa.fan@daocloud.io> --------- Signed-off-by: carlory <baofa.fan@daocloud.io>
1 parent b4e3da2 commit b73d1da

File tree

3 files changed

+310
-138
lines changed

3 files changed

+310
-138
lines changed
 

‎keps/prod-readiness/sig-storage/2644.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,6 @@ kep-number: 2644
22
alpha:
33
approver: "@ehashman"
44
beta:
5-
approver: "@ehashman"
5+
approver: "@ehashman"
6+
stable:
7+
approver: "@soltysh"

‎keps/sig-storage/2644-honor-pv-reclaim-policy/README.md

+304-135
Original file line numberDiff line numberDiff line change
@@ -4,152 +4,186 @@
44
- [Release Signoff Checklist](#release-signoff-checklist)
55
- [Summary](#summary)
66
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
79
- [Proposal](#proposal)
10+
- [User Stories (Optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Story 2](#story-2)
13+
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
814
- [Risks and Mitigations](#risks-and-mitigations)
915
- [Design Details](#design-details)
1016
- [CSI Driver volumes](#csi-driver-volumes)
1117
- [In-Tree Plugin volumes](#in-tree-plugin-volumes)
1218
- [Test Plan](#test-plan)
13-
- [E2E tests](#e2e-tests)
19+
- [Prerequisite testing updates](#prerequisite-testing-updates)
20+
- [Unit tests](#unit-tests)
21+
- [Integration tests](#integration-tests)
22+
- [e2e tests](#e2e-tests)
1423
- [Graduation Criteria](#graduation-criteria)
1524
- [Alpha](#alpha)
16-
- [Alpha -&gt; Beta](#alpha---beta)
17-
- [Beta -&gt; GA](#beta---ga)
25+
- [Beta](#beta)
26+
- [GA](#ga)
1827
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
1928
- [Version Skew Strategy](#version-skew-strategy)
2029
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
2130
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
31+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
2232
- [Monitoring Requirements](#monitoring-requirements)
33+
- [Dependencies](#dependencies)
2334
- [Scalability](#scalability)
35+
- [Troubleshooting](#troubleshooting)
2436
- [Implementation History](#implementation-history)
2537
- [Drawbacks](#drawbacks)
2638
- [Alternatives](#alternatives)
39+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
2740
<!-- /toc -->
2841

2942
## Release Signoff Checklist
3043

44+
<!--
45+
**ACTION REQUIRED:** In order to merge code into a release, there must be an
46+
issue in [kubernetes/enhancements] referencing this KEP and targeting a release
47+
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
48+
of the targeted release**.
49+
50+
For enhancements that make changes to code or processes/procedures in core
51+
Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
52+
Signoff checklist to be completed.
53+
54+
Check these off as they are completed for the Release Team to track. These
55+
checklist items _must_ be updated for the enhancement to be released.
56+
-->
57+
3158
Items marked with (R) are required *prior to targeting to a milestone / release*.
3259

33-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
34-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
35-
- [ ] (R) Design details are appropriately documented
60+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
61+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
62+
- [x] (R) Design details are appropriately documented
3663
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
3764
- [ ] e2e Tests for all Beta API Operations (endpoints)
38-
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
65+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
3966
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
4067
- [ ] (R) Graduation criteria is in place
41-
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
68+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
4269
- [ ] (R) Production readiness review completed
4370
- [ ] (R) Production readiness review approved
44-
- [ ] (R) "Implementation History" section is up-to-date for milestone
45-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
71+
- [x] "Implementation History" section is up-to-date for milestone
72+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
4673
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4774

75+
<!--
76+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
77+
-->
78+
4879
[kubernetes.io]: https://kubernetes.io/
4980
[kubernetes/enhancements]: https://git.k8s.io/enhancements
5081
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
5182
[kubernetes/website]: https://git.k8s.io/website
5283

5384
## Summary
85+
5486
Reclaim policy associated with the Persistent Volume is currently ignored under certain circumstance. For a `Bound`
5587
PV-PVC pair the ordering of PV-PVC deletion determines whether the PV delete reclaim policy is honored. The PV honors
5688
the reclaim policy if the PVC is deleted prior to deleting the PV, however, if the PV is deleted prior to deleting the
5789
PVC then the Reclaim policy is not exercised. As a result of this behavior, the associated storage asset in the
5890
external infrastructure is not removed.
5991

6092
## Motivation
93+
6194
Prevent volumes from being leaked by honoring the PV Reclaim policy after the `Bound` PVC is also deleted.
6295

96+
### Goals
97+
98+
- Ensure no volumes are leaked when the PV is deleted prior to deleting the PVC if the PV Reclaim policy is set to `Delete`.
99+
100+
### Non-Goals
101+
102+
<!--
103+
What is out of scope for this KEP? Listing non-goals helps to focus discussion
104+
and make progress.
105+
-->
106+
63107
## Proposal
64-
To better understand the existing issue we will initially walk through existing behavior when a Bound PV is deleted prior
65-
to deleting the PVC.
108+
109+
To better understand the existing issue we will initially walk through existing behavior when a Bound PV is deleted prior to deleting the PVC.
66110

67111
```
68112
kubectl delete pv <pv-name>
69113
```
70-
On deleting a `Bound` PV, a `deletionTimestamp` is added on to the PV object, this triggers an update which is consumed
71-
by the PV-PVC-controller. The PV-PVC-controller observes that the PV is still associated with a PVC, and the PVC hasn't
72-
been deleted, as a result of this, the PV-PVC-controller attempts to update the PV phase to `Bound` state. Assuming that
73-
the PV was already bound previously, this update eventually amounts to a no-op.
114+
115+
On deleting a `Bound` PV, a `deletionTimestamp` is added on to the PV object, this triggers an update which is consumed by the PV-PVC-controller. The PV-PVC-controller observes that the PV is still associated with a PVC, and the PVC hasn't been deleted, as a result of this, the PV-PVC-controller attempts to update the PV phase to `Bound` state. Assuming that the PV was already bound previously, this update eventually amounts to a no-op.
74116

75117
The PVC is deleted at a later point of time.
118+
76119
```
77120
kubectl -n <namespace> delete pvc <pvc-name>
78121
```
79122

80123
1. A `deletionTimestamp` is added on the PVC object.
81-
2. The PVC-protection-controller picks the update, verifies if `deletionTimestamp` is present and there are no pods that
82-
are currently using the PVC.
124+
2. The PVC-protection-controller picks the update, verifies if `deletionTimestamp` is present and there are no pods that are currently using the PVC.
83125
3. If there are no Pods using the PVC, the PVC finalizers are removed, eventually triggering a PVC delete event.
84-
4. The PV-PVC-controller processes delete PVC event, leading to removal of the PVC from it's in-memory cache followed
85-
by triggering an explicit sync on the PV.
86-
5. The PV-PVC-controller processes the triggered explicit sync, here, it observes that the PVC is no longer available
87-
and updates the PV phase to `Released` state.
88-
6. If the PV-protection-controller picks the update, it observes that there is a `deletionTimestamp` and the PV is not
89-
in a `Bound` phase, this causes the finalizers to be removed.
126+
4. The PV-PVC-controller processes delete PVC event, leading to removal of the PVC from it's in-memory cache followed by triggering an explicit sync on the PV.
127+
5. The PV-PVC-controller processes the triggered explicit sync, here, it observes that the PVC is no longer available and updates the PV phase to `Released` state.
128+
6. If the PV-protection-controller picks the update, it observes that there is a `deletionTimestamp` and the PV is not in a `Bound` phase, this causes the finalizers to be removed.
90129
7. This is followed by PV-PVC-controller initiating the reclaim volume workflows.
91130
8. The reclaim volume workflow observes the `persistentVolumeReclaimPolicy` as `Delete` and schedules a volume deletion.
92-
9. Under the event that (6) has occurred and when `deleteVolumeOperation` executes it attempts to retrieve the latest PV
93-
state from the API server, however, due to (6) the PV is removed from the API server, leading to an error state. This
94-
results in the plugin volume deletion not being exercised hence leaking the volume.
95-
```go
96-
func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) {
97-
klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name)
98-
99-
// This method may have been waiting for a volume lock for some time.
100-
// Previous deleteVolumeOperation might just have saved an updated version, so
101-
// read current volume state now.
102-
newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{}) //<========== NotFound error thrown
103-
if err != nil {
104-
klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err)
105-
return "", nil
106-
}
107-
// Remaining code below skipped for brevity...
108-
// Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code.
109-
```
110-
10. Under the event that (6) has not occurred yet, during execution of `deleteVolumeOperation` it is observed that the
111-
PV has a pre-existing `deletionTimestamp`, this makes the method assume that delete is already being processed. This
112-
results in the plugin volume deletion not being exercised hence leaking the volume.
113-
```go
114-
func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) {
115-
klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name)
116-
117-
// This method may have been waiting for a volume lock for some time.
118-
// Previous deleteVolumeOperation might just have saved an updated version, so
119-
// read current volume state now.
120-
newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{})
121-
if err != nil {
122-
klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err)
123-
return "", nil
124-
}
125-
126-
if newVolume.GetDeletionTimestamp() != nil {//<==========================DeletionTimestamp set since the PV was deleted first.
127-
klog.V(3).Infof("Volume %q is already being deleted", volume.Name)
128-
return "", nil
129-
}
130-
// Remaining code below skipped for brevity...
131-
// Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code.
132-
```
133-
11. Meanwhile, the external-provisioner checks if there is a `deletionTimestamp` on the PV, if so, it assumes that its
134-
in a transitory state and returns false for the `shouldDelete` check.
135-
```go
136-
// shouldDelete returns whether a volume should have its backing volume
137-
// deleted, i.e. whether a Delete is "desired"
138-
func (ctrl *ProvisionController) shouldDelete(ctx context.Context, volume *v1.PersistentVolume) bool {
139-
if deletionGuard, ok := ctrl.provisioner.(DeletionGuard); ok {
140-
if !deletionGuard.ShouldDelete(ctx, volume) {
141-
return false
142-
}
143-
}
144-
145-
if ctrl.addFinalizer && !ctrl.checkFinalizer(volume, finalizerPV) && volume.ObjectMeta.DeletionTimestamp != nil {
146-
return false
147-
} else if volume.ObjectMeta.DeletionTimestamp != nil { //<========== DeletionTimestamp is set on the PV since it's deleted first.
148-
return false
149-
}
150-
// Remaining code below skipped for brevity...
151-
// Please refer sig-storage-lib-external-provisioner/controller/controller.go for the full code.
152-
```
131+
9. Under the event that (6) has occurred and when `deleteVolumeOperation` executes it attempts to retrieve the latest PV state from the API server, however, due to (6) the PV is removed from the API server, leading to an error state. This results in the plugin volume deletion not being exercised hence leaking the volume.
132+
```go
133+
func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) {
134+
klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name)
135+
136+
// This method may have been waiting for a volume lock for some time.
137+
// Previous deleteVolumeOperation might just have saved an updated version, so
138+
// read current volume state now.
139+
newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{}) //<========== NotFound error thrown
140+
if err != nil {
141+
klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err)
142+
return "", nil
143+
}
144+
// Remaining code below skipped for brevity...
145+
// Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code.
146+
```
147+
10. Under the event that (6) has not occurred yet, during execution of `deleteVolumeOperation` it is observed that the PV has a pre-existing `deletionTimestamp`, this makes the method assume that delete is already being processed. This results in the plugin volume deletion not being exercised hence leaking the volume.
148+
```go
149+
func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) {
150+
klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name)
151+
152+
// This method may have been waiting for a volume lock for some time.
153+
// Previous deleteVolumeOperation might just have saved an updated version, so
154+
// read current volume state now.
155+
newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{})
156+
if err != nil {
157+
klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err)
158+
return "", nil
159+
}
160+
161+
if newVolume.GetDeletionTimestamp() != nil {//<==========================DeletionTimestamp set since the PV was deleted first.
162+
klog.V(3).Infof("Volume %q is already being deleted", volume.Name)
163+
return "", nil
164+
}
165+
// Remaining code below skipped for brevity...
166+
// Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code.
167+
```
168+
11. Meanwhile, the external-provisioner checks if there is a `deletionTimestamp` on the PV, if so, it assumes that its in a transitory state and returns false for the `shouldDelete` check.
169+
```go
170+
// shouldDelete returns whether a volume should have its backing volume
171+
// deleted, i.e. whether a Delete is "desired"
172+
func (ctrl *ProvisionController) shouldDelete(ctx context.Context, volume *v1.PersistentVolume) bool {
173+
if deletionGuard, ok := ctrl.provisioner.(DeletionGuard); ok {
174+
if !deletionGuard.ShouldDelete(ctx, volume) {
175+
return false
176+
}
177+
}
178+
179+
if ctrl.addFinalizer && !ctrl.checkFinalizer(volume, finalizerPV) && volume.ObjectMeta.DeletionTimestamp != nil {
180+
return false
181+
} else if volume.ObjectMeta.DeletionTimestamp != nil { //<========== DeletionTimestamp is set on the PV since it's deleted first.
182+
return false
183+
}
184+
// Remaining code below skipped for brevity...
185+
// Please refer sig-storage-lib-external-provisioner/controller/controller.go for the full code.
186+
```
153187
154188
The fix applies to both csi driver volumes and in-tree plugin volumes each either statically or dynamically provisioned.
155189
@@ -164,20 +198,37 @@ When the PVC is deleted, the PV is moved into `Released` state and following che
164198
165199
2. CSI driver
166200
167-
If when processing the PV update it is observed that it has `external-provisioner.volume.kubernetes.io/finalizer` finalizer and
168-
`DeletionTimestamp` set, then the volume deletion request is forwarded to the driver, provided other pre-defined conditions are met.
201+
If when processing the PV update it is observed that it has `external-provisioner.volume.kubernetes.io/finalizer` finalizer and `DeletionTimestamp` set, then the volume deletion request is forwarded to the driver, provided other pre-defined conditions are met.
169202
170203
When a PV with `Delete` reclaim policy is not associated with a PVC:
171204
172-
Under the event that the PV is not associated with a PVC, either finalizers `kubernetes.io/pv-controller` or `external-provisioner.volume.kubernetes.io/finalizer` are not added to the PV.
173-
If such a PV is deleted the reclaim workflow is not executed, this is the current behavior and will be retained.
205+
Under the event that the PV is not associated with a PVC, either finalizers `kubernetes.io/pv-controller` or `external-provisioner.volume.kubernetes.io/finalizer` are not added to the PV. If such a PV is deleted the reclaim workflow is not executed, this is the current behavior and will be retained.
206+
207+
### User Stories (Optional)
208+
209+
<!--
210+
Detail the things that people will be able to do if this KEP is implemented.
211+
Include as much detail as possible so that people can understand the "how" of
212+
the system. The goal here is to make this feel real for users without getting
213+
bogged down.
214+
-->
215+
216+
#### Story 1
217+
218+
#### Story 2
219+
220+
### Notes/Constraints/Caveats (Optional)
221+
222+
<!--
223+
What are the caveats to the proposal?
224+
What are some important details that didn't come across above?
225+
Go in to as much detail as necessary here.
226+
This might be a good place to talk about core concepts and how they relate.
227+
-->
174228
175229
### Risks and Mitigations
176230
177-
The primary risk is volume deletion to a user that expect the current behavior, i.e, they do not expect the volume to be
178-
deleted from the underlying storage infrastructure when for a bound PV-PVC pair, the PV is deleted followed by deleting
179-
the PVC. This can be mitigated by initially introducing the fix behind a feature gate, followed by explicitly deprecating
180-
the current behavior.
231+
The primary risk is volume deletion to a user that expect the current behavior, i.e, they do not expect the volume to be deleted from the underlying storage infrastructure when for a bound PV-PVC pair, the PV is deleted followed by deleting the PVC. This can be mitigated by initially introducing the fix behind a feature gate, followed by explicitly deprecating the current behavior.
181232
182233
## Design Details
183234
@@ -261,24 +312,71 @@ The finalizer `kubernetes.io/pv-controller` will not be added on statically prov
261312
262313
### Test Plan
263314
264-
#### E2E tests
315+
<!--
316+
**Note:** *Not required until targeted at a release.*
317+
The goal is to ensure that we don't accept enhancements with inadequate testing.
318+
319+
All code is expected to have adequate tests (eventually with coverage
320+
expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
321+
when drafting this test plan.
322+
323+
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
324+
-->
325+
326+
[x] I/we understand the owners of the involved components may require updates to
327+
existing tests to make this code solid enough prior to committing the changes necessary
328+
to implement this enhancement.
329+
330+
##### Prerequisite testing updates
331+
332+
<!--
333+
Based on reviewers feedback describe what additional tests need to be added prior
334+
implementing this enhancement to ensure the enhancements have also solid foundations.
335+
-->
336+
337+
##### Unit tests
338+
339+
- `pkg/controller/volume/persistentvolume`: [`2025-02-05` - `80.4%`](https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit&include-filter-by-regex=pkg/controller/volume/persistentvolume)
340+
341+
##### Integration tests
342+
343+
No integration tests are required. This feature is better tested with e2e tests.
344+
345+
##### e2e tests
346+
265347
An e2e test to exercise deletion of PV prior to deletion of PVC for a `Bound` PV-PVC pair.
266348
267-
- `test/e2e/storage/csimock/csi_honor_pv_reclaim_policy.go`: [k8s-triage](https://storage.googleapis.com/k8s-triage/index.html?sig=scheduling,storage&test=honor%20pv%20reclaim%20policy)
349+
- [test/e2e/storage/csimock/csi_honor_pv_reclaim_policy.go](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/storage/csimock/csi_honor_pv_reclaim_policy.go): [k8s-triage](https://storage.googleapis.com/k8s-triage/index.html?sig=scheduling,storage&test=honor%20pv%20reclaim%20policy), [k8s-test-grid](https://testgrid.k8s.io/sig-storage-kubernetes#kind-storage-alpha-beta-features&include-filter-by-regex=CSI%20Mock%20honor%20pv%20reclaim%20policy)
268350
269351
### Graduation Criteria
270352
271353
#### Alpha
272-
* Feedback
273-
* Fix of the issue behind a feature flag.
274-
* Unit tests and e2e tests outlined in design proposal implemented.
275-
* Tests should be done with both CSI Migration disabled and enabled.
276354
277-
#### Alpha -> Beta
278-
* Allow the Alpha fix to soak for one release.
355+
- Feedback
356+
- Fix of the issue behind a feature flag.
357+
- Unit tests and e2e tests outlined in design proposal implemented.
358+
- Tests should be done with both CSI Migration disabled and enabled.
359+
360+
#### Beta
361+
362+
- Allow the Alpha fix to soak for one release.
363+
- Gather feedback from developers and surveys
364+
- Fully implemented in both kubernetes and CSI driver repositories
365+
- Additional tests are in Testgrid and linked in KEP
366+
367+
#### GA
368+
369+
- Deployed in production and have gone through at least one K8s upgrade.
370+
- Allowing time for feedback
279371
280-
#### Beta -> GA
281-
* Deployed in production and have gone through at least one K8s upgrade.
372+
**Note:** Generally we also wait at least two releases between beta and
373+
GA/stable, because there's no opportunity for user feedback, or even bug reports,
374+
in back-to-back releases.
375+
376+
**For non-optional features moving to GA, the graduation criteria must include
377+
[conformance tests].**
378+
379+
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
282380
283381
### Upgrade / Downgrade Strategy
284382
@@ -292,6 +390,7 @@ if there are PVs that have a valid associated PVC and deletion timestamp set, th
292390
In this case, there may be PVs with the deletion finalizer that the older Kubernetes does not remove. Such PVs will be in the API server forever unless if the user manually removes them.
293391
294392
### Version Skew Strategy
393+
295394
The fix is part of `kube-controller-manager` and `external-provisioner`.
296395
297396
1. `kube-controller-manager` is upgraded, but `external-provisioner` is not:
@@ -316,84 +415,154 @@ In addition, PVs migrated to CSI will have the finalizer. When the CSI migration
316415
317416
- [x] Feature gate (also fill in values in `kep.yaml`)
318417
- Feature gate name: `HonorPVReclaimPolicy`
319-
- Components depending on the feature gate: kube-controller-manager, external-provisioner
418+
- Components depending on the feature gate:
419+
- kube-controller-manager
420+
- external-provisioner
320421
321422
###### Does enabling the feature change any default behavior?
322423
323-
Enabling the feature will delete the volume from underlying storage when the PV is deleted followed deleting the PVC for
324-
a bound PV-PVC pair where the PV reclaim policy is delete.
424+
Enabling the feature will delete the volume from underlying storage when the PV is deleted followed deleting the PVC for a bound PV-PVC pair where the PV reclaim policy is delete.
325425
326426
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
327-
Yes. Disabling the feature flag will continue previous behavior. However, after the rollback if there are existing PVs
328-
that have the finalizer `external-provisioner.volume.kubernetes.io/finalizer` or `kubernetes.io/pv-controller` cannot
329-
be deleted from the API server(due to the finalizers), these PVs must be explicitly patched to remove the finalizers.
427+
428+
Yes. Disabling the feature flag will continue previous behavior. However, after the rollback if there are existing PVs that have the finalizer `external-provisioner.volume.kubernetes.io/finalizer` or `kubernetes.io/pv-controller` cannot be deleted from the API server(due to the finalizers), these PVs must be explicitly patched to remove the finalizers.
330429
331430
###### What happens if we reenable the feature if it was previously rolled back?
431+
332432
Will pick the new behavior as described before.
333433
334434
###### Are there any tests for feature enablement/disablement?
335435
336436
There will be unit tests for the feature `HonorPVReclaimPolicy` enablement/disablement.
337437
338-
### Monitoring Requirements
438+
### Rollout, Upgrade and Rollback Planning
339439
340-
_This section must be completed when targeting beta graduation to a release._
440+
###### How can a rollout or rollback fail? Can it impact already running workloads?
341441
342-
* **How can an operator determine if the feature is in use by workloads?**
442+
No.
443+
444+
###### What specific metrics should inform a rollback?
445+
446+
Users can compare these existing metrics with the feature gate enabled and disabled and see if downgrade actually helped.
447+
448+
- `persistentvolumeclaim_provision_failed_total`
449+
- `persistentvolume_delete_failed_total`
450+
451+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
452+
453+
This feature is already beta since 1.30. No upgrade or rollback needed to test this.
454+
455+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
456+
457+
No.
458+
459+
### Monitoring Requirements
460+
461+
###### How can an operator determine if the feature is in use by workloads?
343462
344463
The presence of finalizer described above in Persistent Volume will indicate that the feature is enabled.
345464
The finalizer `external-provisioner.volume.kubernetes.io/finalizer` will be present on the CSI Driver volumes.
346465
The finalizer `kubernetes.io/pv-controller` will be present on In-Tree Plugin volumes.
347466
348-
* **What are the SLIs (Service Level Indicators) an operator can use to determine
349-
the health of the service?**
350-
- [X] Metrics
351-
- Metric name: For CSI Driver volumes, `persistentvolume_delete_duration_seconds` metric is used to track the time
352-
taken for PV deletion. `volume_operation_total_seconds` metrics tracks the end-to-end latency for delete
353-
operation, it is applicable to both CSI driver volumes and in-tree plugin volumes.
354-
- [Optional] Aggregation method:
355-
- [ ] Other (treat as last resort)
356-
- Details:
467+
###### How can someone using this feature know that it is working for their instance?
468+
469+
- [ ] Events
470+
- Event Reason:
471+
- [ ] API .status
472+
- Condition name:
473+
- Other field:
474+
- [x] Other (treat as last resort)
475+
- Details: the `metadata.finalizers` field of a PV object will have the finalizer `external-provisioner.volume.kubernetes.io/finalizer` for CSI Driver volumes and `kubernetes.io/pv-controller` for In-Tree Plugin volumes.
476+
477+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
357478
358-
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
359-
The current SLOs for `persistentvolume_delete_duration_seconds` and `volume_operation_total_seconds` would be
360-
increased by the amount of time taken to remove the newly introduced finalizer. This should be an insignificant
361-
increase.
479+
The current SLOs for `persistentvolume_delete_duration_seconds` and `volume_operation_total_seconds` would be increased by the amount of time taken to remove the newly introduced finalizer. This should be an insignificant increase.
362480
363-
* **Are there any missing metrics that would be useful to have to improve observability
364-
of this feature?**
481+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
365482
366-
No
483+
<!--
484+
Pick one more of these and delete the rest.
485+
-->
486+
487+
- [x] Metrics
488+
- Metric name: For CSI Driver volumes, `persistentvolume_delete_duration_seconds` metric is used to track the time taken for PV deletion. `volume_operation_total_seconds` metrics tracks the end-to-end latency for delete operation, it is applicable to both CSI driver volumes and in-tree plugin volumes.
489+
- [Optional] Aggregation method:
490+
- Components exposing the metric:
491+
- [ ] Other (treat as last resort)
492+
- Details:
493+
494+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
495+
496+
No.
497+
498+
### Dependencies
499+
500+
###### Does this feature depend on any specific services running in the cluster?
501+
502+
Yes, a CSI driver supporting this enhancement is required when a persistent volume is provisioned with the csi source or can be migrated to CSI.
367503
368504
### Scalability
369505
370506
###### Will enabling / using this feature result in any new API calls?
507+
371508
The new finalizer will be added to all existing PVs when the feature is enabled. This will be a one-time sync.
372509
373510
###### Will enabling / using this feature result in introducing new API types?
511+
374512
No.
375513
376514
###### Will enabling / using this feature result in any new calls to the cloud provider?
377-
Yes, previously the delete volume call would not reach the provider under the described circumstances, with this change
378-
the volume delete will be invoked to remove the volume from the underlying storage.
515+
516+
Yes, previously the delete volume call would not reach the provider under the described circumstances, with this change the volume delete will be invoked to remove the volume from the underlying storage.
379517
380518
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
519+
381520
When the feature is enabled, we will be adding a new finalizer to existing PVs so this would result in an increase of the PV size.
382521
383522
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
523+
384524
We have metric for delete volume operations. So potentially the time to delete a PV could be longer now since we want to make sure volume on the storage system is deleted first before deleting the PV.
385525
386526
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
527+
387528
No.
388529
530+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
531+
532+
No.
533+
534+
### Troubleshooting
535+
536+
###### How does this feature react if the API server and/or etcd is unavailable?
537+
538+
If API server is unavailable, an API update on the PV object will fail due to timeout. Failed operations will be added back to a rate-limited queue for retries. A `DeleteVolume` call to CSI driver is idempotent. So when API server is back, and the same request is sent to the CSI driver again, the CSI driver should return success even if the volume is already deleted from the storage system by the previous request.
539+
540+
###### What are other known failure modes?
541+
542+
- Volume may not be deleted from the underlying storage if the PV is deleted rapidly after the reclaim policy changes to `Delete`.
543+
- Detection: The only way to detect this is by comparing the PVs in the API server and the volumes in the storage system.
544+
- Mitigations: If the reclaim policy is `Delete`, please make sure the finalizer is added to the PV before deleting the PV.
545+
- Diagnostics: external-provisioner should log `error syncing volume: xxx`.
546+
- Testing: It's hard to test this failure mode. Because there's a race condition between the PV deletion and the finalizer addition.
547+
548+
###### What steps should be taken if SLOs are not being met to determine the problem?
549+
550+
N/A.
551+
389552
## Implementation History
390553
391-
1.26: Alpha
392-
1.31: Add e2e tests for the feature and promote to Beta.
554+
- 1.26: Alpha
555+
- 1.31: Add e2e tests for the feature and promote to Beta.
556+
- 1.33: Promote to GA.
393557
394558
## Drawbacks
395-
None. The current behavior could be considered as a drawback, the KEP presents the fix to the drawback. The current
396-
behavior was reported in [Issue-546](https://github.com/kubernetes-csi/external-provisioner/issues/546) and [Issue-195](https://github.com/kubernetes-csi/external-provisioner/issues/195)
559+
560+
None. The current behavior could be considered as a drawback, the KEP presents the fix to the drawback. The current behavior was reported in [Issue-546](https://github.com/kubernetes-csi/external-provisioner/issues/546) and [Issue-195](https://github.com/kubernetes-csi/external-provisioner/issues/195)
397561
398562
## Alternatives
399-
Other approaches to fix have not been considered and will be considered if presented during the KEP review.
563+
564+
Other approaches to fix have not been considered and will be considered if presented during the KEP review.
565+
566+
## Infrastructure Needed (Optional)
567+
568+
Initially, all development will happen inside the main Kubernetes and CSI driver repositories. The mock driver can be developed inside test/e2e/storage/csimock. For the generic part of that driver, i.e. the code that other drivers can reuse, is developed in the kubernetes-csi organization.

‎keps/sig-storage/2644-honor-pv-reclaim-policy/kep.yaml

+3-2
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,14 @@ approvers:
1616
see-also:
1717
replaces:
1818

19-
stage: beta
19+
stage: stable
2020

21-
latest-milestone: "v1.31"
21+
latest-milestone: "v1.33"
2222

2323
milestone:
2424
alpha: "v1.23"
2525
beta: "v1.31"
26+
stable: "v1.33"
2627

2728
feature-gates:
2829
- name: HonorPVReclaimPolicy

0 commit comments

Comments
 (0)