diff --git a/keps/prod-readiness/sig-storage/2644.yaml b/keps/prod-readiness/sig-storage/2644.yaml index 7ce1a474ddc..0cbe3bd86db 100644 --- a/keps/prod-readiness/sig-storage/2644.yaml +++ b/keps/prod-readiness/sig-storage/2644.yaml @@ -2,4 +2,6 @@ kep-number: 2644 alpha: approver: "@ehashman" beta: - approver: "@ehashman" \ No newline at end of file + approver: "@ehashman" +stable: + approver: "@soltysh" \ No newline at end of file diff --git a/keps/sig-storage/2644-honor-pv-reclaim-policy/README.md b/keps/sig-storage/2644-honor-pv-reclaim-policy/README.md index 6712663bf30..e75cd191951 100644 --- a/keps/sig-storage/2644-honor-pv-reclaim-policy/README.md +++ b/keps/sig-storage/2644-honor-pv-reclaim-policy/README.md @@ -4,53 +4,85 @@ - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) - [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [CSI Driver volumes](#csi-driver-volumes) - [In-Tree Plugin volumes](#in-tree-plugin-volumes) - [Test Plan](#test-plan) - - [E2E tests](#e2e-tests) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) - [Graduation Criteria](#graduation-criteria) - [Alpha](#alpha) - - [Alpha -> Beta](#alpha---beta) - - [Beta -> GA](#beta---ga) + - [Beta](#beta) + - [GA](#ga) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) - [Implementation History](#implementation-history) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) ## Release Signoff Checklist + + Items marked with (R) are required *prior to targeting to a milestone / release*. -- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) -- [ ] (R) KEP approvers have approved the KEP status as `implementable` -- [ ] (R) Design details are appropriately documented +- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [x] (R) KEP approvers have approved the KEP status as `implementable` +- [x] (R) Design details are appropriately documented - [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) - [ ] e2e Tests for all Beta API Operations (endpoints) - - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free - [ ] (R) Graduation criteria is in place - - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Production readiness review completed - [ ] (R) Production readiness review approved -- [ ] (R) "Implementation History" section is up-to-date for milestone -- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [x] "Implementation History" section is up-to-date for milestone +- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + [kubernetes.io]: https://kubernetes.io/ [kubernetes/enhancements]: https://git.k8s.io/enhancements [kubernetes/kubernetes]: https://git.k8s.io/kubernetes [kubernetes/website]: https://git.k8s.io/website ## Summary + Reclaim policy associated with the Persistent Volume is currently ignored under certain circumstance. For a `Bound` PV-PVC pair the ordering of PV-PVC deletion determines whether the PV delete reclaim policy is honored. The PV honors the reclaim policy if the PVC is deleted prior to deleting the PV, however, if the PV is deleted prior to deleting the @@ -58,98 +90,100 @@ PVC then the Reclaim policy is not exercised. As a result of this behavior, the external infrastructure is not removed. ## Motivation + Prevent volumes from being leaked by honoring the PV Reclaim policy after the `Bound` PVC is also deleted. +### Goals + +- Ensure no volumes are leaked when the PV is deleted prior to deleting the PVC if the PV Reclaim policy is set to `Delete`. + +### Non-Goals + + + ## Proposal -To better understand the existing issue we will initially walk through existing behavior when a Bound PV is deleted prior -to deleting the PVC. + +To better understand the existing issue we will initially walk through existing behavior when a Bound PV is deleted prior to deleting the PVC. ``` kubectl delete pv ``` -On deleting a `Bound` PV, a `deletionTimestamp` is added on to the PV object, this triggers an update which is consumed -by the PV-PVC-controller. The PV-PVC-controller observes that the PV is still associated with a PVC, and the PVC hasn't -been deleted, as a result of this, the PV-PVC-controller attempts to update the PV phase to `Bound` state. Assuming that -the PV was already bound previously, this update eventually amounts to a no-op. + +On deleting a `Bound` PV, a `deletionTimestamp` is added on to the PV object, this triggers an update which is consumed by the PV-PVC-controller. The PV-PVC-controller observes that the PV is still associated with a PVC, and the PVC hasn't been deleted, as a result of this, the PV-PVC-controller attempts to update the PV phase to `Bound` state. Assuming that the PV was already bound previously, this update eventually amounts to a no-op. The PVC is deleted at a later point of time. + ``` kubectl -n delete pvc ``` 1. A `deletionTimestamp` is added on the PVC object. -2. The PVC-protection-controller picks the update, verifies if `deletionTimestamp` is present and there are no pods that - are currently using the PVC. +2. The PVC-protection-controller picks the update, verifies if `deletionTimestamp` is present and there are no pods that are currently using the PVC. 3. If there are no Pods using the PVC, the PVC finalizers are removed, eventually triggering a PVC delete event. -4. The PV-PVC-controller processes delete PVC event, leading to removal of the PVC from it's in-memory cache followed - by triggering an explicit sync on the PV. -5. The PV-PVC-controller processes the triggered explicit sync, here, it observes that the PVC is no longer available - and updates the PV phase to `Released` state. -6. If the PV-protection-controller picks the update, it observes that there is a `deletionTimestamp` and the PV is not - in a `Bound` phase, this causes the finalizers to be removed. +4. The PV-PVC-controller processes delete PVC event, leading to removal of the PVC from it's in-memory cache followed by triggering an explicit sync on the PV. +5. The PV-PVC-controller processes the triggered explicit sync, here, it observes that the PVC is no longer available and updates the PV phase to `Released` state. +6. If the PV-protection-controller picks the update, it observes that there is a `deletionTimestamp` and the PV is not in a `Bound` phase, this causes the finalizers to be removed. 7. This is followed by PV-PVC-controller initiating the reclaim volume workflows. 8. The reclaim volume workflow observes the `persistentVolumeReclaimPolicy` as `Delete` and schedules a volume deletion. -9. Under the event that (6) has occurred and when `deleteVolumeOperation` executes it attempts to retrieve the latest PV - state from the API server, however, due to (6) the PV is removed from the API server, leading to an error state. This - results in the plugin volume deletion not being exercised hence leaking the volume. -```go -func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) { - klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name) - - // This method may have been waiting for a volume lock for some time. - // Previous deleteVolumeOperation might just have saved an updated version, so - // read current volume state now. - newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{}) //<========== NotFound error thrown - if err != nil { - klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err) - return "", nil - } - // Remaining code below skipped for brevity... - // Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code. -``` -10. Under the event that (6) has not occurred yet, during execution of `deleteVolumeOperation` it is observed that the - PV has a pre-existing `deletionTimestamp`, this makes the method assume that delete is already being processed. This - results in the plugin volume deletion not being exercised hence leaking the volume. -```go -func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) { - klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name) - - // This method may have been waiting for a volume lock for some time. - // Previous deleteVolumeOperation might just have saved an updated version, so - // read current volume state now. - newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{}) - if err != nil { - klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err) - return "", nil - } - - if newVolume.GetDeletionTimestamp() != nil {//<==========================DeletionTimestamp set since the PV was deleted first. - klog.V(3).Infof("Volume %q is already being deleted", volume.Name) - return "", nil - } - // Remaining code below skipped for brevity... - // Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code. -``` -11. Meanwhile, the external-provisioner checks if there is a `deletionTimestamp` on the PV, if so, it assumes that its - in a transitory state and returns false for the `shouldDelete` check. -```go -// shouldDelete returns whether a volume should have its backing volume -// deleted, i.e. whether a Delete is "desired" -func (ctrl *ProvisionController) shouldDelete(ctx context.Context, volume *v1.PersistentVolume) bool { - if deletionGuard, ok := ctrl.provisioner.(DeletionGuard); ok { - if !deletionGuard.ShouldDelete(ctx, volume) { - return false - } - } - - if ctrl.addFinalizer && !ctrl.checkFinalizer(volume, finalizerPV) && volume.ObjectMeta.DeletionTimestamp != nil { - return false - } else if volume.ObjectMeta.DeletionTimestamp != nil { //<========== DeletionTimestamp is set on the PV since it's deleted first. - return false - } - // Remaining code below skipped for brevity... - // Please refer sig-storage-lib-external-provisioner/controller/controller.go for the full code. -``` +9. Under the event that (6) has occurred and when `deleteVolumeOperation` executes it attempts to retrieve the latest PV state from the API server, however, due to (6) the PV is removed from the API server, leading to an error state. This results in the plugin volume deletion not being exercised hence leaking the volume. + ```go + func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) { + klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name) + + // This method may have been waiting for a volume lock for some time. + // Previous deleteVolumeOperation might just have saved an updated version, so + // read current volume state now. + newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{}) //<========== NotFound error thrown + if err != nil { + klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err) + return "", nil + } + // Remaining code below skipped for brevity... + // Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code. + ``` +10. Under the event that (6) has not occurred yet, during execution of `deleteVolumeOperation` it is observed that the PV has a pre-existing `deletionTimestamp`, this makes the method assume that delete is already being processed. This results in the plugin volume deletion not being exercised hence leaking the volume. + ```go + func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) { + klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name) + + // This method may have been waiting for a volume lock for some time. + // Previous deleteVolumeOperation might just have saved an updated version, so + // read current volume state now. + newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(context.TODO(), volume.Name, metav1.GetOptions{}) + if err != nil { + klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err) + return "", nil + } + + if newVolume.GetDeletionTimestamp() != nil {//<==========================DeletionTimestamp set since the PV was deleted first. + klog.V(3).Infof("Volume %q is already being deleted", volume.Name) + return "", nil + } + // Remaining code below skipped for brevity... + // Please refer pkg/controller/volume/persistentvolume/pv_controller.go in kubernetes/kubernetes for the full code. + ``` +11. Meanwhile, the external-provisioner checks if there is a `deletionTimestamp` on the PV, if so, it assumes that its in a transitory state and returns false for the `shouldDelete` check. + ```go + // shouldDelete returns whether a volume should have its backing volume + // deleted, i.e. whether a Delete is "desired" + func (ctrl *ProvisionController) shouldDelete(ctx context.Context, volume *v1.PersistentVolume) bool { + if deletionGuard, ok := ctrl.provisioner.(DeletionGuard); ok { + if !deletionGuard.ShouldDelete(ctx, volume) { + return false + } + } + + if ctrl.addFinalizer && !ctrl.checkFinalizer(volume, finalizerPV) && volume.ObjectMeta.DeletionTimestamp != nil { + return false + } else if volume.ObjectMeta.DeletionTimestamp != nil { //<========== DeletionTimestamp is set on the PV since it's deleted first. + return false + } + // Remaining code below skipped for brevity... + // Please refer sig-storage-lib-external-provisioner/controller/controller.go for the full code. + ``` The fix applies to both csi driver volumes and in-tree plugin volumes each either statically or dynamically provisioned. @@ -164,20 +198,37 @@ When the PVC is deleted, the PV is moved into `Released` state and following che 2. CSI driver -If when processing the PV update it is observed that it has `external-provisioner.volume.kubernetes.io/finalizer` finalizer and -`DeletionTimestamp` set, then the volume deletion request is forwarded to the driver, provided other pre-defined conditions are met. +If when processing the PV update it is observed that it has `external-provisioner.volume.kubernetes.io/finalizer` finalizer and `DeletionTimestamp` set, then the volume deletion request is forwarded to the driver, provided other pre-defined conditions are met. When a PV with `Delete` reclaim policy is not associated with a PVC: -Under the event that the PV is not associated with a PVC, either finalizers `kubernetes.io/pv-controller` or `external-provisioner.volume.kubernetes.io/finalizer` are not added to the PV. -If such a PV is deleted the reclaim workflow is not executed, this is the current behavior and will be retained. +Under the event that the PV is not associated with a PVC, either finalizers `kubernetes.io/pv-controller` or `external-provisioner.volume.kubernetes.io/finalizer` are not added to the PV. If such a PV is deleted the reclaim workflow is not executed, this is the current behavior and will be retained. + +### User Stories (Optional) + + + +#### Story 1 + +#### Story 2 + +### Notes/Constraints/Caveats (Optional) + + ### Risks and Mitigations -The primary risk is volume deletion to a user that expect the current behavior, i.e, they do not expect the volume to be -deleted from the underlying storage infrastructure when for a bound PV-PVC pair, the PV is deleted followed by deleting -the PVC. This can be mitigated by initially introducing the fix behind a feature gate, followed by explicitly deprecating -the current behavior. +The primary risk is volume deletion to a user that expect the current behavior, i.e, they do not expect the volume to be deleted from the underlying storage infrastructure when for a bound PV-PVC pair, the PV is deleted followed by deleting the PVC. This can be mitigated by initially introducing the fix behind a feature gate, followed by explicitly deprecating the current behavior. ## Design Details @@ -261,24 +312,71 @@ The finalizer `kubernetes.io/pv-controller` will not be added on statically prov ### Test Plan -#### E2E tests + + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + +- `pkg/controller/volume/persistentvolume`: [`2025-02-05` - `80.4%`](https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit&include-filter-by-regex=pkg/controller/volume/persistentvolume) + +##### Integration tests + +No integration tests are required. This feature is better tested with e2e tests. + +##### e2e tests + An e2e test to exercise deletion of PV prior to deletion of PVC for a `Bound` PV-PVC pair. -- `test/e2e/storage/csimock/csi_honor_pv_reclaim_policy.go`: [k8s-triage](https://storage.googleapis.com/k8s-triage/index.html?sig=scheduling,storage&test=honor%20pv%20reclaim%20policy) +- [test/e2e/storage/csimock/csi_honor_pv_reclaim_policy.go](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/storage/csimock/csi_honor_pv_reclaim_policy.go): [k8s-triage](https://storage.googleapis.com/k8s-triage/index.html?sig=scheduling,storage&test=honor%20pv%20reclaim%20policy), [k8s-test-grid](https://testgrid.k8s.io/sig-storage-kubernetes#kind-storage-alpha-beta-features&include-filter-by-regex=CSI%20Mock%20honor%20pv%20reclaim%20policy) ### Graduation Criteria #### Alpha -* Feedback -* Fix of the issue behind a feature flag. -* Unit tests and e2e tests outlined in design proposal implemented. -* Tests should be done with both CSI Migration disabled and enabled. -#### Alpha -> Beta -* Allow the Alpha fix to soak for one release. +- Feedback +- Fix of the issue behind a feature flag. +- Unit tests and e2e tests outlined in design proposal implemented. +- Tests should be done with both CSI Migration disabled and enabled. + +#### Beta + +- Allow the Alpha fix to soak for one release. +- Gather feedback from developers and surveys +- Fully implemented in both kubernetes and CSI driver repositories +- Additional tests are in Testgrid and linked in KEP + +#### GA + +- Deployed in production and have gone through at least one K8s upgrade. +- Allowing time for feedback -#### Beta -> GA -* Deployed in production and have gone through at least one K8s upgrade. +**Note:** Generally we also wait at least two releases between beta and +GA/stable, because there's no opportunity for user feedback, or even bug reports, +in back-to-back releases. + +**For non-optional features moving to GA, the graduation criteria must include +[conformance tests].** + +[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md ### Upgrade / Downgrade Strategy @@ -292,6 +390,7 @@ if there are PVs that have a valid associated PVC and deletion timestamp set, th In this case, there may be PVs with the deletion finalizer that the older Kubernetes does not remove. Such PVs will be in the API server forever unless if the user manually removes them. ### Version Skew Strategy + The fix is part of `kube-controller-manager` and `external-provisioner`. 1. `kube-controller-manager` is upgraded, but `external-provisioner` is not: @@ -316,84 +415,154 @@ In addition, PVs migrated to CSI will have the finalizer. When the CSI migration - [x] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: `HonorPVReclaimPolicy` - - Components depending on the feature gate: kube-controller-manager, external-provisioner + - Components depending on the feature gate: + - kube-controller-manager + - external-provisioner ###### Does enabling the feature change any default behavior? -Enabling the feature will delete the volume from underlying storage when the PV is deleted followed deleting the PVC for -a bound PV-PVC pair where the PV reclaim policy is delete. +Enabling the feature will delete the volume from underlying storage when the PV is deleted followed deleting the PVC for a bound PV-PVC pair where the PV reclaim policy is delete. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? -Yes. Disabling the feature flag will continue previous behavior. However, after the rollback if there are existing PVs -that have the finalizer `external-provisioner.volume.kubernetes.io/finalizer` or `kubernetes.io/pv-controller` cannot -be deleted from the API server(due to the finalizers), these PVs must be explicitly patched to remove the finalizers. + +Yes. Disabling the feature flag will continue previous behavior. However, after the rollback if there are existing PVs that have the finalizer `external-provisioner.volume.kubernetes.io/finalizer` or `kubernetes.io/pv-controller` cannot be deleted from the API server(due to the finalizers), these PVs must be explicitly patched to remove the finalizers. ###### What happens if we reenable the feature if it was previously rolled back? + Will pick the new behavior as described before. ###### Are there any tests for feature enablement/disablement? There will be unit tests for the feature `HonorPVReclaimPolicy` enablement/disablement. -### Monitoring Requirements +### Rollout, Upgrade and Rollback Planning -_This section must be completed when targeting beta graduation to a release._ +###### How can a rollout or rollback fail? Can it impact already running workloads? -* **How can an operator determine if the feature is in use by workloads?** +No. + +###### What specific metrics should inform a rollback? + +Users can compare these existing metrics with the feature gate enabled and disabled and see if downgrade actually helped. + +- `persistentvolumeclaim_provision_failed_total` +- `persistentvolume_delete_failed_total` + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + +This feature is already beta since 1.30. No upgrade or rollback needed to test this. + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + +No. + +### Monitoring Requirements + +###### How can an operator determine if the feature is in use by workloads? The presence of finalizer described above in Persistent Volume will indicate that the feature is enabled. The finalizer `external-provisioner.volume.kubernetes.io/finalizer` will be present on the CSI Driver volumes. The finalizer `kubernetes.io/pv-controller` will be present on In-Tree Plugin volumes. -* **What are the SLIs (Service Level Indicators) an operator can use to determine - the health of the service?** - - [X] Metrics - - Metric name: For CSI Driver volumes, `persistentvolume_delete_duration_seconds` metric is used to track the time - taken for PV deletion. `volume_operation_total_seconds` metrics tracks the end-to-end latency for delete - operation, it is applicable to both CSI driver volumes and in-tree plugin volumes. - - [Optional] Aggregation method: - - [ ] Other (treat as last resort) - - Details: +###### How can someone using this feature know that it is working for their instance? + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [x] Other (treat as last resort) + - Details: the `metadata.finalizers` field of a PV object will have the finalizer `external-provisioner.volume.kubernetes.io/finalizer` for CSI Driver volumes and `kubernetes.io/pv-controller` for In-Tree Plugin volumes. + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? -* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** - The current SLOs for `persistentvolume_delete_duration_seconds` and `volume_operation_total_seconds` would be - increased by the amount of time taken to remove the newly introduced finalizer. This should be an insignificant - increase. +The current SLOs for `persistentvolume_delete_duration_seconds` and `volume_operation_total_seconds` would be increased by the amount of time taken to remove the newly introduced finalizer. This should be an insignificant increase. -* **Are there any missing metrics that would be useful to have to improve observability - of this feature?** +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - No + + +- [x] Metrics + - Metric name: For CSI Driver volumes, `persistentvolume_delete_duration_seconds` metric is used to track the time taken for PV deletion. `volume_operation_total_seconds` metrics tracks the end-to-end latency for delete operation, it is applicable to both CSI driver volumes and in-tree plugin volumes. + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + +No. + +### Dependencies + +###### Does this feature depend on any specific services running in the cluster? + +Yes, a CSI driver supporting this enhancement is required when a persistent volume is provisioned with the csi source or can be migrated to CSI. ### Scalability ###### Will enabling / using this feature result in any new API calls? + The new finalizer will be added to all existing PVs when the feature is enabled. This will be a one-time sync. ###### Will enabling / using this feature result in introducing new API types? + No. ###### Will enabling / using this feature result in any new calls to the cloud provider? -Yes, previously the delete volume call would not reach the provider under the described circumstances, with this change -the volume delete will be invoked to remove the volume from the underlying storage. + +Yes, previously the delete volume call would not reach the provider under the described circumstances, with this change the volume delete will be invoked to remove the volume from the underlying storage. ###### Will enabling / using this feature result in increasing size or count of the existing API objects? + When the feature is enabled, we will be adding a new finalizer to existing PVs so this would result in an increase of the PV size. ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + We have metric for delete volume operations. So potentially the time to delete a PV could be longer now since we want to make sure volume on the storage system is deleted first before deleting the PV. ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + No. +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + +No. + +### Troubleshooting + +###### How does this feature react if the API server and/or etcd is unavailable? + +If API server is unavailable, an API update on the PV object will fail due to timeout. Failed operations will be added back to a rate-limited queue for retries. A `DeleteVolume` call to CSI driver is idempotent. So when API server is back, and the same request is sent to the CSI driver again, the CSI driver should return success even if the volume is already deleted from the storage system by the previous request. + +###### What are other known failure modes? + +- Volume may not be deleted from the underlying storage if the PV is deleted rapidly after the reclaim policy changes to `Delete`. + - Detection: The only way to detect this is by comparing the PVs in the API server and the volumes in the storage system. + - Mitigations: If the reclaim policy is `Delete`, please make sure the finalizer is added to the PV before deleting the PV. + - Diagnostics: external-provisioner should log `error syncing volume: xxx`. + - Testing: It's hard to test this failure mode. Because there's a race condition between the PV deletion and the finalizer addition. + +###### What steps should be taken if SLOs are not being met to determine the problem? + +N/A. + ## Implementation History -1.26: Alpha -1.31: Add e2e tests for the feature and promote to Beta. +- 1.26: Alpha +- 1.31: Add e2e tests for the feature and promote to Beta. +- 1.33: Promote to GA. ## Drawbacks -None. The current behavior could be considered as a drawback, the KEP presents the fix to the drawback. The current -behavior was reported in [Issue-546](https://github.com/kubernetes-csi/external-provisioner/issues/546) and [Issue-195](https://github.com/kubernetes-csi/external-provisioner/issues/195) + +None. The current behavior could be considered as a drawback, the KEP presents the fix to the drawback. The current behavior was reported in [Issue-546](https://github.com/kubernetes-csi/external-provisioner/issues/546) and [Issue-195](https://github.com/kubernetes-csi/external-provisioner/issues/195) ## Alternatives -Other approaches to fix have not been considered and will be considered if presented during the KEP review. \ No newline at end of file + +Other approaches to fix have not been considered and will be considered if presented during the KEP review. + +## Infrastructure Needed (Optional) + +Initially, all development will happen inside the main Kubernetes and CSI driver repositories. The mock driver can be developed inside test/e2e/storage/csimock. For the generic part of that driver, i.e. the code that other drivers can reuse, is developed in the kubernetes-csi organization. \ No newline at end of file diff --git a/keps/sig-storage/2644-honor-pv-reclaim-policy/kep.yaml b/keps/sig-storage/2644-honor-pv-reclaim-policy/kep.yaml index 3cf4b938e9a..2c61edaabe8 100644 --- a/keps/sig-storage/2644-honor-pv-reclaim-policy/kep.yaml +++ b/keps/sig-storage/2644-honor-pv-reclaim-policy/kep.yaml @@ -16,13 +16,14 @@ approvers: see-also: replaces: -stage: beta +stage: stable -latest-milestone: "v1.31" +latest-milestone: "v1.33" milestone: alpha: "v1.23" beta: "v1.31" + stable: "v1.33" feature-gates: - name: HonorPVReclaimPolicy