Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
KEP-1847: Auto remove PVCs created by StatefulSet #1915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1847: Auto remove PVCs created by StatefulSet #1915
Changes from all commits
13f1c05
4824729
9a42a94
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a real world example of such an application ?if PVC should be deleted when pod is deleted, how is this different than a pod using an emptyDir ? I would vote for some more concrete use cases here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The volume is retained if the pod is rescheduled, eg if the node goes down or is upgraded.
+1 to adding the explicitly so it's clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or statefulset rolling update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the pod rescheduling note shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add information on applicability of the feature across configurations. Does the feature work with Local PVCs for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kk-src Maybe something like the following?
This feature applies to PVs which are dynamically provisioned from the volumeClaimTemplate of a StatefulSet. Any PVC and PV provisioned from this mechanism will function with this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kow3ns
@mattcary - Thank you, added to the text. Will come out in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be dynamically provisioned from the StatefulSet controller? Since this behavior is opt-in, the user should understand what they're getting into?
For example, today, you can manually create a PV object, and set Delete reclaim policy. The provisioner will delete the volume even though it didn't provision it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if pods have multiple PVCs? Should all of them be deleted? I don't think that would make sense, there might be a shared volume.
Theres a 1:1 PVC:PV relationship so that doesn't come up in that case.
I think it's much simpler to define reasonable behavior if we scope to the volumeClaimTemplates only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. Scoping the behavior seems reasonable to me, I would just clarify that this is about PVC objects being created by the StatefulSet controller, and not about PV objects being dynamically provisioned. You can have StatefulSet create PVCs that can be bound to pre-created PVs and those should still be in scope for the Statefulset reclaim policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 In the design section we are talking about the PVCs created from the VolumeClaimTemplate in the StatefulSet. That's exactly the set we're interested in.
eg, if a user created a PVC that matched the naming convention of the volumeClaimTemplate, a pod created for the statefulset would attach to it, so it seems unsurprising that the new reclaim policy would cause that PVC to be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think I misunderstood then. I thought you wanted PVCs manually created by the user with the same naming convention to NOT be part of the reclaim policy. But it sounds like we do want it part of the reclaim policy.
And PVCs referenced outside of volumeClaimTemplates, ie in
volumes
are out of scope.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, there was a discussion that only partially made it into the comments. I think that the version down in "design details" is correct: if the statefulset has a volumeClaimTemplate, the static naming scheme defines a PVC for each pod which is called here the associated PVC of a pod. These associated PVCs are the ones that are deleted or not according to the policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A discussion of durability risks with the changed behavior (even though it is opt-in) should be added here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kk-src Maybe something like the following?
If the PVCReclaimPolicy is changed from its default of "Retain", then PVs will be deleted on SS scaledown or deletion. A user will lose data on those PVs if proactive effort is not taken to replicate that data. However, PVs associated with the StatefulSet will be more durable than ephemeral volumes would be, as they are only deleted on scaledown or StatefulSet deletion, and not on other pod lifecycle events like being rescheduled to a new node, even with the new retain policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should specify that ownerRefs on PVCs are only set for those that are created for volumes provisioned as part of the the statefulset, ie in VolumeClaimTemplates with a StorageClass specified, and no dataSource, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell? Someone could manually create a dynamically provisioned PVC with the same name that the StatefulSet controller expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Does that mean we should add an annotation when a PVC is created by the SS controller for a pod?
We don't want to add an owner reference I think as that would make it harder to not change behavior for a retain policy.
It seems to me that an annotation would work well here but maybe there is an idiomatic alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added text indicating static naming convention used by the statefulset will be reused here to identity the PVCs in question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very elegant, nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if I delete a
RemoveOnStatefulSetDeletion
StatefulSet with orphan policy? Will the PVCs be gone while the Pods stay?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the orphan policy do you mean kubectl delete --cascade=false?
In that case I think nothing should be deleted other than the statefulset resource, the semantics for cascade seem clear. That would mean removing any ownership IIUC.
At any rate, even if the PVCs were deleted, the protection controller would prevent them from being finalized until the pods were deleted. But it might be unexpected to have the PVCs disappear when the pods are eventually deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, and remaining sections should be filled out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe say
On a downgrade, the PersistentVolumeClaimReclaimPolicy field will be removed from any StatefulSets. If a scaledown or delete is in process when the downgrade happens, any existing OwnerRefs on PVCs will not be removed so that in most cases when the scaledown completes, unused PVCs will be deleted. However, there may be edge cases causing only some of the unused PVCs to be deleted. As unused PVCs remaining after a scaledown is the expected behavior of the downgraded clusters no further effort will be made to remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be sufficient to say yes here, and refer to the downgrade section above on what happens on rollback.
I think that disabling the feature would also mean removing the new policy field?
I'm not sure what you mean by annotating the PVCs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have no effect, as the policy fields were removed during the rollback, so when reenabled all StatefulSets in the cluster will be using the retain policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way feature disabling/rollback works is that the field remains stored in etcd, however controllers will not process the field (because it's protected by a feature gate). Then if you re-enable the feature again, then you can start processing the field again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I think that means that the RemoveOnDeletion needs to be reconciled, that is make sure associated PVCs have ownerrefs to the statefulset.
I don't see this as too much of a problem if things aren't exactly consistent. For example, one might set RemoveOnScaledown on a 10 replica statefulset, disable the feature, scale down to 6 pods, then re-enable the feature; PVCs 7-10 are going to not have been deleted and we can't be expected to delete them. So if the disabling or re-enabling happens during a scaledown and some PVCs are missed or deleted, the behavior is almost the same as before when we were okay with the orphaned PVCs 7-10.
The main thing I think is to make sure "future" scaledown/deletions work correctly, and as long as we get the statefulset ownerref for RemoveOnDeletion that will be covered.