-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): prevent zfs volume cr deletion if snapshot exists #613
base: develop
Are you sure you want to change the base?
Conversation
a961d15
to
78048a5
Compare
@tiagolobocastro PR #607 is closed as there was some issue from my side while rebasing. I have created this PR with the same code base along with checking that ZV is present in the system as per your comment here. Splitting the test to run on different cluster can be taken sometime later. |
8e135cc
to
39b0fc7
Compare
39b0fc7
to
aa02172
Compare
5eb7e8f
to
54c716b
Compare
// Delete the corresponding ZV CR only if this is the last snapshot | ||
// for the volume and the corresponding pvc is deleted | ||
if len(snapList.Items) == 1 && !isCorrespondingPvcPresent { | ||
err = zfs.DeleteVolume(volumeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think this might inadvertently delete a volume with the retain binding mode where a user is allowed to delete the PVC and PV but the store asset should remain.
I think on the (cs *controller) DeleteVolume
we need to mark it there as deleted, which would allow us here to proceed with the deletion? Or something along those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow doesnt come to (cs *controller) DeleteVolume
in case the storage class has reclaimPolicy: Retain
. So we cant do any thing there .
On the other hand i check for pv for persistentVolumeReclaimPolicy
. If its value is Retain
then i don't delete the zv .
added the function isVolumeEligibleForDeletion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow doesnt come to (cs *controller) DeleteVolume in case the storage class has reclaimPolicy: Retain.
Yes exactly, that's the idea.
When the reclaimPolicy
is Retain
we don't receive the DeleteVolume
and therefore wouldn't mark the ZV
for deletion. So when the last snapshot is deleted, we don't delete the the ZV
.
Otherwise, if we do receive the DeleteVolume
then when the last snapshot is deleted, we do delete the ZV
.
Would this workflow make sense @Abhinandan-Purkait ? What's the best way to mark a deletion on the ZV
without actually adding the deletionTimestamp
? (that would trigger a storm of reconciles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this workflow sounds good to me. We can use annotations to mark things. I will try to summarize the workflow:
- If We received a DeleteVolume call, and volume has snapshots, add
marked for deletetion
annotation on ZV and succeed the DeleteVolume call with no op. - If We received a DeleteVolume call, and volume doesn't have any snapshots, succeed the DeleteVolume call with and also remove the assets.
- If we never received a DeleteVolume call, the annotation will never be present so, we just deal with deletion of snapshots in DeleteSnapshot and ignore ZV.
It still might be a good idea to fetch the PV if present in isVolumeEligibleForDeletion
and double check the reclaim policy just in case :-;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one thing missing: when we receive a CreateVolume
call, and the ZV
exists, we have to remove this annonation. This completes the circle.
We should have a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda lost the thread here, but if we've verified the PV creation does re-trigger another CreateVolume call then yes we should remove the openebs.io/mark-for-deletion: "true"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my wrong statement earlier. The CreateVolume
is not called if we manually create the pv from zv.
Earlier the log which i had pasted was earlier one, i.e. the first time when the pvc was created.
So finally we dont get any CreateVolume
call when we create the PV manually from ZV . This means that the ZV will be having the annotation openebs.io/mark-for-deletion: "true"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm then what happens if we recreate the pv and the zv has an annotation and then we delete the last snapshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think it's fine actually.
In case of retain there's no DeleteVolume
to begin with, so the ZV
does not have the annotation, so it won't get deleted when the last snapshot is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, its zv is left as is
b6cbb01
to
fbe3c80
Compare
ff1251d
to
3d5c36a
Compare
3d5c36a
to
47da3ff
Compare
7e8f590
to
2005206
Compare
Looks good to me mostly. Thanks for taking this up. Will approve once the test passes and the other comments are addressed. |
pkg/driver/controller.go
Outdated
if snap != nil && snap.DeletionTimestamp != nil { | ||
return csipayload.NewDeleteSnapshotResponseBuilder().Build(), nil | ||
} | ||
|
||
if err != nil { | ||
if k8serror.IsNotFound(err) { | ||
return csipayload.NewDeleteSnapshotResponseBuilder().Build(), nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a race condition here, if the pod crashes after issuing DeleteSnapshot
. Then when it restarts, if it finds the timestamp is set or snapshot is already deleted, it doesn't delete the volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the condition
By("Deleting storage class", deleteStorageClass) | ||
} | ||
|
||
func blockVolCreationWithReclaimRetainTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just for checking the retain, then we don't need any applications deployed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the code to create a pv and pv and check if the pvc is bound . Have not deployed the app
) | ||
} | ||
|
||
// Delete the corresponding ZV CR only if there are no snapshots present for the volume | ||
if len(snapList.Items) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rely on the list of CRs btw, isn't this behind a cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not so bad if we get it wrong and mark-for-deletion is set. Might just mean we need a garbage collector in case we get the wrong list count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When snapshot is there but snapList size return 0 because of cache; and set the marker for deletion.
There is no issue as zv is present with the marker set. Upon deletion of snapshot we will get snapList size as 1 and the marker set ,so the volume will be deleted .
db2ac77
to
46f95ef
Compare
Signed-off-by: sinhaashish <[email protected]>
46f95ef
to
d0ae251
Compare
Pull Request template
Why is this PR required? What issue does it fix?:
Upon pvc deletion the zfs snapshot also gets deleted. this can be verified using ssh into the node and running the command zfs list -t all
What this PR does?:
Does this PR require any upgrade changes?:
If the changes in this PR are manually verified, list down the scenarios covered::
Any additional information for your reviewer? :
Mention if this PR is part of any design or a continuation of previous PRs
Checklist:
<type>(<scope>): <subject>