Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): prevent zfs volume cr deletion if snapshot exists #613

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

sinhaashish
Copy link
Member

Pull Request template

Why is this PR required? What issue does it fix?:
Upon pvc deletion the zfs snapshot also gets deleted. this can be verified using ssh into the node and running the command zfs list -t all

What this PR does?:

Does this PR require any upgrade changes?:

If the changes in this PR are manually verified, list down the scenarios covered::

Any additional information for your reviewer? :
Mention if this PR is part of any design or a continuation of previous PRs

Checklist:

  • Fixes #
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/website is used to track them:

@sinhaashish sinhaashish force-pushed the stop-deletion branch 2 times, most recently from a961d15 to 78048a5 Compare January 21, 2025 08:51
@sinhaashish
Copy link
Member Author

@tiagolobocastro PR #607 is closed as there was some issue from my side while rebasing. I have created this PR with the same code base along with checking that ZV is present in the system as per your comment here.

Splitting the test to run on different cluster can be taken sometime later.

@sinhaashish sinhaashish requested review from tiagolobocastro and Abhinandan-Purkait and removed request for tiagolobocastro January 21, 2025 10:04
ci/ci-test.sh Outdated Show resolved Hide resolved
tests/zv/build.go Outdated Show resolved Hide resolved
tests/zv/build.go Outdated Show resolved Hide resolved
tests/provision_test.go Show resolved Hide resolved
tests/suite_test.go Outdated Show resolved Hide resolved
@sinhaashish sinhaashish requested a review from niladrih January 21, 2025 10:35
@sinhaashish sinhaashish force-pushed the stop-deletion branch 5 times, most recently from 8e135cc to 39b0fc7 Compare January 21, 2025 16:44
tests/provision_test.go Outdated Show resolved Hide resolved
tests/provision_test.go Show resolved Hide resolved
@sinhaashish sinhaashish force-pushed the stop-deletion branch 13 times, most recently from 5eb7e8f to 54c716b Compare January 24, 2025 06:24
pkg/driver/controller.go Outdated Show resolved Hide resolved
// Delete the corresponding ZV CR only if this is the last snapshot
// for the volume and the corresponding pvc is deleted
if len(snapList.Items) == 1 && !isCorrespondingPvcPresent {
err = zfs.DeleteVolume(volumeID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think this might inadvertently delete a volume with the retain binding mode where a user is allowed to delete the PVC and PV but the store asset should remain.
I think on the (cs *controller) DeleteVolume we need to mark it there as deleted, which would allow us here to proceed with the deletion? Or something along those lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow doesnt come to (cs *controller) DeleteVolume in case the storage class has reclaimPolicy: Retain. So we cant do any thing there .

On the other hand i check for pv for persistentVolumeReclaimPolicy . If its value is Retain then i don't delete the zv .
added the function isVolumeEligibleForDeletion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow doesnt come to (cs *controller) DeleteVolume in case the storage class has reclaimPolicy: Retain.

Yes exactly, that's the idea.
When the reclaimPolicy is Retain we don't receive the DeleteVolume and therefore wouldn't mark the ZV for deletion. So when the last snapshot is deleted, we don't delete the the ZV.

Otherwise, if we do receive the DeleteVolume then when the last snapshot is deleted, we do delete the ZV.

Would this workflow make sense @Abhinandan-Purkait ? What's the best way to mark a deletion on the ZV without actually adding the deletionTimestamp? (that would trigger a storm of reconciles)

Copy link
Member

@Abhinandan-Purkait Abhinandan-Purkait Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this workflow sounds good to me. We can use annotations to mark things. I will try to summarize the workflow:

  1. If We received a DeleteVolume call, and volume has snapshots, add marked for deletetion annotation on ZV and succeed the DeleteVolume call with no op.
  2. If We received a DeleteVolume call, and volume doesn't have any snapshots, succeed the DeleteVolume call with and also remove the assets.
  3. If we never received a DeleteVolume call, the annotation will never be present so, we just deal with deletion of snapshots in DeleteSnapshot and ignore ZV.

It still might be a good idea to fetch the PV if present in isVolumeEligibleForDeletion and double check the reclaim policy just in case :-;

Copy link
Contributor

@tiagolobocastro tiagolobocastro Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one thing missing: when we receive a CreateVolume call, and the ZV exists, we have to remove this annonation. This completes the circle.
We should have a test for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda lost the thread here, but if we've verified the PV creation does re-trigger another CreateVolume call then yes we should remove the openebs.io/mark-for-deletion: "true"

Copy link
Member Author

@sinhaashish sinhaashish Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my wrong statement earlier. The CreateVolume is not called if we manually create the pv from zv.
Earlier the log which i had pasted was earlier one, i.e. the first time when the pvc was created.

So finally we dont get any CreateVolume call when we create the PV manually from ZV . This means that the ZV will be having the annotation openebs.io/mark-for-deletion: "true"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm then what happens if we recreate the pv and the zv has an annotation and then we delete the last snapshot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think it's fine actually.
In case of retain there's no DeleteVolume to begin with, so the ZV does not have the annotation, so it won't get deleted when the last snapshot is deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, its zv is left as is

@sinhaashish sinhaashish force-pushed the stop-deletion branch 3 times, most recently from b6cbb01 to fbe3c80 Compare January 28, 2025 08:23
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
@sinhaashish sinhaashish force-pushed the stop-deletion branch 14 times, most recently from ff1251d to 3d5c36a Compare January 30, 2025 05:02
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/zfs/volume.go Show resolved Hide resolved
pkg/zfs/volume.go Outdated Show resolved Hide resolved
pkg/zfs/volume.go Outdated Show resolved Hide resolved
pkg/zfs/volume.go Outdated Show resolved Hide resolved
pkg/zfs/volume.go Outdated Show resolved Hide resolved
@sinhaashish sinhaashish force-pushed the stop-deletion branch 2 times, most recently from 7e8f590 to 2005206 Compare January 30, 2025 11:24
@Abhinandan-Purkait
Copy link
Member

Looks good to me mostly. Thanks for taking this up. Will approve once the test passes and the other comments are addressed.

Comment on lines 850 to 854
if snap != nil && snap.DeletionTimestamp != nil {
return csipayload.NewDeleteSnapshotResponseBuilder().Build(), nil
}

if err != nil {
if k8serror.IsNotFound(err) {
return csipayload.NewDeleteSnapshotResponseBuilder().Build(), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a race condition here, if the pod crashes after issuing DeleteSnapshot. Then when it restarts, if it finds the timestamp is set or snapshot is already deleted, it doesn't delete the volume?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the condition

By("Deleting storage class", deleteStorageClass)
}

func blockVolCreationWithReclaimRetainTest() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just for checking the retain, then we don't need any applications deployed?

Copy link
Member Author

@sinhaashish sinhaashish Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the code to create a pv and pv and check if the pvc is bound . Have not deployed the app

)
}

// Delete the corresponding ZV CR only if there are no snapshots present for the volume
if len(snapList.Items) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rely on the list of CRs btw, isn't this behind a cache?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not so bad if we get it wrong and mark-for-deletion is set. Might just mean we need a garbage collector in case we get the wrong list count?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When snapshot is there but snapList size return 0 because of cache; and set the marker for deletion.
There is no issue as zv is present with the marker set. Upon deletion of snapshot we will get snapList size as 1 and the marker set ,so the volume will be deleted .

@sinhaashish sinhaashish force-pushed the stop-deletion branch 2 times, most recently from db2ac77 to 46f95ef Compare January 31, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants