-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add disabled PV re-provisioning by StorageClasses option on restore #8287
base: main
Are you sure you want to change the base?
Conversation
Awesome! We are really interested in this annotation in order to recover all our existing volumes. |
Does this mean a coincident and unsteady situation -- the CSI driver just doesn't have chance to reclaim the storage volumes of the PVs because of the disaster? |
I think the goal is here to support cases where the entire kubernetes cluster is lost (e.g. burnt bare metal servers) and you still want to recover Volumes that may still exist on the storage backend, regardless Reclaim Policies of backuped volumes objects. Currently Velero doesn't allow "recovering" / "reattaching" such volumes and either ignore them, or force empty volume recreation (expecting it to be recovered from a snapshot) |
Stumbled upon this PR and it seems to be solving a use case for Disaster Recovery that I have. All my PVs are created with I am able to do this with some manual intervention pre-backup:
I have tried automatic the manual step of modiyfing the PVs pre-backup but this appears to be not supported but from what I can see this PR is in line with what I am after. Looking forward to it. |
A rebase on this PR would be great :) |
Signed-off-by: Clément Rostagni <[email protected]>
Signed-off-by: Clément Rostagni <[email protected]>
That's exactly the idea! I'm glad to see that this feature is wanted by others. Any update from the maintaining team ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabledPVReprovisioningStorageClasses: | ||
description: |- | ||
DisabledPVReprovisioningStorageClasses is a slice of StorageClasses names. | ||
PV without snaptshot and having one of these StorageClass will not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PV without snaptshot and having one of these StorageClass will not be | |
PV without snapshot and having one of these StorageClass will not be |
@@ -129,6 +129,13 @@ type RestoreSpec struct { | |||
// +optional | |||
// +nullable | |||
UploaderConfig *UploaderConfigForRestore `json:"uploaderConfig,omitempty"` | |||
|
|||
// DisabledPVReprovisioningStorageClasses is a slice of StorageClasses names. | |||
// PV without snaptshot and having one of these StorageClass will not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// PV without snaptshot and having one of these StorageClass will not be | |
// PV without snapshot and having one of these StorageClass will not be |
obj *unstructured.Unstructured, | ||
logger logrus.FieldLogger, | ||
) (*unstructured.Unstructured, error) { | ||
logger.Infof("Restoring persistent volume as-is because it doesn't have a snapshot and it's storage class has re-provisionning disabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Infof("Restoring persistent volume as-is because it doesn't have a snapshot and it's storage class has re-provisionning disabled.") | |
logger.Infof("Restoring persistent volume as-is because it doesn't have a snapshot and restore has storage class re-provisionning disabled.") |
The thing is the volumes may or may not exist in the storage, or it is not a steady situation. Operations may succeed or may not. Could you clarify why this relink approach is must-have and what the problem is if you use the current create new volume-restore approach? |
Time to recovery is the key, at least for me. Last time I had to do a full cluster restore which involved a Restic data restore of a couple of TB's from Azure Blob onto persistent volumes based on Azure Files NFS, the restoration took several hours due to throttling, bandwidth and I/O slowless. If I'd had the ability to re-link restored PV's to the existing (unharmed) Azure Files NFS volumes, the restore would probably have taken seconds. |
Thank you for contributing to Velero!
Please add a summary of your change
This PR addresses a specific disaster recovery use case:
When restoring a cluster after a disaster, PV may or may not have associated snapshots.
However, the underlying volumes from the CSI driver remain intact. I want to relink PVs with existing volumes instead of creating new ones. This PR propose a way to retrieve existing volumes without recreating them.
In my case, the PV has a Reclaim Policy Delete enforced by the StorageClass.
Velero currently does not restore PVs with a Delete policy and no snapshots, which makes sense for regular backup/restore scenarios but can be limiting in disaster recovery situations.
Proposed solution:
This PR introduces a new feature that allows Velero to restore PVs as-is when they meet the following conditions:
disabledPVReprovisioningStorageClasses
.This ensures that Velero can relink the PV to its existing volume and rebind the associated PVC, similar to how it restores PVs with a Reclaim Policy Retain.
Implementation Details:
This solution uses StorageClass names rather than PV names, as some CSI drivers generate random PV names.
Also, using StorageClass makes sense in this context because PVs sharing the same StorageClass generally have similar configurations.
This solution does not bypass snapshots. If snapshots are available, restoring from them is preferred as it is generally more reliable. However, this is open for discussion.
Does your change fix a particular issue?
There is no direct issue but I found two issues that are close to what I've added.
Please indicate you've done the following:
make new-changelog
) or comment/kind changelog-not-required
on this PR.site/content/docs/main
.