-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD-610: automated backups no config #1646
base: master
Are you sure you want to change the base?
ETCD-610: automated backups no config #1646
Conversation
@Elbehery: This pull request references ETCD-610 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
035e9d5
to
f54b7a3
Compare
16f8e08
to
a40a86f
Compare
|
||
In the event of the node becoming inaccessible or unscheduled, the recurring backups would not be scheduled. The periodic backup config would have to be recreated or updated with a different PVC that allows for a new PV to be provisioned on a node that is healthy. | ||
|
||
If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would we restore from a localVolume? Is the data stored somewhere in /var/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the localVolume is being mounted anywhere as specified in the Pod. In fact, you can mount a whole external drive.
iiuc, hostPath
supports only mount points from the node's FS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imagine we have to run the restore script, how would we get the local volume mounted into a recovery machine? would we create another static pod that would mount the localVolume?
I imagine the local storage plugin must be running along with the kubelet for that to work? @jsafrane / @gnufied maybe you guys can elaborate a bit how that works, I don't have much time this week to read through the CSI implementation 🐎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pardon my ignorance for not being familiar with etcd deployment, but even if there is a default SC available, why are we backing up just one replica of etcd? I thought, we would want to backup all replicas - no? Such as - what if in case of network partition or something, the replica we are backing up is behind?
So IMO it sounds whether we are using LocalVolume or something else, we should always be creating backups across master nodes? Is that accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to answer @tjungblu question. Local Volumes are a inbuilt feature of k8s and hence no additional plugin is necessary. Heck, a local PV can be provisioned statically and local-storage-operator is not necessary either, https://docs.openshift.com/container-platform/4.15/storage/persistent_storage/persistent_storage_local/persistent-storage-local.html#local-create-cr-manual_persistent-storage-local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and can be mounted in the new etcd pod and the backup will exist.
check the restore script, you will need to unpack the snapshot before you can run any pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied in order to make sure the backups are accessible, and that we can round-robin the backups across all master nodes, which are up-to-date with the etcd-cluster
leader
Is it possible to
- Create StatefulSet across the master nodes, where the PVC template uses the
localVolume
to provision PV. - Since we need to do the backup according to a schedule as per EtcdBackupSpec, the issue to manage the backup pods within the STS using the CronJob
- I am not aware if this is possible, but at least, we can generate
Event
from the CronJob and the STS could react to these event by taking a backup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the restore script, you will need to unpack the snapshot before you can run any pod
We can use init-container
to unpack the backup, and then the etcd-pod can start ?
Otherwise, we can start a fresh etcd-pod and then run etcdctl restore
?
Please correct me if I am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Create StatefulSet across the master nodes, where the PVC template uses the
localVolume
to provision PV.
I believe you mean DaemonSet across the master nodes 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought about this as well, but there is no PersistentVolumeClaim
template on DaemonSet.
Having a separate PVC & PV for each master node would be ideal for spreading the backups across the master nodes.
That being said, according to the latest update, we are going to use SideCar
container && hostPath
volume
|
||
If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be | ||
|
||
- create a PV for each master node using `localVolume` and the backup controller from CEO should take care of balancing the backups across the `available` and `healthy` volumes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would/could a DaemonSet do that for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
-
We can use DS and use node affinity to run the DS pods only on master nodes. However, I am not aware of how the storage will be handled in this case.
-
Is there is a way to keep the backups in round-robin fashion among all master nodes ?
-
Also what about STS ? wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, STS could also work, especially with the storage. Would also be useful to have this compared to the static pod/hostpath approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think if we have STS, then the Pods, PVC, and PVs are being managed together
I think we can in this case create backups on RoundRobin manner across the master nodes.
For BM, I think we can still use local volume as storage option and being managed by STS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this round robin backup work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the original enhancement - https://github.com/openshift/enhancements/blob/master/enhancements/etcd/automated-backups.md and I understand this now better. But still this begs the question - how do you know which snapshot to restore from? What if you take snapshot of a etcd replica which was behind and is just catching up with other master nodes? Is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we will make sure that the backup is taken from the a member whose log is identical to the leader
. This way we make sure that we are not lagging behind.
But, could this approach work ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to handle the balancing? That's not what you should care for at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, therefore the SideCar
approach is the most appropriate
#### Cloud based OCP | ||
|
||
##### Pros | ||
- Utilising `Dynamic provisioning` is best option on cloud based OCP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the customer able to access a dynamically created PV to get their snapshots for restoration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gr8 question, so I have defaulted the RetentionPolicy
in my sample configs to Retain
.
This way the PV contents will never be erased automatically.
Now to answer the restoration
- If the cluster still running, the PV can be attached to any node, as long as it is in the same availability zone.
- If the cluster is completely down, the PV can be accessed by the CU, it will never be delete unless manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not about the retention, it's about the access and mounting possibilities
If the cluster still running, the PV can be attached to any node, as long as it is in the same availability zone.
that's quite a bad constraint for a disaster recovery procedure, isn't it? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:D :D .. I know, well the best option imho is to push backups to Remote storage option, then even in case of the whole cluster is down, a new installation can be restored using the remote backup.
However, in my discussion yesterday, the remote storage was not an option for BM. Also they recommended to keep it simple not too complicated
If it were to me, I would create a side car which pushes the backups to remote storage, I think this option would work for any OCP variant. The master nodes dont have to be in the cloud, the side car could authenticate and push backups regardless of OCP underlying infrastructure, wdyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dyanmically provisioned volumes are available in the cluster, it is 100% likely that, cluster has Remote Storage and hence backups are available even if cluster is down.
As for availability zones concerns, yeah that is why, backing up into a PV is not enough. We should consider using CSI snapshots of those PVCs, so as snapshot of backups can be available across availability zones and in case file system on PV gets corrupt or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually want to fall back to the localVolume
approach, since having two separate approaches is harder to maintain.
I believe the localVolume should be sufficient. However, if we could utilize STS across all the master nodes would be ideal in situation where we lose one or more master nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I were to design this, I would definitely use two different backup strategies. It is far more reliable to use dynamically provisioned PVCs for backups. They can be snapshotted and can be accessible if node goes down.
It seems to me that - any hostpath/localvolume based approach is basically inferior. If last backup was taken on leader and then leader node goes down, then so does our backup(other nodes may have slightly behind backups). So hostpath/localvolume requires fundamentally different recovery mechanism.
IMO - if we take other components that use persistent storage in Openshift, if customers don't provide Storage then no persistent configuration is created. Prometheus or image-registry, they all require storage. And they don't do anything automatically.
So, what I am saying is - we should probably limit scope of this KEP only to environments where an StorageClass is available. If no StorageClass is configured then no automatic backups are taken.
We should not take upon ourselves to decide a storage strategy for customer. cc @tjungblu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to summarize - I would simply not bother to configure backups via localvolume/hostpath in environments where no storage is available and solve that problem via documentation or ask customer to configure storage. I do not think, we should automatically configuring local-storage for these clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8ad5f84
to
99044c1
Compare
Also see this default I used to test the approach
|
99044c1
to
51fdbad
Compare
- Several options exist for the default `PVCName`. | ||
- Relying on `dynamic provisioning` is sufficient, however not an option for `SNO` or `BM` clusters. | ||
- Utilising `local storage operator` is a proper solution, however installing a whole operator is too much overhead. | ||
- The most viable solution to cover all OCP variants is to use `local volume`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to necessarily install local-storage-operator to use/consume local-storage. For simple one-off use cases like this, it may be possible that etcd-backup operator or something creates static-pv as documented - https://docs.openshift.com/container-platform/4.15/storage/persistent_storage/persistent_storage_local/persistent-storage-local.html#local-create-cr-manual_persistent-storage-local and uses it to perform etcd backups.
|
||
|
||
### Workflow Description | ||
- The user will enable the AutomatedBackupNoConfig feature gate (under discussion). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that AutomatedEtcdBackup
is still a tech preview feature, I'd consider expanding that functionality with the default configuration, rather than adding another one. This will save a lot of problems for some of the fields in the API you're planning to provide default values for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I will fix this 👍🏽
|
||
In the event of the node becoming inaccessible or unscheduled, the recurring backups would not be scheduled. The periodic backup config would have to be recreated or updated with a different PVC that allows for a new PV to be provisioned on a node that is healthy. | ||
|
||
If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Create StatefulSet across the master nodes, where the PVC template uses the
localVolume
to provision PV.
I believe you mean DaemonSet across the master nodes 😉
|
||
If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be | ||
|
||
- create a PV for each master node using `localVolume` and the backup controller from CEO should take care of balancing the backups across the `available` and `healthy` volumes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to handle the balancing? That's not what you should care for at all.
|
||
It supports all OCP variants, including `SNO` and `BM`. | ||
|
||
However, I am strongly against using it for the following reasons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I am strongly against using it for the following reasons | |
However, the following reasons suggest against that solution: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole has been removed, radical changes :D :D
- No scheduling guarantees for the pod using `hostpath` as is the case with `localVolume`. The pod could be scheduled on a different node from where the hostPath volume exist. | ||
- On the other hand, using `localVolume` and the node affinity within the PV manifest forces the backup pod to be scheduled on a specific node, where the volume is attached. | ||
- `localVolume` allows using a separate disk as PV, unlike `hostPath` which mounts a folder from the node's FS. | ||
- `localVolume` is handled by the PV controller and the scheduler in different manner, in fact it was created to resolve issues with `hostPath` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to add the locality of the backup. Iow. if it happens that backup is kept in the current leader, and that dies, we're loosing that data with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the current approach will take backups among all master nodes. Actually we have another issue now, how to distinguish which backup is most up-to-date since we have backups from all master nodes :)
/label tide/merge-method-squash |
This enhancement adds on previous work [automated backup of etcd](https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#creating-automated-etcd-backups_backup-etcd). | ||
Therefore, it is vital to use the same feature gate for both approaches. The following issues need to be clarified before implementation | ||
|
||
* How to distinguish between `NoConfig` backups and configs that are triggered using `EtcdBackup` CR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Would you kindly help me answering these questions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will call it name="default" and in the controller we'll skip the reconciliation of that CRD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if the sidecar is enabled we skip the backup using the controller, or we skip the default backups if the controllers is enabled ?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@sjenning I also vote for the DaemonSet approach you can find here a test of the CronBased approach #1646 (comment) .. IMHO the DS approach is more flexible, and easier to modify and adapt. For instance, I could just add a container to the Pod template, in order to push the backup to external target (e.g. S3 bucket) |
@Elbehery: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Also posted on the implementation PR, but perhaps we should continue the discussion here
I guess I didn't fully understand the flow here, did not expect to see a cronjob to create EtcdBackup which then triggers the actual backups, I thought that would be a controller of itself and that we only had one Job involved here. Anyway, is there any particular reason why the pruning has to be done prior to the
How would you make the argument that it is more flexible? Which parts are more flexible? Is this flexibility required, and, is it worth having to implement and maintain a completely new backup system, vs just re-using/extending the existing system and having 1 path?
The intention in the future would be to extend this API and create a specific area to specify what kind of storage is being used, and assume not to just assume based on a magic keyname. I believe we discussed making this something like
|
@JoelSpeed yes, u r right, lets move here Citing PoC implemented in openshift/cluster-etcd-operator#1381
So this is the current implementation, as I explained in openshift/cluster-etcd-operator#1381 (comment)
I am not the one who can answer this, I believe @tjungblu and @hasbro17 are the best to answer However, iirc, the I can also assume that since the we have a The templates used for creating the CronJob and the Job resources are below
As the whole code of the For instance, if we were now moving the
Indeed, if we were to move forward with this approach, then we would make the API changes 👍🏽 |
Very simply because an EtcdBackup is a one-off backup. There is nothing to prune in such a scenario, in the recurring one we need to ensure that we do not overflow the disk. That's also the reason why it's done prior to creating the backup, to ensure there is enough space left for another backup. |
so moving the because this is the only way to have the pruning functionality while using the |
what's the point of that? I want to create a one-off-backup because I'm making a risky change, I don't want the job to delete all prior backups.
I thought we have settled on DS and a separate hostPath api change now, why are we debating this again? |
I am replying to the reviews requests, if we are settled then I am happy 🥳 What API change I need to create ? |
With a one-off backup, you have no pre-flight check to make sure there is enough space? Would it not be helpful, in the case where a user requests a one-off backup to run this check? For example, we are talking about enabling the use of EtcdBackup to trigger hostpath based backups (and IIRC that was also in scope beyond this EP?), why would you not want pruning when an admin triggers a one off hostpath back?
A user would do this prior to the risky change, so pruning before this backup isn't actually harmful as the cluster should still be in a good state
Pruning deletes all previous backups? I would have assumed it prunes some amount to make sure there is space, but leaving some sort of history?
IIUC, the last discussion assumed that the idea of just re-using all of the existing mechanisms and setting up a Job with parrallelism didn't work. But Mustafa hadn't actually tested the key part of that prior to ending that experiment. I prompted him about this and he has now completed the experiment, and has shown it works. The previous discussion assumed only the DS worked. So now the decision should be re-discussed. Do we want a completely new backup system, or do we want to just extend the current one? Do we want 2 different implementations reacting to one CRD? Or just one? Both solutions work, both have pros, both have cons. |
Now you're conflating two different things, pre-flight space checks and retention. What should be retained on a hostPath that you just want to have for a one-time backup?
So I shall randomly delete things under a hostPath that we don't know what's in there, before doing a risky change? This is absurd, Joel. |
How does this work today? I am not an expert on the backup mechanics, but I would assume at least that when you create a backup, there is a file (or folder?) with a well known name? You would delete only the things that look like they were created by a backup, based on the naming I'm thinking here about how traditional VM backup software works. In at least 3 backup softwares I know of, I can configure a job, as a manual run. But then I can trigger that multiple times, and the retention settings are still shared across those runs. In a parallel here, if I were an admin, and I didn't want to schedule my backups, but I re-used the folder over and over, would I expect over time to cap the size of that backup? Would I expect it to eventually start pruning the oldest of my manual backups? Perhaps this isn't supported today since there isn't a pruning/retention configuration in the API? Are we going off on a tangent here? Perhaps we should refocus the conversation on the question "Would it be preferable long term to base all types of backup on a single implementation (the Job based one) assuming no major hurdles?" and if the answer to that is Yes, "What of the existing open questions/discussions around the Job based approach would be considered to big to overcome?". |
|
||
## Summary | ||
|
||
To enable automated backups of Etcd database and cluster resources of an Openshift cluster from day one after installation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: etcd is always lowercase :)
The current method of **Backup** and **Restore** of an Openshift cluster is outlined in [Automated Backups of etcd](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/automated-backups.md). | ||
This approach relies on user supplied configuration, in the form of `config.openshift.io/v1alpha1 Backup` CR. This method utilizes `batchv1.CronJob` in order to take Backups of Etcd and cluster resources. | ||
|
||
To improve the user's experience, this work proposes taking etcd backup using default configurations from day one, without user intervention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: from a support perspective does this mean it is a bug if the backup is invalid or corrupt? are there any legal concerns with Red Hat "owning" backups?
* Backups should be taken without configuration after cluster installation from day 1. | ||
* Backups are saved to a default location on each master node's file system. | ||
* Backups are taken according to a default schedule. | ||
* Backups are being maintained according to a default retention policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it also be a goal to ensure that the backup is valid? Is there a periodic job that ensures this or is the cluster/customer trusting it will work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the size of the etcd db is no longer capped. Is this a concern for taking snapshots?
|
||
_**Pros**_ | ||
|
||
As the Backup pods are being scheduled independently of the etcd pods, any error within the Backup pods does not influence cluster Openshift cluster availability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: would a failure to create a backup block upgrades?
|
||
As the Backup pods are being scheduled independently of the etcd pods, any error within the Backup pods does not influence cluster Openshift cluster availability. | ||
|
||
Moreover, as the Backup pods are being scheduled within every master nodes, the backups are being taken from every local etcd member on the same node. This avoids overwhelming a specific member with backup requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: it has been a while but doesn't a snapshot request always go to the leader?
The _no-config_ backups builds atop of the existing [automated backup of etcd](https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#creating-automated-etcd-backups_backup-etcd). | ||
|
||
The `PeriodicBackupController` within **cluster-etcd-operator** checks for `Backup` CR with `name=default` within every sync. Upon applying a `Backup` CR with `name=default`, the _no-config_ backup sidecar container is being enabled within each etcd pod in Openshift cluster. | ||
The `Backup` configurations such as `schedule`, `timezone` and `retention policy`, are being supplied to the _no-config_ container as flags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how does retention actually work for the hostpath based backup in the DS approach?
Since the `config.openshift.io/v1alpha1 Backup` API expects only one `PVC`, only one Backup Job could mount the PVC and schedule the backup. | ||
A workaround has been tried, in which the `job.Spec` mounts a `hostPath` volume, so that it is being used on each master node independently. | ||
However, since the `Backup` job runs two containers (i.e. request-backup and prune) and the containers expect `PVC` as an argument, the approach has been declared as non-viable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is accurate based on the latest testing and discussion is it?
Reviewing now, and discussing sync with @Elbehery, I think we at a high level are here:
Based on this, I think the next steps should probably be:
I don't have a strong opinion on the DS vs Job based approach, both seem to have pros and cons. I would generally lean toward having 1 implementation of the backup mechanics rather than two though, since I expect over time having two implementations will lead to "this feature got implemented but only works in PVC mode" kind of bugs coming up. Having reviewed the implementations of both in POC form, neither is super complex though IMO |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
This PR add enhancement proposal for
Etcd Automated Backups No Config
.Resolves https://issues.redhat.com/browse/ETCD-610
cc @openshift/openshift-team-etcd