Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD-610: automated backups no config #1646

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

Elbehery
Copy link
Contributor

This PR add enhancement proposal for Etcd Automated Backups No Config.

Resolves https://issues.redhat.com/browse/ETCD-610

cc @openshift/openshift-team-etcd

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 17, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 17, 2024

@Elbehery: This pull request references ETCD-610 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to this:

This PR add enhancement proposal for Etcd Automated Backups No Config.

Resolves https://issues.redhat.com/browse/ETCD-610

cc @openshift/openshift-team-etcd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Elbehery
Copy link
Contributor Author

/assign @hasbro17
/assign @tjungblu
/assign @dusk125
/assign @soltysh

@Elbehery Elbehery force-pushed the etcd-automated-backup-no-config branch 3 times, most recently from 035e9d5 to f54b7a3 Compare June 17, 2024 22:31
@Elbehery Elbehery force-pushed the etcd-automated-backup-no-config branch 2 times, most recently from 16f8e08 to a40a86f Compare June 18, 2024 15:55

In the event of the node becoming inaccessible or unscheduled, the recurring backups would not be scheduled. The periodic backup config would have to be recreated or updated with a different PVC that allows for a new PV to be provisioned on a node that is healthy.

If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we restore from a localVolume? Is the data stored somewhere in /var/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the localVolume is being mounted anywhere as specified in the Pod. In fact, you can mount a whole external drive.

iiuc, hostPath supports only mount points from the node's FS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine we have to run the restore script, how would we get the local volume mounted into a recovery machine? would we create another static pod that would mount the localVolume?

I imagine the local storage plugin must be running along with the kubelet for that to work? @jsafrane / @gnufied maybe you guys can elaborate a bit how that works, I don't have much time this week to read through the CSI implementation 🐎

Copy link
Member

@gnufied gnufied Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pardon my ignorance for not being familiar with etcd deployment, but even if there is a default SC available, why are we backing up just one replica of etcd? I thought, we would want to backup all replicas - no? Such as - what if in case of network partition or something, the replica we are backing up is behind?

So IMO it sounds whether we are using LocalVolume or something else, we should always be creating backups across master nodes? Is that accurate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to answer @tjungblu question. Local Volumes are a inbuilt feature of k8s and hence no additional plugin is necessary. Heck, a local PV can be provisioned statically and local-storage-operator is not necessary either, https://docs.openshift.com/container-platform/4.15/storage/persistent_storage/persistent_storage_local/persistent-storage-local.html#local-create-cr-manual_persistent-storage-local

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and can be mounted in the new etcd pod and the backup will exist.

check the restore script, you will need to unpack the snapshot before you can run any pod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied in order to make sure the backups are accessible, and that we can round-robin the backups across all master nodes, which are up-to-date with the etcd-cluster leader

Is it possible to

  • Create StatefulSet across the master nodes, where the PVC template uses the localVolume to provision PV.
  • Since we need to do the backup according to a schedule as per EtcdBackupSpec, the issue to manage the backup pods within the STS using the CronJob
  • I am not aware if this is possible, but at least, we can generate Event from the CronJob and the STS could react to these event by taking a backup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the restore script, you will need to unpack the snapshot before you can run any pod

We can use init-container to unpack the backup, and then the etcd-pod can start ?

Otherwise, we can start a fresh etcd-pod and then run etcdctl restore ?

Please correct me if I am wrong

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Create StatefulSet across the master nodes, where the PVC template uses the localVolume to provision PV.

I believe you mean DaemonSet across the master nodes 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought about this as well, but there is no PersistentVolumeClaim template on DaemonSet.

Having a separate PVC & PV for each master node would be ideal for spreading the backups across the master nodes.

That being said, according to the latest update, we are going to use SideCar container && hostPath volume


If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be

- create a PV for each master node using `localVolume` and the backup controller from CEO should take care of balancing the backups across the `available` and `healthy` volumes.
Copy link

@tjungblu tjungblu Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would/could a DaemonSet do that for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

  • We can use DS and use node affinity to run the DS pods only on master nodes. However, I am not aware of how the storage will be handled in this case.

  • Is there is a way to keep the backups in round-robin fashion among all master nodes ?

  • Also what about STS ? wdyt ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, STS could also work, especially with the storage. Would also be useful to have this compared to the static pod/hostpath approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I think if we have STS, then the Pods, PVC, and PVs are being managed together

I think we can in this case create backups on RoundRobin manner across the master nodes.

For BM, I think we can still use local volume as storage option and being managed by STS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this round robin backup work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the original enhancement - https://github.com/openshift/enhancements/blob/master/enhancements/etcd/automated-backups.md and I understand this now better. But still this begs the question - how do you know which snapshot to restore from? What if you take snapshot of a etcd replica which was behind and is just catching up with other master nodes? Is that possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will make sure that the backup is taken from the a member whose log is identical to the leader. This way we make sure that we are not lagging behind.

But, could this approach work ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to handle the balancing? That's not what you should care for at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, therefore the SideCar approach is the most appropriate

#### Cloud based OCP

##### Pros
- Utilising `Dynamic provisioning` is best option on cloud based OCP.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the customer able to access a dynamically created PV to get their snapshots for restoration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gr8 question, so I have defaulted the RetentionPolicy in my sample configs to Retain.

This way the PV contents will never be erased automatically.

Now to answer the restoration

  • If the cluster still running, the PV can be attached to any node, as long as it is in the same availability zone.
  • If the cluster is completely down, the PV can be accessed by the CU, it will never be delete unless manually.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not about the retention, it's about the access and mounting possibilities

If the cluster still running, the PV can be attached to any node, as long as it is in the same availability zone.

that's quite a bad constraint for a disaster recovery procedure, isn't it? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D :D .. I know, well the best option imho is to push backups to Remote storage option, then even in case of the whole cluster is down, a new installation can be restored using the remote backup.

However, in my discussion yesterday, the remote storage was not an option for BM. Also they recommended to keep it simple not too complicated

If it were to me, I would create a side car which pushes the backups to remote storage, I think this option would work for any OCP variant. The master nodes dont have to be in the cloud, the side car could authenticate and push backups regardless of OCP underlying infrastructure, wdyt

Copy link
Member

@gnufied gnufied Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dyanmically provisioned volumes are available in the cluster, it is 100% likely that, cluster has Remote Storage and hence backups are available even if cluster is down.

As for availability zones concerns, yeah that is why, backing up into a PV is not enough. We should consider using CSI snapshots of those PVCs, so as snapshot of backups can be available across availability zones and in case file system on PV gets corrupt or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to fall back to the localVolume approach, since having two separate approaches is harder to maintain.

I believe the localVolume should be sufficient. However, if we could utilize STS across all the master nodes would be ideal in situation where we lose one or more master nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to design this, I would definitely use two different backup strategies. It is far more reliable to use dynamically provisioned PVCs for backups. They can be snapshotted and can be accessible if node goes down.

It seems to me that - any hostpath/localvolume based approach is basically inferior. If last backup was taken on leader and then leader node goes down, then so does our backup(other nodes may have slightly behind backups). So hostpath/localvolume requires fundamentally different recovery mechanism.

IMO - if we take other components that use persistent storage in Openshift, if customers don't provide Storage then no persistent configuration is created. Prometheus or image-registry, they all require storage. And they don't do anything automatically.

So, what I am saying is - we should probably limit scope of this KEP only to environments where an StorageClass is available. If no StorageClass is configured then no automatic backups are taken.

We should not take upon ourselves to decide a storage strategy for customer. cc @tjungblu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to summarize - I would simply not bother to configure backups via localvolume/hostpath in environments where no storage is available and solve that problem via documentation or ask customer to configure storage. I do not think, we should automatically configuring local-storage for these clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree that using dynamic provisioning and CSI snapshot is vital for the automated backup.

I also agree to limit the automated backup for clusters with dynamic provisioning only.

wdyt @tjungblu @hasbro17 @dusk125 @soltysh

@Elbehery Elbehery force-pushed the etcd-automated-backup-no-config branch 2 times, most recently from 8ad5f84 to 99044c1 Compare June 18, 2024 18:00
@Elbehery
Copy link
Contributor Author

Elbehery commented Jun 18, 2024

Also see this default I used to test the approach


apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: etcd-backup-local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: etcd-backup-pvc
  namespace: openshift-etcd
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi 
  storageClassName: etcd-backup-local-storage

---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: etcd-backup-pv-fs
spec:
  capacity:
    storage: 100Gi 
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: etcd-backup-local-storage
  local:
    path: /mnt
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:

---

apiVersion: operator.openshift.io/v1alpha1
kind: EtcdBackup
metadata:
  name: etcd-single-backup
  namespace: openshift-etcd
spec:
  pvcName: etcd-backup-pvc




@Elbehery Elbehery force-pushed the etcd-automated-backup-no-config branch from 99044c1 to 51fdbad Compare June 18, 2024 21:05
- Several options exist for the default `PVCName`.
- Relying on `dynamic provisioning` is sufficient, however not an option for `SNO` or `BM` clusters.
- Utilising `local storage operator` is a proper solution, however installing a whole operator is too much overhead.
- The most viable solution to cover all OCP variants is to use `local volume`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to necessarily install local-storage-operator to use/consume local-storage. For simple one-off use cases like this, it may be possible that etcd-backup operator or something creates static-pv as documented - https://docs.openshift.com/container-platform/4.15/storage/persistent_storage/persistent_storage_local/persistent-storage-local.html#local-create-cr-manual_persistent-storage-local and uses it to perform etcd backups.



### Workflow Description
- The user will enable the AutomatedBackupNoConfig feature gate (under discussion).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that AutomatedEtcdBackup is still a tech preview feature, I'd consider expanding that functionality with the default configuration, rather than adding another one. This will save a lot of problems for some of the fields in the API you're planning to provide default values for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I will fix this 👍🏽


In the event of the node becoming inaccessible or unscheduled, the recurring backups would not be scheduled. The periodic backup config would have to be recreated or updated with a different PVC that allows for a new PV to be provisioned on a node that is healthy.

If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Create StatefulSet across the master nodes, where the PVC template uses the localVolume to provision PV.

I believe you mean DaemonSet across the master nodes 😉


If we were to use `localVolume`, we need to find a solution for balancing the backups across the master nodes, some ideas could be

- create a PV for each master node using `localVolume` and the backup controller from CEO should take care of balancing the backups across the `available` and `healthy` volumes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to handle the balancing? That's not what you should care for at all.


It supports all OCP variants, including `SNO` and `BM`.

However, I am strongly against using it for the following reasons
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, I am strongly against using it for the following reasons
However, the following reasons suggest against that solution:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole has been removed, radical changes :D :D

- No scheduling guarantees for the pod using `hostpath` as is the case with `localVolume`. The pod could be scheduled on a different node from where the hostPath volume exist.
- On the other hand, using `localVolume` and the node affinity within the PV manifest forces the backup pod to be scheduled on a specific node, where the volume is attached.
- `localVolume` allows using a separate disk as PV, unlike `hostPath` which mounts a folder from the node's FS.
- `localVolume` is handled by the PV controller and the scheduler in different manner, in fact it was created to resolve issues with `hostPath`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to add the locality of the backup. Iow. if it happens that backup is kept in the current leader, and that dies, we're loosing that data with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current approach will take backups among all master nodes. Actually we have another issue now, how to distinguish which backup is most up-to-date since we have backups from all master nodes :)

@Elbehery
Copy link
Contributor Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 27, 2024
This enhancement adds on previous work [automated backup of etcd](https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#creating-automated-etcd-backups_backup-etcd).
Therefore, it is vital to use the same feature gate for both approaches. The following issues need to be clarified before implementation

* How to distinguish between `NoConfig` backups and configs that are triggered using `EtcdBackup` CR.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh Would you kindly help me answering these questions ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will call it name="default" and in the controller we'll skip the reconciliation of that CRD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if the sidecar is enabled we skip the backup using the controller, or we skip the default backups if the controllers is enabled ?

Copy link
Contributor

openshift-ci bot commented Jan 1, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hasbro17. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 1, 2025

Overrunning disk space on the control-plane nodes or temporarily I/O saturating them resulting in etcd latency spikes are my primary concerns. Maybe we'd want to stagger the schedule among etcd members. Though I'm not sure if that's something a manually speficied Backup CR allows.

@sjenning I also vote for the DaemonSet approach

you can find here a test of the CronBased approach #1646 (comment) .. IMHO the DS approach is more flexible, and easier to modify and adapt.

For instance, I could just add a container to the Pod template, in order to push the backup to external target (e.g. S3 bucket)

@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 1, 2025

poll

I have created this poll for easier voting, please click on the image and then vote accordingly

cc @JoelSpeed
@tjungblu
@vrutkovs
@jmhbnz
@deads2k
@csrwng
@sjenning
@wking
@rhuss
@cuppett

Copy link
Contributor

openshift-ci bot commented Jan 1, 2025

@Elbehery: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Contributor

Also posted on the implementation PR, but perhaps we should continue the discussion here

The limitation is that, the pruning runs on only one master node at a time.

I guess I didn't fully understand the flow here, did not expect to see a cronjob to create EtcdBackup which then triggers the actual backups, I thought that would be a controller of itself and that we only had one Job involved here.

Anyway, is there any particular reason why the pruning has to be done prior to the EtcdBackup being created? Why does the pruning not happen as part of the second Job, the one triggered by the EtcdBackup being created? (Do you have well known names for these two jobs?)

To summarize, this approach works but as you can see, the DaemonSet approach is much more flexible

How would you make the argument that it is more flexible? Which parts are more flexible? Is this flexibility required, and, is it worth having to implement and maintain a completely new backup system, vs just re-using/extending the existing system and having 1 path?

spec:
  pvcName: no-config

The intention in the future would be to extend this API and create a specific area to specify what kind of storage is being used, and assume not to just assume based on a magic keyname.

I believe we discussed making this something like

spec:
  backupStorage:
    type: HostPath | PVC
    pvc: # Only valid when type is PVC
      name: ...

@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 2, 2025

@JoelSpeed yes, u r right, lets move here

Citing PoC implemented in openshift/cluster-etcd-operator#1381

I guess I didn't fully understand the flow here, did not expect to see a cronjob to create EtcdBackup which then triggers the actual backups, I thought that would be a controller of itself and that we only had one Job involved here.

So this is the current implementation, as I explained in openshift/cluster-etcd-operator#1381 (comment)

Anyway, is there any particular reason why the pruning has to be done prior to the EtcdBackup being created? Why does the pruning not happen as part of the second Job, the one triggered by the EtcdBackup being created? (Do you have well known names for these two jobs?)

I am not the one who can answer this, I believe @tjungblu and @hasbro17 are the best to answer

However, iirc, the prune container had to run as init-container, since it should run in serial order with the cluster-backup container.

I can also assume that since the we have a one-time cluster backup, which is supported by the request-backup container and being handled by backup controller, and a recurring backup being handled by periodicbackupcontroller, maybe this is the reason we have two separate resources.

The templates used for creating the CronJob and the Job resources are below

How would you make the argument that it is more flexible? Which parts are more flexible? Is this flexibility required, and, is it worth having to implement and maintain a completely new backup system, vs just re-using/extending the existing system and having 1 path?

As the whole code of the etcd-backup-server runs as a single entity (i.e. Pod), it is easier to adapt, and edit.

For instance, if we were now moving the prune container from the CronJob level to the BackupJob level, this could be a big change. On the other hand, doing the same changes on the etcd-backup-server Pod, is bound to the same entity, and would not have a big impact on CU experience with the product.

The intention in the future would be to extend this API and create a specific area to specify what kind of storage is being used, and assume not to just assume based on a magic keyname.

Indeed, if we were to move forward with this approach, then we would make the API changes 👍🏽

@tjungblu
Copy link

tjungblu commented Jan 2, 2025

Anyway, is there any particular reason why the pruning has to be done prior to the EtcdBackup being created? Why does the pruning not happen as part of the second Job, the one triggered by the EtcdBackup being created? (Do you have well known names for these two jobs?)

I am not the one who can answer this, I believe @tjungblu and @hasbro17 are the best to answer

Very simply because an EtcdBackup is a one-off backup. There is nothing to prune in such a scenario, in the recurring one we need to ensure that we do not overflow the disk. That's also the reason why it's done prior to creating the backup, to ensure there is enough space left for another backup.

@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 2, 2025

Anyway, is there any particular reason why the pruning has to be done prior to the EtcdBackup being created? Why does the pruning not happen as part of the second Job, the one triggered by the EtcdBackup being created? (Do you have well known names for these two jobs?)

I am not the one who can answer this, I believe @tjungblu and @hasbro17 are the best to answer

Very simply because an EtcdBackup is a one-off backup. There is nothing to prune in such a scenario, in the recurring one we need to ensure that we do not overflow the disk. That's also the reason why it's done prior to creating the backup, to ensure there is enough space left for another backup.

so moving the prune container to the cluster-backup-job is not an option ?

because this is the only way to have the pruning functionality while using the parallel-job approach 🤔

@tjungblu
Copy link

tjungblu commented Jan 2, 2025

what's the point of that? I want to create a one-off-backup because I'm making a risky change, I don't want the job to delete all prior backups.

because this is the only way to have the pruning functionality while using the parallel-job approach 🤔

I thought we have settled on DS and a separate hostPath api change now, why are we debating this again?

@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 2, 2025

what's the point of that? I want to create a one-off-backup because I'm making a risky change, I don't want the job to delete all prior backups.

because this is the only way to have the pruning functionality while using the parallel-job approach 🤔

I thought we have settled on DS and a separate hostPath api change now, why are we debating this again?

I am replying to the reviews requests, if we are settled then I am happy 🥳

What API change I need to create ?

@JoelSpeed
Copy link
Contributor

I can also assume that since the we have a one-time cluster backup, which is supported by the request-backup container and being handled by backup controller, and a recurring backup being handled by periodicbackupcontroller, maybe this is the reason we have two separate resources.

Very simply because an EtcdBackup is a one-off backup. There is nothing to prune in such a scenario, in the recurring one we need to ensure that we do not overflow the disk. That's also the reason why it's done prior to creating the backup, to ensure there is enough space left for another backup.

With a one-off backup, you have no pre-flight check to make sure there is enough space? Would it not be helpful, in the case where a user requests a one-off backup to run this check? For example, we are talking about enabling the use of EtcdBackup to trigger hostpath based backups (and IIRC that was also in scope beyond this EP?), why would you not want pruning when an admin triggers a one off hostpath back?

I want to create a one-off-backup because I'm making a risky change

A user would do this prior to the risky change, so pruning before this backup isn't actually harmful as the cluster should still be in a good state

I don't want the job to delete all prior backups.

Pruning deletes all previous backups? I would have assumed it prunes some amount to make sure there is space, but leaving some sort of history?

I thought we have settled on DS and a separate hostPath api change now, why are we debating this again?

IIUC, the last discussion assumed that the idea of just re-using all of the existing mechanisms and setting up a Job with parrallelism didn't work. But Mustafa hadn't actually tested the key part of that prior to ending that experiment. I prompted him about this and he has now completed the experiment, and has shown it works.

The previous discussion assumed only the DS worked. So now the decision should be re-discussed. Do we want a completely new backup system, or do we want to just extend the current one? Do we want 2 different implementations reacting to one CRD? Or just one? Both solutions work, both have pros, both have cons.

@tjungblu
Copy link

tjungblu commented Jan 2, 2025

With a one-off backup, you have no pre-flight check to make sure there is enough space? Would it not be helpful, in the case where a user requests a one-off backup to run this check?

Now you're conflating two different things, pre-flight space checks and retention. What should be retained on a hostPath that you just want to have for a one-time backup?

A user would do this prior to the risky change, so pruning before this backup isn't actually harmful as the cluster should still be in a good state

So I shall randomly delete things under a hostPath that we don't know what's in there, before doing a risky change? This is absurd, Joel.

@JoelSpeed
Copy link
Contributor

Now you're conflating two different things, pre-flight space checks and retention. What should be retained on a hostPath that you just want to have for a one-time backup?

So I shall randomly delete things under a hostPath that we don't know what's in there, before doing a risky change? This is absurd, Joel.

How does this work today? I am not an expert on the backup mechanics, but I would assume at least that when you create a backup, there is a file (or folder?) with a well known name? You would delete only the things that look like they were created by a backup, based on the naming

I'm thinking here about how traditional VM backup software works. In at least 3 backup softwares I know of, I can configure a job, as a manual run. But then I can trigger that multiple times, and the retention settings are still shared across those runs.

In a parallel here, if I were an admin, and I didn't want to schedule my backups, but I re-used the folder over and over, would I expect over time to cap the size of that backup? Would I expect it to eventually start pruning the oldest of my manual backups? Perhaps this isn't supported today since there isn't a pruning/retention configuration in the API?

Are we going off on a tangent here? Perhaps we should refocus the conversation on the question "Would it be preferable long term to base all types of backup on a single implementation (the Job based one) assuming no major hurdles?" and if the answer to that is Yes, "What of the existing open questions/discussions around the Job based approach would be considered to big to overcome?".


## Summary

To enable automated backups of Etcd database and cluster resources of an Openshift cluster from day one after installation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: etcd is always lowercase :)

The current method of **Backup** and **Restore** of an Openshift cluster is outlined in [Automated Backups of etcd](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/automated-backups.md).
This approach relies on user supplied configuration, in the form of `config.openshift.io/v1alpha1 Backup` CR. This method utilizes `batchv1.CronJob` in order to take Backups of Etcd and cluster resources.

To improve the user's experience, this work proposes taking etcd backup using default configurations from day one, without user intervention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: from a support perspective does this mean it is a bug if the backup is invalid or corrupt? are there any legal concerns with Red Hat "owning" backups?

* Backups should be taken without configuration after cluster installation from day 1.
* Backups are saved to a default location on each master node's file system.
* Backups are taken according to a default schedule.
* Backups are being maintained according to a default retention policy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it also be a goal to ensure that the backup is valid? Is there a periodic job that ensures this or is the cluster/customer trusting it will work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the size of the etcd db is no longer capped. Is this a concern for taking snapshots?


_**Pros**_

As the Backup pods are being scheduled independently of the etcd pods, any error within the Backup pods does not influence cluster Openshift cluster availability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: would a failure to create a backup block upgrades?


As the Backup pods are being scheduled independently of the etcd pods, any error within the Backup pods does not influence cluster Openshift cluster availability.

Moreover, as the Backup pods are being scheduled within every master nodes, the backups are being taken from every local etcd member on the same node. This avoids overwhelming a specific member with backup requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: it has been a while but doesn't a snapshot request always go to the leader?

The _no-config_ backups builds atop of the existing [automated backup of etcd](https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#creating-automated-etcd-backups_backup-etcd).

The `PeriodicBackupController` within **cluster-etcd-operator** checks for `Backup` CR with `name=default` within every sync. Upon applying a `Backup` CR with `name=default`, the _no-config_ backup sidecar container is being enabled within each etcd pod in Openshift cluster.
The `Backup` configurations such as `schedule`, `timezone` and `retention policy`, are being supplied to the _no-config_ container as flags.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does retention actually work for the hostpath based backup in the DS approach?

Comment on lines +341 to +343
Since the `config.openshift.io/v1alpha1 Backup` API expects only one `PVC`, only one Backup Job could mount the PVC and schedule the backup.
A workaround has been tried, in which the `job.Spec` mounts a `hostPath` volume, so that it is being used on each master node independently.
However, since the `Backup` job runs two containers (i.e. request-backup and prune) and the containers expect `PVC` as an argument, the approach has been declared as non-viable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate based on the latest testing and discussion is it?

@JoelSpeed
Copy link
Contributor

Reviewing now, and discussing sync with @Elbehery, I think we at a high level are here:

  • A "create only" default Backup CR will be created
    • Users can modify this resource if they disagree with the default configuration
    • If a user deletes this configuration, it will be recreated
  • The PVC configuration is omitted on the Backup CR, which implicitly means that a hostpath is used
    • This is bad from an API perspective, we need a required storage choice discriminated union to ensure an explicit choice from the user
    • Current implementation ignores the PVC name if the Backup CR has a special name - this is confusing UX and doesn't allow user to update the configuration
    • Current implementation applies various special logical choices based on the CR name only, this should be within the API and an explicit choice
  • HostPath no-config approach is a branch from the main periodic backup logic
    • It creates a DS with pods on each of the master nodes
    • Pods run continuously, and handle backups on cron schedule
    • Completely separate implementation to the existing cronjob implementation
    • This means implementing configuration of Backup CR twice for two backup paths - will want testing to ensure all options work across both implementations, perhaps some refactoring could deduplicate here to avoid this scenario
    • Enhancement needs to be updated to explain how the pruning works for this implementation (as it differs from the normal scenario)
  • The sidecar container approach has been dismissed and needs to be moved to the alternatives
  • A third path of using the existing CronJob approach has been suggested (alternatives section is outdated wrt this)
    • Recently proven to work (over shutdown), with parallelism and completion configuration
    • Re-uses existing cronjob logic, but modifies some parameters when "hostpath" is configured (TBD, needs an API change)
    • Potentially easier to maintain in the future?
    • Open questions about changes that might be required around backup pruning here

Based on this, I think the next steps should probably be:

  • Remove special reliances on the CR name, and move explicit choice of storage (PVC vs hostpath) to the Backup API
  • Ensure that when the ^ API configures hostpath, the appropriate controller handles this correctly
  • Ensure that any user change to the default Backup CR is observed and enacted
  • Write up the parallelism/job based approach that was tested over shutdown, exploring pros and cons as with the other two approaches
  • Resolve open questions about DS and Job based approaches to a point where we could discuss and make a decision on which to move forward with
  • Discuss/make a decision on cronjob vs DS approach based on each of their pros and cons, move one to alternatives
  • Move sidecar approach to alternatives

I don't have a strong opinion on the DS vs Job based approach, both seem to have pros and cons. I would generally lean toward having 1 implementation of the backup mechanics rather than two though, since I expect over time having two implementations will lead to "this feature got implemented but only works in PVC mode" kind of bugs coming up. Having reviewed the implementations of both in POC form, neither is super complex though IMO

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2025
@Elbehery
Copy link
Contributor Author

Elbehery commented Feb 1, 2025

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.