Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding backup to gcs using gcsfuse with an auto option. #215

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gmarcospythian
Copy link
Contributor

Change Description:

Adding gcsfuse automatic functionality to backup to GCS bucket as part of the creation of the machine.

Solution Overview:

This solution implements using gcsfuse, this allows presenting and mounting the GCS bucket into the host.

The solution includes installing the rpm directly from Google Cloud repository to the current version available on the repository.

The current "auto" implementation takes in consideration to add a new bucket identified by the instance_id belonging to the GCE compute object. This is a unique identifier which allows to assign and trace back buckets to its own gce instance_id. The bucket then is mounted by creating the keys on a service account related to the instance_id. The service account keys are stored in the host and assigned to the mount command while mounting the volume.

Once the verifications are done, the configuration scripts continue, assigning the new gcsfuse mount point to the backup scripts.

Removing the bucket removes all backups and data with the GCE instance_id assigned bucket. The removal also takes away the service account and service account keys.

Tested replacing current backup configuration with --config-db

./install-oracle.sh --instance-ip-addr 10.2.80.62 --ora-version 19 --ora-swlib-bucket gs://pythian-gto-oracle-software/19c --backup-dest /u01/backups --config-db

Test commands

time ./install-oracle.sh --ora-swlib-bucket gs://pythian-gto-oracle-software/19c --instance-ssh-user marcos --instance-ssh-key ~/.ssh/id_rsa --backup-dest "+RECO" --ora-swlib-path /u02/swlib/ --ora-version 19 --ora-swlib-type gcs --ora-asm-disks ~/asm_disk_config.json --ora-data-mounts ~/data_mounts_config.json --cluster-type NONE --ora-data-diskgroup DATA --ora-reco-diskgroup RECO --ora-db-name orcl --ora-db-container false --instance-ip-addr 10.2.80.72 --instance-hostname marcos-db-rhel7-9qte --allow-install-on-vm --no-patch  --gcs-backup-config auto
time ./install-oracle.sh --ora-swlib-bucket gs://pythian-gto-oracle-software/19c --instance-ssh-user marcos --instance-ssh-key ~/.ssh/id_rsa --backup-dest "+RECO" --ora-swlib-path /u02/swlib/ --ora-version 19 --ora-swlib-type gcs --ora-asm-disks ~/asm_disk_config.json --ora-data-mounts ~/data_mounts_config.json --cluster-type NONE --ora-data-diskgroup DATA --ora-reco-diskgroup RECO --ora-db-name orcl --ora-db-container false --instance-ip-addr 10.2.80.125 --instance-hostname marcos-db-rhel8-gqn9 --allow-install-on-vm --no-patch  --gcs-backup-config auto

Gist output log from full test run of toolkit on 19c and covering 4 versions of OS, rhel 7/8 oel 7/8.

* Adding gcsfuse backup auto option, fixing formatting requests, modifying user guide and adding parameter description.
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gmarcospythian
Once this PR has been reviewed and has the lgtm label, please assign alexbasinov for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Hi @gmarcospythian. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mfielding
Copy link
Member

/ok-to-test

Copy link

@gmarcospythian: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
oracle-toolkit-install 43f96d5 link true /test oracle-toolkit-install

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

- name: gcsfuse | gcsfuse install
include_tasks: gcsfuse.yml

- name: gcsfuse enable resource manager API | Enable resource manager API auto creation of bucket
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I appreciate the intent of making this plug-and-play, google cloud security recommendations are not to allow a VM's default service account to do powerful things like enable APIs.

@@ -206,6 +214,10 @@ fi
echo "Incorrect parameter provided for ora-data-mounts-json: $ORA_DATA_MOUNTS_JSON"
exit 1
}
[[ ! "$REMOVE_AUTOGCS_BUCKET" =~ $REMOVE_AUTOGCS_BUCKET_PARAM ]] && {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle the case where $REMOVE_AUTOGCS_BUCKET is blank too. (see sample output: https://oss.gprow.dev/view/gs/bmaas-testing-prow-bucket/pr-logs/pull/google_oracle-toolkit/215/oracle-toolkit-install/1895185501892120576)

@@ -1889,6 +1889,17 @@ directory does not have to exist, but initial backups will fail if the
destination is not available or writeable.</td>
</tr>
<tr>
<td>GCS backup configuration</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need some more documentation around the gcsfuse option, particularly what cloud-level prerequisites are required, and maybe even an example.


- name: gcsfuse extract project number | GCE instance project number value
shell:
cmd: "gcloud projects describe $(gcloud config get-value project) --format=\"value(projectNumber)\""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do this in native Ansible rather than shelling out?

include_tasks: gcsfuse.yml

- name: gcsfuse enable resource manager API | Enable resource manager API auto creation of bucket
shell:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more secure to use the command module with ignore_errors


- name: gcsfuse create service account | GCE gcsfuse service account
shell:
cmd: "gcloud iam service-accounts create gcsfuse-{{ gce_instance_id.stdout }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of command is even more sensitive security-wise. Let's please have a way to work with more restrictive permissions (perhaps documentation, or even a script that a user can run in cloud shell?)

seconds: 15
when: not ansible_check_mode

- name: gcsfuse create service account key | GCE gcsfuse service account key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for harping about security, but bearer tokens are a bad thing. Application default credentials would be better.

@@ -0,0 +1,92 @@
# Copyright 2025 Google LLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put cleanup-related tasks inside roles/brute-ora-cleanup please?

- name: gcsfuse service account verification | GCE gcsfuse service account verification
shell:
cmd: "gcloud iam service-accounts list --format json|egrep -i gcsfuse-{{ gce_instance_id.stdout }}|egrep -i email |awk -F'\"' '{print $4}'"
register: gcsfuse_backup_service_account_verification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea: can we put the manipulation of cloud constructs into the Terraform we're working on, coupled with docs on how do it manually?

seconds: 15
when: not ansible_check_mode

- name: gcsfuse mount bucket GCE gcsfuse mount directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this survive reboots?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants