Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ORAchk to the Oracle Toolkit (#35) #220

Merged
merged 27 commits into from
Apr 4, 2025

Conversation

jkstill
Copy link
Collaborator

@jkstill jkstill commented Mar 6, 2025

Change Description:

The check-oracle.sh script was added to the oracle-toolkit.

This will allow installing, running and uninstalling orachk.

Solution Overview:

The following files were created to extract and install orachk from the AHF Bundle, run orachk, and uninstall orachk.

  • check-oracle.sh
    • the main user script
  • ansible files
    • check-oracle-vars.yml
    • check-oracle.yml
  • files copied to the target server to run orachk
    • run-orachk.sh
    • orachk.env

Test Commands:

Test Prep:

Download the AHF bundle and copy to the Software Media Bucket.

For Pythian Internal Testing, the two most recent AHF zip files are already copied to the SW Media Bucket

Detailed output examples are available in the User Guide.

Install

./check-oracle.sh  --ahf-install \
      --ora-swlib-bucket gs://pythian-gto-oracle-software \
      --oracle-sid ORCL \
      --oracle-server jkstill-orachk-19c-patch \
      --ahf-dir AHF \
      --ahf-file AHF-LINUX_v25.1.0.zip

Run

./check-oracle.sh  --run-orachk \
      --oracle-sid ORCL \
      --oracle-server jkstill-orachk-19c-patch

Uninstall

./check-oracle.sh  --ahf-uninstall \
      --oracle-sid ORCL \
      --oracle-server jkstill-orachk-19c-patch

Install and Run

./check-oracle.sh  --ahf-install \
        --ora-swlib-bucket gs://pythian-gto-oracle-software \
        --oracle-sid ORCL \
        --oracle-server jkstill-orachk-19c-patch \
        --ahf-dir AHF \
        --ahf-file AHF-LINUX_v25.1.0.zip \
        --run-orachk

Expected Result

An orachk report zip file will be placed in the local /tmp/ directory.,

The name of the file will be shown in when check-oracle.sh --run-orachk is used.

The check-oracle.sh script was added to the oracle-toolkit.

This will allow installing, running and uninstallint orachk.

Solution Overview:
The following files were created to extract and install orachk from the AHF Bundle, run orachk, and uninstall orachk.

check-oracle.sh
the main user script
ansible files
check-oracle-vars.yml
check-oracle.yml
files copied to the target server to run orachk
run-orachk.sh
orachk.env
Test Commands:
Test Prep:
Download the AHF bundle and copy to the Software Media Bucket.

For Pythian Internal Testing, the two most recent AHF zip files are already copied to the SW Media Bucket

Detailed output examples are available in the User Guide.

Install
./check-oracle.sh  --ahf-install \
      --ora-swlib-bucket gs://pythian-gto-oracle-software \
      --oracle-sid ORCL \
      --oracle-server jkstill-orachk-19c-patch \
      --ahf-dir AHF \
      --ahf-file AHF-LINUX_v25.1.0.zip
Run
./check-oracle.sh  --run-orachk \
      --oracle-sid ORCL \
      --oracle-server jkstill-orachk-19c-patch
Uninstall
./check-oracle.sh  --ahf-uninstall \
      --oracle-sid ORCL \
      --oracle-server jkstill-orachk-19c-patch
Install and Run
./check-oracle.sh  --ahf-install \
        --ora-swlib-bucket gs://pythian-gto-oracle-software \
        --oracle-sid ORCL \
        --oracle-server jkstill-orachk-19c-patch \
        --ahf-dir AHF \
        --ahf-file AHF-LINUX_v25.1.0.zip \
        --run-orachk
Expected Result
An orachk report zip file will be placed in the local /tmp/ directory.,

The name of the file will be shown in when check-oracle.sh --run-orachk is used.
Copy link

Hi @jkstill. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkstill
Copy link
Collaborator Author

jkstill commented Mar 11, 2025

/ok-to-test

@jkstill
Copy link
Collaborator Author

jkstill commented Mar 11, 2025

/retest

@mfielding
Copy link
Member

/ok-to-test

Copy link
Member

@mfielding mfielding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is reverting unrelated changes by mistake; can you do through the "files changed" to make sure they're just the ones we intend for this change? Thanks!

…7-orachk

Merge in old commits that were recently merged into oracle-toolkit
@mfielding
Copy link
Member

/retest

@@ -46,7 +46,7 @@ fi

# run the install script
./install-oracle.sh --ora-swlib-bucket gs://bmaas-testing-oracle-software \
--instance-ssh-user ansible1 --instance-ssh-key /etc/files_needed_for_tk/id_rsa_bms_tk_key \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was changed unintentionally (and is causing presubmit tests to fail)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That file is no longer in the PR. There had been a permissions change in an earlier commit by another dev, though I do not know why it appeared as changed here.

@@ -445,6 +445,22 @@ gi_patches:
- { category: "RU", base: "21.3.0.0.0", release: "21.16.0.0.0", patchnum: "36990664", patchfile: "p36990664_210000_Linux-x86-64.zip", patch_subdir: "/", prereq_check: false, method: "opatchauto apply", ocm: false, upgrade: false, md5sum: "N1p/HLEg+UFYAr04loujqg==" }
- { category: "RU", base: "21.3.0.0.0", release: "21.17.0.0.0", patchnum: "37349593", patchfile: "p37349593_210000_Linux-x86-64.zip", patch_subdir: "/", prereq_check: false, method: "opatchauto apply", ocm: false, upgrade: false, md5sum: "Mt0Bw+IPKqoKh31YkneCrg==" }

# 21c GRID RU - 21.4 and 21.7 are available only via SR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too; do we intend to modify patch definitions for 21.3 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why that was changed here, likely a mistake. Reverted.

```text


$ ./check-oracle-test.sh install-and-run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming as rather long: 300+ lines of sample output. Can we skip some of the intermediate output just to show the command and some key results, so people don't have to scroll down too far through the user guide?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples shortened

changed: [ora-db-server-orachk-19c-patch]

TASK [Create AHF directory] ************************************************************************************************************************************************************************************
skipping: [ora-db-server-orachk-19c-patch]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, since we're talking about keeping output trip, I think we have ansible configured not to output the "skipping" lines when running install-oracle.sh. Would it make sense to do the same here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'skipping' no longer displayed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file was added by mistake.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, not sure why this is appearing here.

run-orachk.sh Outdated
. $orachk_env_file

# use of undeclared variables is fatal
set -u
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 'set -e' too, to make errors in subcommands fatal too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scrips that had 'set -u' no longer exist.

orachk.sh Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid cluttering up the top-level directory too much, can we put the internal functionality into a subdirectory, and leave just the user-exposed check-oracle.sh at the top level?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

run-orachk.sh Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the logic here seems to be to validate preqresuities and set environment variables for orachk itself. Assuming we intend only to run form ansible, and that the playbook already satisfies the prerequisites, is it possible for the playbook to directly run ${ORACHK_BASE}/bin/${CHK_EXE_NAME} ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run-check.sh and orachk.sh removed, and their functionality moved to ansible files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind running these shell scripts through a linter like shellcheck? It has a bunch of lint-style warnings around quoting and the like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only remaining shell script is check-oracle.sh, and I have run it through shellcheck, and made any modifications that were appropriate.

# Default is to NOT uninstall AHF
uninstall_ahf: false
# default is to NOT run orachk
run_orachk: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a thought here: check-oracle says it's used to run orachk, but here the default is not to run it. So I must be missing something here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not uninstall/install AHF if already installed - that's all it is.
If AHF has been uninstalled, then the install will take place.

@jkstill
Copy link
Collaborator Author

jkstill commented Mar 24, 2025

Copy link
Member

@mfielding mfielding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get this to complete, but I did manage to get fairly far and here are a few comments related to the hiccups I saw.

check-oracle.sh Outdated
[[ -z $ORACLE_SERVER ]] && { echo "please specify --oracle-server"; exit 1; }
[[ -z $ORACLE_SID ]] && { echo "please specify --oracle-sid"; exit 1; }

INVENTORY_FILE=inventory_files/inventory_${ORACLE_SERVER}_${ORACLE_SID}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If --oracle-server and --oracle-sid are just used to generate an inventory file, I think it would be better to accept invenotry_file directly, just like cleanup-oracle.sh does today. Why? In my RAC install, the inventory file has a different naming convention. When installing, we use --db-name and --instance-ip-addr rather than oracle-sid and oracle-server, so can be confusing here.

Copy link
Collaborator Author

@jkstill jkstill Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfielding While I can add a parameter --inventory-file, the --oracle-sid is not just to get the name of the inventory file.

--oracle-server and be renamed to --instance-ip-addr

--oracle-sid can be renamed to --db-name, but will still be required, as the value must be specified on the orachk command line.

Edit: --db-name is only required to run orachk, not to install or uninstall

shift
;;

--ahf-install)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no command is specified, this script exits with success and no output at all. It would be better to show an error in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something here does not seem right.

When I run without any options, I get this:

$ ./check-oracle.sh
please specify --oracle-server

When run with --ahf-install

$ ./check-oracle.sh --ahf-install
please specify --oracle-server

I think it would probably be good to include the help message here, but, there is output.

Can you verify that check-oracle.sh is correct?

What I see in google-toolkit-for-oracle in rel/oratk-57-orachk

$ sha256sum check-oracle.sh
1ffdb2c72f52ce073f8a7b0c6cc3309653adc9226cef072a4faf06174f9a0928  check-oracle.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, I'm saw the error message and passed --oracle-server as requested (but no command). And I got success and blank output.

while true
do

case "$1" in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask for a --debug parameter like install-oracle.sh does, which adds verbosity to the ansible-playbook command? Maybe even an --extra-vars to pass extra stuff to ansible if needed?

state: directory
mode: "0700"

- name: Copy AHF file from Google Storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this file doesn't exist, I get a success here, and an undecipherable error later. If the file can't be found, can we fail with an obvious error?

TASK [Copy AHF file from Google Storage] ***********************************************************************************************************************************************************************************************************************************************************************************
ok: [mfielding-ora122-node2]

TASK [Unzip AHF file] ******************************************************************************************************************************************************************************************************************************************************************************************************
fatal: [mfielding-ora122-node2]: FAILED! => {"changed": false, "msg": "Failed to find handler for "/u01/AHF". Make sure the required command to extract the file is installed.\nCommand "/bin/gtar" could not handle archive: Unable to list files in the archive: tar (child): /u01/AHF: Cannot read: Is a directory\ntar (child): At beginning of tape, quitting now\ntar (child): Error is not recoverable: exiting now\ntar (grandchild): bzip2: Cannot exec: No such file or directory\ntar (grandchild): Error is not recoverable: exiting now\n/bin/gtar: Child returned status 2\n/bin/gtar: Error is not recoverable: exiting now\n\nCommand "/bin/gtar" could not handle archive: Unable to list files in the archive: /bin/gtar: /u01/AHF: Cannot read: Is a directory\n/bin/gtar: At beginning of tape, quitting now\n/bin/gtar: Error is not recoverable: exiting now\n\nCommand "/bin/gtar" could not handle archive: Unable to list files in the archive: tar (child): /u01/AHF: Cannot read: Is a directory\ntar (child): At beginning of tape, quitting now\ntar (child): Error is not recoverable: exiting now\n\ngzip: stdin: unexpected end of file\n/bin/gtar: Child returned status 2\n/bin/gtar: Error is not recoverable: exiting now\n\nCommand "/bin/unzip" could not handle archive: unzip: cannot find or open /u01/AHF, /u01/AHF.zip or /u01/AHF.ZIP.\n\nCommand "/bin/gtar" could not handle archive: Unable to list files in the archive: tar (child): /u01/AHF: Cannot read: Is a directory\ntar (child): At beginning of tape, quitting now\ntar (child): Error is not recoverable: exiting now\ntar (grandchild): zstd: Cannot exec: No such file or directory\ntar (grandchild): Error is not recoverable: exiting now\n/bin/gtar: Child returned status 2\n/bin/gtar: Error is not recoverable: exiting now\n\nCommand "/bin/gtar" could not handle archive: Unable to list files in the archive: tar (child): /u01/AHF: Cannot read: Is a directory\ntar (child): At beginning of tape, quitting now\ntar (child): Error is not recoverable: exiting now\nxz: (stdin): File format not recognized\n/bin/gtar: Child returned status 2\n/bin/gtar: Error is not recoverable: exiting now\n"}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not working for me, even with the latest version of the code. It seems /u01/AHF is empty:

[mfielding_google_com@mfielding-ora122-node1 ~]$ sudo ls -al /u01/AHF
total 0
drwx------ 2 root root 6 Mar 25 18:17 .
drwxr-xr-x 6 root oinstall 57 Mar 25 18:17 ..

And then the unzip fails with a cryptic error (pasted last comment) that seems to be related to unarchive trying to detect the archive format.

--debug output attached in case it's useful: https://gist.github.com/mfielding/d932a159ae5a50cedce458375f8843f1

check-oracle.sh Outdated

ORACLE_SERVER=''
AHF_DIR='AHF'
AHF_FILE=''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with oracle_server, if we're using ora-swlib-bucket and ahf-file only to generate a GCS path to pull the AHF zipfile, I'd prefer just to accept a simple --ahf-location=gs://bucket/file/location/AHF.zip parameter, and validate that it starts with gs:// and that it exists.

check-oracle.sh Outdated
exit 1
fi

# set to 'echo ' to only dislay commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To echo commands, I jsut run bash -x check.oracle.sh ...
So if even just for consistency with our other shell scripts, can we remove this debug reference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, not yet pushed to repo.

state: present

- name: Run AHF setup
shell: yes 'Y' | ./ahf_setup -extract -notfasetup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this run AHF in daemon mode? If so, I'd prefere standalone, if just that it's one less thing to clean up. The intent here is for a one-off anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AHF is not fully installed. The method used is the one documented that allows running ORAchk in standalone mode.
check.oracle --ahf-uninstall removes the AHF installation.
What is left is the zip file and log file created by orachk.

I'm not sure why this would be considered a one off. ORAchk may be run repeatedly to determine if any remediation efforts were effective.

check-oracle.yml Outdated
mode: "0700"

- name: Copy AHF file from Google Storage
shell: gsutil cp {{ ORA_SWLIB_AHF_FILENAME }} {{ ahf_extract_path }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get an error here:

    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py\", line 123, in _GetApi",
    "    self._LoadApi(provider, api_selector)",
    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py\", line 141, in _LoadApi",
    "    self.api_map[ApiMapConstants.API_MAP][provider][api_selector](",
    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py\", line 242, in __init__",
    "    SetUpJsonCredentialsAndCache(self, logger, credentials=credentials)",
    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/gslib/gcs_json_credentials.py\", line 218, in SetUpJsonCredentialsAndCache",
    "    api.credentials = (credentials or _CheckAndGetCredentials(logger) or",
    "                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^",
    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/gslib/gcs_json_credentials.py\", line 301, in _CheckAndGetCredentials",
    "    gce_creds = _GetGceCreds()",
    "                ^^^^^^^^^^^^^^",
    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/gslib/gcs_json_credentials.py\", line 464, in _GetGceCreds",
    "    return credentials_lib.GceAssertionCredentials(",
    "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^",
    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/credentials_lib.py\", line 263, in __init__",
    "    scopes = cached_scopes or self._ScopesFromMetadataServer(scopes)",
    "                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^",
    "  File \"/usr/lib64/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/credentials_lib.py\", line 339, in _ScopesFromMetadataServer",
    "    raise exceptions.ResourceUnavailableError(",
    "apitools.base.py.exceptions.ResourceUnavailableError: GCE credentials requested outside a GCE instance"

I'd suggest using the same mechanism we use for install-oracle.sh: running the GCS copy from the control node: https://github.com/google/oracle-toolkit/blob/633e4e61ad3f0a35ebf80e1065066bf547f53204/roles/swlib/tasks/gcscopy.yml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share the command lines you are using? Otherwise I just have to guess at what the command options are being used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(with some hacked up code to use my inventory file)
./check-oracle.sh --ahf-install --inventory-file inventory_files/inventory_new_RAC --ora-swlib-bucket gs://my-
oracle-software --ahf-file AHF-LINUX_v25.2.2.zip

But I think the key point is that we may not be able to rely on commands like gsutil working on the DB node.

@jkstill
Copy link
Collaborator Author

jkstill commented Mar 27, 2025

Changes to check-oracle.sh

  • --oracle-sid is now --db-name
  • --oracle-server changed to --instance-ip-addr
    • server name may be used here as well
  • --extra-vars option added
  • --debug option added
  • --ahf-location added
    • swlib options removed
  • tests in check-oracle.yml to validate files and content
    • /etc/oratab
    • ORAchk output log, used to get ORAchk report zip file
    • the ORAchk zip file
    • error messages display should these fail

Options to install

example 1:

./check-oracle.sh \
  --ahf-install \
  --db-name <your-database-name> \
  --instance-ip-addr <your-instance-ip-address|server-name> \
  --ahf-location <path-to-ahf-zip-file>

example 2:

./check-oracle.sh \
  --ahf-install \
  --inventory-file <path-to-inventory-file> \
  --ahf-location <path-to-ahf-zip-file>

Options to run

The minimum required options to run:

example 1:

./check-oracle.sh \
  --run-orachk \
  --db-name <your-database-name> \
  --instance-ip-addr <your-instance-ip-address|server-name>

example 2:

./check-oracle.sh \
  --run-orachk \
  --db-name <your-database-name> \
  --inventory-file <path-to-inventory-file>

@jkstill
Copy link
Collaborator Author

jkstill commented Mar 27, 2025

Gists from the most recent tests.

These tests are using expedited testing, which entails the use of a dummy script to replace orachk for testing purposes.
ie. orachk will run in seconds rather than minutes, which is useful for testing logic.

This is done by adding --extra-vars "expedited_testing=true" to the command line

Tests can then be run in regular mode once changes are accepted.

'oratk-57-orachk no options'
'oratk-57-orachk fail due to invalid ahf'
'oratk-57-orachk succeed with valid ahf'
'oratk-57-orachk fail due to invalie db-name'
'oratk-57-orachk succeed with valid db-name'
'oratk-57-orachk succeed - lookup inventory file via ip address'
'oratk-57-orachk succeed - lookup inventory file with hostname'
'oratk-57-orachk succeed - uninstall'

Copy link
Member

@mfielding mfielding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I'm still not able to get through the install stage; I attached a gist in the comments with my full --debug output in case it's useful.

help () {

echo -e "\tUsage: $(basename "$0")"
echo "${GETOPT_MANDATORY}" | sed 's/,/\n/g' | sed 's/:/ <value>/' | sed 's/\(.\+\)/\t --\1/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the autogeneration of usage, but when I run it, I don't see --install-ahf at all:

$ bash ./check-oracle.sh --inventory-file inventory_files/inventory_new_RAC
Usage: check-oracle.sh
--instance-ip-addr
[ --extra-vars ]
[ --ahf-location ]
[ --db-name ]
[ --inventory-file ]
[ --ahf-install ]
[ --ahf-uninstall ]
[ --run-orachk ]
[ --help ]
[ --debug ]

--ahf-install and --run-orachk may be combined to install and run
--extra-vars is used to pass extra ansible vars
example: --extra-vars var1=val1 var2=val2 ...

state: directory
mode: "0700"

- name: Copy AHF file from Google Storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not working for me, even with the latest version of the code. It seems /u01/AHF is empty:

[mfielding_google_com@mfielding-ora122-node1 ~]$ sudo ls -al /u01/AHF
total 0
drwx------ 2 root root 6 Mar 25 18:17 .
drwxr-xr-x 6 root oinstall 57 Mar 25 18:17 ..

And then the unzip fails with a cryptic error (pasted last comment) that seems to be related to unarchive trying to detect the archive format.

--debug output attached in case it's useful: https://gist.github.com/mfielding/d932a159ae5a50cedce458375f8843f1

check-oracle.yml Outdated
- name: Download AHF file from Google Storage to the ansible control node.
local_action:
module: command
cmd: "gsutil cp {{ AHF_LOCATION }} {{ local_ahf_path }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, ansible is running this as root, and causing issues with gsutil:

mfielding_google_com@mfielding-ora122-control:~/git/orachk$ ls -l /tmp/AHF*
-rw-r--r-- 1 root root 311141407 Mar 28 20:11 /tmp/AHF-LINUX_v25.2.2.zip

TASK [Download AHF file from Google Storage to the ansible control node.] ***********************************************************************************************************
changed: [mfielding-ora122-node2 -> localhost]
fatal: [mfielding-ora122-node1 -> localhost]: FAILED! => {"changed": false, "cmd": ["gsutil", "cp", "gs://gcp-oracle-software/AHF/AHF-LINUX_v25.2.2.zip", "/tmp/AHF-LINUX_v25.2.2.zip"], "delta": "0:00:08.449122", "end": "2025-03-28 20:11:31.968966", "msg": "non-zero return code", "rc": 1, "start": "2025-03-28 20:11:23.519844", "stderr": "Copying gs://gcp-oracle-software/AHF/AHF-LINUX_v25.2.2.zip...\n/ [0 files][    0.0 B/296.7 MiB]                                                \r-\r- [0 files][ 33.8 MiB/296.7 MiB]                                                \r\\\r|\r| [0 files][ 92.8 MiB/296.7 MiB]                                                \r/\r/ [0 files][151.3 MiB/296.7 MiB]                                                \r-\r\\\r\\ [0 files][208.8 MiB/296.7 MiB]                                                \r|\r/\r/ [0 files][266.3 MiB/296.7 MiB]                                                \r-\rOSError: No such file or directory.", "stderr_lines": ["Copying gs://gcp-oracle-software/AHF/AHF-LINUX_v25.2.2.zip...", "/ [0 files][    0.0 B/296.7 MiB]                                                ", "-", "- [0 files][ 33.8 MiB/296.7 MiB]                                                ", "\\", "|", "| [0 files][ 92.8 MiB/296.7 MiB]                                                ", "/", "/ [0 files][151.3 MiB/296.7 MiB]                                                ", "-", "\\", "\\ [0 files][208.8 MiB/296.7 MiB]                                                ", "|", "/", "/ [0 files][266.3 MiB/296.7 MiB]                                                ", "-", "OSError: No such file or directory."], "stdout": "", "stdout_lines": []}

Not sure what's going on here; one thing to try might be to use gcloud storage cp instead of gsutil cp; it's more modern and has better error checking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying to the localhost first and then copying to the remote does complicate things.

The 'become: true' is now being used only where necessary, so the file will be owned by the current user.

Now using the ansible stat module to check for the file, it just has to exist.

Also switched to gcloud storage cp

check-oracle.yml Outdated
- name: Download AHF file from Google Storage to the ansible control node.
local_action:
module: command
cmd: "gsutil cp {{ AHF_LOCATION }} {{ local_ahf_path }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, can we fix the quoting here? I suggest:
cmd: gsutil cp "{{ AHF_LOCATION }}" "{{ local_ahf_path }}"
(or, even better, the equivalent gcloud storage cp command)
Among other things, it will help us detect if one of these variables is empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now using gcloud storage cp "{{ AHF_LOCATION }}" "{{ local_ahf_path }}"

@jkstill jkstill force-pushed the rel/oratk-57-orachk branch from 04b3213 to 8ebb803 Compare March 28, 2025 22:12
- block:
- name: Create AHF directory
file:
path: "{{ ahf_extract_path }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failing for me, on a brand new install:

fatal: [mfielding-ora122-node1]: FAILED! => {"changed": false, "msg": "There was an issue creating /u01/AHF as requested: [Errno 13] Permission denied: b'/u01/AHF'", "path": "/u01/AHF"}

ls -ld /u01

drwxr-xr-x 4 root oinstall 32 Apr 2 22:02 /u01


- name: Create ORAchk directory
file:
path: "{{ orachk_script_dir }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably fail for me too.

Copy link
Member

@mfielding mfielding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. Still getting permission errors, sadly.

msg: "Could not locate ORAchk report file {{ ORACHK_RPT_FILE }}"
when: not orachk_rpt_location.stat.exists

- name: Fetch the ORAchk zipfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting permission errors here too:

TASK [Fetch the ORAchk zipfile] ***************************************************************************************
fatal: [mfielding-ora122-node1]: FAILED! => {"changed": false, "msg": "file is not readable: /opt/oracle.ahf/data/mfielding-ora122-node1/orachk/orachk_mfielding-ora122-node1_040325_17063.zip"}

$ sudo ls -l /opt/oracle.ahf/data/mfielding-ora122-node1/orachk/orachk_mfielding-ora122-node1_040325_17063.zip
-r--r----- 1 root root 3902854 Apr 3 17:11 /opt/oracle.ahf/data/mfielding-ora122-node1/orachk/orachk_mfielding-ora122-node1_040325_17063.zip

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought it might be necessary to use root to get this file, but it was not needed in my environment for some reason.
I will change that, and set ownership appropriately for the orachk zip file.

…r the local reports. added specific name to local tmp file and code to remove when no longer needed
Copy link
Member

@mfielding mfielding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; with the latest changes everything's working for me.

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkstill, mfielding

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mfielding mfielding merged commit 10aab83 into google:master Apr 4, 2025
2 checks passed
@jkstill jkstill deleted the rel/oratk-57-orachk branch April 4, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants