Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitHub Actions health check workflow #23036

Merged
merged 7 commits into from
Mar 6, 2025
Merged

Conversation

KevinMind
Copy link
Contributor

@KevinMind KevinMind commented Jan 29, 2025

Fixes: mozilla/addons#15429

Description

Introduces a new GitHub Actions workflow to periodically check service health and version information for specified endpoints. The workflow includes:

  • Scheduled runs every 5 minutes
  • Checks version information
  • Monitors service status
  • Raises an error if any services are down

Context

This PR adds the workflow for checking the health of our services/dependencies. A future PR will integrate a slack notification when things are found

Testing

Running the workflow

You can run the workflow manually via bash (example here)

gh --ref kevinmind/addons/15308 workflow run

Select "Health Check" and then visit https://github.com/mozilla/addons-server/actions/workflows/health_check.yml to see your run

Running locally

To inspect the logic of the health check script you can run it manually passing in an environment to ping.

Note

You should do this inside a running web container if you don't want to have to install dependencies manually on your host

./scripts/health_check.py --env prod

Warning

Passing no environment should error

Checking for failure

You can manually run the bash script the healthcheck runs and modify the monitors json to test for error state.

  • first run the python script with the below script:
./scripts/health_check.py --verbose
  • in the monitors, set at least one service "state" to false and add a status message.
  • manually run the logic in the bash script and expect errors to be printed out

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@KevinMind KevinMind changed the base branch from devcontainer to master January 29, 2025 08:58
@KevinMind KevinMind force-pushed the kevinmind/addons/15308 branch 11 times, most recently from 359ed4d to 14df65d Compare January 29, 2025 11:48
@KevinMind KevinMind requested a review from diox January 29, 2025 11:49
@KevinMind KevinMind force-pushed the kevinmind/addons/15308 branch from 14df65d to 311332d Compare January 29, 2025 13:15
@KevinMind KevinMind force-pushed the kevinmind/addons/15308 branch 3 times, most recently from 3f76254 to 6df4943 Compare March 4, 2025 11:47
@KevinMind KevinMind requested a review from diox March 4, 2025 15:41
@KevinMind KevinMind force-pushed the kevinmind/addons/15308 branch 3 times, most recently from 678c7c5 to 916ac21 Compare March 4, 2025 16:20
@KevinMind KevinMind marked this pull request as draft March 4, 2025 16:39
@KevinMind KevinMind force-pushed the kevinmind/addons/15308 branch 2 times, most recently from c6b4e04 to 16283cf Compare March 4, 2025 16:58
- we need a migrated database to pass healtchecks on the database, but we don't necessarily need a fully seeded DB
@KevinMind KevinMind force-pushed the kevinmind/addons/15308 branch from 16283cf to ef19c7e Compare March 4, 2025 17:11
Introduces a new GitHub Actions workflow to periodically check service health and version information for specified endpoints. The workflow includes:
- Scheduled runs every 5 minutes
- Checks version information
- Monitors service status
- Raises an error if any services are down
@KevinMind KevinMind marked this pull request as ready for review March 4, 2025 17:17
@KevinMind KevinMind force-pushed the kevinmind/addons/15308 branch from ef19c7e to e67ac81 Compare March 6, 2025 09:25
@KevinMind
Copy link
Contributor Author

@diox should be ready to verify.

Copy link
Member

@diox diox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the run I just did is marked as successful if you dig through the logs it raised a 404 and logged an error:

Requesting https://addons-dev.allizom.org/services/monitors.json for dev
{'error': JSONDecodeError('Expecting value: line 1 column 1 (char 0)'), 'data': None, 'response': <Response [404]>}

@KevinMind
Copy link
Contributor Author

While the run I just did is marked as successful if you dig through the logs it raised a 404 and logged an error:

Requesting https://addons-dev.allizom.org/services/monitors.json for dev
{'error': JSONDecodeError('Expecting value: line 1 column 1 (char 0)'), 'data': None, 'response': <Response [404]>}

typo in the URL its services/monitor.json

@KevinMind
Copy link
Contributor Author

@diox the issue was in the retry mechanism in the healtcheck script. We were not raising on the final attempt. That is fixed now and I've verified it exits 1 with the incorrect URL. I've also fixed the URL and the related tests.

@KevinMind KevinMind requested a review from diox March 6, 2025 12:23
@KevinMind KevinMind merged commit ebc7146 into master Mar 6, 2025
45 checks passed
@KevinMind KevinMind deleted the kevinmind/addons/15308 branch March 6, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: add healthcheck script to verify if any monitors are failing
2 participants