Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup Monit to flip switches in monitor.webplatform.org #160

Open
renoirb opened this issue Apr 29, 2015 · 0 comments
Open

Setup Monit to flip switches in monitor.webplatform.org #160

renoirb opened this issue Apr 29, 2015 · 0 comments

Comments

@renoirb
Copy link
Member

renoirb commented Apr 29, 2015

We now have Monit to monitor crucial services on each nodes.

How about we make a services status page to have its switches flipped automatically through it.

Estimated work items

  1. Find how to catch service recovery in Monit when we could send a recovered call (i.e. service tests)
  2. Adjust cachet to have 1:1 mapping of components and monit checks
  3. Make mapping of monit status 1:1 with cachet
  4. Find way to make Monit send variables into update script
  5. Create cachet api update script that’ll be used by Monit
  6. Create API update only account

Proposal

Cachet’ documentation is not very complete but we could use Monit event handler (see how they’d do it with a 3rd party provider)

Configure Monit to make a trigger

# An example of Salt stack managed Monit template
# refer to salt-states/mysql/files/monit.conf.jinja

check process mysql
  matching "mysql"
  group database
  start = "/usr/sbin/service mysql start"
  stop  = "/usr/sbin/service mysql stop"
  if failed host {{ ip4_interfaces[0]|default('127.0.0.1') }} port 3306
    protocol MYSQL then restart
  if not exist for 3 cycles then restart
  if 3 restarts within 5 cycles then exec /path/to/monit_update_cachet_db.sh

Setup an update script

#!/bin/sh

# /path/to/monit_update_cachet_db.sh

# Make an update to the cachet API
# -u would contain pre-populated cachet update only user
# components/2 would be the component id
# we’d have to figure out how monit tells status and make sure the value at status=3 is the right one
#10.10.10.2:8000 is the internal upstream service we send our update requests

/usr/bin/curl -u user:pass -XPUT \
  -d status=3 \
  10.10.10.2:8000/api/components/2

Example on how to update a component status

Using curl we an update of the database component into partial outage would look like this;

API call

Its using incident status 2, which would mean "partial outage". See also post-parameters section.

curl -u user:pass -XPUT -d status=3 10.10.10.2:8000/api/components/2
{
    "data": {
        "created_at": 1427482793,
        "description": "MariaDB database cluster nodes",
        "id": 2,
        "incident_count": 0,
        "name": "db cluster",
        "status": "Partial Outage",
        "status_id": 3,
        "updated_at": 1430332325
    }
}

How the status is displayed

monitor-dashboard-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant