Skip to content

Latest commit

 

History

History
35 lines (19 loc) · 1.76 KB

README.md

File metadata and controls

35 lines (19 loc) · 1.76 KB

Monitoring And Alerting Essentials

Purpose

The following is a guide to understand improving system reliability through logging, monitoring, and alerting, and a process of continuous improvement.

If we want to consider ourselves 'engineers' our systems need to work reliably. When they do not work, we need to know when they are failing and why they are failing.

Content

There are several parts to this documentation.

Scope

This documentation is primarily limited to application logging. OS, web service, and other types of logging will not be covered.

References

The majority of the ideas in this repo have been taken from the places where I learned them: