Checkout the Documentation here
A Kubernetes controller for Prometheus Anomaly Detection
K8S Anomaly Detection Operator is a controller to help manage Detector deployments for a Kubernetes cluster using CRDs (Custom Resource Defenitions). Detector deployments allow you to configure Prometheus endpoints + PromQL expressions and some extra configuration to monitor/alert on anomalies found for a given timeframe.
This operator lets you choose metrics and fine tune the configuration in order to detect anomalies over time inside of your system in a simple, scalable manner, preempting events such as regular, repeated high load. This allows for proactive rather than simply reactive maintenance of production environments and make intelligent ahead of time decisions.
Systems that have predictable trends in metrics, for example; if over a 24 hour period the load on a resource is generally higher between 3pm and 5pm - with enough data and use of correct configurations the detector could expose an anomaly and push a notification to the relevant team in order to look deeper into it, increasing responsiveness of the system to changes in metrics.
- Integrates simply with Prometheus metrics.
- Leverages Prophet framework by facebook https://github.com/facebook/prophet
- Allows customization of Kubernetes resource spec. Can work on managed solutions such as EKS or GKE.
- Light weight and scalable
- Simplified configuration and easy integration
Detectors
are designed to be as simple as possible, with some optional
configuration options.
Detectors
have their own custom resource:
apiVersion: monitoring.amitdebachar/v1alpha1
kind: Detector
metadata:
name: minimal-detector
spec:
image: "amitde7896/anomaly-operator:latest-detector"
prom_url: "http://prometheus.monitoring.svc.cluster.local"
interval_mins: "15"
queries:
- name: "sum_pods_running_anomaly"
query: 'sum(kube_pod_status_phase{phase=~"Running", pod=~"application-pod-.*"}) > 1'
train_window: "14d"
This Detector, in a 15 minutes intervals, will query prometheus for the kube_pod_status_phase
metrics in the past 2 weeks (14d), evaluate and try to find anomalies in the past 1 hour (configurable) based on the trained trend.
The operator for managing Detectors
can be installed using Helm:
git clone https://github.com/amitde69/anomaly-operator
helm install anomaly-operator helm/
Check out the getting started
guide and the
examples for ways to use Detectors
.
See the wiki for more information, such as guides and references.
See the examples/
directory for working code samples.