Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# More info: https://docs.docker.com/engine/reference/builder/#dockerignore-file
# Ignore build and test binaries.
bin/
docs/
helm/
dashboards/
assets/
adr/
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,9 @@ Dockerfile.cross
*.swp
*.swo
*~

# Dashboard vendor dependencies (generated from jsonnetfile.lock.json)
dashboards/vendor/

# Repomix
repomix-output.xml
7 changes: 7 additions & 0 deletions .repomixignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Add patterns to ignore here, one per line
# Example:
# *.log
# tmp/
config/crd/**/*
assets/
repomix-output.xml
105 changes: 105 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

OSKO is a Kubernetes operator for managing SLIs (Service Level Indicators), SLOs (Service Level Objectives), alerting rules, and alert routing via Kubernetes CRDs according to the OpenSLO specification. It aims to provide simple management of observability concepts in Kubernetes environments, with particular focus on Prometheus/Mimir-based monitoring stacks.

## Common Development Commands

### Building and Development
- `make build` - Build the manager binary
- `make run` - Run the controller locally (requires K8s cluster context)
- `make run-pretty-debug` - Run with debug output and pretty formatting using zap-pretty
- `make install run` - Install CRDs and run controller in one step

### Code Generation and Manifests
- `make manifests` - Generate CRDs, RBAC, and webhook configurations
- `make generate` - Generate DeepCopy methods for API types
- `make fmt` - Format Go code
- `make vet` - Run go vet

### Testing
- `make test` - Run all tests (includes manifests, generate, fmt, vet, and test execution)
- `KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test ./...` - Run tests directly

### Deployment
- `make install` - Install CRDs into current K8s context
- `make uninstall` - Remove CRDs from cluster
- `make deploy` - Deploy controller to cluster
- `make undeploy` - Remove controller from cluster
- `make docker-build` - Build Docker image
- `make docker-push` - Push Docker image

### Development Environment
- `make deploydev` - Deploy development stack (Grafana, Mimir) with port-forwards
- `make undeploydev` - Clean up development environment

## Architecture

### API Groups and Versions
- **openslo.com/v1**: Core OpenSLO specification resources (SLO, SLI, Datasource, AlertPolicy, etc.)
- **osko.dev/v1alpha1**: Operator-specific resources (MimirRule, AlertManagerConfig, etc.)

### Controller Structure
Controllers are organized by API group:
- `internal/controller/openslo/`: Controllers for OpenSLO resources
- `internal/controller/osko/`: Controllers for operator-specific resources
- `internal/controller/monitoring.coreos.com/`: Controllers for Prometheus Operator resources

### Key Controllers
- **SLO Controller** (`slo_controller.go`): Main controller implementing ownership model, creates PrometheusRules, MimirRules, inline SLIs, and AlertManagerConfigs
- **SLI Controller**: Manages Service Level Indicators
- **Datasource Controller**: Manages data source connections (Mimir, Cortex)
- **MimirRule Controller**: Manages Mimir-specific rule configurations
- **AlertManagerConfig Controller**: Manages AlertManager routing configurations

### Ownership Model
OSKO implements a comprehensive ownership model:
- **Owned Resources** (cascading deletion): inline SLIs, PrometheusRules, MimirRules, AlertManagerConfigs
- **Referenced Resources** (preserved): shared Datasources, referenced SLIs, AlertPolicies
- Uses Kubernetes finalizers for proper cleanup of external system resources

### Resource Dependencies
```
SLO -> SLI (inline or referenced) -> Datasource
SLO -> PrometheusRule (owned)
SLO -> MimirRule (owned)
SLO -> AlertManagerConfig (owned, when magicAlerting enabled)
```

## Key Directories

- `api/`: API type definitions for both openslo.com and osko.dev groups
- `internal/controller/`: Controller implementations
- `internal/helpers/`: Helper utilities for Prometheus and Mimir integration
- `internal/config/`: Configuration management
- `config/`: Kubernetes manifests (CRDs, RBAC, deployment configs)
- `helm/`: Helm charts for deployment
- `examples/`: Example resource manifests
- `docs/`: Additional documentation

## Important Implementation Notes

### Testing Requirements
- Always run `make test` before submitting changes
- Tests require KUBEBUILDER_ASSETS to be set up (handled automatically by make test)
- Integration tests exist for the ownership model in `slo_controller_test.go`

### Development Dependencies
- Requires Prometheus Operator CRDs: `helm install prometheus-operator-crds prometheus-community/prometheus-operator-crds`
- Uses controller-runtime framework
- Built with Kubebuilder

### Magic Alerting
SLOs can enable automatic AlertManager configuration via the `osko.dev/magicAlerting: "true"` annotation, which creates owned AlertManagerConfig resources for alert routing.

### Inline vs Referenced SLIs
- **Inline SLIs** (defined in `spec.indicator`): Created and owned by the SLO
- **Referenced SLIs** (defined via `spec.indicatorRef`): External resources that are referenced but not owned

### External Systems Integration
- **Mimir/Cortex**: Via MimirRule controller and connection details
- **Prometheus**: Via PrometheusRule resources compatible with prometheus-operator
- **AlertManager**: Via AlertManagerConfig for routing configuration
90 changes: 90 additions & 0 deletions devel/dashboards/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# OSKO Grafana Dashboards

This directory contains Grafonnet templates for generating Grafana dashboards to monitor SLO performance and metrics.

## SLO Performance Dashboard

The `slo-performance-dashboard.jsonnet` template creates a dashboard matching the OSKO SLO monitoring requirements with the following panels:

### Panels Included:
- **STATUS**: Current SLI value as percentage with color-coded thresholds
- **ERROR BUDGET LEFT**: Remaining error budget as horizontal bar gauge with time remaining
- **Error budget burndown**: Time series chart showing error budget consumption over time
- **Burn rate**: Time series chart showing current burn rate spikes

### Template Variables:
- `$datasource`: Prometheus datasource selection
- `$slo_name`: SLO name selector
- `$service`: Service name selector

### Expected Metrics:
The dashboard expects the following Prometheus metrics to be available from OSKO:
- `osko_sli_measurement{slo_name, service, window}`: Current SLI measurement (0-1, displayed as percentage)
- `osko_error_budget_value{slo_name, service, window}`: Error budget consumed (0-1)
- `osko_slo_target{slo_name, service}`: SLO target threshold
- `osko_error_budget_burn_rate{slo_name, service, window}`: Rate of error budget consumption

### Calculated Metrics:
The dashboard calculates these derived metrics:
- **Error Budget Remaining**: `((sli_measurement - slo_target) / (1 - slo_target)) * 100` (as percentage)
- **Burn Rate**: Uses `osko_error_budget_burn_rate` metric directly
- **Time Left** (if needed): `error_budget_remaining / burn_rate` (in time units)

### Important Note:
`osko_error_budget_value` represents the current error rate (1 - sli_measurement), not error budget consumed. The dashboard correctly calculates error budget remaining relative to the SLO target.

## Usage

### Prerequisites
1. Install Grafonnet library:
```bash
jb install # Installs dependencies from jsonnetfile.lock.json to vendor/
```

2. Ensure you have `jsonnet` command available

**Note**: The `vendor/` directory is generated and not committed to git. Use `jb install` to regenerate it from the lock file.

### Generate Dashboard JSON
```bash
jsonnet slo-performance-dashboard.jsonnet > slo-performance-dashboard.json
```

### Import to Grafana
1. Open Grafana UI
2. Go to "+" → "Import"
3. Upload the generated JSON file or paste its contents
4. Configure the Prometheus datasource
5. Save the dashboard

### Example jsonnetfile.json
```json
{
"version": 1,
"dependencies": [
{
"source": {
"git": {
"remote": "https://github.com/grafana/grafonnet-lib.git",
"subdir": "grafonnet"
}
},
"version": "master"
}
],
"legacyImports": true
}
```

## Customization

The template can be customized by modifying:
- Metric names and labels to match your OSKO deployment
- Thresholds and colors for status indicators
- Time ranges and refresh intervals
- Panel layouts and sizing
- Additional template variables for filtering

## Integration with OSKO

This dashboard is designed to work with the OSKO operator's metric exposition. Ensure your OSKO deployment is configured to expose the required metrics through your Prometheus/Mimir setup.
15 changes: 15 additions & 0 deletions devel/dashboards/jsonnetfile.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"version": 1,
"dependencies": [
{
"source": {
"git": {
"remote": "https://github.com/grafana/grafonnet-lib.git",
"subdir": "grafonnet"
}
},
"version": "master"
}
],
"legacyImports": true
}
16 changes: 16 additions & 0 deletions devel/dashboards/jsonnetfile.lock.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"version": 1,
"dependencies": [
{
"source": {
"git": {
"remote": "https://github.com/grafana/grafonnet-lib.git",
"subdir": "grafonnet"
}
},
"version": "a1d61cce1da59c71409b99b5c7568511fec661ea",
"sum": "342u++/7rViR/zj2jeJOjshzglkZ1SY+hFNuyCBFMdc="
}
],
"legacyImports": false
}
Loading
Loading