New use case: MLOps lifecycle automation — model-registry promotion, drift remediation, training-pipeline failure triage, and scheduled fairness audits via agent-driven workflows

## Summary

Kelos ships strong primitives for engineering-team automation (GitHub/Linear/Jira sources, generic webhooks, Prometheus alerts, K8s events, cron) but no example, template, or proposal addresses **MLOps / ML model lifecycle**. This is a notable gap given Kelos is Kubernetes-native and the K8s ecosystem is the de facto control plane for ML — MLflow, KServe, Kubeflow Pipelines, Argo Workflows, Flyte, Seldon, BentoML, Ray, Feast all run on K8s and emit events Kelos can already consume via `webhook`, `cron`, and `prometheusAlerts` (#775, merged) sources.

ML platforms have an unusually high autonomous-agent payoff because:

- The status quo is "drift detector pages a human → manual notebook → manual eval → manual PR → manual rollout." Latency is days; off-hours toil is real.
- Each step is well-defined enough for an agent to draft and a human to approve (the same shape as Kelos's existing security-remediation #920 and IaC #945 use cases).
- The triggers are already declarative events (model-registry transitions, drift webhooks, training-pipeline failures, scheduled fairness audits) that map cleanly to existing TaskSpawner sources.

This issue proposes **four concrete TaskSpawner patterns** for the MLOps lifecycle, identifies the ecosystem they target, and notes which existing Kelos primitives already cover them vs. what minor gaps could be closed in follow-ups.

## Target Audience

- **ML platform engineers** running MLflow / KServe / Kubeflow / Seldon on K8s, owning the serving stack and registry plumbing.
- **ML engineers and data scientists** owning model code, training pipelines, evaluation harnesses, and notebooks.
- **SRE-for-ML / on-call** rotations that today catch drift alerts and fairness regressions manually.

## Proposed TaskSpawner Patterns

All four patterns work with Kelos's current API; no CRD changes are required to land the examples. The first three use the existing `webhook` (GenericWebhook) source merged via #687; the fourth uses `cron`.

### Pattern 1 — MLflow Model Registry promotion → update KServe InferenceService manifest

**Trigger:** MLflow's model-registry webhook fires on `MODEL_VERSION_TRANSITIONED_STAGE` (e.g., `Staging` → `Production`).

**Agent task:** Open a PR updating the corresponding `InferenceService` manifest in the GitOps repo with the new model URI, runtime version, and resource requests pulled from the registry's tags. Generate a model card delta from MLflow run metadata. Re-run smoke tests against the staging endpoint.

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: mlflow-promotion-responder
spec:
  when:
    webhook:
      source: mlflow
      fieldMapping:
        id: "$.id"
        modelName: "$.model_name"
        version: "$.version"
        toStage: "$.to_stage"
        runId: "$.run_id"
      filters:
        - field: "$.to_stage"
          value: "Production"
  taskTemplate:
    type: claude-code
    credentials:
      type: api-key
      secretRef: { name: claude-credentials }
    workspaceRef: { name: gitops-inference-manifests }
    branch: "model-promotion/{{.modelName}}-v{{.version}}"
    promptTemplate: |
      Model `{{.modelName}}` version `{{.version}}` was promoted to **Production** in MLflow
      (run: {{.runId}}).

      Please:
      1. Locate the corresponding KServe InferenceService manifest in this repo
         (search by name `{{.modelName}}` under `inference/`).
      2. Update `spec.predictor.model.storageUri` to the new artifact URI from MLflow.
      3. Update resource requests from the new run's tags (`gpu_type`, `replica_min`, `replica_max`).
      4. Append a model-card entry under `model-cards/{{.modelName}}.md` summarizing the
         run's eval metrics (accuracy, calibration, fairness deltas) compared to the
         currently-deployed version.
      5. Open a PR with a checklist for the on-call to verify the staging smoke test
         and approve rollout.
```

### Pattern 2 — Drift detector webhook (Evidently / NannyML / Arize / Fiddler / WhyLabs) → open retraining PR

**Trigger:** Generic webhook from any drift-monitoring platform. All five major vendors support outbound webhooks and ship a stable JSON shape including `model`, `feature`, `severity`, `metric`, and `timestamp`.

**Agent task:** Investigate the flagged feature, propose a retraining plan in a draft PR (config bumps to the training pipeline definition, dataset window adjustment, retraining trigger), or open an issue if root cause looks like an upstream data-pipeline bug.

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: drift-remediation
spec:
  when:
    webhook:
      source: evidently
      fieldMapping:
        id: "$.test_id"
        model: "$.model_name"
        feature: "$.feature"
        severity: "$.severity"
        metric: "$.metric_name"
        metricValue: "$.metric_value"
      filters:
        - field: "$.severity"
          pattern: "^(high|critical)$"
  taskTemplate:
    type: claude-code
    credentials:
      type: api-key
      secretRef: { name: claude-credentials }
    workspaceRef: { name: ml-training-pipelines }
    branch: "drift/{{.model}}-{{.feature}}-{{.id}}"
    promptTemplate: |
      Evidently flagged drift on model **{{.model}}**, feature **{{.feature}}**:
      `{{.metric}} = {{.metricValue}}` (severity: {{.severity}}).

      Investigate by:
      1. Locating this model's training pipeline (likely under `pipelines/{{.model}}/`).
      2. Checking whether the upstream feature pipeline has had recent schema changes
         (`git log` on `features/{{.feature}}*`).
      3. Producing one of:
         - **Draft retraining PR**: bump dataset window, adjust feature transforms, bump
           pipeline parameters file. Add a checklist for evaluation thresholds.
         - **Issue (label `data-quality`)**: if the root cause looks like an upstream
           pipeline bug, not a model-decay issue.
      Quote the relevant metric thresholds from the pipeline's config so the human
      reviewer can audit your decision.
    ttlSecondsAfterFinished: 86400
```

### Pattern 3 — Training-pipeline failure (Argo Workflows / Kubeflow Pipelines / Flyte) → root-cause analysis PR

**Trigger:** Kubernetes Events on `WorkflowFailed` (this would benefit from #872 *kubernetesEvents source*; until then, Argo's outbound webhook hook or Prometheus alert via #775 works).

**Agent task:** Pull the failed workflow's logs and pod statuses, classify the failure (OOM / image pull / data missing / actual code bug / GPU resource), and either open an issue tagging the pipeline owner with a remediation suggestion, or open a PR for clear-cut fixes (resource bumps, image digest pins, retry policy).

This pattern is identical in shape to #946's CI/CD failure auto-remediation, just specialized to ML workflow CRDs.

### Pattern 4 — Scheduled fairness / bias audit and model-card refresh

**Trigger:** `cron` on a weekly or monthly cadence.

**Agent task:** For each registered production model, run a templated fairness sweep against the eval dataset, regenerate the model card, and open a PR if metrics have shifted by more than a configured threshold. This produces the *evidence trail* demanded by EU AI Act / NIST AI RMF audits without launching humans every cycle.

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: weekly-model-card-refresh
spec:
  when:
    cron:
      schedule: "0 6 * * 1"   # Mondays 06:00 UTC
  taskTemplate:
    type: claude-code
    credentials:
      type: api-key
      secretRef: { name: claude-credentials }
    workspaceRef: { name: model-cards-repo }
    branch: "model-card-refresh-{{.Time.Format \"2006-01-02\"}}"
    promptTemplate: |
      Weekly model-card and fairness-audit refresh.

      For each production-stage model registered in MLflow (read via the
      `MLFLOW_TRACKING_URI` env var):
      1. Pull the latest eval-set metrics including disaggregated fairness slices.
      2. Diff against the metrics in `model-cards/<model>.md`.
      3. Update model cards whose drift exceeds the thresholds defined in
         `model-cards/POLICY.md`.
      4. Open a single PR titled "Weekly model-card refresh — <date>" listing
         every changed card, with a digest of metric deltas in the PR body.
```

## Why this is differentiated from existing issues

| Existing issue                           | Why MLOps is distinct                                                                              |
| ---------------------------------------- | -------------------------------------------------------------------------------------------------- |
| #920 security vuln auto-remediation      | ML triggers are *registry / drift / training-failure* events, not GitHub security advisories       |
| #946 CI/CD failure auto-remediation      | ML pipelines fail with ML-specific signatures (OOM on data, eval-set regressions, GPU starvation)  |
| #967 perf regression                     | Inference latency is one slice; drift / fairness / accuracy regression are different metrics       |
| #981 supply-chain compliance             | Model lineage / model-card / data provenance are governed by AI-specific frameworks (NIST AI RMF)  |
| #992 data-privacy compliance             | Overlaps lightly (PII in training data) but the ML-eval and registry pipelines are separate        |

## Existing Kelos primitives this builds on

- `webhook` source (GenericWebhook, #687) — already covers MLflow, Evidently, NannyML, Arize, Fiddler, WhyLabs, BentoML
- `prometheusAlerts` source (#775) — covers inference-side regressions
- `cron` source — covers scheduled audits
- `linearWebhook` / `githubWebhook` — covers human-in-the-loop ChatOps for ML reviewers
- File-pattern filtering (#778) — lets a single TaskSpawner respond only to model-card or pipeline-spec changes

## Minor gaps worth tracking (for follow-up issues, not this one)

- A `kubernetesEvents` source (#872, open) would let pattern 3 listen directly on `WorkflowFailed`/`PipelineRunFailed` Custom Resource events without a webhook bridge.
- `contextSources` (#881, open) would let agents pull MLflow run metadata or eval reports as input context, reducing the agent's reliance on tool-calling out of the workspace.
- A `CloudEvents` source (#914, open) would give a single canonical mapping for Vertex AI / SageMaker / Kubeflow Notifications, all of which speak CloudEvents natively.

These are already proposed and tracked. This issue does **not** ask for new CRDs — only the four reference TaskSpawner patterns, examples folders, and an MLOps section in the docs.

## Proposed deliverables

1. New examples directory `examples/mlops-mlflow-promotion/` with a runnable TaskSpawner + Workspace + README walking through pattern 1.
2. New examples directory `examples/mlops-drift-remediation/` for pattern 2.
3. New examples directory `examples/mlops-fairness-audit-cron/` for pattern 4.
4. A new docs page `docs/use-cases/mlops.md` linking the patterns and explaining where each ecosystem partner (MLflow, KServe, Kubeflow, Argo, Evidently, Arize) fits.
5. A short addition to the main `README.md` "Use Cases" section calling MLOps out as a first-class lifecycle.

## Acceptance criteria

- Each example directory follows the structure of `examples/10-taskspawner-github-webhook/` (yaml + README).
- Examples reference existing CRD fields only; no schema changes.
- Docs page links to the upstream MLflow webhook spec, Evidently webhook spec, KServe InferenceService spec, and the relevant CNCF / Linux Foundation project pages.
- Examples are listed in `examples/README.md` index.

/kind feature


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New use case: MLOps lifecycle automation — model-registry promotion, drift remediation, training-pipeline failure triage, and scheduled fairness audits via agent-driven workflows #1037

Summary

Target Audience

Proposed TaskSpawner Patterns

Pattern 1 — MLflow Model Registry promotion → update KServe InferenceService manifest

Pattern 2 — Drift detector webhook (Evidently / NannyML / Arize / Fiddler / WhyLabs) → open retraining PR

Pattern 3 — Training-pipeline failure (Argo Workflows / Kubeflow Pipelines / Flyte) → root-cause analysis PR

Pattern 4 — Scheduled fairness / bias audit and model-card refresh

Why this is differentiated from existing issues

Existing Kelos primitives this builds on

Minor gaps worth tracking (for follow-up issues, not this one)

Proposed deliverables

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Existing issue	Why MLOps is distinct
#920 security vuln auto-remediation	ML triggers are registry / drift / training-failure events, not GitHub security advisories
#946 CI/CD failure auto-remediation	ML pipelines fail with ML-specific signatures (OOM on data, eval-set regressions, GPU starvation)
#967 perf regression	Inference latency is one slice; drift / fairness / accuracy regression are different metrics
#981 supply-chain compliance	Model lineage / model-card / data provenance are governed by AI-specific frameworks (NIST AI RMF)
#992 data-privacy compliance	Overlaps lightly (PII in training data) but the ML-eval and registry pipelines are separate

New use case: MLOps lifecycle automation — model-registry promotion, drift remediation, training-pipeline failure triage, and scheduled fairness audits via agent-driven workflows #1037

Description

Summary

Target Audience

Proposed TaskSpawner Patterns

Pattern 1 — MLflow Model Registry promotion → update KServe InferenceService manifest

Pattern 2 — Drift detector webhook (Evidently / NannyML / Arize / Fiddler / WhyLabs) → open retraining PR

Pattern 3 — Training-pipeline failure (Argo Workflows / Kubeflow Pipelines / Flyte) → root-cause analysis PR

Pattern 4 — Scheduled fairness / bias audit and model-card refresh

Why this is differentiated from existing issues

Existing Kelos primitives this builds on

Minor gaps worth tracking (for follow-up issues, not this one)

Proposed deliverables

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions