Status: Draft v0.1
Audience: ML beginners, slightly-advanced developers, data engineers
Scope: Local-first, self-hosted, open-source reference implementation for “good MLOps” with minimal friction.
ZebraOps is a local-first MLOps devkit and reference implementation that lets a user:
- clone a repo
- run
docker compose up - drop in
train.pyor atrain.ipynb - run training + evaluation + promotion + local serving
- get monitoring + drift checks + retrain triggers
- scale later to remote compute (Vast / SageMaker / Vertex) without rewriting the core project shape
ZebraOps is not aimed at hyperscale (FAANG). It targets small-to-mid teams that want correctness, reproducibility, and a paved path.
- Local-first: the default workflow runs fully on a laptop or a single workstation.
- Paved road: one “golden path” that works immediately; advanced users can extend via adapters.
- Contracts over frameworks: users keep their
train.pyand notebooks; ZebraOps wraps them. - Minimal new abstractions: only a few concepts are introduced (Model Spec, Dataset Manifest, Runner).
- Reproducible by default: every run is tied to code version + config + data manifest + environment.
- Composable open-source stack: integrate best-of-breed instead of reimplementing them.
- Deterministic cleanup and scaffolding: no dependency on LLMs for core operations (LLM-assisted customization is optional).
- Competing with full enterprise platforms (ClearML, Vertex AI, SageMaker end-to-end governance).
- Replacing experiment trackers (use MLflow) or orchestrators (use Prefect).
- Building a full feature store (Feast integration optional; ZebraOps can provide lightweight dataset/feature materialization patterns instead).
- Beginner ML engineer: has a
train.pyand wants “real MLOps” without platform engineering. - Data engineer: wants clean ingestion + caching + scheduled retrains + observability.
- Small team lead: wants shared runs, simple promotion gates, basic rollback, and dashboards.
Local Devstack (Docker Compose):
- Orchestration: Prefect (server + UI)
- Tracking + registry: MLflow (tracking server + UI, model registry)
- Storage:
- Postgres (Prefect + ZebraOps metadata if needed)
- MinIO (S3-compatible artifacts + datasets)
- Serving: FastAPI (local inference service)
- Monitoring: Prometheus + Grafana
- Drift/data quality: Evidently (reports + metrics export)
Optional integrations (profiles):
- Feature store: Feast (optional)
- Remote compute: Vast.ai (SSH/Docker runner)
- Cloud training: AWS SageMaker, Google Vertex AI
+----------------------+
| ZebraOps UI + CLI |
| (Control Plane) |
+----------+-----------+
|
v
+-------------------+ +-----------+ +------------------+
| Prefect (flows) |--> | Runner |--> | MLflow Tracking |
| ingest/train/eval | | (py/nb) | | + Model Registry |
+-------------------+ +-----------+ +------------------+
| | |
v v v
+---------+ +----------+ +----------+
| MinIO | | Artifacts| | Metrics |
| datasets| | models | | params |
+---------+ +----------+ +----------+
|
v
+-------------------+ +-------------------+
| Monitoring jobs |--> | Prometheus/Grafana|
| drift/quality/perf| +-------------------+
+-------------------+
|
v
+-------------------+
| FastAPI Serving |
| loads "prod" alias |
+-------------------+
zebraops/
models/
<model_name>/
model.yaml
train.py | train.ipynb
predict.py # optional
requirements.txt # optional (or extras in pyproject)
data_sources/
connectors/ # db/s3/files/http connectors
flows/ # Prefect flows: ingest/feature/labels
cached_data/
manifests/ # dataset manifests, schema, hashes
parquet/ # local materializations
libraries/
contracts/ # schema validators, manifests, events
runner/ # python + notebook runners
promotion/ # gates + stage transitions
serving/ # FastAPI app + loaders
monitoring/ # drift/quality/perf jobs
adapters/ # Vast/SageMaker/Vertex integrations
platform/
compose/
docker-compose.yaml
profiles/
local.yaml
shared.yaml
vast.yaml
sagemaker.yaml
vertex.yaml
ui/ # simple web UI (thin)
scripts/
init_project.py
prune_examples.py
doctor.py
reset_local.py
docs/
specification.md
skills.md
Declarative configuration for a model.
Minimum fields (v0.1):
name: string (unique)type:python|notebookentrypoint:- python:
train.py - notebook:
train.ipynb
- python:
datasets:train: dataset idvalid: dataset idtest: dataset id (optional)
resources:accelerator:cpu|gpugpu_count: int (optional)
tracking:experiment_name: string
metrics:- list of metric definitions:
namedirection:maximize|minimizethreshold: number (optional; promotion gate)
- list of metric definitions:
promotion:registry:mlflowalias_prod: defaultprodalias_staging: defaultstaging
Optional fields:
hyperparams:grid: dict of lists (for parallel runs)
env:python_versionpip_requirements(path)
notes: free text
Immutable description of a dataset materialization used for training/eval.
Required fields:
id: unique id (content-addressable recommended)created_atsource:- connector type + query/path + parameters
schema:- columns, dtypes, target label definition
splits:- train/valid/test definitions (hash-based or explicit)
stats(optional):- row count, missingness summary, basic distributions
artifacts:- object store locations (MinIO/S3 URIs)
hashes:- content hash of materialization (or partition hashes)
A single execution of training/evaluation. ZebraOps guarantees:
- run is traceable to:
- model spec version
- code revision
- dataset manifest(s)
- environment/container digest
- run logs to MLflow:
- params, metrics, artifacts, tags
Moving a model artifact to a named alias/stage (staging/prod) in MLflow registry, gated by thresholds.
ZebraOps supports existing scripts if they can be executed with a config.
Contract:
- ZebraOps invokes:
python models/<name>/train.py --config <path>
- Script must:
- read config (YAML/JSON)
- load dataset paths from config/env
- log to MLflow (directly or via ZebraOps helper)
- write model artifact(s) to an output directory provided in config/env
- exit non-zero on failure
Standard env variables set by ZebraOps:
ZEBRA_MODEL_NAMEZEBRA_RUN_IDZEBRA_OUTPUT_DIRZEBRA_DATASET_TRAIN_URIZEBRA_DATASET_VALID_URIMLFLOW_TRACKING_URIMLFLOW_EXPERIMENT_NAME
ZebraOps executes parameterized notebooks via Papermill (or equivalent).
Contract:
- ZebraOps injects parameters:
- dataset URIs, output dir, run id, model name, tracking URI
- Notebook must log to MLflow and write artifacts to output dir.
CLI is the primary interface; UI calls the same backend.
Minimum commands (v0.1):
mlops doctor— validate environment, services, contractsmlops up/mlops down— start/stop local devstack (compose wrappers)mlops list models|datasets|runsmlops ingest <dataset_id>— run ingest flow to materialize cached dataset + manifestmlops train <model> [--grid|--params] [--profile local|vast|sagemaker|vertex]mlops eval <model> --run <run_id>mlops promote <model> --run <run_id> --to staging|prodmlops rollback <model> --to <run_id>mlops serve <model> [--alias prod|staging]mlops monitor <model>— run drift/quality checks and export metrics
A thin local web UI for beginners:
- list models and their status (latest run, current prod alias)
- buttons: ingest/train/eval/promote/serve/monitor
- links out to:
- Prefect UI
- MLflow UI
- Grafana dashboards
- view “recommendations” (e.g., drift exceeded → retrain suggested)
- Default: SQLite at
.zebraops/state.db(gitignored). - Stores:
- discovered models + spec hashes
- last known endpoints (mlflow/prefect/minio)
- local “active serving” pointers
- last monitoring results + acknowledgements
- Must be portable to Postgres (shared deployment mode).
Prefect flows (minimum):
ingest_flow: materialize dataset + manifesttrain_flow: run runner, log MLflow, store artifactseval_flow: compute metrics + produce eval report artifactpromote_flow: apply gates + update MLflow alias/stagemonitor_flow: drift/quality checks, export metrics, recommend retrainretrain_flow: conditional wrapper (monitor → train → eval → promote)
Execution requirements:
- local concurrency supports parallel hyperparam experiments
- all flows parametrized by:
- model name
- dataset manifest ids
- profile (local/remote/cloud)
- drift against training baseline using Evidently
- export drift metrics to Prometheus
- render an HTML report artifact stored in MinIO and linked in UI
- retrain suggestion if thresholds exceeded
Define a minimal event schema for production services to emit:
Event types:
prediction_event(sampled)data_quality_eventdrift_eventlabel_event(when ground truth arrives)performance_event(computed metrics)incident_event(latency/error/cost anomalies)
Collector mode (later):
- simple ingestion API
- storage in Postgres/ClickHouse (configurable)
- jobs that produce:
- segment regressions
- retrain recommendations
- alerts
- FastAPI service that loads the model referenced by MLflow registry alias (default:
prod). - Endpoints:
POST /predict(JSON)GET /healthGET /model(current model version/alias)
- Logging:
- request counts, latency to Prometheus
- optional sampled payload summaries for drift monitoring (privacy-aware)
- Everything via Docker Compose.
- Storage in MinIO.
- Runs executed locally.
- Shared MLflow/Prefect/MinIO endpoints.
- Developers run CLI/UI locally pointing to shared services.
- Submit training job to a remote machine via SSH/Docker.
- Same container image; logs to shared MLflow; artifacts to MinIO/S3.
- Adapter submits a training job spec using provider SDK.
- Requirements:
- container image
- dataset URIs in S3/GCS
- logs and artifacts routed to MLflow-compatible tracking (either via network or post-run sync)
- ZebraOps keeps contract stable:
mlops train <model> --profile sagemaker.
GitHub Actions (minimum):
- lint + unit tests
- contract tests:
- validate
model.yamlschema - validate dataset manifest schema
- validate
- integration test:
- start compose stack
- run one example end-to-end on small data
- smoke test serving endpoint
Optional:
- “continuous training” workflow on schedule or when data changes
- promotion gates enforced in CI (same code as local)
- Default secrets via
.env(local), never committed. - Support secret backends later:
- environment variables
- SOPS-encrypted env files
- cloud secret managers (profile-specific)
- No secrets stored in SQLite/Postgres; only references/keys.
docker compose upresults in:- Prefect UI reachable
- MLflow UI reachable
- basic Grafana dashboards reachable
- ZebraOps UI reachable
mlops doctorprints actionable fixes.mlops initcan create a new model skeleton from templates.- Example(s) run end-to-end on CPU in <10 minutes on a normal laptop (small datasets).
Minimum:
tabular-churn— classic supervised ML with clear metricsllm-rag— retrieval + eval dataset + offline scoring Optional:timeseries-forecaststreaming-fraud(simulated batch micro-batches)
v0.1
- local devstack
- python + notebook runner
- dataset manifests
- MLflow tracking + basic promotion
- FastAPI serving
- drift report + Prometheus export
- CLI + thin UI
v0.2
- shared profile
- Vast adapter
- richer gates (latency/cost budget checks)
- better hyperparam sweep UX
v0.3
- basic production collector mode
- label join + performance monitoring
- retrain triggers via Prefect deployments
- Keep the default path simple; advanced features must be optional profiles.
- Avoid introducing new infrastructure unless it replaces 3+ existing moving parts.
- Maintain backward compatibility for
model.yamlwith explicit versioning. - Every new component must include:
- a runnable example
- docs updates
- integration test coverage where feasible