How to build, run, test, and deploy the containerized prediction service.
Host machine Docker container
+----------------------------------+ +------------------------------+
| | | /app |
| mlruns/ (MLflow tracking) ----|->| /app/mlruns (read-only) |
| src/config/ (YAML configs) ----|->| /app/src/config (read-only) |
| | | |
| .env (environment vars) ----|->| Environment variables |
| | | |
| localhost:8000 <----------------|--| uvicorn :8000 |
+----------------------------------+ +------------------------------+
The model is not baked into the Docker image. It is loaded at runtime
from the volume-mounted mlruns/ directory via the MLflow Model Registry.
This means the image stays the same across model versions — only a
container restart is needed after a new model is promoted.
- Docker (with Compose v2) installed and running.
- Python 3.12 for running the pipeline locally.
- At least one model promoted to Production via the pipeline
(
run-pipeline --config src/config/pipeline_tabular_classification.yamland approve when prompted).
-
Run the pipeline and approve a model so a Production version exists:
run-pipeline --config src/config/pipeline_tabular_classification.yaml
-
Copy the environment file and adjust if needed:
cp .env.example .env
-
Build and start the container:
docker compose -f docker/docker-compose.yml up --build
-
Verify the service is running:
curl http://localhost:8000/health
Expected: HTTP 200 with JSON body.
-
Stop the service:
docker compose -f docker/docker-compose.yml down
| Variable | Default | Description |
|---|---|---|
MLFLOW_TRACKING_URI |
/app/mlruns |
Path to the MLflow tracking store inside container. |
API_PORT |
8000 |
Host port mapped to the container's port 8000. |
PIPELINE_CONFIG_PATH |
/app/src/config/pipeline_tabular_classification.yaml |
Pipeline config path inside the container. |
LOG_LEVEL |
INFO |
Logging level for the prediction service. |
MODEL_STAGE |
Production |
MLflow stage to load. Only Production in practice. |
API_ADMIN_TOKEN |
(unset) | Shared secret for POST /admin/reload. When set, the matching X-Admin-Token header is required; when unset, the endpoint is open. |
| Section | Key | Default | Description |
|---|---|---|---|
server |
host |
0.0.0.0 |
Bind address for uvicorn. |
server |
port |
8000 |
Port inside the container. |
server |
log_level |
info |
Uvicorn log level. |
model |
allowed_stage |
Production |
Only this MLflow stage will be loaded. |
model |
require_production_model |
true |
Fail startup if no Production model exists. |
model |
startup_timeout_seconds |
120 |
Max seconds for model loading before timeout. |
healthcheck |
include_model_info |
true |
Include model metadata in health endpoint response. |
reload |
enabled |
false |
Opt-in: POST the reload endpoint after a successful pipeline run. |
reload |
url |
http://localhost:8000/admin/reload |
Reload endpoint of the running API. |
reload |
timeout_seconds |
5 |
Max seconds to wait for the reload HTTP call. |
The compose file at docker/docker-compose.yml exposes:
- Port:
${API_PORT:-8000}:8000(configurable via.env). - Volumes:
mlruns/andsrc/config/mounted read-only. - Restart policy:
unless-stopped(auto-restarts on failure).
The src/deployment/startup_checks.py module enforces that only the
Production-stage model is served:
- At startup, it queries the MLflow Model Registry for versions with
stage="Production". - If no Production version exists, the service refuses to start with a clear error message.
- The loaded model's lineage metadata (version, run_id, algorithm,
dataset, promotion outcome) is preserved in a
ProductionModelInfoobject for use in API responses.
If you run rollback-model to promote a different version to Production,
the running container still serves the previous model until restarted.
This is by design — it prevents accidental mid-flight model switches.
To serve the rolled-back model, either restart the container:
docker compose -f docker/docker-compose.yml restartor trigger a zero-downtime reload (see Hot Model Reload).
- Models in
None,Staging, orArchivedstages are never loaded. - Candidate models that failed promotion rules or were rejected are not accessible through the API.
The running API loads Production models once at startup. To pick up a freshly promoted model without a process restart, the service exposes an admin reload endpoint.
Re-queries the MLflow registry, rebuilds the loaded-model set, and atomically
swaps it in — in-flight /predict requests never observe a half-loaded
state. If the reload fails (registry unreachable, no models), the previously
loaded models stay live and the endpoint returns an error instead of crashing.
curl -X POST http://localhost:8000/admin/reloadResponse (200 OK):
{
"status": "reloaded",
"models_loaded": 1,
"models": [{"name": "iris-classifier", "version": "2"}],
"reloaded_at": "2026-05-15T12:00:00+00:00"
}| Status | Meaning |
|---|---|
200 |
Reload succeeded; body summarizes the newly loaded models. |
403 |
API_ADMIN_TOKEN is set and the X-Admin-Token header is missing or wrong. |
503 |
Reload failed (e.g. registry unreachable); old models stay live. |
Authentication. When the API_ADMIN_TOKEN environment variable is set,
callers must send a matching X-Admin-Token header:
curl -X POST http://localhost:8000/admin/reload -H "X-Admin-Token: $API_ADMIN_TOKEN"When API_ADMIN_TOKEN is unset the endpoint is open — set it (and bind the
server to a trusted network) for any non-local deployment.
The pipeline can ping the reload endpoint automatically after a successful
run. It is disabled by default to keep run-pipeline side-effect-free and
CI-safe. Enable it in src/config/deployment.yaml:
reload:
enabled: true
url: "http://localhost:8000/admin/reload"
timeout_seconds: 5When enabled, after a successful run whose deployment stage completed, the
pipeline POSTs to reload.url (attaching X-Admin-Token from API_ADMIN_TOKEN
if set). The call is fully CI-safe: if no API is reachable, the failure is
logged and the pipeline run still succeeds.
docker build -f docker/Dockerfile -t mlops-prediction-api .docker run --rm \
-p 8000:8000 \
-v "$(pwd)/mlruns:/app/mlruns:ro" \
-v "$(pwd)/src/config:/app/src/config:ro" \
-v "$(pwd)/artifacts:/app/artifacts:ro" \
-v "$(pwd)/data/processed:/app/data/processed:ro" \
-e MLFLOW_TRACKING_URI=/app/mlruns \
-e PIPELINE_CONFIG_PATH=/app/src/config/pipeline_tabular_classification.yaml \
mlops-prediction-apiThe GitHub Actions workflow (.github/workflows/ci-pipeline.yml) runs:
- Tests —
pytest tests/. - Pipeline run —
run-pipelineagainstpipeline_tabular_classification_ci.yaml, a CI-specific config that omits the interactivepromotionstage. - Artifact upload —
outputs/,artifacts/runs/, andmlruns/are uploaded on every run, including failures. - Docker build — builds the image to verify the Dockerfile is valid.
- Import check — runs the container briefly to verify Python imports.
To trigger the CI workflow:
- Go to the repository on GitHub.
- Navigate to Actions > CI Pipeline.
- Click Run workflow > Run workflow.
The pipeline has not been run with a successful promotion yet. Run:
run-pipeline --config src/config/pipeline_tabular_classification.yamland approve the model when prompted.
The mlruns/ directory is not mounted or is empty. Verify:
ls mlruns/should contain experiment directories. If running in Docker, check the
volume mount in docker-compose.yml.
Change the API_PORT in .env:
API_PORT=8001
Every successful run-pipeline invocation emits
outputs/deployment_manifest.json. The manifest is a structured,
deterministic description of what was just promoted and how to serve it.
It is not a deployment trigger.
Key fields:
readiness.status— one ofready,no_production_model,promotion_not_approved,mlflow_unavailable.model.registry_name/model.registry_version/model.run_id— registry coordinates of the current Production model.service.app_import/service.entrypoint_cli— how to start the FastAPI service (run-apioruvicorn src.deployment.app:app).container.compose_service/container.image_name— references to the existing Docker Compose service used for local serving.env_template— suggested.enventries; identical in spirit to.env.examplebut populated with the current run's resolved values.
Use the manifest as a deployment intent contract: read it from CI, attach it to release artifacts, or feed it into your serving rollout tooling. The manifest does not start Docker or push images.