This project demonstrates a full MLOps pipeline for a machine learning model:
from data preprocessing → model training → deployment of API and frontend via Docker.
Focus: The project emphasizes pipeline automation and orchestration, featuring:
- Simple frontend interface
- Basic logistic regression model
- Binary classification task
- Dataset: Airline Passenger Satisfaction from Kaggle
├── code
│ ├── datasets
│ ├── deployment # Docker + services
│ │ ├── api # FastAPI backend
│ │ └── app # Streamlit frontend
│ └── models # training scripts
├── data
│ ├── raw # raw input data
│ └── processed # processed datasets
├── models # saved models and scalers (pkl)
├── services
│ └── airflow
│ ├── dags # Airflow DAGs
│ └── logs # Airflow logs
└── requirements.txt # dependencies
- Airflow — pipeline orchestration (0.0.0.0:8080)
- MLflow — experiment tracking (127.0.0.1:5000)
- FastAPI — REST API for serving the model (localhost:8000)
- Streamlit — frontend for interaction (localhost:8501)
- Docker Compose — deployment of API & frontend services
Set up a virtual environment and install dependencies:
python3 -m venv .venv
source .venv/bin/activate
export AIRFLOW_HOME=$(pwd)/services/airflow
pip install -r requirements.txt📦 requirements.txt includes:
apache-airflow
pandas
numpy
scikit-learn
mlflow-skinny
pip install mlflowairflow standaloneWeb UI available at: http://0.0.0.0:8080
Username/password will be shown in the terminal on first startup, or you can find credentials in MLOps-practice/services/airflow/simple_auth_manager_passwords.json.generated
In a separate environment:
mlflow uiUI available at: http://127.0.0.1:5000
Install Docker and verify it's running:
docker --versionDAGs are located in services/airflow/dags. There are two ways to run pipelines:
Step by step
- data_pipeline_dag — data cleaning & encoding
- model_training_dag_simple — model training
- deploy_model_pipeline — Docker deployment
Full pipeline
- full_pipeline_dag — runs the entire flow (data → training → deploy)
- scheduled to run every 5 minutes
- drop NaN values
- label & one-hot encoding
- save results in data/processed/
- scale data
- train logistic regression (GridSearchCV)
- log metrics into MLflow
- save best model & scaler into models/
- build Docker images for FastAPI & Streamlit
- start containers:
- API: http://localhost:8000
- App: http://localhost:8501
Open MLflow UI: http://127.0.0.1:5000 Here you can monitor:
- metrics (accuracy, precision, recall, f1)
- trained models
full_pipeline_dag is configured with cron */5 * * * * → runs every 5 minutes.
The pipeline executes in sequence:
- prepare data
- train model
- deploy services