A complete end-to-end machine learning system for predicting customer churn in online retail, featuring database normalization (3NF), 16 experiments, MLflow tracking, and production deployment.
- Project Overview
- Features
- Architecture
- Installation
- Usage
- Database Schema
- Experiments
- Deployment
- API Documentation
- Project Structure
This project implements a binary classification system to predict customer churn in an online retail environment using the UCI Online Retail Dataset. The system includes:
- 3NF Normalized Database: Proper database design with SQLite
- 16 Machine Learning Experiments: 4 algorithms Γ 4 configurations
- Experiment Tracking: MLflow integration with DagsHub
- Production API: FastAPI backend for model serving
- User Interface: Streamlit frontend for predictions
- Docker Deployment: Containerized services with docker-compose
Target Variable: Customer Churn (Binary)
- 1 (Churned): Customer did not return within 90 days
- 0 (Retained): Customer made purchases within 90 days
- β 3NF normalized SQLite database
- β RFM (Recency, Frequency, Monetary) analysis
- β 16 experiments with different configurations
- β Hyperparameter tuning with Optuna
- β PCA for dimensionality reduction
- β MLflow/DagsHub experiment tracking
- β FastAPI REST API for inference
- β Interactive Streamlit dashboard
- β Docker containerization
- β Complete CI/CD ready
βββββββββββββββββββ ββββββββββββββββββββ
β Streamlit UI βββββββΆβ FastAPI API β
β (Port 8501) β β (Port 8000) β
βββββββββββββββββββ ββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Trained Models β
β (Pickle files) β
ββββββββββββββββββββ
- Python 3.10+
- Docker & Docker Compose (for deployment)
- Git
git clone https://github.com/yourusername/retail-churn-classification.git
cd retail-churn-classificationpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtDownload the UCI Online Retail Dataset from:
Place the CSV file in: data/raw/online_retail.csv
cp .env.example .env
# Edit .env with your DagsHub credentialsCreate and populate the 3NF normalized database:
python database/init_db.pyOutput:
database/retail.db- SQLite database with 4 normalized tables- Customers, Products, Invoices, InvoiceItems
Create features and labels for machine learning:
python data/feature_engineering.pyOutput:
data/processed/ml_dataset.csv- ML-ready dataset with RFM features
Execute all 16 experiments:
python experiments/run_experiments.pyThis will:
- Train 16 models (4 algorithms Γ 4 configurations)
- Save all models to
models/directory - Save metrics to
results/experiment_results.json - Print comparison table
Expected Runtime: ~15-30 minutes depending on hardware
python experiments/mlflow_tracking.pyThis will:
- Log all 16 experiments to DagsHub
- Create comparison visualizations
- Save charts to
results/
cd api
uvicorn main:app --reloadAPI will be available at: http://localhost:8000
cd streamlit
streamlit run app.pyUI will be available at: http://localhost:8501
Customers (CustomerID PK, Country, FirstPurchaseDate, LastPurchaseDate, TotalPurchases, TotalSpent)
Products (StockCode PK, Description, UnitPrice)
Invoices (InvoiceNo PK, CustomerID FK, InvoiceDate, Country)
InvoiceItems (ItemID PK, InvoiceNo FK, StockCode FK, Quantity, UnitPrice, TotalPrice)Benefits:
- β No data redundancy
- β Easy to update customer/product info
- β Maintains referential integrity
- β Optimized queries with indexes
- Logistic Regression: Fast, interpretable baseline
- Random Forest: Ensemble method, handles non-linearity
- XGBoost: Gradient boosting, typically best performer
- SVM: Support Vector Machine with RBF kernel
Using Optuna with 50 trials per model:
- Bayesian optimization (TPE sampler)
- 3-fold cross-validation
- F1-score as optimization metric
# Build and start all services
docker-compose up --build
# Run in background
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose downServices:
- API:
http://localhost:8000 - Streamlit:
http://localhost:8501
- Create a Droplet (Ubuntu 22.04)
- Install Docker and Docker Compose
- Clone repository
- Copy models to server
- Run
docker-compose up -d - Configure firewall (ports 8000, 8501)
- Create new Web Service
- Connect GitHub repository
- Set Docker as runtime
- Deploy
apiandstreamlitas separate services - Configure environment variables
Required for deployment:
# API
MODEL_PATH=/app/models/best_model.pkl
# DagsHub (optional for tracking)
DAGSHUB_USER=your-username
DAGSHUB_REPO=retail-churn-classification
DAGSHUB_TOKEN=your-tokenGET /healthGET /model/infoPOST /predict
Content-Type: application/json
{
"Recency": 30,
"Frequency": 5,
"Monetary": 500.0,
"InvoiceNo_nunique": 5,
"Quantity_sum": 50.0,
"Quantity_mean": 10.0,
"TotalPrice_sum": 500.0,
"TotalPrice_mean": 100.0,
"TotalPrice_std": 0.0,
"StockCode_nunique": 10,
"CustomerLifetime": 180,
"AvgDaysBetweenPurchases": 30.0,
"Country_United_Kingdom": 1,
...
}Response:
{
"churn_probability": 0.25,
"churn_prediction": 0,
"risk_level": "Low",
"timestamp": "2024-01-15T10:30:00"
}Visit http://localhost:8000/docs for Swagger UI