This project is designed to predict the likelihood of a customer making a purchase based on demographic and behavioral features. It incorporates data preprocessing, exploratory data analysis (EDA), model training, and deployment through a web interface using Streamlit.
The model utilizes a Random Forest Classifier to predict purchase probabilities. The project includes local deployment using Docker, with bonus deployment steps for cloud environments.
Input Features:
- Age
- Annual Income
- Discounts Availed
- Loyalty Program Status
- Number of Purchases
- Time Spent on Website
Output:
- Predicted probability of purchase (a value between 0 and 1).
The project includes the following files:
-
notebook.py:- Data preparation and cleaning
- Exploratory Data Analysis (EDA)
- Model training and hyperparameter tuning
-
train.py:- Training the Random Forest Classifier
- Saving the trained model using
pickle
-
predict.py:- Loading the model
- Function to predict customer purchase probabilities
-
app.py:- Streamlit-based web application
-
Dockerfile:- Instructions for containerizing the application
-
requirements.txt:- List of dependencies for the project
-
Dataset:- A CSV file containing the customer dataset (instructions provided to download it if not included).
- Python (version >= 3.8)
- Required Python packages (specified in
requirements.txt) - Docker (for containerization)
- Dataset available from Kaggle: Predict Customer Purchase Behavior Dataset.
git clone https://github.com/Pei-Tong/ml-zoomcamp-midproject.git
cd ml-zoomcamp-midprojectCreate a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # For Linux/Mac
venv\Scripts\activate # For Windows
pip install -r requirements.txtstreamlit run app.pydocker build -t customer-purchase-prediction .docker run -p 8501:8501 customer-purchase-predictionThe application has been deployed online for easy access. You can use the following link to test the app:
Customer Purchase Prediction App
The online deployment ensures that users can interact with the app without needing to set up the environment locally. All functionalities, including inputting customer data and obtaining purchase probability predictions, are fully operational.
- A Streamlit application for customer purchase prediction.
- Docker container for easy deployment.
- Dataset analysis and preprocessing script in notebook.py.
- Trained model and prediction scripts.
The following is a screenshot of the Customer Purchase Prediction app interface:

- Dataset: Kaggle: Predict Customer Purchase Behavior Dataset.
- Libraries:
pandas,numpy,scikit-learn,pickle,streamlit.