Weekly-Sales-Transaction-Clustering

Segmenting customer behavior through K-Means clustering on weekly sales data.

🧠 Overview

This project provides a comprehensive solution for performing customer segmentation using K-Means clustering on weekly sales transaction data. The primary goal is to identify distinct groups of customers based on their purchasing patterns, enabling businesses to implement targeted marketing strategies, personalize sales approaches, and make informed product recommendations.

🔨 What I Built

This repository offers a suite of tools for data analysis, machine learning, and interactive visualization:

Interactive Exploratory Data Analysis (EDA): A dedicated module for generating interactive plots (histograms, box plots, scatter plots) to visually explore dataset features, distributions, and relationships.
K-Means Clustering Workflow: A detailed Jupyter Notebook outlining the end-to-end process of customer segmentation, including data loading, preprocessing, feature selection (using trimmed variance), optimal cluster determination (Elbow method, Silhouette scores), K-Means application, and visualization with PCA.
Dash Web Application: An interactive web dashboard built with Dash for visualizing and analyzing clustering results. It allows users to explore clusters in a 2D PCA-reduced space and understand the underlying patterns.
Streamlit User Interface: A user-friendly Streamlit application that empowers users to perform K-Means clustering interactively. It features dynamic selection of features and number of clusters, and visualizes the results through distribution plots, variance analysis, heatmaps, and PCA-reduced scatter plots.
Robust Data Wrangling: A modular script to clean and preprocess raw sales transaction data, specifically removing irrelevant columns and preparing the dataset for clustering.

💭 Thought Process

My approach to building this project focused on creating a flexible and intuitive solution for customer segmentation. I prioritized modularity by separating data wrangling into a dedicated wrangle.py file, ensuring that the core data preparation logic could be easily reused across different parts of the project.

For exploratory data analysis, I opted for interactive visualizations using ipywidgets in EDA.py. This decision was driven by the need for dynamic exploration, allowing users to quickly grasp insights into feature distributions and relationships without rerunning code.

The core clustering workflow in Sales Transaction Clustering.ipynb details the analytical journey. I made the key decision to employ K-Means clustering due to its efficiency and interpretability for segmentation tasks. To enhance the robustness of the clustering, StandardScaler was consistently applied to features, addressing the sensitivity of K-Means to feature scales. Furthermore, Principal Component Analysis (PCA) was integrated for dimensionality reduction, which proved crucial for visualizing high-dimensional cluster results in an understandable 2D space. The consideration of both the Elbow method and Silhouette scores for optimal 'k' selection provided a more comprehensive evaluation of clustering quality.

To make the clustering analysis accessible to a wider audience, I developed two interactive web applications: one using Dash (dash_app.py) and another with Streamlit (stream_lit.py). The choice to include both frameworks showcases different approaches to building interactive dashboards and allows users to choose their preferred interface. Streamlit, in particular, offered a rapid development pathway for creating a highly interactive and user-friendly experience, allowing dynamic feature selection and real-time visualization of clustering outcomes.

Throughout the project, emphasis was placed on clear visualization using seaborn, matplotlib, and plotly.express to effectively communicate the results of the clustering and provide actionable insights into customer segments.

🛠️ Tools & Tech Stack

Layer	Technology
Language	Python
Data Wrangling	Pandas, `re` (Regular Expressions)
Scientific Computing	SciPy
Data Visualization	Seaborn, Matplotlib, Plotly Express
Interactive UI	ipywidgets, Dash, Streamlit, Jupyter Dash
Machine Learning	Scikit-learn (KMeans, StandardScaler, Pipeline, PCA, silhouette_score)

🚀 Getting Started

Prerequisites

Python 3.8+
Git

Installation

git clone https://github.com/rashadmin/Weekly-Sales-Transaction-Clustering.git
cd Weekly-Sales-Transaction-Clustering
pip install -r requirements.txt

Note: A requirements.txt file needs to be created based on the tools_or_frameworks_used section from the file summaries. A sample requirements.txt based on the detected libraries would include:

pandas
scipy
seaborn
matplotlib
plotly
ipywidgets
scikit-learn
dash
jupyter-dash
streamlit

Data

Ensure the Sales_Transactions_Dataset_Weekly.csv file is present in the project's root directory.

Run

1. Jupyter Notebook for Detailed Analysis:

jupyter notebook "Sales Transaction Clustering.ipynb"

2. Dash Web Application:

python dash_app.py

Open your web browser and navigate to the address displayed in the console (usually http://127.0.0.1:8050/).

3. Streamlit Web Application:

streamlit run stream_lit.py

Open your web browser and navigate to the address displayed in the console (usually http://localhost:8501).

📖 Usage

Example 1: Exploring Data Interactively with `EDA.py`

To use the interactive EDA features in a Jupyter environment:

# In a Jupyter Notebook or IPython environment
from EDA import make_hist_box_plot, make_scatter_plot
import pandas as pd

df = pd.read_csv('Sales_Transactions_Dataset_Weekly.csv')

# Use ipywidgets to interact with these functions
# Example: make_hist_box_plot(df, 'Feature_Column')
# Example: make_scatter_plot(df, 'Feature_X', 'Feature_Y', 'Cluster_Label')

Example 2: Streamlit Interactive Clustering

Run the Streamlit application to visually perform clustering:

streamlit run stream_lit.py

Interact with the sidebar controls to:

Select the number of features for analysis.
Choose specific features from a multi-select dropdown.
Set the desired number of clusters (K).
View distribution plots, variance analysis, correlation heatmaps, and PCA-reduced scatter plots of the clusters.

📚 Resources

Pandas Documentation — Data manipulation and analysis
Scikit-learn Documentation — Machine learning algorithms (KMeans, StandardScaler, PCA, Pipeline)
Streamlit Documentation — Building interactive web applications
Dash Documentation — Building analytical web applications
Plotly Express Documentation — High-level interface for Plotly
Seaborn Tutorial — Statistical data visualization
Matplotlib Tutorial — Basic plotting library
ipywidgets Documentation — Interactive HTML widgets for Jupyter notebooks
SciPy Documentation — Scientific and technical computing
Python re module — Regular expression operations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weekly-Sales-Transaction-Clustering

🧠 Overview

🔨 What I Built

💭 Thought Process

🛠️ Tools & Tech Stack

🚀 Getting Started

Prerequisites

Installation

Data

Run

📖 Usage

Example 1: Exploring Data Interactively with `EDA.py`

Example 2: Streamlit Interactive Clustering

📚 Resources

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
EDA.py		EDA.py
README.md		README.md
Sales Transaction Clustering.ipynb		Sales Transaction Clustering.ipynb
Sales_Transactions_Dataset_Weekly.csv		Sales_Transactions_Dataset_Weekly.csv
dash_app.py		dash_app.py
requirements.txt		requirements.txt
stream_lit.py		stream_lit.py
wrangle.py		wrangle.py

Folders and files

Latest commit

History

Repository files navigation

Weekly-Sales-Transaction-Clustering

🧠 Overview

🔨 What I Built

💭 Thought Process

🛠️ Tools & Tech Stack

🚀 Getting Started

Prerequisites

Installation

Data

Run

📖 Usage

Example 1: Exploring Data Interactively with EDA.py

Example 2: Streamlit Interactive Clustering

📚 Resources

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example 1: Exploring Data Interactively with `EDA.py`

Packages