This project demonstrates Principal Component Analysis (PCA) applied on the MNIST dataset to reduce dimensionality while preserving key features of handwritten digit images.
The dataset used is the MNIST handwritten digits dataset, downloaded from Kaggle. It contains:
- 60,000 training images
- 10,000 test images
- Each image is 28x28 pixels (784 features)
- Python π
- NumPy & Pandas
- Matplotlib & Seaborn (for visualization)
- Scikit-Learn (for PCA and data preprocessing)
-
Data Loading & Preprocessing
- Read the MNIST dataset
- Normalize pixel values
- Optional: Visualize a few digit samples
-
PCA Application
- Apply PCA to reduce from 784 dimensions to N components
- Plot explained variance ratio vs. number of components
- Reconstruct digits from reduced dimensions
-
Visualization
- Compare original and reconstructed images
- 2D scatter plots of digits after PCA for clustering insight
- PCA reduced dimensions while retaining ~95% variance with ~150 components
- Reconstructed images were visually close to originals
- 2D PCA plots showed visible clustering of different digits
- PCA is effective for compression and visualization of high-dimensional data like MNIST.
- You can balance between compression (fewer components) and reconstruction quality.