Skip to content

Latest commit

 

History

History
279 lines (218 loc) · 15.3 KB

File metadata and controls

279 lines (218 loc) · 15.3 KB

Machine Vision Project — Kodak & Kodak M2

CSE480 Machine Vision · Ain Shams University · Spring 2026 A two-milestone, mostly from-scratch computer-vision project. Milestone 1 is a NumPy-only image-processing library; Milestone 2 builds an end-to-end supervised classification pipeline (dataset → preprocessing → augmentation → handcrafted features → MRMR → 4 classifiers) on top of it. A Streamlit app ties everything together with an interactive demo, including live webcam classification.

Python NumPy Streamlit


Table of Contents


Live Demo (Streamlit)

A single-file Streamlit app (app.py) wraps both milestones into one interactive demo:

streamlit run app.py

Two top-level modes (sidebar)

1. Kodak Studio (Milestone 1)

Upload an image, pick a category (Enhancement / Filters / Features / Geometric / Frequency / Morphology), tweak parameters with sliders. The processed image and any auxiliary plot (histogram, FFT spectrum) update in real time. Every operation calls a Kodak (minicv) primitive.

2. Milestone 2: Predict an Image (Kodak M2)

Feed an image into the full classification pipeline and step through every stage. Image source is selectable:

  • Pick a TEST sample (unseen) (default) — the honest demo: model has never seen these
  • Upload — drop in any JPG/PNG
  • Pick a train / val sample — for sanity checks and augmentation demos
  • 🆕 Live camera — open your webcam and see all three models classify the live frame side-by-side

The 8-page pipeline walkthrough:

# Page What you see for your image
0 Overview Raw image + one-line predictions from KNN, Softmax, CNN
1 Preprocessing Raw vs 96×96 vs 64×64 + train-set R/G/B mean & std
2 Augmentation Grid of 1–12 stochastic augmentations + re-roll button
3 Feature extraction 1,107-dim vector colour-coded by family + per-family stats
4 MRMR-selected features Compresses 1,107 → 800 (or 175); top picks for your image
5 KNN prediction Predicted class, per-class probability, 3 nearest training images with cosine distances. Live k slider + cosine/l2 toggle
6 Softmax prediction Predicted class, confidence, raw logits
7 CNN prediction Predicted class on the 64×64 input + raw logits
8 All side-by-side Summary table + per-class probability heatmap + bar chart

🆕 The latest version also includes a Live camera option under the image-source picker — open your laptop/phone webcam directly inside the app, and KNN, Softmax and CNN classify the live frame in real time. Best demo class to act out: watching_tv (just sit in front of any screen).

Caching: every expensive resource (trained checkpoints, MRMR selection, train-set statistics, the from-scratch KNN classifier) is wrapped with @st.cache_resource so first page-load is the only one that pays load cost; subsequent navigation is instant.


Repository Layout

Machine-Vision-/
├── minicv/                       # Milestone 1 library  (a.k.a. Kodak)
│   ├── io.py            ├── utils.py        ├── filters.py
│   ├── transforms.py    ├── processing.py   ├── features.py
│   ├── drawing.py       └── frequency.py
│
├── el_nos_el_tany/               # Milestone 2 pipeline  (a.k.a. Kodak M2)
│   ├── el_nos_el_tany/
│   │   ├── dataset.py            # Stanford-40 loading, stratified splits
│   │   ├── preprocess.py         # decode + resize + canonical [0,1] float
│   │   ├── augment.py            # 7 train-time transforms via Kodak
│   │   ├── features.py           # 1,107-dim feature pool (6 families) + schema
│   │   ├── mrmr.py               # MRMR top-K wrapper
│   │   ├── optim.py              # SGD, Adam, schedulers, EarlyStopping
│   │   ├── runs.py               # logs.csv + npz checkpoints + resume
│   │   ├── metrics.py            # accuracy, confmat, P/R/F1, macro-F1
│   │   └── models/
│   │       ├── knn.py            # KNN with hand-rolled L2 / cosine distance
│   │       ├── softmax.py        # softmax regression, stable softmax + ε CE
│   │       ├── cnn.py            # Conv2D / MaxPool2D / ReLU / FC + SimpleCNN
│   │       └── mobilevit.py      # MobileViT-XXS in PyTorch (Section 5.4)
│   │
│   ├── notebooks/                # all 8 deliverables
│   │   ├── 01_dataset_preprocess_augment.ipynb
│   │   ├── 02_feature_extraction.ipynb
│   │   ├── 03_knn_softmax.ipynb
│   │   ├── 04_cnn_from_scratch.ipynb
│   │   ├── 05_paper_architecture.ipynb
│   │   ├── 06_optimizer_comparison.ipynb
│   │   ├── 07_logging_resume.ipynb
│   │   └── 08_final_evaluation.ipynb
│   │
│   ├── data/
│   │   ├── raw/Stanford40/       # 9,500+ raw photos
│   │   ├── splits/               # train/val/test CSVs + norm stats
│   │   ├── features/             # *.npz feature matrices + MRMR selections
│   │   └── runs/                 # one folder per training run (logs+checkpoints)
│   │
│   └── figures/                  # generated PNGs used in the report
│
├── app.py                        # Streamlit live demo (M1 + M2 + Live camera)
├── Project_report_M1/            # Milestone 1 LaTeX report
├── Project_report_M2/            # Milestone 2 LaTeX report (PDF + sources)
├── requirements.txt
└── README.md                     # ← you are here

Installation

git clone https://github.com/Senior2Projects/Machine-Vision-.git
cd Machine-Vision-

# (recommended) create a venv
python -m venv .venv
.venv\Scripts\activate          # Windows PowerShell
# source .venv/bin/activate     # Linux / macOS

pip install -r requirements.txt

Run the live demo:

streamlit run app.py

Run the notebooks (all 8 are self-contained and reproducible):

jupyter lab el_nos_el_tany/notebooks/

Milestone 1 — Kodak Image-Processing Library

Goal: rebuild a small but real subset of OpenCV in pure NumPy with no external CV libraries. Constraint: only numpy, pandas, matplotlib, and the Python standard library are allowed.

What's implemented

Module Functions
io.py read_image, write_image (PNG / JPG via Matplotlib backends)
utils.py rgb2gray, gray2rgb, to_float01, to_uint8, dtype validation, 3 normalization modes, pixel clipping, 3 padding modes, RGB↔HSV
filters.py True 2-D convolution, mean/box, Gaussian (with kernel generator), median, Sobel gradients, Otsu, adaptive thresholding, erosion/dilation
processing.py Gamma correction, histogram equalization, histogram matching, bit-plane slicing
features.py Harris corners, Canny edges, HOG, color-histogram & gradient-histogram & image-statistics & canny-grid descriptors
transforms.py resize (nearest + bilinear), rotate (about-centre with bilinear interpolation), translate, flip, crop
drawing.py draw_point, draw_line (Bresenham), draw_rectangle (filled / outline), draw_polygon, draw_text
frequency.py FFT low/high-pass filtering with the Fourier centred-magnitude spectrum

Engineering rules followed

  • Docstrings for every public function (description, parameters + types, return value, raised exceptions, expected dtype/range)
  • TypeError for wrong types, ValueError for invalid shapes — error messages always say what failed and what was expected
  • NumPy-vectorised everywhere; loops only over kernel windows when justified (median) and clearly documented
  • Modular: I/O, filtering, transforms, features, drawing, utils each in their own file

Milestone 2 — Kodak M2 Supervised Pipeline

Goal: end-to-end supervised vision pipeline using the Milestone-1 library as the only image-processing backend. Constraint: every classical learning component implemented from scratch in NumPy — KNN, softmax, CNN (forward + backward), Adam, schedulers, metrics. Library use allowed only for MRMR (rubric exception) and the one paper architecture (PyTorch under the rubric's framework exception).

Dataset

  • Stanford 40 Actions, restricted to a curated 6-class subset chosen by ranking per-class F1 of a 40-way diagnostic baseline: cleaning_the_floor · climbing · cutting_trees · riding_a_horse · rowing_a_boat · watching_tv
  • Stratified 70 / 15 / 15 split → 989 train · 211 val · 214 test
  • Class-distribution figure: el_nos_el_tany/figures/class_distribution.png

Pipeline at a glance

raw JPG ─► Kodak.io.read_image ─► resize 96×96 (or 64×64) ─► float32 [0,1]
                                            │                       │
                                            │                       └─► CNN / MobileViT
                                            ▼
                                  extract_pool ─► 1,107-dim vector
                                            │
                                            ├─► (training only) augment ── 7 stochastic transforms
                                            │
                                            └─► MRMR top-K (800 default, 175 also persisted)
                                                       │
                                                       └─► z-score on train mean/std
                                                              │
                                                              ├─► KNN
                                                              └─► Softmax regression

What's in each module

Module Responsibility
dataset.py Discover Stanford-40 JPEGs, parse class from filename, build annotation DataFrame, stratified split, class-distribution plot
preprocess.py Decode → gray2rgb (if needed) → to_uint8 → bilinear resize → to_float01. Plus NormStats dataclass for train-set mean/std persistence
augment.py 7 stochastic train-time transforms (hflip, rotate ±15°, translate ±10%, gamma, blur, crop+resize, vflip-disabled). Programmatic train-only guard
features.py 1,107-dim feature pool from 6 Kodak descriptors: color_hist (96), hsv_hist (48), stats (15), grad_hist (32), canny_grid (16), hog (900). Stable named-index FeatureSchema
mrmr.py Wraps mrmr_selection for top-K subset; persists with decoded names; sweep K=10…800; both K=800 and K=175 selections saved
models/knn.py pairwise_l2, pairwise_cosine (vectorised), KNNClassifier with k-sweep helper
models/softmax.py Numerically stable softmax, ε-clipped cross-entropy, mini-batch SGD with L2, gradient clipping, early stopping
models/cnn.py Conv2D (im2col / col2im), MaxPool2D, ReLU, Flatten, FC — all forward + backward; SimpleCNN (548 K params)
models/mobilevit.py MobileViT-XXS architecture in PyTorch (paper architecture exception)
optim.py SGD, Adam, 4 LR schedulers (Step, Exponential, Cosine, ReduceOnPlateau), EarlyStopping; all with state_dict / load_state_dict
runs.py Run class: config.json + logs.csv + best/last .npz checkpoints + full resume support
metrics.py Accuracy, confusion matrix, per-class P/R/F1, macro-F1, weighted-F1 — all from raw NumPy

Why these design choices

  • 96×96 (handcrafted) + 64×64 (CNN) — large enough for HOG to resolve human silhouettes, small enough that the from-scratch CNN runs on CPU
  • 6 feature families — the original 4-family pool was almost entirely global and per-class accuracy collapsed on indoor classes; adding hsv_hist and canny_grid lifted LogReg validation from 0.36 → 0.49 on the diagnostic
  • MRMR K-sweep: K=800 picked by argmax of mean validation accuracy across two classifiers (LogReg + SVM); a finer sweep also identified K=175 as a sweet spot reaching ~99 % of the full-pool accuracy with 6.3× fewer features

Results

Test-set comparison (held-out, 214 images)

Model Features Test acc Macro-F1 Weighted-F1
MobileViT-XXS raw 64×64 RGB + on-the-fly aug 0.720 0.720 0.719
KNN (k=3, cosine) 800-dim MRMR 0.673 0.672 0.671
CNN-from-scratch raw 64×64 RGB 0.668 0.669 0.668
Softmax (Adam + cosine LR) 800-dim MRMR 0.631 0.629 0.634

Per-class F1 (averaged across all 3 classical-style models)

Class Avg F1
riding_a_horse 0.718
watching_tv 0.703
rowing_a_boat 0.685
climbing 0.627
cleaning_the_floor 0.603
cutting_trees 0.587

Generated figures (el_nos_el_tany/figures/)

  • Dataset: class_distribution.png
  • Preprocessing: preprocess_before_after.png
  • Augmentation: augmentation_per_transform.png, augmentation_panel.png
  • Features: feature_pool_one_sample.png, feature_before_after_per_sample.png, feature_before_after_dataset.png
  • MRMR: mrmr_sweep.png, mrmr_family_breakdown.png, mrmr_before_after_aug.png
  • Training: knn_sweep.png, softmax_curves.png, cnn_curves.png, mobilevit_curves.png, softmax_optimizer_compare.png, cnn_optimizer_compare.png, logging_resume_demo.png
  • Final eval: knn_confusion.png, softmax_confusion.png, cnn_confusion.png, mobilevit_confusion.png, final_confusion_matrices.png, final_per_class_f1.png

Tech Stack & Constraints

Component Library Constraint
Image processing Kodak (our minicv) NumPy / pandas / matplotlib / stdlib only
Feature extraction Kodak descriptors Same — every dim computed by a Kodak primitive
MRMR selection mrmr_selection (third-party) Allowed by rubric — only the selection step
KNN / Softmax / CNN From scratch (NumPy) No sklearn, no PyTorch for these
Optimizers From scratch (NumPy) SGD + Adam + 4 schedulers + early stopping
Metrics From scratch (NumPy) Verified against sklearn to 1e-15 precision
MobileViT-XXS PyTorch Allowed by rubric Section 5.4 (paper architecture exception)
Live demo Streamlit Wraps everything for the interactive demo

License

Educational use under the CSE480 course agreement. Stanford 40 Actions dataset © its original authors (Yao et al., ICCV 2011).