CSE480 Machine Vision · Ain Shams University · Spring 2026 A two-milestone, mostly from-scratch computer-vision project. Milestone 1 is a NumPy-only image-processing library; Milestone 2 builds an end-to-end supervised classification pipeline (dataset → preprocessing → augmentation → handcrafted features → MRMR → 4 classifiers) on top of it. A Streamlit app ties everything together with an interactive demo, including live webcam classification.
- Live Demo (Streamlit)
- Repository Layout
- Installation
- Milestone 1 — Kodak Image-Processing Library
- Milestone 2 — Kodak M2 Supervised Pipeline
- Results
- Tech Stack & Constraints
A single-file Streamlit app (app.py) wraps both milestones into one interactive demo:
streamlit run app.pyUpload an image, pick a category (Enhancement / Filters / Features / Geometric / Frequency / Morphology), tweak parameters with sliders. The processed image and any auxiliary plot (histogram, FFT spectrum) update in real time. Every operation calls a Kodak (minicv) primitive.
Feed an image into the full classification pipeline and step through every stage. Image source is selectable:
- Pick a TEST sample (unseen) (default) — the honest demo: model has never seen these
- Upload — drop in any JPG/PNG
- Pick a train / val sample — for sanity checks and augmentation demos
- 🆕 Live camera — open your webcam and see all three models classify the live frame side-by-side
The 8-page pipeline walkthrough:
| # | Page | What you see for your image |
|---|---|---|
| 0 | Overview | Raw image + one-line predictions from KNN, Softmax, CNN |
| 1 | Preprocessing | Raw vs 96×96 vs 64×64 + train-set R/G/B mean & std |
| 2 | Augmentation | Grid of 1–12 stochastic augmentations + re-roll button |
| 3 | Feature extraction | 1,107-dim vector colour-coded by family + per-family stats |
| 4 | MRMR-selected features | Compresses 1,107 → 800 (or 175); top picks for your image |
| 5 | KNN prediction | Predicted class, per-class probability, 3 nearest training images with cosine distances. Live k slider + cosine/l2 toggle |
| 6 | Softmax prediction | Predicted class, confidence, raw logits |
| 7 | CNN prediction | Predicted class on the 64×64 input + raw logits |
| 8 | All side-by-side | Summary table + per-class probability heatmap + bar chart |
🆕 The latest version also includes a Live camera option under the image-source picker — open your laptop/phone webcam directly inside the app, and KNN, Softmax and CNN classify the live frame in real time. Best demo class to act out: watching_tv (just sit in front of any screen).
Caching: every expensive resource (trained checkpoints, MRMR selection, train-set statistics, the from-scratch KNN classifier) is wrapped with
@st.cache_resourceso first page-load is the only one that pays load cost; subsequent navigation is instant.
Machine-Vision-/
├── minicv/ # Milestone 1 library (a.k.a. Kodak)
│ ├── io.py ├── utils.py ├── filters.py
│ ├── transforms.py ├── processing.py ├── features.py
│ ├── drawing.py └── frequency.py
│
├── el_nos_el_tany/ # Milestone 2 pipeline (a.k.a. Kodak M2)
│ ├── el_nos_el_tany/
│ │ ├── dataset.py # Stanford-40 loading, stratified splits
│ │ ├── preprocess.py # decode + resize + canonical [0,1] float
│ │ ├── augment.py # 7 train-time transforms via Kodak
│ │ ├── features.py # 1,107-dim feature pool (6 families) + schema
│ │ ├── mrmr.py # MRMR top-K wrapper
│ │ ├── optim.py # SGD, Adam, schedulers, EarlyStopping
│ │ ├── runs.py # logs.csv + npz checkpoints + resume
│ │ ├── metrics.py # accuracy, confmat, P/R/F1, macro-F1
│ │ └── models/
│ │ ├── knn.py # KNN with hand-rolled L2 / cosine distance
│ │ ├── softmax.py # softmax regression, stable softmax + ε CE
│ │ ├── cnn.py # Conv2D / MaxPool2D / ReLU / FC + SimpleCNN
│ │ └── mobilevit.py # MobileViT-XXS in PyTorch (Section 5.4)
│ │
│ ├── notebooks/ # all 8 deliverables
│ │ ├── 01_dataset_preprocess_augment.ipynb
│ │ ├── 02_feature_extraction.ipynb
│ │ ├── 03_knn_softmax.ipynb
│ │ ├── 04_cnn_from_scratch.ipynb
│ │ ├── 05_paper_architecture.ipynb
│ │ ├── 06_optimizer_comparison.ipynb
│ │ ├── 07_logging_resume.ipynb
│ │ └── 08_final_evaluation.ipynb
│ │
│ ├── data/
│ │ ├── raw/Stanford40/ # 9,500+ raw photos
│ │ ├── splits/ # train/val/test CSVs + norm stats
│ │ ├── features/ # *.npz feature matrices + MRMR selections
│ │ └── runs/ # one folder per training run (logs+checkpoints)
│ │
│ └── figures/ # generated PNGs used in the report
│
├── app.py # Streamlit live demo (M1 + M2 + Live camera)
├── Project_report_M1/ # Milestone 1 LaTeX report
├── Project_report_M2/ # Milestone 2 LaTeX report (PDF + sources)
├── requirements.txt
└── README.md # ← you are here
git clone https://github.com/Senior2Projects/Machine-Vision-.git
cd Machine-Vision-
# (recommended) create a venv
python -m venv .venv
.venv\Scripts\activate # Windows PowerShell
# source .venv/bin/activate # Linux / macOS
pip install -r requirements.txtRun the live demo:
streamlit run app.pyRun the notebooks (all 8 are self-contained and reproducible):
jupyter lab el_nos_el_tany/notebooks/Goal: rebuild a small but real subset of OpenCV in pure NumPy with no external CV libraries. Constraint: only
numpy,pandas,matplotlib, and the Python standard library are allowed.
| Module | Functions |
|---|---|
io.py |
read_image, write_image (PNG / JPG via Matplotlib backends) |
utils.py |
rgb2gray, gray2rgb, to_float01, to_uint8, dtype validation, 3 normalization modes, pixel clipping, 3 padding modes, RGB↔HSV |
filters.py |
True 2-D convolution, mean/box, Gaussian (with kernel generator), median, Sobel gradients, Otsu, adaptive thresholding, erosion/dilation |
processing.py |
Gamma correction, histogram equalization, histogram matching, bit-plane slicing |
features.py |
Harris corners, Canny edges, HOG, color-histogram & gradient-histogram & image-statistics & canny-grid descriptors |
transforms.py |
resize (nearest + bilinear), rotate (about-centre with bilinear interpolation), translate, flip, crop |
drawing.py |
draw_point, draw_line (Bresenham), draw_rectangle (filled / outline), draw_polygon, draw_text |
frequency.py |
FFT low/high-pass filtering with the Fourier centred-magnitude spectrum |
- Docstrings for every public function (description, parameters + types, return value, raised exceptions, expected dtype/range)
TypeErrorfor wrong types,ValueErrorfor invalid shapes — error messages always say what failed and what was expected- NumPy-vectorised everywhere; loops only over kernel windows when justified (median) and clearly documented
- Modular: I/O, filtering, transforms, features, drawing, utils each in their own file
Goal: end-to-end supervised vision pipeline using the Milestone-1 library as the only image-processing backend. Constraint: every classical learning component implemented from scratch in NumPy — KNN, softmax, CNN (forward + backward), Adam, schedulers, metrics. Library use allowed only for MRMR (rubric exception) and the one paper architecture (PyTorch under the rubric's framework exception).
- Stanford 40 Actions, restricted to a curated 6-class subset chosen by ranking per-class F1 of a 40-way diagnostic baseline:
cleaning_the_floor · climbing · cutting_trees · riding_a_horse · rowing_a_boat · watching_tv - Stratified 70 / 15 / 15 split → 989 train · 211 val · 214 test
- Class-distribution figure:
el_nos_el_tany/figures/class_distribution.png
raw JPG ─► Kodak.io.read_image ─► resize 96×96 (or 64×64) ─► float32 [0,1]
│ │
│ └─► CNN / MobileViT
▼
extract_pool ─► 1,107-dim vector
│
├─► (training only) augment ── 7 stochastic transforms
│
└─► MRMR top-K (800 default, 175 also persisted)
│
└─► z-score on train mean/std
│
├─► KNN
└─► Softmax regression
| Module | Responsibility |
|---|---|
dataset.py |
Discover Stanford-40 JPEGs, parse class from filename, build annotation DataFrame, stratified split, class-distribution plot |
preprocess.py |
Decode → gray2rgb (if needed) → to_uint8 → bilinear resize → to_float01. Plus NormStats dataclass for train-set mean/std persistence |
augment.py |
7 stochastic train-time transforms (hflip, rotate ±15°, translate ±10%, gamma, blur, crop+resize, vflip-disabled). Programmatic train-only guard |
features.py |
1,107-dim feature pool from 6 Kodak descriptors: color_hist (96), hsv_hist (48), stats (15), grad_hist (32), canny_grid (16), hog (900). Stable named-index FeatureSchema |
mrmr.py |
Wraps mrmr_selection for top-K subset; persists with decoded names; sweep K=10…800; both K=800 and K=175 selections saved |
models/knn.py |
pairwise_l2, pairwise_cosine (vectorised), KNNClassifier with k-sweep helper |
models/softmax.py |
Numerically stable softmax, ε-clipped cross-entropy, mini-batch SGD with L2, gradient clipping, early stopping |
models/cnn.py |
Conv2D (im2col / col2im), MaxPool2D, ReLU, Flatten, FC — all forward + backward; SimpleCNN (548 K params) |
models/mobilevit.py |
MobileViT-XXS architecture in PyTorch (paper architecture exception) |
optim.py |
SGD, Adam, 4 LR schedulers (Step, Exponential, Cosine, ReduceOnPlateau), EarlyStopping; all with state_dict / load_state_dict |
runs.py |
Run class: config.json + logs.csv + best/last .npz checkpoints + full resume support |
metrics.py |
Accuracy, confusion matrix, per-class P/R/F1, macro-F1, weighted-F1 — all from raw NumPy |
- 96×96 (handcrafted) + 64×64 (CNN) — large enough for HOG to resolve human silhouettes, small enough that the from-scratch CNN runs on CPU
- 6 feature families — the original 4-family pool was almost entirely global and per-class accuracy collapsed on indoor classes; adding
hsv_histandcanny_gridlifted LogReg validation from 0.36 → 0.49 on the diagnostic - MRMR K-sweep: K=800 picked by argmax of mean validation accuracy across two classifiers (LogReg + SVM); a finer sweep also identified K=175 as a sweet spot reaching ~99 % of the full-pool accuracy with 6.3× fewer features
| Model | Features | Test acc | Macro-F1 | Weighted-F1 |
|---|---|---|---|---|
| MobileViT-XXS | raw 64×64 RGB + on-the-fly aug | 0.720 | 0.720 | 0.719 |
| KNN (k=3, cosine) | 800-dim MRMR | 0.673 | 0.672 | 0.671 |
| CNN-from-scratch | raw 64×64 RGB | 0.668 | 0.669 | 0.668 |
| Softmax (Adam + cosine LR) | 800-dim MRMR | 0.631 | 0.629 | 0.634 |
| Class | Avg F1 |
|---|---|
| riding_a_horse | 0.718 |
| watching_tv | 0.703 |
| rowing_a_boat | 0.685 |
| climbing | 0.627 |
| cleaning_the_floor | 0.603 |
| cutting_trees | 0.587 |
- Dataset:
class_distribution.png - Preprocessing:
preprocess_before_after.png - Augmentation:
augmentation_per_transform.png,augmentation_panel.png - Features:
feature_pool_one_sample.png,feature_before_after_per_sample.png,feature_before_after_dataset.png - MRMR:
mrmr_sweep.png,mrmr_family_breakdown.png,mrmr_before_after_aug.png - Training:
knn_sweep.png,softmax_curves.png,cnn_curves.png,mobilevit_curves.png,softmax_optimizer_compare.png,cnn_optimizer_compare.png,logging_resume_demo.png - Final eval:
knn_confusion.png,softmax_confusion.png,cnn_confusion.png,mobilevit_confusion.png,final_confusion_matrices.png,final_per_class_f1.png
| Component | Library | Constraint |
|---|---|---|
| Image processing | Kodak (our minicv) |
NumPy / pandas / matplotlib / stdlib only |
| Feature extraction | Kodak descriptors |
Same — every dim computed by a Kodak primitive |
| MRMR selection | mrmr_selection (third-party) |
Allowed by rubric — only the selection step |
| KNN / Softmax / CNN | From scratch (NumPy) | No sklearn, no PyTorch for these |
| Optimizers | From scratch (NumPy) | SGD + Adam + 4 schedulers + early stopping |
| Metrics | From scratch (NumPy) | Verified against sklearn to 1e-15 precision |
| MobileViT-XXS | PyTorch | Allowed by rubric Section 5.4 (paper architecture exception) |
| Live demo | Streamlit | Wraps everything for the interactive demo |
Educational use under the CSE480 course agreement. Stanford 40 Actions dataset © its original authors (Yao et al., ICCV 2011).