CSE480 Machine Vision Β· Ain Shams University Β· Spring 2026 A two-milestone, mostly from-scratch computer-vision project. Milestone 1 is a NumPy-only image-processing library; Milestone 2 builds an end-to-end supervised classification pipeline (dataset β preprocessing β augmentation β handcrafted features β MRMR β 4 classifiers) on top of it. A Streamlit app ties everything together with an interactive demo, including live webcam classification.
- Live Demo (Streamlit)
- Repository Layout
- Installation
- Milestone 1 β Kodak Image-Processing Library
- Milestone 2 β Kodak M2 Supervised Pipeline
- Results
- Tech Stack & Constraints
A single-file Streamlit app (app.py) wraps both milestones into one interactive demo:
streamlit run app.pyUpload an image, pick a category (Enhancement / Filters / Features / Geometric / Frequency / Morphology), tweak parameters with sliders. The processed image and any auxiliary plot (histogram, FFT spectrum) update in real time. Every operation calls a Kodak (minicv) primitive.
Feed an image into the full classification pipeline and step through every stage. Image source is selectable:
- Pick a TEST sample (unseen) (default) β the honest demo: model has never seen these
- Upload β drop in any JPG/PNG
- Pick a train / val sample β for sanity checks and augmentation demos
- π Live camera β open your webcam and see all three models classify the live frame side-by-side
The 8-page pipeline walkthrough:
| # | Page | What you see for your image |
|---|---|---|
| 0 | Overview | Raw image + one-line predictions from KNN, Softmax, CNN |
| 1 | Preprocessing | Raw vs 96Γ96 vs 64Γ64 + train-set R/G/B mean & std |
| 2 | Augmentation | Grid of 1β12 stochastic augmentations + re-roll button |
| 3 | Feature extraction | 1,107-dim vector colour-coded by family + per-family stats |
| 4 | MRMR-selected features | Compresses 1,107 β 800 (or 175); top picks for your image |
| 5 | KNN prediction | Predicted class, per-class probability, 3 nearest training images with cosine distances. Live k slider + cosine/l2 toggle |
| 6 | Softmax prediction | Predicted class, confidence, raw logits |
| 7 | CNN prediction | Predicted class on the 64Γ64 input + raw logits |
| 8 | All side-by-side | Summary table + per-class probability heatmap + bar chart |
π The latest version also includes a Live camera option under the image-source picker β open your laptop/phone webcam directly inside the app, and KNN, Softmax and CNN classify the live frame in real time. Best demo class to act out: watching_tv (just sit in front of any screen).
Caching: every expensive resource (trained checkpoints, MRMR selection, train-set statistics, the from-scratch KNN classifier) is wrapped with
@st.cache_resourceso first page-load is the only one that pays load cost; subsequent navigation is instant.
Machine-Vision-/
βββ minicv/ # Milestone 1 library (a.k.a. Kodak)
β βββ io.py βββ utils.py βββ filters.py
β βββ transforms.py βββ processing.py βββ features.py
β βββ drawing.py βββ frequency.py
β
βββ el_nos_el_tany/ # Milestone 2 pipeline (a.k.a. Kodak M2)
β βββ el_nos_el_tany/
β β βββ dataset.py # Stanford-40 loading, stratified splits
β β βββ preprocess.py # decode + resize + canonical [0,1] float
β β βββ augment.py # 7 train-time transforms via Kodak
β β βββ features.py # 1,107-dim feature pool (6 families) + schema
β β βββ mrmr.py # MRMR top-K wrapper
β β βββ optim.py # SGD, Adam, schedulers, EarlyStopping
β β βββ runs.py # logs.csv + npz checkpoints + resume
β β βββ metrics.py # accuracy, confmat, P/R/F1, macro-F1
β β βββ models/
β β βββ knn.py # KNN with hand-rolled L2 / cosine distance
β β βββ softmax.py # softmax regression, stable softmax + Ξ΅ CE
β β βββ cnn.py # Conv2D / MaxPool2D / ReLU / FC + SimpleCNN
β β βββ mobilevit.py # MobileViT-XXS in PyTorch (Section 5.4)
β β
β βββ notebooks/ # all 8 deliverables
β β βββ 01_dataset_preprocess_augment.ipynb
β β βββ 02_feature_extraction.ipynb
β β βββ 03_knn_softmax.ipynb
β β βββ 04_cnn_from_scratch.ipynb
β β βββ 05_paper_architecture.ipynb
β β βββ 06_optimizer_comparison.ipynb
β β βββ 07_logging_resume.ipynb
β β βββ 08_final_evaluation.ipynb
β β
β βββ data/
β β βββ raw/Stanford40/ # 9,500+ raw photos
β β βββ splits/ # train/val/test CSVs + norm stats
β β βββ features/ # *.npz feature matrices + MRMR selections
β β βββ runs/ # one folder per training run (logs+checkpoints)
β β
β βββ figures/ # generated PNGs used in the report
β
βββ app.py # Streamlit live demo (M1 + M2 + Live camera)
βββ Project_report_M1/ # Milestone 1 LaTeX report
βββ Project_report_M2/ # Milestone 2 LaTeX report (PDF + sources)
βββ requirements.txt
βββ README.md # β you are here
git clone https://github.com/Senior2Projects/Machine-Vision-.git
cd Machine-Vision-
# (recommended) create a venv
python -m venv .venv
.venv\Scripts\activate # Windows PowerShell
# source .venv/bin/activate # Linux / macOS
pip install -r requirements.txtRun the live demo:
streamlit run app.pyRun the notebooks (all 8 are self-contained and reproducible):
jupyter lab el_nos_el_tany/notebooks/Goal: rebuild a small but real subset of OpenCV in pure NumPy with no external CV libraries. Constraint: only
numpy,pandas,matplotlib, and the Python standard library are allowed.
| Module | Functions |
|---|---|
io.py |
read_image, write_image (PNG / JPG via Matplotlib backends) |
utils.py |
rgb2gray, gray2rgb, to_float01, to_uint8, dtype validation, 3 normalization modes, pixel clipping, 3 padding modes, RGBβHSV |
filters.py |
True 2-D convolution, mean/box, Gaussian (with kernel generator), median, Sobel gradients, Otsu, adaptive thresholding, erosion/dilation |
processing.py |
Gamma correction, histogram equalization, histogram matching, bit-plane slicing |
features.py |
Harris corners, Canny edges, HOG, color-histogram & gradient-histogram & image-statistics & canny-grid descriptors |
transforms.py |
resize (nearest + bilinear), rotate (about-centre with bilinear interpolation), translate, flip, crop |
drawing.py |
draw_point, draw_line (Bresenham), draw_rectangle (filled / outline), draw_polygon, draw_text |
frequency.py |
FFT low/high-pass filtering with the Fourier centred-magnitude spectrum |
- Docstrings for every public function (description, parameters + types, return value, raised exceptions, expected dtype/range)
TypeErrorfor wrong types,ValueErrorfor invalid shapes β error messages always say what failed and what was expected- NumPy-vectorised everywhere; loops only over kernel windows when justified (median) and clearly documented
- Modular: I/O, filtering, transforms, features, drawing, utils each in their own file
Goal: end-to-end supervised vision pipeline using the Milestone-1 library as the only image-processing backend. Constraint: every classical learning component implemented from scratch in NumPy β KNN, softmax, CNN (forward + backward), Adam, schedulers, metrics. Library use allowed only for MRMR (rubric exception) and the one paper architecture (PyTorch under the rubric's framework exception).
- Stanford 40 Actions, restricted to a curated 6-class subset chosen by ranking per-class F1 of a 40-way diagnostic baseline:
cleaning_the_floor Β· climbing Β· cutting_trees Β· riding_a_horse Β· rowing_a_boat Β· watching_tv - Stratified 70 / 15 / 15 split β 989 train Β· 211 val Β· 214 test
- Class-distribution figure:
el_nos_el_tany/figures/class_distribution.png
raw JPG ββΊ Kodak.io.read_image ββΊ resize 96Γ96 (or 64Γ64) ββΊ float32 [0,1]
β β
β βββΊ CNN / MobileViT
βΌ
extract_pool ββΊ 1,107-dim vector
β
βββΊ (training only) augment ββ 7 stochastic transforms
β
βββΊ MRMR top-K (800 default, 175 also persisted)
β
βββΊ z-score on train mean/std
β
βββΊ KNN
βββΊ Softmax regression
| Module | Responsibility |
|---|---|
dataset.py |
Discover Stanford-40 JPEGs, parse class from filename, build annotation DataFrame, stratified split, class-distribution plot |
preprocess.py |
Decode β gray2rgb (if needed) β to_uint8 β bilinear resize β to_float01. Plus NormStats dataclass for train-set mean/std persistence |
augment.py |
7 stochastic train-time transforms (hflip, rotate Β±15Β°, translate Β±10%, gamma, blur, crop+resize, vflip-disabled). Programmatic train-only guard |
features.py |
1,107-dim feature pool from 6 Kodak descriptors: color_hist (96), hsv_hist (48), stats (15), grad_hist (32), canny_grid (16), hog (900). Stable named-index FeatureSchema |
mrmr.py |
Wraps mrmr_selection for top-K subset; persists with decoded names; sweep K=10β¦800; both K=800 and K=175 selections saved |
models/knn.py |
pairwise_l2, pairwise_cosine (vectorised), KNNClassifier with k-sweep helper |
models/softmax.py |
Numerically stable softmax, Ξ΅-clipped cross-entropy, mini-batch SGD with L2, gradient clipping, early stopping |
models/cnn.py |
Conv2D (im2col / col2im), MaxPool2D, ReLU, Flatten, FC β all forward + backward; SimpleCNN (548 K params) |
models/mobilevit.py |
MobileViT-XXS architecture in PyTorch (paper architecture exception) |
optim.py |
SGD, Adam, 4 LR schedulers (Step, Exponential, Cosine, ReduceOnPlateau), EarlyStopping; all with state_dict / load_state_dict |
runs.py |
Run class: config.json + logs.csv + best/last .npz checkpoints + full resume support |
metrics.py |
Accuracy, confusion matrix, per-class P/R/F1, macro-F1, weighted-F1 β all from raw NumPy |
- 96Γ96 (handcrafted) + 64Γ64 (CNN) β large enough for HOG to resolve human silhouettes, small enough that the from-scratch CNN runs on CPU
- 6 feature families β the original 4-family pool was almost entirely global and per-class accuracy collapsed on indoor classes; adding
hsv_histandcanny_gridlifted LogReg validation from 0.36 β 0.49 on the diagnostic - MRMR K-sweep: K=800 picked by argmax of mean validation accuracy across two classifiers (LogReg + SVM); a finer sweep also identified K=175 as a sweet spot reaching ~99 % of the full-pool accuracy with 6.3Γ fewer features
| Model | Features | Test acc | Macro-F1 | Weighted-F1 |
|---|---|---|---|---|
| MobileViT-XXS | raw 64Γ64 RGB + on-the-fly aug | 0.720 | 0.720 | 0.719 |
| KNN (k=3, cosine) | 800-dim MRMR | 0.673 | 0.672 | 0.671 |
| CNN-from-scratch | raw 64Γ64 RGB | 0.668 | 0.669 | 0.668 |
| Softmax (Adam + cosine LR) | 800-dim MRMR | 0.631 | 0.629 | 0.634 |
| Class | Avg F1 |
|---|---|
| riding_a_horse | 0.718 |
| watching_tv | 0.703 |
| rowing_a_boat | 0.685 |
| climbing | 0.627 |
| cleaning_the_floor | 0.603 |
| cutting_trees | 0.587 |
- Dataset:
class_distribution.png - Preprocessing:
preprocess_before_after.png - Augmentation:
augmentation_per_transform.png,augmentation_panel.png - Features:
feature_pool_one_sample.png,feature_before_after_per_sample.png,feature_before_after_dataset.png - MRMR:
mrmr_sweep.png,mrmr_family_breakdown.png,mrmr_before_after_aug.png - Training:
knn_sweep.png,softmax_curves.png,cnn_curves.png,mobilevit_curves.png,softmax_optimizer_compare.png,cnn_optimizer_compare.png,logging_resume_demo.png - Final eval:
knn_confusion.png,softmax_confusion.png,cnn_confusion.png,mobilevit_confusion.png,final_confusion_matrices.png,final_per_class_f1.png
| Component | Library | Constraint |
|---|---|---|
| Image processing | Kodak (our minicv) |
NumPy / pandas / matplotlib / stdlib only |
| Feature extraction | Kodak descriptors |
Same β every dim computed by a Kodak primitive |
| MRMR selection | mrmr_selection (third-party) |
Allowed by rubric β only the selection step |
| KNN / Softmax / CNN | From scratch (NumPy) | No sklearn, no PyTorch for these |
| Optimizers | From scratch (NumPy) | SGD + Adam + 4 schedulers + early stopping |
| Metrics | From scratch (NumPy) | Verified against sklearn to 1e-15 precision |
| MobileViT-XXS | PyTorch | Allowed by rubric Section 5.4 (paper architecture exception) |
| Live demo | Streamlit | Wraps everything for the interactive demo |
Educational use under the CSE480 course agreement. Stanford 40 Actions dataset Β© its original authors (Yao et al., ICCV 2011).