Machine Vision Project — Kodak & Kodak M2

CSE480 Machine Vision · Ain Shams University · Spring 2026 A two-milestone, mostly from-scratch computer-vision project. Milestone 1 is a NumPy-only image-processing library; Milestone 2 builds an end-to-end supervised classification pipeline (dataset → preprocessing → augmentation → handcrafted features → MRMR → 4 classifiers) on top of it. A Streamlit app ties everything together with an interactive demo, including live webcam classification.

Live Demo (Streamlit)

A single-file Streamlit app (app.py) wraps both milestones into one interactive demo:

streamlit run app.py

Two top-level modes (sidebar)

1. Kodak Studio (Milestone 1)

Upload an image, pick a category (Enhancement / Filters / Features / Geometric / Frequency / Morphology), tweak parameters with sliders. The processed image and any auxiliary plot (histogram, FFT spectrum) update in real time. Every operation calls a Kodak (minicv) primitive.

2. Milestone 2: Predict an Image (Kodak M2)

Feed an image into the full classification pipeline and step through every stage. Image source is selectable:

Pick a TEST sample (unseen) (default) — the honest demo: model has never seen these
Upload — drop in any JPG/PNG
Pick a train / val sample — for sanity checks and augmentation demos
🆕 Live camera — open your webcam and see all three models classify the live frame side-by-side

The 8-page pipeline walkthrough:

#	Page	What you see for your image
0	Overview	Raw image + one-line predictions from KNN, Softmax, CNN
1	Preprocessing	Raw vs 96×96 vs 64×64 + train-set R/G/B mean & std
2	Augmentation	Grid of 1–12 stochastic augmentations + re-roll button
3	Feature extraction	1,107-dim vector colour-coded by family + per-family stats
4	MRMR-selected features	Compresses 1,107 → 800 (or 175); top picks for your image
5	KNN prediction	Predicted class, per-class probability, 3 nearest training images with cosine distances. Live `k` slider + `cosine`/`l2` toggle
6	Softmax prediction	Predicted class, confidence, raw logits
7	CNN prediction	Predicted class on the 64×64 input + raw logits
8	All side-by-side	Summary table + per-class probability heatmap + bar chart

🆕 The latest version also includes a Live camera option under the image-source picker — open your laptop/phone webcam directly inside the app, and KNN, Softmax and CNN classify the live frame in real time. Best demo class to act out: watching_tv (just sit in front of any screen).

Caching: every expensive resource (trained checkpoints, MRMR selection, train-set statistics, the from-scratch KNN classifier) is wrapped with @st.cache_resource so first page-load is the only one that pays load cost; subsequent navigation is instant.

Repository Layout

Machine-Vision-/
├── minicv/                       # Milestone 1 library  (a.k.a. Kodak)
│   ├── io.py            ├── utils.py        ├── filters.py
│   ├── transforms.py    ├── processing.py   ├── features.py
│   ├── drawing.py       └── frequency.py
│
├── el_nos_el_tany/               # Milestone 2 pipeline  (a.k.a. Kodak M2)
│   ├── el_nos_el_tany/
│   │   ├── dataset.py            # Stanford-40 loading, stratified splits
│   │   ├── preprocess.py         # decode + resize + canonical [0,1] float
│   │   ├── augment.py            # 7 train-time transforms via Kodak
│   │   ├── features.py           # 1,107-dim feature pool (6 families) + schema
│   │   ├── mrmr.py               # MRMR top-K wrapper
│   │   ├── optim.py              # SGD, Adam, schedulers, EarlyStopping
│   │   ├── runs.py               # logs.csv + npz checkpoints + resume
│   │   ├── metrics.py            # accuracy, confmat, P/R/F1, macro-F1
│   │   └── models/
│   │       ├── knn.py            # KNN with hand-rolled L2 / cosine distance
│   │       ├── softmax.py        # softmax regression, stable softmax + ε CE
│   │       ├── cnn.py            # Conv2D / MaxPool2D / ReLU / FC + SimpleCNN
│   │       └── mobilevit.py      # MobileViT-XXS in PyTorch (Section 5.4)
│   │
│   ├── notebooks/                # all 8 deliverables
│   │   ├── 01_dataset_preprocess_augment.ipynb
│   │   ├── 02_feature_extraction.ipynb
│   │   ├── 03_knn_softmax.ipynb
│   │   ├── 04_cnn_from_scratch.ipynb
│   │   ├── 05_paper_architecture.ipynb
│   │   ├── 06_optimizer_comparison.ipynb
│   │   ├── 07_logging_resume.ipynb
│   │   └── 08_final_evaluation.ipynb
│   │
│   ├── data/
│   │   ├── raw/Stanford40/       # 9,500+ raw photos
│   │   ├── splits/               # train/val/test CSVs + norm stats
│   │   ├── features/             # *.npz feature matrices + MRMR selections
│   │   └── runs/                 # one folder per training run (logs+checkpoints)
│   │
│   └── figures/                  # generated PNGs used in the report
│
├── app.py                        # Streamlit live demo (M1 + M2 + Live camera)
├── Project_report_M1/            # Milestone 1 LaTeX report
├── Project_report_M2/            # Milestone 2 LaTeX report (PDF + sources)
├── requirements.txt
└── README.md                     # ← you are here

Installation

git clone https://github.com/Senior2Projects/Machine-Vision-.git
cd Machine-Vision-

# (recommended) create a venv
python -m venv .venv
.venv\Scripts\activate          # Windows PowerShell
# source .venv/bin/activate     # Linux / macOS

pip install -r requirements.txt

Run the live demo:

streamlit run app.py

Run the notebooks (all 8 are self-contained and reproducible):

jupyter lab el_nos_el_tany/notebooks/

Milestone 1 — Kodak Image-Processing Library

Goal: rebuild a small but real subset of OpenCV in pure NumPy with no external CV libraries. Constraint: only numpy, pandas, matplotlib, and the Python standard library are allowed.

What's implemented

Module	Functions
`io.py`	`read_image`, `write_image` (PNG / JPG via Matplotlib backends)
`utils.py`	`rgb2gray`, `gray2rgb`, `to_float01`, `to_uint8`, dtype validation, 3 normalization modes, pixel clipping, 3 padding modes, RGB↔HSV
`filters.py`	True 2-D convolution, mean/box, Gaussian (with kernel generator), median, Sobel gradients, Otsu, adaptive thresholding, erosion/dilation
`processing.py`	Gamma correction, histogram equalization, histogram matching, bit-plane slicing
`features.py`	Harris corners, Canny edges, HOG, color-histogram & gradient-histogram & image-statistics & canny-grid descriptors
`transforms.py`	`resize` (nearest + bilinear), `rotate` (about-centre with bilinear interpolation), `translate`, `flip`, `crop`
`drawing.py`	`draw_point`, `draw_line` (Bresenham), `draw_rectangle` (filled / outline), `draw_polygon`, `draw_text`
`frequency.py`	FFT low/high-pass filtering with the Fourier centred-magnitude spectrum

Engineering rules followed

Docstrings for every public function (description, parameters + types, return value, raised exceptions, expected dtype/range)
TypeError for wrong types, ValueError for invalid shapes — error messages always say what failed and what was expected
NumPy-vectorised everywhere; loops only over kernel windows when justified (median) and clearly documented
Modular: I/O, filtering, transforms, features, drawing, utils each in their own file

Milestone 2 — Kodak M2 Supervised Pipeline

Goal: end-to-end supervised vision pipeline using the Milestone-1 library as the only image-processing backend. Constraint: every classical learning component implemented from scratch in NumPy — KNN, softmax, CNN (forward + backward), Adam, schedulers, metrics. Library use allowed only for MRMR (rubric exception) and the one paper architecture (PyTorch under the rubric's framework exception).

Dataset

Stanford 40 Actions, restricted to a curated 6-class subset chosen by ranking per-class F1 of a 40-way diagnostic baseline: cleaning_the_floor · climbing · cutting_trees · riding_a_horse · rowing_a_boat · watching_tv
Stratified 70 / 15 / 15 split → 989 train · 211 val · 214 test
Class-distribution figure: el_nos_el_tany/figures/class_distribution.png

Pipeline at a glance

raw JPG ─► Kodak.io.read_image ─► resize 96×96 (or 64×64) ─► float32 [0,1]
                                            │                       │
                                            │                       └─► CNN / MobileViT
                                            ▼
                                  extract_pool ─► 1,107-dim vector
                                            │
                                            ├─► (training only) augment ── 7 stochastic transforms
                                            │
                                            └─► MRMR top-K (800 default, 175 also persisted)
                                                       │
                                                       └─► z-score on train mean/std
                                                              │
                                                              ├─► KNN
                                                              └─► Softmax regression

What's in each module

Module	Responsibility
`dataset.py`	Discover Stanford-40 JPEGs, parse class from filename, build annotation DataFrame, stratified split, class-distribution plot
`preprocess.py`	Decode → gray2rgb (if needed) → to_uint8 → bilinear resize → to_float01. Plus `NormStats` dataclass for train-set mean/std persistence
`augment.py`	7 stochastic train-time transforms (hflip, rotate ±15°, translate ±10%, gamma, blur, crop+resize, vflip-disabled). Programmatic train-only guard
`features.py`	1,107-dim feature pool from 6 Kodak descriptors: `color_hist (96)`, `hsv_hist (48)`, `stats (15)`, `grad_hist (32)`, `canny_grid (16)`, `hog (900)`. Stable named-index `FeatureSchema`
`mrmr.py`	Wraps `mrmr_selection` for top-K subset; persists with decoded names; sweep K=10…800; both K=800 and K=175 selections saved
`models/knn.py`	`pairwise_l2`, `pairwise_cosine` (vectorised), `KNNClassifier` with k-sweep helper
`models/softmax.py`	Numerically stable softmax, ε-clipped cross-entropy, mini-batch SGD with L2, gradient clipping, early stopping
`models/cnn.py`	`Conv2D` (im2col / col2im), `MaxPool2D`, `ReLU`, `Flatten`, `FC` — all forward + backward; `SimpleCNN` (548 K params)
`models/mobilevit.py`	MobileViT-XXS architecture in PyTorch (paper architecture exception)
`optim.py`	`SGD`, `Adam`, 4 LR schedulers (Step, Exponential, Cosine, ReduceOnPlateau), `EarlyStopping`; all with `state_dict` / `load_state_dict`
`runs.py`	`Run` class: `config.json` + `logs.csv` + best/last `.npz` checkpoints + full resume support
`metrics.py`	Accuracy, confusion matrix, per-class P/R/F1, macro-F1, weighted-F1 — all from raw NumPy

Why these design choices

96×96 (handcrafted) + 64×64 (CNN) — large enough for HOG to resolve human silhouettes, small enough that the from-scratch CNN runs on CPU
6 feature families — the original 4-family pool was almost entirely global and per-class accuracy collapsed on indoor classes; adding hsv_hist and canny_grid lifted LogReg validation from 0.36 → 0.49 on the diagnostic
MRMR K-sweep: K=800 picked by argmax of mean validation accuracy across two classifiers (LogReg + SVM); a finer sweep also identified K=175 as a sweet spot reaching ~99 % of the full-pool accuracy with 6.3× fewer features

Results

Test-set comparison (held-out, 214 images)

Model	Features	Test acc	Macro-F1	Weighted-F1
MobileViT-XXS	raw 64×64 RGB + on-the-fly aug	0.720	0.720	0.719
KNN (k=3, cosine)	800-dim MRMR	0.673	0.672	0.671
CNN-from-scratch	raw 64×64 RGB	0.668	0.669	0.668
Softmax (Adam + cosine LR)	800-dim MRMR	0.631	0.629	0.634

Per-class F1 (averaged across all 3 classical-style models)

Class	Avg F1
riding_a_horse	0.718
watching_tv	0.703
rowing_a_boat	0.685
climbing	0.627
cleaning_the_floor	0.603
cutting_trees	0.587

Generated figures (`el_nos_el_tany/figures/`)

Dataset: class_distribution.png
Preprocessing: preprocess_before_after.png
Augmentation: augmentation_per_transform.png, augmentation_panel.png
Features: feature_pool_one_sample.png, feature_before_after_per_sample.png, feature_before_after_dataset.png
MRMR: mrmr_sweep.png, mrmr_family_breakdown.png, mrmr_before_after_aug.png
Training: knn_sweep.png, softmax_curves.png, cnn_curves.png, mobilevit_curves.png, softmax_optimizer_compare.png, cnn_optimizer_compare.png, logging_resume_demo.png
Final eval: knn_confusion.png, softmax_confusion.png, cnn_confusion.png, mobilevit_confusion.png, final_confusion_matrices.png, final_per_class_f1.png

Tech Stack & Constraints

Component	Library	Constraint
Image processing	`Kodak` (our `minicv`)	NumPy / pandas / matplotlib / stdlib only
Feature extraction	`Kodak` descriptors	Same — every dim computed by a Kodak primitive
MRMR selection	`mrmr_selection` (third-party)	Allowed by rubric — only the selection step
KNN / Softmax / CNN	From scratch (NumPy)	No sklearn, no PyTorch for these
Optimizers	From scratch (NumPy)	SGD + Adam + 4 schedulers + early stopping
Metrics	From scratch (NumPy)	Verified against sklearn to 1e-15 precision
MobileViT-XXS	PyTorch	Allowed by rubric Section 5.4 (paper architecture exception)
Live demo	Streamlit	Wraps everything for the interactive demo

License

Educational use under the CSE480 course agreement. Stanford 40 Actions dataset © its original authors (Yao et al., ICCV 2011).

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Project_Description		Project_Description
Project_report		Project_report
Project_report_M2		Project_report_M2
benchmarks		benchmarks
el_nos_el_tany		el_nos_el_tany
minicv		minicv
unit_testing		unit_testing
.gitignore		.gitignore
README.md		README.md
app.py		app.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Vision Project — Kodak & Kodak M2

Table of Contents

Live Demo (Streamlit)

Two top-level modes (sidebar)

1. Kodak Studio (Milestone 1)

2. Milestone 2: Predict an Image (Kodak M2)

Repository Layout

Installation

Milestone 1 — Kodak Image-Processing Library

What's implemented

Engineering rules followed

Milestone 2 — Kodak M2 Supervised Pipeline

Dataset

Pipeline at a glance

What's in each module

Why these design choices

Results

Test-set comparison (held-out, 214 images)

Per-class F1 (averaged across all 3 classical-style models)

Generated figures (`el_nos_el_tany/figures/`)

Tech Stack & Constraints

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Machine Vision Project — Kodak & Kodak M2

Table of Contents

Live Demo (Streamlit)

Two top-level modes (sidebar)

1. Kodak Studio (Milestone 1)

2. Milestone 2: Predict an Image (Kodak M2)

Repository Layout

Installation

Milestone 1 — Kodak Image-Processing Library

What's implemented

Engineering rules followed

Milestone 2 — Kodak M2 Supervised Pipeline

Dataset

Pipeline at a glance

What's in each module

Why these design choices

Results

Test-set comparison (held-out, 214 images)

Per-class F1 (averaged across all 3 classical-style models)

Generated figures (el_nos_el_tany/figures/)

Tech Stack & Constraints

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Generated figures (`el_nos_el_tany/figures/`)

Packages