Benchmarks

This folder is the evidence layer for EduScale. It contains the scripts and saved outputs used to compare checkpoints on reconstruction quality, readability, and runtime.

Role In The Study

EduScale treats super-resolution as a readability task. A checkpoint is not selected only because it has a high PSNR or SSIM score; it is also checked against OCR confidence, character error rate, and practical runtime.

Metric	Direction	Use
PSNR	higher is better	Pixel-level reconstruction against HR targets
SSIM	higher is better	Structural quality for text, diagrams, and slide edges
Runtime	lower is better	Practicality for offline/mobile use
OCR confidence	higher is better	Approximate readability after enhancement
CER	lower is better	Text-recognition error rate after enhancement

Folder Layout

Path	Purpose
`benchmarks/scripts/`	Entry points for image quality, processing time, and video quality benchmarks
`benchmarks/analysis/`	Supporting metric and plotting helpers
`benchmarks/results/`	Saved CSV and JSON benchmark outputs used by reports and thesis text

Current Evaluation Targets

Scale	Held-out manifest	Samples
x2	`datasets/manifests/heldout/heldout_real_x2.csv`	291
x3	`datasets/manifests/heldout/heldout_real_x3.csv`	291

The held-out decks and rendered assets are under datasets/heldout-updated/. The CSV manifests are the stable source for model-to-model comparison.

Current Benchmark Snapshot

These are saved results from benchmarks/results, not a fresh rerun.

Model run	Scale	Samples	PSNR	SSIM	Runtime / image	OCR confidence	CER
`x2-all-20260410`	x2	291	29.9002	0.9813	719.159 ms	90.9905	0.1016
`x2-all-20260410-ocr-refine`	x2	291	29.3532	0.9758	762.6484 ms	90.9805	0.0814
`x2-v3-tpgsr-refine-20260411`	x2	291	29.4579	0.9786	572.7827 ms	91.2295	0.0765
`x3-real-refine-20260412`	x3	291	26.6166	0.9642	292.9324 ms	88.8627	0.1918
`x3-real-refine-20260412` last	x3	291	26.6002	0.9642	288.0681 ms	89.1655	0.2058

Result File Conventions

Pattern	Meaning
`*-eval.csv`	Per-image PSNR, SSIM, runtime, and optional OCR metrics
`*-summary.json`	Dataset-level aggregate summary
`*-ocr-eval.csv`	Per-image OCR-focused benchmark output
`*-ocr-summary.json`	OCR confidence, CER, and readability summary
`outputs/benchmarks/training_x*/`	Per-epoch benchmark snapshots emitted during training

Keep summary JSON and evaluation CSV files when they support a reported claim. Generated images, rendered videos, and large temporary outputs belong in outputs/, which is ignored by Git.

Main Reproduction Commands

x2 held-out evaluation

.\.venv\Scripts\python.exe -m models.training.evaluation `
  --pairs_csv datasets/manifests/heldout/heldout_real_x2.csv `
  --checkpoint models/span/education-finetuned/x2-v3-tpgsr-refine-20260411/last_model.pt `
  --output_csv benchmarks/results/x2-v3-tpgsr-refine-20260411-last-heldout-updated-eval.csv `
  --output_summary_json benchmarks/results/x2-v3-tpgsr-refine-20260411-last-heldout-updated-summary.json `
  --include_ocr_metrics

x3 held-out evaluation

.\.venv\Scripts\python.exe -m models.training.evaluation `
  --pairs_csv datasets/manifests/heldout/heldout_real_x3.csv `
  --checkpoint models/span/education-finetuned/x3-real-refine-20260412/best_model.pt `
  --output_csv benchmarks/results/x3-real-refine-20260412-heldout-updated-eval.csv `
  --output_summary_json benchmarks/results/x3-real-refine-20260412-heldout-updated-summary.json `
  --include_ocr_metrics

Project OCR benchmark matrix

.\.venv\Scripts\python.exe scripts/benchmark.py

Video-quality benchmark

.\.venv\Scripts\python.exe benchmarks/scripts/benchmark_video_quality.py `
  --reference_video datasets/benchmark/video/hr.mp4 `
  --input_video datasets/benchmark/video/lr.mp4 `
  --output_dir outputs/benchmarks/video/sample_run

Dependencies

Most benchmarks require:

opencv-python
numpy
scikit-image
torch

OCR-aware benchmarks additionally require:

Pillow
pytesseract
jiwer
lpips
a working tesseract.exe

Open Details To Confirm

Target device model used for runtime claims
Whether runtime is CPU-only, GPU-assisted, or mixed
Whether OCR metrics should be reported with Tesseract, ML Kit, or both
Which checkpoint is the official final model for publication

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Role In The Study

Folder Layout

Current Evaluation Targets

Current Benchmark Snapshot

Result File Conventions

Main Reproduction Commands

x2 held-out evaluation

x3 held-out evaluation

Project OCR benchmark matrix

Video-quality benchmark

Dependencies

Open Details To Confirm

Related Docs

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Benchmarks

Role In The Study

Folder Layout

Current Evaluation Targets

Current Benchmark Snapshot

Result File Conventions

Main Reproduction Commands

x2 held-out evaluation

x3 held-out evaluation

Project OCR benchmark matrix

Video-quality benchmark

Dependencies

Open Details To Confirm

Related Docs