This folder is the evidence layer for EduScale. It contains the scripts and saved outputs used to compare checkpoints on reconstruction quality, readability, and runtime.
EduScale treats super-resolution as a readability task. A checkpoint is not selected only because it has a high PSNR or SSIM score; it is also checked against OCR confidence, character error rate, and practical runtime.
| Metric | Direction | Use |
|---|---|---|
| PSNR | higher is better | Pixel-level reconstruction against HR targets |
| SSIM | higher is better | Structural quality for text, diagrams, and slide edges |
| Runtime | lower is better | Practicality for offline/mobile use |
| OCR confidence | higher is better | Approximate readability after enhancement |
| CER | lower is better | Text-recognition error rate after enhancement |
| Path | Purpose |
|---|---|
benchmarks/scripts/ |
Entry points for image quality, processing time, and video quality benchmarks |
benchmarks/analysis/ |
Supporting metric and plotting helpers |
benchmarks/results/ |
Saved CSV and JSON benchmark outputs used by reports and thesis text |
| Scale | Held-out manifest | Samples |
|---|---|---|
| x2 | datasets/manifests/heldout/heldout_real_x2.csv |
291 |
| x3 | datasets/manifests/heldout/heldout_real_x3.csv |
291 |
The held-out decks and rendered assets are under datasets/heldout-updated/. The CSV manifests are the stable source for model-to-model comparison.
These are saved results from benchmarks/results, not a fresh rerun.
| Model run | Scale | Samples | PSNR | SSIM | Runtime / image | OCR confidence | CER |
|---|---|---|---|---|---|---|---|
x2-all-20260410 |
x2 | 291 | 29.9002 | 0.9813 | 719.159 ms | 90.9905 | 0.1016 |
x2-all-20260410-ocr-refine |
x2 | 291 | 29.3532 | 0.9758 | 762.6484 ms | 90.9805 | 0.0814 |
x2-v3-tpgsr-refine-20260411 |
x2 | 291 | 29.4579 | 0.9786 | 572.7827 ms | 91.2295 | 0.0765 |
x3-real-refine-20260412 |
x3 | 291 | 26.6166 | 0.9642 | 292.9324 ms | 88.8627 | 0.1918 |
x3-real-refine-20260412 last |
x3 | 291 | 26.6002 | 0.9642 | 288.0681 ms | 89.1655 | 0.2058 |
| Pattern | Meaning |
|---|---|
*-eval.csv |
Per-image PSNR, SSIM, runtime, and optional OCR metrics |
*-summary.json |
Dataset-level aggregate summary |
*-ocr-eval.csv |
Per-image OCR-focused benchmark output |
*-ocr-summary.json |
OCR confidence, CER, and readability summary |
outputs/benchmarks/training_x*/ |
Per-epoch benchmark snapshots emitted during training |
Keep summary JSON and evaluation CSV files when they support a reported claim. Generated images, rendered videos, and large temporary outputs belong in outputs/, which is ignored by Git.
.\.venv\Scripts\python.exe -m models.training.evaluation `
--pairs_csv datasets/manifests/heldout/heldout_real_x2.csv `
--checkpoint models/span/education-finetuned/x2-v3-tpgsr-refine-20260411/last_model.pt `
--output_csv benchmarks/results/x2-v3-tpgsr-refine-20260411-last-heldout-updated-eval.csv `
--output_summary_json benchmarks/results/x2-v3-tpgsr-refine-20260411-last-heldout-updated-summary.json `
--include_ocr_metrics.\.venv\Scripts\python.exe -m models.training.evaluation `
--pairs_csv datasets/manifests/heldout/heldout_real_x3.csv `
--checkpoint models/span/education-finetuned/x3-real-refine-20260412/best_model.pt `
--output_csv benchmarks/results/x3-real-refine-20260412-heldout-updated-eval.csv `
--output_summary_json benchmarks/results/x3-real-refine-20260412-heldout-updated-summary.json `
--include_ocr_metrics.\.venv\Scripts\python.exe scripts/benchmark.py.\.venv\Scripts\python.exe benchmarks/scripts/benchmark_video_quality.py `
--reference_video datasets/benchmark/video/hr.mp4 `
--input_video datasets/benchmark/video/lr.mp4 `
--output_dir outputs/benchmarks/video/sample_runMost benchmarks require:
opencv-pythonnumpyscikit-imagetorch
OCR-aware benchmarks additionally require:
Pillowpytesseractjiwerlpips- a working
tesseract.exe
- Target device model used for runtime claims
- Whether runtime is CPU-only, GPU-assisted, or mixed
- Whether OCR metrics should be reported with Tesseract, ML Kit, or both
- Which checkpoint is the official final model for publication