Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Benchmarks

This folder is the evidence layer for EduScale. It contains the scripts and saved outputs used to compare checkpoints on reconstruction quality, readability, and runtime.

Role In The Study

EduScale treats super-resolution as a readability task. A checkpoint is not selected only because it has a high PSNR or SSIM score; it is also checked against OCR confidence, character error rate, and practical runtime.

Metric Direction Use
PSNR higher is better Pixel-level reconstruction against HR targets
SSIM higher is better Structural quality for text, diagrams, and slide edges
Runtime lower is better Practicality for offline/mobile use
OCR confidence higher is better Approximate readability after enhancement
CER lower is better Text-recognition error rate after enhancement

Folder Layout

Path Purpose
benchmarks/scripts/ Entry points for image quality, processing time, and video quality benchmarks
benchmarks/analysis/ Supporting metric and plotting helpers
benchmarks/results/ Saved CSV and JSON benchmark outputs used by reports and thesis text

Current Evaluation Targets

Scale Held-out manifest Samples
x2 datasets/manifests/heldout/heldout_real_x2.csv 291
x3 datasets/manifests/heldout/heldout_real_x3.csv 291

The held-out decks and rendered assets are under datasets/heldout-updated/. The CSV manifests are the stable source for model-to-model comparison.

Current Benchmark Snapshot

These are saved results from benchmarks/results, not a fresh rerun.

Model run Scale Samples PSNR SSIM Runtime / image OCR confidence CER
x2-all-20260410 x2 291 29.9002 0.9813 719.159 ms 90.9905 0.1016
x2-all-20260410-ocr-refine x2 291 29.3532 0.9758 762.6484 ms 90.9805 0.0814
x2-v3-tpgsr-refine-20260411 x2 291 29.4579 0.9786 572.7827 ms 91.2295 0.0765
x3-real-refine-20260412 x3 291 26.6166 0.9642 292.9324 ms 88.8627 0.1918
x3-real-refine-20260412 last x3 291 26.6002 0.9642 288.0681 ms 89.1655 0.2058

Result File Conventions

Pattern Meaning
*-eval.csv Per-image PSNR, SSIM, runtime, and optional OCR metrics
*-summary.json Dataset-level aggregate summary
*-ocr-eval.csv Per-image OCR-focused benchmark output
*-ocr-summary.json OCR confidence, CER, and readability summary
outputs/benchmarks/training_x*/ Per-epoch benchmark snapshots emitted during training

Keep summary JSON and evaluation CSV files when they support a reported claim. Generated images, rendered videos, and large temporary outputs belong in outputs/, which is ignored by Git.

Main Reproduction Commands

x2 held-out evaluation

.\.venv\Scripts\python.exe -m models.training.evaluation `
  --pairs_csv datasets/manifests/heldout/heldout_real_x2.csv `
  --checkpoint models/span/education-finetuned/x2-v3-tpgsr-refine-20260411/last_model.pt `
  --output_csv benchmarks/results/x2-v3-tpgsr-refine-20260411-last-heldout-updated-eval.csv `
  --output_summary_json benchmarks/results/x2-v3-tpgsr-refine-20260411-last-heldout-updated-summary.json `
  --include_ocr_metrics

x3 held-out evaluation

.\.venv\Scripts\python.exe -m models.training.evaluation `
  --pairs_csv datasets/manifests/heldout/heldout_real_x3.csv `
  --checkpoint models/span/education-finetuned/x3-real-refine-20260412/best_model.pt `
  --output_csv benchmarks/results/x3-real-refine-20260412-heldout-updated-eval.csv `
  --output_summary_json benchmarks/results/x3-real-refine-20260412-heldout-updated-summary.json `
  --include_ocr_metrics

Project OCR benchmark matrix

.\.venv\Scripts\python.exe scripts/benchmark.py

Video-quality benchmark

.\.venv\Scripts\python.exe benchmarks/scripts/benchmark_video_quality.py `
  --reference_video datasets/benchmark/video/hr.mp4 `
  --input_video datasets/benchmark/video/lr.mp4 `
  --output_dir outputs/benchmarks/video/sample_run

Dependencies

Most benchmarks require:

  • opencv-python
  • numpy
  • scikit-image
  • torch

OCR-aware benchmarks additionally require:

  • Pillow
  • pytesseract
  • jiwer
  • lpips
  • a working tesseract.exe

Open Details To Confirm

  • Target device model used for runtime claims
  • Whether runtime is CPU-only, GPU-assisted, or mixed
  • Whether OCR metrics should be reported with Tesseract, ML Kit, or both
  • Which checkpoint is the official final model for publication

Related Docs