- Confirmed SD VAE round-trip works
- eps=5.0 / 20 carriers = 100% bit accuracy, PSNR 30.7 dB
- eps=2.0 / 20 carriers = 75% (motivates stability selection)
- 100% of positions positive stability at eps=5.0 (averaged)
- Variance range 0.98–3.48 in stability score
- Figures: carrier_stability_avg.png, carrier_stability_heatmap.png
- Stability-selected carriers: 100% acc at K=20, eps=2.0 on patches
- Up to 500 carriers tested, 99.4% raw acc with rep3=100%
- PSNR degrades predictably: 32 dB at K=20 → 24 dB at K=500
- Figures: capacity_curve.png, capacity_ablation.png, rate_distortion_accuracy.png
- Note: ablation version ran but stdout was lost to buffering
- Color patches: 100% acc survives JPEG Q=50, noise sigma=0.01 at eps=5.0
- Gradient images: near-chance even without distortion (~45-55%)
- Key finding: image content strongly determines channel reliability
- Figure: robustness_bars.png
- Channel importance: all 4 channels similar (97.5%–99.7%)
- Reconstruction error vs stability: Pearson r = 0.048 (no correlation!)
- Border vs interior: no systematic difference
- Figure: mechanistic_analysis.png
- eps=1.0: AUC=0.44 (STEALTHY, below chance)
- eps=2.0: AUC=0.68 (MARGINAL)
- eps=5.0: AUC=0.93 (DETECTED)
- eps=10.0: AUC=0.97 (DETECTED)
- Clear stealth-capacity tradeoff
- Figure: detectability_curve.png
- main.tex fully populated with all results, tables, figures
- PDF compiled successfully
- All 8 figures referenced
- Natural photos (CIFAR-10, 8 images): eps=2.0 avg 98.8% acc / 38.1dB, eps=5.0 avg 98.8% acc / 29.4dB
- Longer messages: up to 152 bits ("THE QUICK BROWN FOX") at 100% acc with rep=1
- LSB baseline: PatchSteg eps=2 AUC=0.349 (stealthier than LSB AUC=0.651)
- Theoretical analysis: latent stats, perturbation/noise ratios, decoder sensitivity (~10.8 max pixel change at eps=1)
- Figures: natural_photos.png, message_length.png, lsb_comparison.png, theoretical_analysis.png
- main.tex updated with all extended results (4 new subsections, 4 new figures, 3 new tables)
- PDF compiled successfully (12 figures, 7 tables total)
- CDF K=5: 92.5% acc, 40.3 dB | K=10: 88.8%, 38.5 dB | K=20: 88.1%, 36.9 dB | K=50: 87.8%, 34.2 dB
- Average KS p-value: 0.179 (distribution preserved; original PatchSteg p=0.000)
- Detection AUC: 0.148 (effectively undetectable); original eps=5 AUC=0.722
- Figures: cdf_capacity_curve.png, cdf_detectability.png, cdf_distribution.png
- Top 3 PCA components explain 96.1% of variance
- PC0 (64.6%): 100% acc, 24.6 dB PSNR, AUC=0.574 (stealthiest)
- PC1 (17.0%): 96.2% acc, 23.7 dB, AUC=0.833
- PC2 (14.5%): 98.8% acc, 28.2 dB
- Random baseline: 99.4% acc, 29.4 dB, AUC=0.722
- Figures: pca_components.png, pca_accuracy_comparison.png, pca_detectability.png
- Within-method: PatchSteg eps=2 AUC=0.625, eps=5 AUC=1.000, CDF AUC=0.875
- Cross-method (PS eps=5 → CDF): AUC=0.688 (weak transfer)
- Top features: spectral means, residual skewness
- Figures: detector_roc_curves.png, detector_cross_method.png, detector_feature_importance.png
- SD-VAE-MSE: eps=2 98.0%, eps=5 99.5%
- SD-VAE-EMA: eps=2 98.5%, eps=5 98.5%
- SDXL-VAE: eps=2 98.5%, eps=5 90.0%
- Cross-model MSE→EMA: 100%, EMA→MSE: 99%
- Figure: multimodel.png
- 200 CIFAR-10 images (20/class)
- eps=2.0: 98.2%±3.3% acc (95% CI [97.8, 98.7]), PSNR 38.6±1.5 dB
- eps=5.0: 98.5%±3.0% acc (95% CI [98.0, 98.9]), PSNR 28.9±1.6 dB
- Figure: serious_dataset.png
- 7 detectors × 3 epsilon values
- Key finding: pixel-residual and spectral detectors achieve AUC=1.0 even at eps=1.0
- Only latent-statistics LR struggles at eps=1 (AUC=0.172)
- Figure: detection_strength.png
- Entropy (r=0.555) and freq_energy (r=0.534) most correlated with capacity
- Carrier positions have higher Jacobian norms (2050 vs 1684, p=0.058)
- Figure: content_science.png, capacity_by_type.png
- eps=5 survives: JPEG Q=10 (96.5%), resize 25% (92%), noise σ=0.10 (98%)
- VAE re-encode: 99%, screenshot sim: 98.5%
- Only center crop degrades significantly (60%: 60% acc)
- Figure: deployment_robustness.png
- main.tex fully updated with all Phase 1-3 and V2 results
- PDF compiled successfully (7.19 MiB)
- All figures generated from real experimental data
- Gradio demo (demo/app.py written but not tested end-to-end)
- Phase 4: baseline comparison (RoSteALS, TrustMark, Tree-Ring)