Following up on #209, I'd like a more consolidated approach to warning about sketchy data.
- How small a sample is too small?
- How variable a sample is too variable?
- How do those two interact?
The real question here is related to https://github.com/Toronto-Big-Data-Innovation-Team/data_validation/issues/45, and how we can determine which estimates may be withheld from OpenData.
Gabe has provided some aggregated data with bootstrapped CIs to help with the analysis.
SELECT *
FROM gwolofs.congestion_segments_monthly_bootstrap
WHERE mnth IN ('2025-06-01', '2023-09-01') AND n_resamples = 300