@@ -691,7 +691,11 @@ dataset, which is a snapshot [**as of**]{.primary} May 31, 2022 that contains da
691691
692692``` {r head-edf}
693693#| echo: false
694- edf <- covid_case_death_rates
694+ edf <- covid_case_death_rates |>
695+ # Filter out locations with no deaths recorded:
696+ group_by(geo_value) |>
697+ filter(!all(death_rate == 0)) |>
698+ ungroup()
695699head(edf |> as_tibble())
696700```
697701
@@ -745,29 +749,33 @@ attr(edf, "metadata")
745749
746750## Features - Correlations at different lags
747751
752+ Correlation coefficients:
753+
754+ - "Strength" and "direction" of a "relationship" between two variables
755+ - Normalized measures of
756+ - how well (aspects of) one variable might be estimated from another
757+ - using particular models and metrics
758+ - based on training errors^[ More rigorous approaches are covered tomorrow.] .
759+
760+ ## Features - Correlations at different lags
761+
748762``` {r corr-lags-ex}
749763#| echo: true
750- ## cor0 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value)
751- ## cor14 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, dt1 = -14)
752- cor0 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, method = "kendall")
753- cor14 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, dt1 = -14, method = "kendall")
764+ epi_cor(edf, case_rate, death_rate, dt1 = -14, cor_by = geo_value, method = "pearson")
754765```
755766
756- ``` {r plot-corr-lags-ex}
757- #| fig-align: center
758- #| warning: false
759- rbind(
760- cor0 |> mutate(lag = 0),
761- cor14 |> mutate(lag = 14)
762- ) |>
763- mutate(lag = as.factor(lag)) |>
764- ggplot(aes(x = time_value, y = cor)) +
765- geom_hline(yintercept = 0) +
766- geom_line(aes(color = lag)) +
767- scale_color_brewer(palette = "Set1") +
768- scale_x_date(minor_breaks = "month", date_labels = "%b %Y") +
769- labs(x = "Date", y = "Correlation", col = "Lag")
770- ```
767+ - For each location (` cor_by = geo_value ` ),
768+ - how well might death rates be estimated by case rates from 14 days ago (` case_rate, death_rate, dt = -14 ` ),
769+ - with a linear model and related error measure, and what was the sign of the cofficient (` method = "pearson" ` ),
770+ - on this training+evaluation set (` edf ` )?
771+
772+ ## Features - Correlations at different lags
773+
774+ TODO lag analysis: Pearson by geo, then mean
775+
776+ ## Features - Correlations at different lags
777+
778+ TODO lag analysis: Kendall by time, then mean
771779
772780## Features - Compute growth rates
773781
0 commit comments