epiverse-trace
diff --git a/‎fig/superspreading-estimate-rendered-unnamed-chunk-10-1.png
-7.74 KB b/‎fig/superspreading-estimate-rendered-unnamed-chunk-10-1.png
-7.74 KB
diff --git a/‎fig/superspreading-estimate-rendered-unnamed-chunk-13-1.png
18.8 KB b/‎fig/superspreading-estimate-rendered-unnamed-chunk-13-1.png
18.8 KB
diff --git a/‎fig/superspreading-estimate-rendered-unnamed-chunk-4-1.png
366 KB b/‎fig/superspreading-estimate-rendered-unnamed-chunk-4-1.png
366 KB
diff --git a/‎fig/superspreading-estimate-rendered-unnamed-chunk-8-1.png
9.21 KB b/‎fig/superspreading-estimate-rendered-unnamed-chunk-8-1.png
9.21 KB
diff --git a/‎fig/superspreading-estimate-rendered-unnamed-chunk-9-1.png
7.42 KB b/‎fig/superspreading-estimate-rendered-unnamed-chunk-9-1.png
7.42 KB
diff --git a/‎md5sum.txt
Lines changed: 1 addition & 1 deletion b/‎md5sum.txt
Lines changed: 1 addition & 1 deletion
diff --git a/‎network.html
Lines changed: 3 additions & 3 deletions b/‎network.html
Lines changed: 3 additions & 3 deletions
diff --git a/‎superspreading-estimate.md
Lines changed: 72 additions & 59 deletions b/‎superspreading-estimate.md
Lines changed: 72 additions & 59 deletions
diff --git a/‎webshot.png
366 KB b/‎webshot.png
366 KB
@@ -9,7 +9,7 @@
 "episodes/delays-functions.Rmd" "0c5c40267d32e39a63d7f75aee9143e4" "site/built/delays-functions.md" "2025-03-28"
 "episodes/create-forecast.Rmd" "e7c0eb37985d139ed3ffd69ec0ad86db" "site/built/create-forecast.md" "2025-03-28"
 "episodes/severity-static.Rmd" "b22e6e3516c9a3b67f864bec763f0343" "site/built/severity-static.md" "2025-03-28"
-"episodes/superspreading-estimate.Rmd" "cd00e83816dc1bf867a8173f326358b5" "site/built/superspreading-estimate.md" "2025-03-28"
+"episodes/superspreading-estimate.Rmd" "d8de21eaaf48aca9e090f477566b2c2a" "site/built/superspreading-estimate.md" "2025-04-03"
 "episodes/superspreading-simulate.Rmd" "8c0d9627c6ea746a6ddff139926c8664" "site/built/superspreading-simulate.md" "2025-03-28"
 "instructors/instructor-notes.md" "ca3834a1b0f9e70c4702aa7a367a6bb5" "site/built/instructor-notes.md" "2025-03-28"
 "learners/reference.md" "18f9dcee553dc88dba8caf6436f8ca41" "site/built/reference.md" "2025-03-28"
 
@@ -92,10 +92,13 @@ Let's practice this using the `mers_korea_2015` linelist and contact data from t
 epi_contacts <-
   epicontacts::make_epicontacts(
     linelist = outbreaks::mers_korea_2015$linelist,
-    contacts = outbreaks::mers_korea_2015$contacts
+    contacts = outbreaks::mers_korea_2015$contacts,
+    directed = TRUE
   )
 ```
 
+With the argument `directed = TRUE` we configure a directed graph. These directions incorporate our hypothesis of the **infector-infectee** pair: from the probable source patient to the secondary case.
+
 
 ``` r
 # visualise contact network
@@ -110,7 +113,7 @@ epicontacts::vis_epicontacts(epi_contacts)
 
 Contact data from a transmission chain can provide information on which infected individuals came into contact with others. We expect to have the infector (`from`) and the infectee (`to`) plus additional columns of variables related to their contact, such as location (`exposure`) and date of contact.
 
-Following [tidy data](https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data) principles, the observation unit in our contact dataset is the **infector-infectee** pair. Although one infector can infect multiple infectees, from contact tracing investigations we may record contacts linked to more than one infector (e.g. within a household). But we should expect to have unique infector-infectee pairs, because typically each infected person will have acquired the infection from one other.
+Following [tidy data](https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data) principles, the observation unit in our contact data frame is the **infector-infectee** pair. Although one infector can infect multiple infectees, from contact tracing investigations we may record contacts linked to more than one infector (e.g. within a household). But we should expect to have unique infector-infectee pairs, because typically each infected person will have acquired the infection from one other.
 
 To ensure these unique pairs, we can check on replicates for infectees:
 
@@ -137,62 +140,83 @@ epi_contacts %>%
 
 :::::::::::::::::::::::::::
 
-When each infector-infectee row is unique, the number of entries per infector corresponds to the number of secondary cases generated by that individual.
+Our goal is to get the number of secondary cases caused by the observed infected individuals. At the contact data frame, when each infector-infectee pair is unique, the number of rows per infector corresponds to the number of secondary cases generated by that individual.
 
 
 ``` r
-# count secondary cases per infector
-infector_secondary <- epi_contacts %>%
+# count secondary cases per infector in contacts
+epi_contacts %>%
   purrr::pluck("contacts") %>%
   dplyr::count(from, name = "secondary_cases")
 ```
 
-But this output only contains number of secondary cases for reported infectors, not for each of the individuals in the whole `epicontacts` object.
+``` output
+     from secondary_cases
+1    SK_1              26
+2   SK_11               1
+3  SK_118               1
+4   SK_12               1
+5  SK_123               1
+6   SK_14              38
+7   SK_15               4
+8   SK_16              21
+9    SK_6               2
+10  SK_76               2
+11  SK_87               1
+```
+
+But this output only contains the number of secondary cases for reported infectors in the contact data, not for each of the individuals in the whole `<epicontacts>` object.
 
-To get this, first, we can use `epicontacts::get_id()` to get the full list of unique identifiers ("id") from the `epicontacts` class object. Second, join it with the count secondary cases per infector stored in the `infector_secondary` object. Third, replace the missing values with `0` to express no report of secondary cases from them.
+Instead, we can use `epicontacts::get_degree()` to get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). 
 
 
 ``` r
-all_secondary <- epi_contacts %>%
-  # extract ids in contact *and* linelist using "which" argument
-  epicontacts::get_id(which = "all") %>%
-  # transform vector to dataframe to use left_join()
-  tibble::enframe(name = NULL, value = "from") %>%
-  # join count secondary cases per infectee
-  dplyr::left_join(infector_secondary) %>%
-  # infectee with missing secondary cases are replaced with zero
-  tidyr::replace_na(
-    replace = list(secondary_cases = 0)
-  )
+# Count secondary cases per subject in contacts and linelist
+all_secondary <- epicontacts::get_degree(
+  x = epi_contacts,
+  type = "out",
+  only_linelist = TRUE
+)
 ```
 
-From a histogram of the `all_secondary` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
+::::::::::::::::::::: caution
+
+At `epicontacts::get_degree()` we use the `only_linelist = TRUE` argument.
+This is to count the number of secondary cases caused by the observed infected individuals,
+which includes subjects in contacts and linelist data frames.
 
+This assumption may not work for your all situations.
+If you need to consider only the subjects from the contact data,
+at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
 
+:::::::::::::::::::::
+
+From a histogram of the `all_secondary` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
 
 <!-- Visualizing the number of secondary cases on a histogram will help us to relate this with the statistical distribution to fit: -->
 
 
 ``` r
 ## plot the distribution
 all_secondary %>%
-  ggplot(aes(secondary_cases)) +
+  tibble::enframe() %>%
+  ggplot(aes(value)) +
   geom_histogram(binwidth = 1) +
   labs(
     x = "Number of secondary cases",
     y = "Frequency"
   )
 ```
 
-<img src="fig/superspreading-estimate-rendered-unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
+<img src="fig/superspreading-estimate-rendered-unnamed-chunk-8-1.png" style="display: block; margin: auto;" />
 
 The number of secondary cases can be used to _empirically_ estimate the **offspring distribution**, which is the number of secondary _infections_ caused by each case. One candidate statistical distribution used to model the offspring distribution is the **negative binomial** distribution with two parameters:
 
 - **Mean**, which represents the $R_{0}$, the average number of (secondary) cases produced by a single individual in an entirely susceptible population, and
 
 - **Dispersion**, expressed as $k$, which represents the individual-level variation in transmission by single individuals.
 
-<img src="fig/superspreading-estimate-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
+<img src="fig/superspreading-estimate-rendered-unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
 
 From the histogram and density plot, we can identify that the offspring distribution is highly skewed or **overdispersed**. In this framework, the superspreading events (SSEs) are not arbitrary or exceptional, but simply realizations from the right-hand tail of the offspring distribution, which we can quantify and analyse ([Lloyd-Smith et al., 2005](https://www.nature.com/articles/nature04153)).
 
@@ -227,11 +251,11 @@ In epidemiology, [negative binomial](https://en.wikipedia.org/wiki/Negative_bino
 
 Calculate the distribution of secondary cases for Ebola using the `ebola_sim_clean` object from `{outbreaks}` package.
 
-Is the offspring distribution of Ebola skewed or overdispersed?
+- Is the offspring distribution of Ebola skewed or overdispersed?
 
 :::::::::::::::::: hint
 
-**Note:** This dataset has 5829 cases. Running `epicontacts::vis_epicontacts()` may overload your session!
+**Note:** This dataset has 5829 cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step.
 
 ::::::::::::::::::
 
@@ -243,38 +267,29 @@ Is the offspring distribution of Ebola skewed or overdispersed?
 ebola_contacts <-
   epicontacts::make_epicontacts(
     linelist = ebola_sim_clean$linelist,
-    contacts = ebola_sim_clean$contacts
+    contacts = ebola_sim_clean$contacts,
+    directed = TRUE
   )
 
-# count secondary cases
-
-ebola_infector_secondary <- ebola_contacts %>%
-  purrr::pluck("contacts") %>%
-  dplyr::count(from, name = "secondary_cases")
-
-ebola_secondary <- ebola_contacts %>%
-  # extract ids in contact *and* linelist using "which" argument
-  epicontacts::get_id(which = "all") %>%
-  # transform vector to dataframe to use left_join()
-  tibble::enframe(name = NULL, value = "from") %>%
-  # join count secondary cases per infectee
-  dplyr::left_join(ebola_infector_secondary) %>%
-  # infectee with missing secondary cases are replaced with zero
-  tidyr::replace_na(
-    replace = list(secondary_cases = 0)
-  )
+# count secondary cases per subject in contacts and linelist
+ebola_secondary <- epicontacts::get_degree(
+  x = ebola_contacts,
+  type = "out",
+  only_linelist = TRUE
+)
 
 ## plot the distribution
 ebola_secondary %>%
-  ggplot(aes(secondary_cases)) +
+  tibble::enframe() %>%
+  ggplot(aes(value)) +
   geom_histogram(binwidth = 1) +
   labs(
     x = "Number of secondary cases",
     y = "Frequency"
   )
 ```
 
-<img src="fig/superspreading-estimate-rendered-unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
+<img src="fig/superspreading-estimate-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
 
 From a visual inspection, the distribution of secondary cases for the Ebola data set in `ebola_sim_clean` shows an skewed distribution with secondary cases equal or lower than 6. We need to complement this observation with a statistical analysis to evaluate for overdispersion.
 
@@ -296,7 +311,6 @@ library(fitdistrplus)
 ``` r
 ## fit distribution
 offspring_fit <- all_secondary %>%
-  dplyr::pull(secondary_cases) %>%
   fitdistrplus::fitdist(distr = "nbinom")
 
 offspring_fit
@@ -328,7 +342,7 @@ From the number secondary cases distribution we estimated a dispersion parameter
 
 We can overlap the estimated density values of the fitted negative binomial distribution and the histogram of the number of secondary cases:
 
-<img src="fig/superspreading-estimate-rendered-unnamed-chunk-14-1.png" style="display: block; margin: auto;" />
+<img src="fig/superspreading-estimate-rendered-unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
 
 :::::::::::::::::::: callout
 
@@ -346,11 +360,11 @@ When $k$ approaches infinity ($k \rightarrow \infty$) the variance equals the me
 
 ::::::::::::::::::::::: challenge
 
-Use the distribution of secondary cases from the `ebola_sim_clean` object from `{outbreaks}` package.
+From the previous challenge, use the distribution of secondary cases from the `ebola_sim_clean` object from `{outbreaks}` package.
 
-Fit a negative binomial distribution to estimate the mean and dispersion parameter of the offspring distribution.
+Fit a negative binomial distribution to estimate the mean and dispersion parameter of the offspring distribution. Try to estimate the uncertainty of the dispersion parameter from the Standard Error to 95% Confidence Intervals.
 
-Does the estimated dispersion parameter of Ebola provide evidence of an individual-level variation in transmission?
+- Does the estimated dispersion parameter of Ebola provide evidence of an individual-level variation in transmission?
 
 :::::::::::::: hint
 
@@ -363,7 +377,6 @@ Review how we fitted a negative binomial distribution using the `fitdistrplus::f
 
 ``` r
 ebola_offspring <- ebola_secondary %>%
-  dplyr::pull(secondary_cases) %>%
   fitdistrplus::fitdist(distr = "nbinom")
 
 ebola_offspring
@@ -372,21 +385,21 @@ ebola_offspring
 ``` output
 Fitting of the distribution ' nbinom ' by maximum likelihood 
 Parameters:
-     estimate  Std. Error
-size 2.353899 0.250124609
-mu   0.539300 0.009699219
+      estimate  Std. Error
+size 0.8539443 0.072505326
+mu   0.3675993 0.009497097
 ```
 
 
 ``` r
 ## extract the "size" parameter
 ebola_mid <- ebola_offspring$estimate[["size"]]
 
-## calculate the 95% confidence intervals using the standard error estimate and
+## calculate the 95% confidence intervals using the
+## standard error estimate and
 ## the 0.025 and 0.975 quantiles of the normal distribution.
 
 ebola_lower <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.025)
-
 ebola_upper <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.975)
 
 # ebola_mid
@@ -395,7 +408,7 @@ ebola_upper <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.975)
 ```
 
 From the number secondary cases distribution we estimated a dispersion parameter $k$ of
-2.354, with a 95% Confidence Interval from 1.864 to 2.844.
+0.85, with a 95% Confidence Interval from 0.71 to 1.
 
 For dispersion parameter estimates higher than one we get low distribution variance, hence, low individual-level variation in transmission.
 
@@ -512,8 +525,8 @@ superspreading::proportion_cluster_size(
 ```
 
 ``` output
-       R        k prop_5 prop_10 prop_25
-1 0.5393 2.353899  1.84%      0%      0%
+          R         k prop_5 prop_10 prop_25
+1 0.3675993 0.8539443  2.64%      0%      0%
 ```
 
 The probability of having clusters of five people is 1.8%. At this stage, given this offspring distribution parameters, a backward strategy may not increase the probability of contain and quarantine more onward cases.
@@ -559,7 +572,7 @@ stats::qpois(
 ```
 
 ``` output
-[1] 3
+[1] 2
 ```
 
 Compare this values with the ones reported by [Lloyd-Smith et al., 2005](https://www.nature.com/articles/nature04153). See figure below: