From fe9032699960494d13ea526df6e3e096fd7c889a Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 3 Apr 2025 19:48:53 +0100
Subject: [PATCH 1/9] use get_degree to replace wrangling steps

- simply the epicontacts to fitdistrplus connection
- use only_linelist = TRUE  for cases without infectees
- edit some text to facilitate readability
---
 episodes/superspreading-estimate.Rmd | 106 +++++++++++++--------------
 1 file changed, 50 insertions(+), 56 deletions(-)
diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index 2a5d2f74..c7492177 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -99,10 +99,13 @@ Let's practice this using the `mers_korea_2015` linelist and contact data from t
 epi_contacts <-
   epicontacts::make_epicontacts(
     linelist = outbreaks::mers_korea_2015$linelist,
-    contacts = outbreaks::mers_korea_2015$contacts
+    contacts = outbreaks::mers_korea_2015$contacts,
+    directed = TRUE
   )
 ```
 
+With the argument `directed = TRUE` we configure a directed graph. These directions incorporate our hypothesis of the **infector-infectee** pair: from the probable source patient to the secondary case.
+
 ```{r,eval=FALSE}
 # visualise contact network
 epicontacts::vis_epicontacts(epi_contacts)
@@ -130,7 +133,7 @@ withr::with_envvar(c(OPENSSL_CONF = file.path("/dev", "null")), {
 
 Contact data from a transmission chain can provide information on which infected individuals came into contact with others. We expect to have the infector (`from`) and the infectee (`to`) plus additional columns of variables related to their contact, such as location (`exposure`) and date of contact.
 
-Following [tidy data](https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data) principles, the observation unit in our contact dataset is the **infector-infectee** pair. Although one infector can infect multiple infectees, from contact tracing investigations we may record contacts linked to more than one infector (e.g. within a household). But we should expect to have unique infector-infectee pairs, because typically each infected person will have acquired the infection from one other.
+Following [tidy data](https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data) principles, the observation unit in our contact data frame is the **infector-infectee** pair. Although one infector can infect multiple infectees, from contact tracing investigations we may record contacts linked to more than one infector (e.g. within a household). But we should expect to have unique infector-infectee pairs, because typically each infected person will have acquired the infection from one other.
 
 To ensure these unique pairs, we can check on replicates for infectees:
 
@@ -144,47 +147,49 @@ epi_contacts %>%
 
 :::::::::::::::::::::::::::
 
-When each infector-infectee row is unique, the number of entries per infector corresponds to the number of secondary cases generated by that individual.
+Our goal is to get the number of secondary cases caused by the observed infected individuals. At the contact data frame, when each infector-infectee pair is unique, the number of rows per infector corresponds to the number of secondary cases generated by that individual.
 
 ```{r}
-# count secondary cases per infector
-infector_secondary <- epi_contacts %>%
+# count secondary cases per infector in contacts
+epi_contacts %>%
   purrr::pluck("contacts") %>%
   dplyr::count(from, name = "secondary_cases")
 ```
 
-But this output only contains number of secondary cases for reported infectors, not for each of the individuals in the whole `epicontacts` object.
+But this output only contains the number of secondary cases for reported infectors in the contact data, not for each of the individuals in the whole `<epicontacts>` object.
 
-To get this, first, we can use `epicontacts::get_id()` to get the full list of unique identifiers ("id") from the `epicontacts` class object. Second, join it with the count secondary cases per infector stored in the `infector_secondary` object. Third, replace the missing values with `0` to express no report of secondary cases from them.
+Instead, we can use `epicontacts::get_degree()` to get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). 
 
 ```{r,message=FALSE,warning=FALSE}
-all_secondary <- epi_contacts %>%
-  # extract ids in contact *and* linelist using "which" argument
-  epicontacts::get_id(which = "all") %>%
-  # transform vector to dataframe to use left_join()
-  tibble::enframe(name = NULL, value = "from") %>%
-  # join count secondary cases per infectee
-  dplyr::left_join(infector_secondary) %>%
-  # infectee with missing secondary cases are replaced with zero
-  tidyr::replace_na(
-    replace = list(secondary_cases = 0)
-  )
+# Count secondary cases per subject in contacts and linelist
+all_secondary <- epicontacts::get_degree(
+  x = epi_contacts,
+  type = "out",
+  only_linelist = TRUE
+)
 ```
 
-From a histogram of the `all_secondary` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
+::::::::::::::::::::: caution
 
-```{r,echo=FALSE,eval=FALSE}
-# arrange in descendant order of secondary cases
-all_secondary %>%
-  dplyr::arrange(dplyr::desc(secondary_cases))
-```
+At `epicontacts::get_degree()` we use the `only_linelist = TRUE` argument.
+This is to count the number of secondary cases caused by the observed infected individuals,
+which includes subjects in contacts and linelist data frames.
+
+This assumption may not work for your all situations.
+If you need to consider only the subjects from the contact data,
+at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
+
+:::::::::::::::::::::
+
+From a histogram of the `all_secondary` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
 
 <!-- Visualizing the number of secondary cases on a histogram will help us to relate this with the statistical distribution to fit: -->
 
 ```{r}
 ## plot the distribution
 all_secondary %>%
-  ggplot(aes(secondary_cases)) +
+  tibble::enframe() %>% 
+  ggplot(aes(value)) +
   geom_histogram(binwidth = 1) +
   labs(
     x = "Number of secondary cases",
@@ -279,11 +284,11 @@ In epidemiology, [negative binomial](https://en.wikipedia.org/wiki/Negative_bino
 
 Calculate the distribution of secondary cases for Ebola using the `ebola_sim_clean` object from `{outbreaks}` package.
 
-Is the offspring distribution of Ebola skewed or overdispersed?
+- Is the offspring distribution of Ebola skewed or overdispersed?
 
 :::::::::::::::::: hint
 
-**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session!
+**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step.
 
 ::::::::::::::::::
 
@@ -294,30 +299,21 @@ Is the offspring distribution of Ebola skewed or overdispersed?
 ebola_contacts <-
   epicontacts::make_epicontacts(
     linelist = ebola_sim_clean$linelist,
-    contacts = ebola_sim_clean$contacts
+    contacts = ebola_sim_clean$contacts,
+    directed = TRUE
   )
 
-# count secondary cases
-
-ebola_infector_secondary <- ebola_contacts %>%
-  purrr::pluck("contacts") %>%
-  dplyr::count(from, name = "secondary_cases")
-
-ebola_secondary <- ebola_contacts %>%
-  # extract ids in contact *and* linelist using "which" argument
-  epicontacts::get_id(which = "all") %>%
-  # transform vector to dataframe to use left_join()
-  tibble::enframe(name = NULL, value = "from") %>%
-  # join count secondary cases per infectee
-  dplyr::left_join(ebola_infector_secondary) %>%
-  # infectee with missing secondary cases are replaced with zero
-  tidyr::replace_na(
-    replace = list(secondary_cases = 0)
-  )
+# count secondary cases per subject in contacts and linelist
+ebola_secondary <- epicontacts::get_degree(
+  x = ebola_contacts,
+  type = "out",
+  only_linelist = TRUE 
+)
 
 ## plot the distribution
 ebola_secondary %>%
-  ggplot(aes(secondary_cases)) +
+  tibble::enframe() %>% 
+  ggplot(aes(value)) +
   geom_histogram(binwidth = 1) +
   labs(
     x = "Number of secondary cases",
@@ -344,7 +340,6 @@ library(fitdistrplus)
 ```{r}
 ## fit distribution
 offspring_fit <- all_secondary %>%
-  dplyr::pull(secondary_cases) %>%
   fitdistrplus::fitdist(distr = "nbinom")
 
 offspring_fit
@@ -392,10 +387,10 @@ fit_density <-
 # plot offspring distribution with density fit
 ggplot() +
   geom_histogram(
-    data = all_secondary,
+    data = all_secondary %>% tibble::enframe(),
     mapping =
       aes(
-        x = secondary_cases,
+        x = value,
         y = after_stat(density)
       ), fill = "white", color = "black",
     binwidth = 1
@@ -441,11 +436,11 @@ When $k$ approaches infinity ($k \rightarrow \infty$) the variance equals the me
 
 ::::::::::::::::::::::: challenge
 
-Use the distribution of secondary cases from the `ebola_sim_clean` object from `{outbreaks}` package.
+From the previous challenge, use the distribution of secondary cases from the `ebola_sim_clean` object from `{outbreaks}` package.
 
-Fit a negative binomial distribution to estimate the mean and dispersion parameter of the offspring distribution.
+Fit a negative binomial distribution to estimate the mean and dispersion parameter of the offspring distribution. Try to estimate the uncertainty of the dispersion parameter from the Standard Error to 95% Confidence Intervals.
 
-Does the estimated dispersion parameter of Ebola provide evidence of an individual-level variation in transmission?
+- Does the estimated dispersion parameter of Ebola provide evidence of an individual-level variation in transmission?
 
 :::::::::::::: hint
 
@@ -457,7 +452,6 @@ Review how we fitted a negative binomial distribution using the `fitdistrplus::f
 
 ```{r}
 ebola_offspring <- ebola_secondary %>%
-  dplyr::pull(secondary_cases) %>%
   fitdistrplus::fitdist(distr = "nbinom")
 
 ebola_offspring
@@ -467,11 +461,11 @@ ebola_offspring
 ## extract the "size" parameter
 ebola_mid <- ebola_offspring$estimate[["size"]]
 
-## calculate the 95% confidence intervals using the standard error estimate and
+## calculate the 95% confidence intervals using the
+## standard error estimate and
 ## the 0.025 and 0.975 quantiles of the normal distribution.
 
 ebola_lower <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.025)
-
 ebola_upper <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.975)
 
 # ebola_mid
@@ -480,7 +474,7 @@ ebola_upper <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.975)
 ```
 
 From the number secondary cases distribution we estimated a dispersion parameter $k$ of
-`r round(ebola_mid, 3)`, with a 95% Confidence Interval from `r round(ebola_lower, 3)` to `r round(ebola_upper, 3)`.
+`r round(ebola_mid, 2)`, with a 95% Confidence Interval from `r round(ebola_lower, 2)` to `r round(ebola_upper, 2)`.
 
 For dispersion parameter estimates higher than one we get low distribution variance, hence, low individual-level variation in transmission.
 

From 09889f1e0eb7a9cfe5ca3c48689a2e835cc498e3 Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 3 Apr 2025 19:56:54 +0100
Subject: [PATCH 2/9] fix lintr white spaces

---
 episodes/superspreading-estimate.Rmd | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index c7492177..f0f6ca5c 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -188,7 +188,7 @@ From a histogram of the `all_secondary` object, we can identify the **individual
 ```{r}
 ## plot the distribution
 all_secondary %>%
-  tibble::enframe() %>% 
+  tibble::enframe() %>%
   ggplot(aes(value)) +
   geom_histogram(binwidth = 1) +
   labs(
@@ -307,12 +307,12 @@ ebola_contacts <-
 ebola_secondary <- epicontacts::get_degree(
   x = ebola_contacts,
   type = "out",
-  only_linelist = TRUE 
+  only_linelist = TRUE
 )
 
 ## plot the distribution
 ebola_secondary %>%
-  tibble::enframe() %>% 
+  tibble::enframe() %>%
   ggplot(aes(value)) +
   geom_histogram(binwidth = 1) +
   labs(

From d0e39b07a3c8aa99852736d8ad22f9e40da8eef8 Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 3 Apr 2025 20:06:09 +0100
Subject: [PATCH 3/9] add extra lines to clarify steps

---
 episodes/superspreading-estimate.Rmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index f0f6ca5c..f89e4cd9 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -156,9 +156,9 @@ epi_contacts %>%
   dplyr::count(from, name = "secondary_cases")
 ```
 
-But this output only contains the number of secondary cases for reported infectors in the contact data, not for each of the individuals in the whole `<epicontacts>` object.
+But this output only contains the number of secondary cases for reported infectors in the contact data, not for **all** the individuals in the whole `<epicontacts>` object.
 
-Instead, we can use `epicontacts::get_degree()` to get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). 
+Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). Also, the argument `only_linelist = TRUE` include individuals in contacts and linelist data frames.
 
 ```{r,message=FALSE,warning=FALSE}
 # Count secondary cases per subject in contacts and linelist
@@ -173,10 +173,10 @@ all_secondary <- epicontacts::get_degree(
 
 At `epicontacts::get_degree()` we use the `only_linelist = TRUE` argument.
 This is to count the number of secondary cases caused by the observed infected individuals,
-which includes subjects in contacts and linelist data frames.
+which includes individuals in contacts and linelist data frames.
 
 This assumption may not work for your all situations.
-If you need to consider only the subjects from the contact data,
+If you need to consider only the individuals from the contact data,
 at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
 
 :::::::::::::::::::::

From 30ee34f33010469d5e50e54f755d71bdf1d1bcbb Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 3 Apr 2025 20:16:48 +0100
Subject: [PATCH 4/9] replace object name for secondary cases

---
 episodes/superspreading-estimate.Rmd | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index f89e4cd9..bcf2250a 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -162,7 +162,7 @@ Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()
 
 ```{r,message=FALSE,warning=FALSE}
 # Count secondary cases per subject in contacts and linelist
-all_secondary <- epicontacts::get_degree(
+all_secondary_cases <- epicontacts::get_degree(
   x = epi_contacts,
   type = "out",
   only_linelist = TRUE
@@ -181,13 +181,13 @@ at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
 
 :::::::::::::::::::::
 
-From a histogram of the `all_secondary` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
+From a histogram of the `all_secondary_cases` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
 
 <!-- Visualizing the number of secondary cases on a histogram will help us to relate this with the statistical distribution to fit: -->
 
 ```{r}
 ## plot the distribution
-all_secondary %>%
+all_secondary_cases %>%
   tibble::enframe() %>%
   ggplot(aes(value)) +
   geom_histogram(binwidth = 1) +
@@ -339,7 +339,7 @@ library(fitdistrplus)
 
 ```{r}
 ## fit distribution
-offspring_fit <- all_secondary %>%
+offspring_fit <- all_secondary_cases %>%
   fitdistrplus::fitdist(distr = "nbinom")
 
 offspring_fit
@@ -387,7 +387,7 @@ fit_density <-
 # plot offspring distribution with density fit
 ggplot() +
   geom_histogram(
-    data = all_secondary %>% tibble::enframe(),
+    data = all_secondary_cases %>% tibble::enframe(),
     mapping =
       aes(
         x = value,

From f6f3b67d46400298682b817e8376c3be61536763 Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 3 Apr 2025 20:42:45 +0100
Subject: [PATCH 5/9] add text edits

---
 episodes/superspreading-estimate.Rmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index bcf2250a..042b97c4 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -172,10 +172,10 @@ all_secondary_cases <- epicontacts::get_degree(
 ::::::::::::::::::::: caution
 
 At `epicontacts::get_degree()` we use the `only_linelist = TRUE` argument.
-This is to count the number of secondary cases caused by the observed infected individuals,
+This is to count the number of secondary cases caused by all the observed infected individuals,
 which includes individuals in contacts and linelist data frames.
 
-This assumption may not work for your all situations.
+This assumption may not work for all situations.
 If you need to consider only the individuals from the contact data,
 at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
 

From a7826e41d484bdc2139fde58de4c77dbef401a59 Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Tue, 8 Apr 2025 19:24:24 +0100
Subject: [PATCH 6/9] add typo suggestion

Co-authored-by: Joshua Lambert <joshua.lambert@lshtm.ac.uk>
---
 episodes/superspreading-estimate.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index 042b97c4..32deca40 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -147,7 +147,7 @@ epi_contacts %>%
 
 :::::::::::::::::::::::::::
 
-Our goal is to get the number of secondary cases caused by the observed infected individuals. At the contact data frame, when each infector-infectee pair is unique, the number of rows per infector corresponds to the number of secondary cases generated by that individual.
+Our goal is to get the number of secondary cases caused by the observed infected individuals. In the contact data frame, when each infector-infectee pair is unique, the number of rows per infector corresponds to the number of secondary cases generated by that individual.
 
 ```{r}
 # count secondary cases per infector in contacts

From 6152de78d01c52628bd88b9625b8800e3cdbae56 Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 1 May 2025 19:32:54 +0100
Subject: [PATCH 7/9] add text clarification edits

---
 episodes/superspreading-estimate.Rmd | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index 32deca40..fff0b298 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -158,7 +158,9 @@ epi_contacts %>%
 
 But this output only contains the number of secondary cases for reported infectors in the contact data, not for **all** the individuals in the whole `<epicontacts>` object.
 
-Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). Also, the argument `only_linelist = TRUE` include individuals in contacts and linelist data frames.
+Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` gets the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). 
+
+Also, the argument `only_linelist = TRUE` will only include individuals in the linelist data frame. During outbreak investigations, we expect a registry of **all** the observed infected individuals in the linelist data. However, anyone not linked with a potential infector or infectee will not appear in the contact data. Thus, the argument `only_linelist = TRUE` will protect us against missing this later set of individuals when counting the number of secondary cases caused by all the observed infected individuals. They will appear in the `<integer>` vector output as `0` secondary cases. 
 
 ```{r,message=FALSE,warning=FALSE}
 # Count secondary cases per subject in contacts and linelist
@@ -176,7 +178,10 @@ This is to count the number of secondary cases caused by all the observed infect
 which includes individuals in contacts and linelist data frames.
 
 This assumption may not work for all situations.
-If you need to consider only the individuals from the contact data,
+For example, if during the registry of observed infections, 
+the contact data included more subjects than the ones available in the linelist data,
+then you need to consider only the individuals from the contact data.
+In that situation,
 at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
 
 :::::::::::::::::::::
@@ -288,7 +293,7 @@ Calculate the distribution of secondary cases for Ebola using the `ebola_sim_cle
 
 :::::::::::::::::: hint
 
-**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step.
+⚠️ **Optional step:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may take several minutes and use significant memory for large outbreaks such as the Ebola linelist. If you're on an older or slower computer, you can skip this step.
 
 ::::::::::::::::::
 

From 94d6e5afe96ab13d1c0658609f8db6153ada4584 Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 1 May 2025 13:46:47 -0500
Subject: [PATCH 8/9] add callout box with only_linelist FALSE reprex

Fix #184
---
 episodes/superspreading-estimate.Rmd | 43 +++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index fff0b298..c8dca013 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -177,13 +177,54 @@ At `epicontacts::get_degree()` we use the `only_linelist = TRUE` argument.
 This is to count the number of secondary cases caused by all the observed infected individuals,
 which includes individuals in contacts and linelist data frames.
 
-This assumption may not work for all situations.
+:::::::::::::::::::::
+
+::::::::::::::::::::: spoiler
+
+### When to use 'only_linelist = FALSE'?
+
+The assumption that 
+"the linelist will include all individuals in contacts and linelist"
+may not work for all situations.
+
 For example, if during the registry of observed infections, 
 the contact data included more subjects than the ones available in the linelist data,
 then you need to consider only the individuals from the contact data.
 In that situation,
 at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
 
+Find here a printed [reproducible example](https://reprex.tidyverse.org/):
+
+```r
+# Three subjects on linelist
+sample_linelist <- tibble::tibble(
+  id = c("id1", "id2", "id3")
+)
+
+# Four infector-infectee pairs with Five subjects in contact data
+sample_contact <- tibble::tibble(
+  from = c("id1","id1","id2","id4"),
+  to = c("id2","id3","id4","id5")
+)
+
+# make an epicontacts object
+sample_net <- epicontacts::make_epicontacts(
+  linelist = sample_linelist,
+  contacts = sample_contact,
+  directed = TRUE
+)
+
+# count secondary cases per subject from linelist only
+epicontacts::get_degree(x = sample_net, type = "out", only_linelist = TRUE)
+#> id1 id2 id3 
+#>   2   1   0
+
+# count secondary cases per subject from contact only
+epicontacts::get_degree(x = sample_net, type = "out", only_linelist = FALSE)
+#> id1 id2 id4 id3 id5 
+#>   2   1   1   0   0
+```
+
 :::::::::::::::::::::
 
 From a histogram of the `all_secondary_cases` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.

From 0526a280324a4f81b6fb65c44383687440d803fc Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Thu, 1 May 2025 15:54:41 -0500
Subject: [PATCH 9/9] specify package source of data

---
 episodes/superspreading-estimate.Rmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd
index c8dca013..0f59abee 100644
--- a/episodes/superspreading-estimate.Rmd
+++ b/episodes/superspreading-estimate.Rmd
@@ -344,8 +344,8 @@ Calculate the distribution of secondary cases for Ebola using the `ebola_sim_cle
 ## first, make an epicontacts object
 ebola_contacts <-
   epicontacts::make_epicontacts(
-    linelist = ebola_sim_clean$linelist,
-    contacts = ebola_sim_clean$contacts,
+    linelist = outbreaks::ebola_sim_clean$linelist,
+    contacts = outbreaks::ebola_sim_clean$contacts,
     directed = TRUE
   )