Skip to content

Commit f854981

Browse files
committed
differences for PR #169
1 parent 582fb88 commit f854981

9 files changed

+76
-63
lines changed
-7.74 KB
Loading
18.8 KB
Loading
366 KB
Loading
9.21 KB
Loading
7.42 KB
Loading

Diff for: md5sum.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"episodes/delays-functions.Rmd" "0c5c40267d32e39a63d7f75aee9143e4" "site/built/delays-functions.md" "2025-03-28"
1010
"episodes/create-forecast.Rmd" "e7c0eb37985d139ed3ffd69ec0ad86db" "site/built/create-forecast.md" "2025-03-28"
1111
"episodes/severity-static.Rmd" "b22e6e3516c9a3b67f864bec763f0343" "site/built/severity-static.md" "2025-03-28"
12-
"episodes/superspreading-estimate.Rmd" "cd00e83816dc1bf867a8173f326358b5" "site/built/superspreading-estimate.md" "2025-03-28"
12+
"episodes/superspreading-estimate.Rmd" "d8de21eaaf48aca9e090f477566b2c2a" "site/built/superspreading-estimate.md" "2025-04-03"
1313
"episodes/superspreading-simulate.Rmd" "8c0d9627c6ea746a6ddff139926c8664" "site/built/superspreading-simulate.md" "2025-03-28"
1414
"instructors/instructor-notes.md" "ca3834a1b0f9e70c4702aa7a367a6bb5" "site/built/instructor-notes.md" "2025-03-28"
1515
"learners/reference.md" "18f9dcee553dc88dba8caf6436f8ca41" "site/built/reference.md" "2025-03-28"

Diff for: network.html

+3-3
Large diffs are not rendered by default.

Diff for: superspreading-estimate.md

+72-59
Original file line numberDiff line numberDiff line change
@@ -92,10 +92,13 @@ Let's practice this using the `mers_korea_2015` linelist and contact data from t
9292
epi_contacts <-
9393
epicontacts::make_epicontacts(
9494
linelist = outbreaks::mers_korea_2015$linelist,
95-
contacts = outbreaks::mers_korea_2015$contacts
95+
contacts = outbreaks::mers_korea_2015$contacts,
96+
directed = TRUE
9697
)
9798
```
9899

100+
With the argument `directed = TRUE` we configure a directed graph. These directions incorporate our hypothesis of the **infector-infectee** pair: from the probable source patient to the secondary case.
101+
99102

100103
``` r
101104
# visualise contact network
@@ -110,7 +113,7 @@ epicontacts::vis_epicontacts(epi_contacts)
110113

111114
Contact data from a transmission chain can provide information on which infected individuals came into contact with others. We expect to have the infector (`from`) and the infectee (`to`) plus additional columns of variables related to their contact, such as location (`exposure`) and date of contact.
112115

113-
Following [tidy data](https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data) principles, the observation unit in our contact dataset is the **infector-infectee** pair. Although one infector can infect multiple infectees, from contact tracing investigations we may record contacts linked to more than one infector (e.g. within a household). But we should expect to have unique infector-infectee pairs, because typically each infected person will have acquired the infection from one other.
116+
Following [tidy data](https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data) principles, the observation unit in our contact data frame is the **infector-infectee** pair. Although one infector can infect multiple infectees, from contact tracing investigations we may record contacts linked to more than one infector (e.g. within a household). But we should expect to have unique infector-infectee pairs, because typically each infected person will have acquired the infection from one other.
114117

115118
To ensure these unique pairs, we can check on replicates for infectees:
116119

@@ -137,62 +140,83 @@ epi_contacts %>%
137140

138141
:::::::::::::::::::::::::::
139142

140-
When each infector-infectee row is unique, the number of entries per infector corresponds to the number of secondary cases generated by that individual.
143+
Our goal is to get the number of secondary cases caused by the observed infected individuals. At the contact data frame, when each infector-infectee pair is unique, the number of rows per infector corresponds to the number of secondary cases generated by that individual.
141144

142145

143146
``` r
144-
# count secondary cases per infector
145-
infector_secondary <- epi_contacts %>%
147+
# count secondary cases per infector in contacts
148+
epi_contacts %>%
146149
purrr::pluck("contacts") %>%
147150
dplyr::count(from, name = "secondary_cases")
148151
```
149152

150-
But this output only contains number of secondary cases for reported infectors, not for each of the individuals in the whole `epicontacts` object.
153+
``` output
154+
from secondary_cases
155+
1 SK_1 26
156+
2 SK_11 1
157+
3 SK_118 1
158+
4 SK_12 1
159+
5 SK_123 1
160+
6 SK_14 38
161+
7 SK_15 4
162+
8 SK_16 21
163+
9 SK_6 2
164+
10 SK_76 2
165+
11 SK_87 1
166+
```
167+
168+
But this output only contains the number of secondary cases for reported infectors in the contact data, not for each of the individuals in the whole `<epicontacts>` object.
151169

152-
To get this, first, we can use `epicontacts::get_id()` to get the full list of unique identifiers ("id") from the `epicontacts` class object. Second, join it with the count secondary cases per infector stored in the `infector_secondary` object. Third, replace the missing values with `0` to express no report of secondary cases from them.
170+
Instead, we can use `epicontacts::get_degree()` to get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)).
153171

154172

155173
``` r
156-
all_secondary <- epi_contacts %>%
157-
# extract ids in contact *and* linelist using "which" argument
158-
epicontacts::get_id(which = "all") %>%
159-
# transform vector to dataframe to use left_join()
160-
tibble::enframe(name = NULL, value = "from") %>%
161-
# join count secondary cases per infectee
162-
dplyr::left_join(infector_secondary) %>%
163-
# infectee with missing secondary cases are replaced with zero
164-
tidyr::replace_na(
165-
replace = list(secondary_cases = 0)
166-
)
174+
# Count secondary cases per subject in contacts and linelist
175+
all_secondary <- epicontacts::get_degree(
176+
x = epi_contacts,
177+
type = "out",
178+
only_linelist = TRUE
179+
)
167180
```
168181

169-
From a histogram of the `all_secondary` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
182+
::::::::::::::::::::: caution
183+
184+
At `epicontacts::get_degree()` we use the `only_linelist = TRUE` argument.
185+
This is to count the number of secondary cases caused by the observed infected individuals,
186+
which includes subjects in contacts and linelist data frames.
170187

188+
This assumption may not work for your all situations.
189+
If you need to consider only the subjects from the contact data,
190+
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument.
171191

192+
:::::::::::::::::::::
193+
194+
From a histogram of the `all_secondary` object, we can identify the **individual-level variation** in the number of secondary cases. Three cases were related to more than 20 secondary cases, while the complementary cases with less than five or zero secondary cases.
172195

173196
<!-- Visualizing the number of secondary cases on a histogram will help us to relate this with the statistical distribution to fit: -->
174197

175198

176199
``` r
177200
## plot the distribution
178201
all_secondary %>%
179-
ggplot(aes(secondary_cases)) +
202+
tibble::enframe() %>%
203+
ggplot(aes(value)) +
180204
geom_histogram(binwidth = 1) +
181205
labs(
182206
x = "Number of secondary cases",
183207
y = "Frequency"
184208
)
185209
```
186210

187-
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
211+
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-8-1.png" style="display: block; margin: auto;" />
188212

189213
The number of secondary cases can be used to _empirically_ estimate the **offspring distribution**, which is the number of secondary _infections_ caused by each case. One candidate statistical distribution used to model the offspring distribution is the **negative binomial** distribution with two parameters:
190214

191215
- **Mean**, which represents the $R_{0}$, the average number of (secondary) cases produced by a single individual in an entirely susceptible population, and
192216

193217
- **Dispersion**, expressed as $k$, which represents the individual-level variation in transmission by single individuals.
194218

195-
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
219+
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
196220

197221
From the histogram and density plot, we can identify that the offspring distribution is highly skewed or **overdispersed**. In this framework, the superspreading events (SSEs) are not arbitrary or exceptional, but simply realizations from the right-hand tail of the offspring distribution, which we can quantify and analyse ([Lloyd-Smith et al., 2005](https://www.nature.com/articles/nature04153)).
198222

@@ -227,11 +251,11 @@ In epidemiology, [negative binomial](https://en.wikipedia.org/wiki/Negative_bino
227251

228252
Calculate the distribution of secondary cases for Ebola using the `ebola_sim_clean` object from `{outbreaks}` package.
229253

230-
Is the offspring distribution of Ebola skewed or overdispersed?
254+
- Is the offspring distribution of Ebola skewed or overdispersed?
231255

232256
:::::::::::::::::: hint
233257

234-
**Note:** This dataset has 5829 cases. Running `epicontacts::vis_epicontacts()` may overload your session!
258+
**Note:** This dataset has 5829 cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step.
235259

236260
::::::::::::::::::
237261

@@ -243,38 +267,29 @@ Is the offspring distribution of Ebola skewed or overdispersed?
243267
ebola_contacts <-
244268
epicontacts::make_epicontacts(
245269
linelist = ebola_sim_clean$linelist,
246-
contacts = ebola_sim_clean$contacts
270+
contacts = ebola_sim_clean$contacts,
271+
directed = TRUE
247272
)
248273

249-
# count secondary cases
250-
251-
ebola_infector_secondary <- ebola_contacts %>%
252-
purrr::pluck("contacts") %>%
253-
dplyr::count(from, name = "secondary_cases")
254-
255-
ebola_secondary <- ebola_contacts %>%
256-
# extract ids in contact *and* linelist using "which" argument
257-
epicontacts::get_id(which = "all") %>%
258-
# transform vector to dataframe to use left_join()
259-
tibble::enframe(name = NULL, value = "from") %>%
260-
# join count secondary cases per infectee
261-
dplyr::left_join(ebola_infector_secondary) %>%
262-
# infectee with missing secondary cases are replaced with zero
263-
tidyr::replace_na(
264-
replace = list(secondary_cases = 0)
265-
)
274+
# count secondary cases per subject in contacts and linelist
275+
ebola_secondary <- epicontacts::get_degree(
276+
x = ebola_contacts,
277+
type = "out",
278+
only_linelist = TRUE
279+
)
266280

267281
## plot the distribution
268282
ebola_secondary %>%
269-
ggplot(aes(secondary_cases)) +
283+
tibble::enframe() %>%
284+
ggplot(aes(value)) +
270285
geom_histogram(binwidth = 1) +
271286
labs(
272287
x = "Number of secondary cases",
273288
y = "Frequency"
274289
)
275290
```
276291

277-
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
292+
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
278293

279294
From a visual inspection, the distribution of secondary cases for the Ebola data set in `ebola_sim_clean` shows an skewed distribution with secondary cases equal or lower than 6. We need to complement this observation with a statistical analysis to evaluate for overdispersion.
280295

@@ -296,7 +311,6 @@ library(fitdistrplus)
296311
``` r
297312
## fit distribution
298313
offspring_fit <- all_secondary %>%
299-
dplyr::pull(secondary_cases) %>%
300314
fitdistrplus::fitdist(distr = "nbinom")
301315

302316
offspring_fit
@@ -328,7 +342,7 @@ From the number secondary cases distribution we estimated a dispersion parameter
328342

329343
We can overlap the estimated density values of the fitted negative binomial distribution and the histogram of the number of secondary cases:
330344

331-
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-14-1.png" style="display: block; margin: auto;" />
345+
<img src="fig/superspreading-estimate-rendered-unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
332346

333347
:::::::::::::::::::: callout
334348

@@ -346,11 +360,11 @@ When $k$ approaches infinity ($k \rightarrow \infty$) the variance equals the me
346360

347361
::::::::::::::::::::::: challenge
348362

349-
Use the distribution of secondary cases from the `ebola_sim_clean` object from `{outbreaks}` package.
363+
From the previous challenge, use the distribution of secondary cases from the `ebola_sim_clean` object from `{outbreaks}` package.
350364

351-
Fit a negative binomial distribution to estimate the mean and dispersion parameter of the offspring distribution.
365+
Fit a negative binomial distribution to estimate the mean and dispersion parameter of the offspring distribution. Try to estimate the uncertainty of the dispersion parameter from the Standard Error to 95% Confidence Intervals.
352366

353-
Does the estimated dispersion parameter of Ebola provide evidence of an individual-level variation in transmission?
367+
- Does the estimated dispersion parameter of Ebola provide evidence of an individual-level variation in transmission?
354368

355369
:::::::::::::: hint
356370

@@ -363,7 +377,6 @@ Review how we fitted a negative binomial distribution using the `fitdistrplus::f
363377

364378
``` r
365379
ebola_offspring <- ebola_secondary %>%
366-
dplyr::pull(secondary_cases) %>%
367380
fitdistrplus::fitdist(distr = "nbinom")
368381

369382
ebola_offspring
@@ -372,21 +385,21 @@ ebola_offspring
372385
``` output
373386
Fitting of the distribution ' nbinom ' by maximum likelihood
374387
Parameters:
375-
estimate Std. Error
376-
size 2.353899 0.250124609
377-
mu 0.539300 0.009699219
388+
estimate Std. Error
389+
size 0.8539443 0.072505326
390+
mu 0.3675993 0.009497097
378391
```
379392

380393

381394
``` r
382395
## extract the "size" parameter
383396
ebola_mid <- ebola_offspring$estimate[["size"]]
384397

385-
## calculate the 95% confidence intervals using the standard error estimate and
398+
## calculate the 95% confidence intervals using the
399+
## standard error estimate and
386400
## the 0.025 and 0.975 quantiles of the normal distribution.
387401

388402
ebola_lower <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.025)
389-
390403
ebola_upper <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.975)
391404

392405
# ebola_mid
@@ -395,7 +408,7 @@ ebola_upper <- ebola_mid + ebola_offspring$sd[["size"]] * qnorm(0.975)
395408
```
396409

397410
From the number secondary cases distribution we estimated a dispersion parameter $k$ of
398-
2.354, with a 95% Confidence Interval from 1.864 to 2.844.
411+
0.85, with a 95% Confidence Interval from 0.71 to 1.
399412

400413
For dispersion parameter estimates higher than one we get low distribution variance, hence, low individual-level variation in transmission.
401414

@@ -512,8 +525,8 @@ superspreading::proportion_cluster_size(
512525
```
513526

514527
``` output
515-
R k prop_5 prop_10 prop_25
516-
1 0.5393 2.353899 1.84% 0% 0%
528+
R k prop_5 prop_10 prop_25
529+
1 0.3675993 0.8539443 2.64% 0% 0%
517530
```
518531

519532
The probability of having clusters of five people is 1.8%. At this stage, given this offspring distribution parameters, a backward strategy may not increase the probability of contain and quarantine more onward cases.
@@ -559,7 +572,7 @@ stats::qpois(
559572
```
560573

561574
``` output
562-
[1] 3
575+
[1] 2
563576
```
564577

565578
Compare this values with the ones reported by [Lloyd-Smith et al., 2005](https://www.nature.com/articles/nature04153). See figure below:

Diff for: webshot.png

366 KB
Loading

0 commit comments

Comments
 (0)