-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use epicontacts::get_degree()
to replace wrangling steps from epicontacts
to fitdistrplus
#169
base: main
Are you sure you want to change the base?
Conversation
- simply the epicontacts to fitdistrplus connection - use only_linelist = TRUE for cases without infectees - edit some text to facilitate readability
Thank you!Thank you for your pull request 😃 🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}. If you have files that automatically render output (e.g. R Markdown), then you should check for the following:
Rendered Changes🔍 Inspect the changes: https://github.com/epiverse-trace/tutorials-middle/compare/md-outputs..md-outputs-PR-169 The following changes were observed in the rendered markdown documents:
What does this mean?If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible. This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation. ⏱️ Updated at 2025-04-08 18:28:15 +0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor edits
|
||
To get this, first, we can use `epicontacts::get_id()` to get the full list of unique identifiers ("id") from the `epicontacts` class object. Second, join it with the count secondary cases per infector stored in the `infector_secondary` object. Third, replace the missing values with `0` to express no report of secondary cases from them. | ||
Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). Also, the argument `only_linelist = TRUE` include individuals in contacts and linelist data frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the argument
only_linelist = TRUE
include individuals in contacts and linelist data frames.
I think you could be a bit clearer on exactly what this means. Is it only including individuals that are in both the contacts and line list data frames, or only the line list irrespective of the contacts data? And why would these two datasets contain different individuals (i.e. is it more likely that the line list or the contacts data is missing individuals)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Suggesting one more paragraph here:
Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). Also, the argument `only_linelist = TRUE` include individuals in contacts and linelist data frames. | |
Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` gets the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). | |
Also, the argument `only_linelist = TRUE` will only include individuals in the linelist data frame. During outbreak investigations, we expect a registry of **all** the observed infected individuals in the linelist data. However, anyone not linked with a potential infector or infectee will not appear in the contact data. Thus, the argument `only_linelist = TRUE` will protect us against missing this later set of individuals when counting the number of secondary cases caused by all the observed infected individuals. They will appear in the `<integer>` vector output as `0` secondary cases. |
This assumption may not work for all situations. | ||
If you need to consider only the individuals from the contact data, | ||
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is not 100% clear to me. If possible could you expand a bit more on what this means and in what situation the reader might want to use only_linelist = FALSE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one sentence with an example:
This assumption may not work for all situations. | |
If you need to consider only the individuals from the contact data, | |
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. | |
This assumption may not work for all situations. | |
For example, if during the registry of observed infections, | |
the contact data included more subjects than the ones available in the linelist data, | |
then you need to consider only the individuals from the contact data. | |
In that situation, | |
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshwlambert would you agree to add this reprex to make the situation more visible? I can add it as a spoiler callout block to make it expandable on demand.
# Three subjects on linelist
sample_linelist <- tibble::tibble(
id = c("id1", "id2", "id3")
)
# Four infector-infectee pairs with Five subjects in contact data
sample_contact <- tibble::tibble(
from = c("id1","id1","id2","id4"),
to = c("id2","id3","id4","id5")
)
# make an epicontacts object
sample_net <- epicontacts::make_epicontacts(
linelist = sample_linelist,
contacts = sample_contact,
directed = TRUE
)
# count secondary cases per subject from linelist only
epicontacts::get_degree(x = sample_net, type = "out", only_linelist = TRUE)
#> id1 id2 id3
#> 2 1 0
# count secondary cases per subject from contact only
epicontacts::get_degree(x = sample_net, type = "out", only_linelist = FALSE)
#> id1 id2 id4 id3 id5
#> 2 1 1 0 0
Created on 2025-04-08 with reprex v2.1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we suggest at {epicontacts}
to have a Venn diagram-like summary table with the output of id's on linelist/common/contacts?
To test as drafted below, but with all the crossings? As a quality control step:
sample_linelist <- tibble::tibble(
id = c("id1", "id2", "id3")
)
sample_contact <- tibble::tibble(
from = c("id1","id1","id2","id4"),
to = c("id2","id3","id4","id5")
)
sample_net <- epicontacts::make_epicontacts(
linelist = sample_linelist,
contacts = sample_contact,
directed = TRUE
)
epi_contacts <- epicontacts::make_epicontacts(
linelist = outbreaks::mers_korea_2015$linelist,
contacts = outbreaks::mers_korea_2015$contacts,
directed = TRUE
)
test_venn <- function(x) {
ids_linelist <- epicontacts::get_id(x = x, which = "linelist")
ids_contacts <- epicontacts::get_id(x = x, which = "all")
out <- length(unique(ids_linelist)) >= length(unique(ids_contacts))
return(out)
}
test_venn(x = sample_net)
#> [1] FALSE
test_venn(x = epi_contacts)
#> [1] TRUE
Created on 2025-04-08 with reprex v2.1.1
|
||
:::::::::::::::::: hint | ||
|
||
**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! | ||
**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid this step.
I'm not sure if we want to provide code in the tutorials that we advice readers not to run. I think this could potentially be reworded. Such as:
⚠️ Optional Step:
epicontacts::vis_epicontacts()
provides an interactive network of the outbreak and may take several minutes and use significant memory for large outbreaks such as the Ebola line list.
If you're on an older or slower computer, you can skip this step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Thanks for proposing an edit. I adapted it here:
**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step. | |
⚠️ **Optional step:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may take several minutes and use significant memory for large outbreaks such as the Ebola linelist. If you're on an older or slower computer, you can skip this step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @avallecam! I read through the .Rmd
file changes and everything looks good. I've left a few comments on the file. I haven't rendered the tutorial to see how it looks as a web page but happy to take a look once this is merged and live and I'll open an issue if I spot anything that needs changing/fixing.
Co-authored-by: Joshua Lambert <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @joshwlambert. Could you take a look at the edits in response to your questions?
|
||
:::::::::::::::::: hint | ||
|
||
**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! | ||
**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Thanks for proposing an edit. I adapted it here:
**Note:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may overload your session! Try to avoid this step. | |
⚠️ **Optional step:** This dataset has `r nrow(ebola_sim_clean$linelist)` cases. Running `epicontacts::vis_epicontacts()` may take several minutes and use significant memory for large outbreaks such as the Ebola linelist. If you're on an older or slower computer, you can skip this step. |
|
||
To get this, first, we can use `epicontacts::get_id()` to get the full list of unique identifiers ("id") from the `epicontacts` class object. Second, join it with the count secondary cases per infector stored in the `infector_secondary` object. Third, replace the missing values with `0` to express no report of secondary cases from them. | ||
Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). Also, the argument `only_linelist = TRUE` include individuals in contacts and linelist data frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Suggesting one more paragraph here:
Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` get the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). Also, the argument `only_linelist = TRUE` include individuals in contacts and linelist data frames. | |
Instead, from `{epicontacts}` we can use the function `epicontacts::get_degree()`. The argument `type = "out"` gets the **out-degree** of each **node** in the contact network from the `<epicontacts>` class object. In a directed network, the out-degree is the number of outgoing edges (infectees) emanating from a node (infector) ([Nykamp DQ, accessed: 2025](https://mathinsight.org/definition/node_degree)). | |
Also, the argument `only_linelist = TRUE` will only include individuals in the linelist data frame. During outbreak investigations, we expect a registry of **all** the observed infected individuals in the linelist data. However, anyone not linked with a potential infector or infectee will not appear in the contact data. Thus, the argument `only_linelist = TRUE` will protect us against missing this later set of individuals when counting the number of secondary cases caused by all the observed infected individuals. They will appear in the `<integer>` vector output as `0` secondary cases. |
This assumption may not work for all situations. | ||
If you need to consider only the individuals from the contact data, | ||
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one sentence with an example:
This assumption may not work for all situations. | |
If you need to consider only the individuals from the contact data, | |
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. | |
This assumption may not work for all situations. | |
For example, if during the registry of observed infections, | |
the contact data included more subjects than the ones available in the linelist data, | |
then you need to consider only the individuals from the contact data. | |
In that situation, | |
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. |
This assumption may not work for all situations. | ||
If you need to consider only the individuals from the contact data, | ||
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshwlambert would you agree to add this reprex to make the situation more visible? I can add it as a spoiler callout block to make it expandable on demand.
# Three subjects on linelist
sample_linelist <- tibble::tibble(
id = c("id1", "id2", "id3")
)
# Four infector-infectee pairs with Five subjects in contact data
sample_contact <- tibble::tibble(
from = c("id1","id1","id2","id4"),
to = c("id2","id3","id4","id5")
)
# make an epicontacts object
sample_net <- epicontacts::make_epicontacts(
linelist = sample_linelist,
contacts = sample_contact,
directed = TRUE
)
# count secondary cases per subject from linelist only
epicontacts::get_degree(x = sample_net, type = "out", only_linelist = TRUE)
#> id1 id2 id3
#> 2 1 0
# count secondary cases per subject from contact only
epicontacts::get_degree(x = sample_net, type = "out", only_linelist = FALSE)
#> id1 id2 id4 id3 id5
#> 2 1 1 0 0
Created on 2025-04-08 with reprex v2.1.1
This assumption may not work for all situations. | ||
If you need to consider only the individuals from the contact data, | ||
at `epicontacts::get_degree()` we use the `only_linelist = FALSE` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we suggest at {epicontacts}
to have a Venn diagram-like summary table with the output of id's on linelist/common/contacts?
To test as drafted below, but with all the crossings? As a quality control step:
sample_linelist <- tibble::tibble(
id = c("id1", "id2", "id3")
)
sample_contact <- tibble::tibble(
from = c("id1","id1","id2","id4"),
to = c("id2","id3","id4","id5")
)
sample_net <- epicontacts::make_epicontacts(
linelist = sample_linelist,
contacts = sample_contact,
directed = TRUE
)
epi_contacts <- epicontacts::make_epicontacts(
linelist = outbreaks::mers_korea_2015$linelist,
contacts = outbreaks::mers_korea_2015$contacts,
directed = TRUE
)
test_venn <- function(x) {
ids_linelist <- epicontacts::get_id(x = x, which = "linelist")
ids_contacts <- epicontacts::get_id(x = x, which = "all")
out <- length(unique(ids_linelist)) >= length(unique(ids_contacts))
return(out)
}
test_venn(x = sample_net)
#> [1] FALSE
test_venn(x = epi_contacts)
#> [1] TRUE
Created on 2025-04-08 with reprex v2.1.1
This aims to simply the
epicontacts
tofitdistrplus
connection, as discussed in epiverse-trace/superspreading#121 (comment)Also, fix #170 to use
only_linelist = TRUE
to count the secondary cases from observed infections without onward transmission (infectees).This also edits some text to define graph concepts and facilitate readability in unrelated sections.