Skip to content

Commit 3b7ffbd

Browse files
authored
Islbs port (#77)
* documentation for some of the data sets * data sets added from islbs and devtools::check passed * updated NEWS.md * pkgdown.yml updated
1 parent 9b858c9 commit 3b7ffbd

File tree

168 files changed

+126893
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

168 files changed

+126893
-1
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ License: GPL-3
2222
Encoding: UTF-8
2323
LazyData: true
2424
LazyDataCompression: xz
25-
RoxygenNote: 7.3.1
25+
RoxygenNote: 7.3.2
2626
Suggests:
2727
broom,
2828
dplyr,

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# Developmental
2+
3+
* Added new datasets:
4+
* `LEAP`, `arenosa`, `cdc`, `cdc.samp`, `census.2010`, `danish.ed.primary`, `danish.ed.validation`, `dds.discr`, `famuss`, `forest.birds`, `frog`, `hyperuricemia`, `hyperuricemia.samp`, `infant_mortality_2022`, `mcas`, `nhanes.samp`, `nhanes.samp.adult`, `nhanes.samp.adult.500`, `opp_insights_colleges`, `opp_insights_colleges_4year`, `prevend`, `prevend.samp`, `sugar.levels.A`, `sugar.levels.B`, `swim`, `tb.interruption`, `thermometry`, `wdi_2022` ported from ISLBS by [@npaterno](https://github.com/npaterno)
5+
16
# openintro 2.5.0
27

38
* Added new datasets:

R/data-LEAP.R

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
#' Patient level data on the randomized trial Learning Early About Peanut (LEAP) allergies.
2+
#'
3+
#' This study examined whether early exposure to peanuts increased tolerance and
4+
#' protection from developing a peanut allergy in children who are allergic to
5+
#' eggs or who have severe eczema. Participants between 4 and 11 months old were
6+
#' randomized to either avoid versus consume peanut based products during the
7+
#' first three years of life. The longer title of the study is Induction of
8+
#' Tolerance Through Early Introduction of Peanut in High-Risk Children and can
9+
#' be found in \url{https://clinicaltrials.gov/} as study NCT00329784.
10+
#'
11+
#' More variables are available at the site in the source.
12+
#'
13+
#' @docType data
14+
#' @format A data frame with 640 rows and 7 columns
15+
#' \describe{
16+
#' \item{\code{participant.ID}}{Character vector, unique identifier for each participant.}
17+
#' \item{\code{stratum}}{Factor, outcome of a skin prick test (SPT) conducted
18+
#' before randomization, with levels \code{SPT-Negative}, participant
19+
#' shows no evidence of peanut allergy, and \code{SPT-Positive}, evidence
20+
#' of a peanut allergy. Participants were
21+
#' randomized separately within each stratum. The primary analysis of the
22+
#' study is typically restricted to the SPT-Negative group.}
23+
#' \item{\code{treatment.group}}{Factor, randomized assignment for each participant,
24+
#' with levels \code{Peanut Avoidance} and \code{Peanut Consumption}}.
25+
#' \item{\code{age.months}}{Participant age in months at randomization.}
26+
#' \item{\code{sex}}{Factor, sex of participant with levels \code{Female} and
27+
#' \code{Male}}
28+
#' \item{\code{primary.ethnicity}}{Factor variable with levels \code{Asian},
29+
#' \code{Black}, \code{Other}, \code{Mixed}, and \code{White}.}
30+
#' \item{\code{overall.V60.outcome}}{Factor, indicating whether after 5 years,
31+
#' the participant had an allergic reaction in the OFC,
32+
#' with levels for having a reaction to a peanut based oral food challenge,
33+
#' with levels (\code{FAIL OFC}) (allergic reaction),
34+
#' (\code{PASS OFC}) (no allergic reaction)}
35+
#' }
36+
#' @source These data are a subset of variables from the file ADSTART0_2015-03-03_14-20-10.txt,
37+
#' available by downloading study files from
38+
#' \url{https://www.immport.org/shared/study/SDY660}
39+
#' @references Du Toit, George, et al. "Randomized trial of peanut consumption in
40+
#' infants at risk for peanut allergy."
41+
#' New England Journal of Medicine 372.9 (2015): 803-813.
42+
#' doi 10.1056/nejmoa1414850
43+
#'
44+
"LEAP"

R/data-arenosa.R

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#' arenosa
2+
#'
3+
#' Published results used RNA-Seq to investigate how cold responsiveness differs
4+
#' in two populations of A. arenosa:
5+
#' TBG (collected from Triberg, Germany) and
6+
#' KA (collected from Kasparstein, Austria). Each row corresponds to a gene;
7+
#' the first column contains the gene name; other columns correspond to expression
8+
#' measured in a plant sample. Three plants of each population were exposed
9+
#' to cold (vernalized, denoted by v), and three were not (non-vernalized,
10+
#' denoted by nv). Expression was measured in gene counts
11+
#' (i.e. the number of RNA transcripts present in a sample);
12+
#' the data were then normalized to allow comparison between samples.
13+
#'
14+
#' @name arenosa
15+
#' @docType data
16+
#' @format A tibble with 1088 rows and 13 variables:
17+
#' \describe{
18+
#' \item{\code{gene.name}}{a character vector}
19+
#' \item{\code{ka.nv.1}}{a numeric vector}
20+
#' \item{\code{ka.nv.2}}{a numeric vector}
21+
#' \item{\code{ka.nv.3}}{a numeric vector}
22+
#' \item{\code{ka.v.1}}{a numeric vector}
23+
#' \item{\code{ka.v.2}}{a numeric vector}
24+
#' \item{\code{ka.v.3}}{a numeric vector}
25+
#' \item{\code{tbg.nv.1}}{a numeric vector}
26+
#' \item{\code{tbg.nv.2}}{a numeric vector}
27+
#' \item{\code{tbg.nv.3}}{a numeric vector}
28+
#' \item{\code{tbg.v.1}}{a numeric vector}
29+
#' \item{\code{tbg.v.2}}{a numeric vector}
30+
#' \item{\code{tbg.v.3}}{a numeric vector}
31+
#' }
32+
#' @references Pierre Baduel, Brian Arnold, Cara M. Weisman, Ben Hunter, Kirsten Bomblies,
33+
#' Habitat-Associated Life History and
34+
#' Stress-Tolerance Variation in Arabidopsis arenosa, Plant Physiology,
35+
#' Volume 171, Issue 1, May 2016, Pages 437–451
36+
#' https://doi.org/10.1104/pp.15.01875https://doi.org/10.1104/pp.15.01875
37+
#' @source K Bomblies Harvard University lab.
38+
#'
39+
"arenosa"

R/data-cdc.R

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#' cdc
2+
#'
3+
#' A dataset from the 2000 Behavioral Risk Factors Surveillance System (BRFSS)
4+
#' conducted by the US Centers for Disease Control and Prevention used to
5+
#' illustrate inference on demographic data.
6+
#'
7+
#' @name cdc
8+
#' @docType data
9+
#' @format A dataframe with 20,000 rows and 9 variables:
10+
#' \describe{
11+
#' \item{\code{genhlth}}{Factor with levels \code{excellent}, \code{very good}
12+
#' \code{good}, \code{fair}, \code{poor}}
13+
#' \item{\code{exerany}}{Numeric vector; 1 if the respondent exercised in the
14+
#' past month and 0 otherwise.}
15+
#' \item{\code{hlthplan}}{Numeric; 1 if the respondent has some form
16+
#' of health coverage and 0 otherwise.}
17+
#' \item{\code{smoke100}}{Numeric; 1 if the respondent has smoked at least 100
18+
#' cigarettes in their entire life and 0 otherwise.}
19+
#' \item{\code{height}}{Numeric; respondent's height in inches.}
20+
#' \item{\code{weight}}{Numeric; respondent's weight in pounds.}
21+
#' \item{\code{wtdesire}}{Numeric; respondent's desired weight in pounds.}
22+
#' \item{\code{age}}{Numeric; respondent's age in years.}
23+
#' \item{\code{gender}}{Factor with two levels \code{m} \code{f}}
24+
#' }
25+
#' @source("https://www.cdc.gov/brfss/index.html")
26+
#'
27+
"cdc"

R/data-cdc.samp.R

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#' cdc.samp
2+
#'
3+
#' A sample of 60 individuals from the 2000 Behavioral Risk Factors Surveillance System
4+
#' (BRFSS) conducted by the US Centers for Disease Control.
5+
#'
6+
#' @name cdc.samp
7+
#' @docType data
8+
#' @format A tibble with 60 rows and 9 variables:
9+
#' \describe{
10+
#' \item{\code{genhlth}}{Factor with levels \code{excellent}, \code{very good}
11+
#' \code{good}, \code{fair}, \code{poor}}
12+
#' \item{\code{exerany}}{Numeric vector; 1 if the respondent exercised in the
13+
#' past month and 0 otherwise.}
14+
#' \item{\code{hlthplan}}{Numeric vector; 1 if the respondent has some form
15+
#' of health coverage and 0 otherwise.}
16+
#' \item{\code{smoke100}}{Numeric; 1 if the respondent has smoked at least 100
17+
#' cigarettes in their entire life and 0 otherwise.}
18+
#' \item{\code{height}}{Numeric; respondent's height in inches.}
19+
#' \item{\code{weight}}{Numeric; respondent's weight in pounds.}
20+
#' \item{\code{wtdesire}}{Numeric; respondent's desired weight in pounds.}
21+
#' \item{\code{age}}{Numeric; respondent's age in years.}
22+
#' \item{\code{gender}}{Factor with two levels \code{m} \code{f}}
23+
#' }
24+
#' @source("http://www.openintro.org/stat/data/cdc.R")
25+
#'
26+
"cdc.samp"

R/data-census.2010.R

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#' census.2010
2+
#'
3+
#' United States 2010 infant mortality and number of physicians by state,
4+
#' including the District of Columbia.
5+
#'
6+
#' Data were abstracted from the 2010 Statistical Abstract of the United States.
7+
#' Due to a lag in recording state level data, the infant mortality data is from
8+
#' 2009 and the data on physicians from 2007. Both measurements are subject to
9+
#' change annually, so these data are not current and should not be used for
10+
#' inference about infant mortality. More current data can be found at the US
11+
#' Centers for Disease Control and Prevention (\url{https://www.cdc.gov/nchs/pressroom/sosmap/infant_mortality_rates/infant_mortality.htm}), and in the dataset \code{infant_mort_2022}.
12+
#'
13+
#' @name census.2010
14+
#' @docType data
15+
#' @format A data frame with 51 rows and 3 columns.
16+
#' \describe{
17+
#' \item{\code{state}}{Character vector vector, US State including the District of Columbia}
18+
#' \item{\code{inf.mort}}{Numeric vector, number of deaths per 1000 live births between 1 day
19+
#' and 1 year of age}
20+
#' \item{\code{doctors}}{Numeric vector, active physicians per 100,000 population}
21+
#' }
22+
#' @source \url{https://www.census.gov/library/publications/2009/compendia/statab/129ed/births-deaths-marriages-divorces.html},
23+
#' \url{https://www.census.gov/library/publications/2009/compendia/statab/129ed/health-nutrition.html}
24+
#'
25+
"census.2010"
26+

R/data-danish.ed.primary.R

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
#' danish.ed.primary
2+
#'
3+
#' Data from a Danish study on triage in an emergency department (ED)
4+
#'
5+
#' Data from a prospective cohort study of triage scoring for an emergency
6+
#' department (ED). The study examined whether the use of patient level
7+
#' measurements would improve an existing triage score. These data are the
8+
#' training data (called primary data in the original manuscript) used for model
9+
#' building. Some variable names have been changed for readability, but the data
10+
#' on 21 variables for the 6,249 participants are otherwise unchanged.
11+
#'
12+
#' @name danish.ed.primary
13+
#' @docType data
14+
#' @format A tibble with 6249 rows and 21 variables:
15+
#' \describe{
16+
#' \item{\code{mort30}}{numeric, 1 if patient died within 30 days of admission, 0
17+
#' otherwise}
18+
#' \item{\code{triage}}{factor, triage score given at arrival to ED.
19+
#' Values \code{green}, \code{yellow}, \code{orange}, \code{red}, from lowest
20+
#' to highest priority
21+
#' for treatment. The value \code{blue} normally denotes severity not
22+
#' warranting admission to the ED, but no participants coded blue
23+
#' are in these data.}
24+
#' \item{\code{age}}{numeric, age in years, rounded to lower integer}
25+
#' \item{\code{sex}}{factor, values \code{female}, \code{male}}
26+
#' \item{\code{albumin}}{numeric, serum albumin, in g/L}
27+
#' \item{\code{creatinine}}{numeric, serum creatinine, in umol/L}
28+
#' \item{\code{hemaglobin}}{numeric, serum hemaglobin, in mmol/L }
29+
#' \item{\code{potassium}}{numeric, serum potassium, in mmol/L}
30+
#' \item{\code{leuk.count}}{blood leukocyte count, in 10E9/L}
31+
#' \item{\code{sodium}}{numeric, serum sodium, in mmol/L}
32+
#' \item{\code{c.react.protein}}{numeric, serum C-reactive protein}
33+
#' \item{\code{oxygen.sat}}{numeric, peripheral arterial oxygen saturation, as a percent}
34+
#' \item{\code{resp.rate}}{numeric, respiratory rate per minute}
35+
#' \item{\code{heart.rate}}{numeric, heart rate, beats/min}
36+
#' \item{\code{systolic.bp}}{numeric, systolic blood pressure, in mmHg}
37+
#' \item{\code{glasgow.coma.scale}}{numeric, extent
38+
#' of impaired consciousness in patients with acute medical condition or
39+
#' trauma, scored between 3 and 15, 3 being the worst and 15 the best. Score
40+
#' is based on 3 subscales, best eye, verbal and motor responses.}
41+
#' \item{\code{readmit.hosp}}{factor, readmitted to hospital within 30 days,
42+
#' values \code{yes}, \code{no}}
43+
#' \item{\code{days.in.hosp}}{numeric, number of days admitted to hospital}
44+
#' \item{\code{icu.time}}{numeric, number of days in the intensive care unit.
45+
#' value 99999 indicates patient not admitted to ICU}
46+
#' \item{\code{icu.status}}{factor, patient admitted to ICU, values \code{yes},
47+
#' \code{no}}
48+
#' }
49+
#' #' @references Kristensen, Michael, et al. "Routine blood tests are associated
50+
#' with short term mortality and can improve emergency department triage: a cohort
51+
#' study of> 12,000 patients." Scandinavian Journal of Trauma, Resuscitation and
52+
#' Emergency Medicine 25 (2017): 1-8.
53+
#' \url{https://sjtrem.biomedcentral.com/articles/10.1186/s13049-017-0458-x?report=reader}
54+
#' @source \url{doi:10.5061/dryad.m2bq5}
55+
#'
56+
"danish.ed.primary"

R/data-danish.ed.validation.R

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#' Data from a Danish study on triage in an emergency department (ED)
2+
#'
3+
#' Data from a prospective cohort study of triage scoring for an emergency
4+
#' department (ED). The study examined whether the use of patient level
5+
#' measurements would improve an existing triage score. These data were used as
6+
#' a test set (called validation in the manuscript) to examine the performance
7+
#' of the model built using the training (primary) cohort. Some variable names
8+
#' have been changed for readability and for consistency with the primary dataset,
9+
#' but the data on 18 variables for the 6,383 participants are otherwise unchanged.
10+
#' Some variables in the primary dataset do not appear in these data.
11+
#'
12+
#' @name danish.ed.validation
13+
#' @docType data
14+
#' @format A tibble with 6383 rows and 18 variables:
15+
#' \describe{
16+
#' \item{\code{mort30}}{numeric, 1 if patient died within 30 days of admission, 0
17+
#' otherwise}
18+
#' \item{\code{triage}}{factor, triage score given at arrival to ED.
19+
#' Values \code{blue}, \code{green}, \code{yellow}, \code{orange}, \code{red},
20+
#' from lowest to highest priority
21+
#' for treatment. The value \code{blue} normally denotes severity not
22+
#' warranting admission to the ED. Participants coded \code{blue}
23+
#' are in these data but not in the primary data.}
24+
#' \item{\code{age}}{numeric, age in years, rounded to lower integer}
25+
#' \item{\code{sex}}{factor, \code{female}, \code{male}}
26+
#' \item{\code{albumin}}{numeric, serum albumin, in g/L}
27+
#' \item{\code{creatinine}}{numeric, serum creatinine, in umol/L}
28+
#' \item{\code{hemaglobin}}{numeric, serum hemaglobin, in mmol/L }
29+
#' \item{\code{potassium}}{numeric, serum potassium, in mmol/L}
30+
#' \item{\code{leuk.count}}{blood leukocyte count, in 10E9/L}
31+
#' \item{\code{sodium}}{numeric, serum sodium, in mmol/L}
32+
#' \item{\code{c.react.protein}}{numeric, serum C-reactive protein}
33+
#' \item{\code{oxygen.sat}}{numeric, peripheral arterial oxygen saturation, %}
34+
#' \item{\code{resp.rate}}{numeric, respiratory rate per minute}
35+
#' \item{\code{heart.rate}}{numeric, heart rate, beats/min}
36+
#' \item{\code{systolic.bp}}{numeric, systolic blood pressure, in mmHg}
37+
#' \item{\code{readmit.hosp}}{factor, readmitted to hospital within 30 days,
38+
#' with values \code{yes}, \code{no}}
39+
#' \item{\code{days.in.hosp}}{numeric, number of days admitted to hospital}
40+
#' \item{\code{icu.status}}{factor, patient admitted to ICU, with values
41+
#' \code{yes}, \code{no}}
42+
#' }
43+
#' @references Kristensen, Michael, et al. "Routine blood tests are associated
44+
#' with short term mortality and can improve emergency department triage: a cohort
45+
#' study of> 12,000 patients." Scandinavian Journal of Trauma, Resuscitation and
46+
#' Emergency Medicine 25 (2017): 1-8.
47+
#' \url{https://sjtrem.biomedcentral.com/articles/10.1186/s13049-017-0458-x?report=reader}
48+
#' @source \url{doi:10.5061/dryad.m2bq5}
49+
#'
50+
"danish.ed.validation"

R/data-dds.dscr.R

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#' A dataset on disbursements from the California Department of Developmental Services (DDS)
2+
#'
3+
#' The dataset represents a sample of 1,000 DDS consumers (out of a total
4+
#' population of approximately 250,000),and includes information about age,
5+
#' gender, ethnicity, and the amount of financial support per consumer provided
6+
#' by the DDS.The dataset is based on recorded attributes of consumers, but has
7+
#' been altered to maintain consumer privacy. From the Taylor and Mickel paper:
8+
#' "The data set originated from DDS’s Client Master File. In order to remain in
9+
#' compliance with California State Legislation, the data have been altered to
10+
#' protect the rights and privacy of specific individual consumers. The provided
11+
#' data set is based on actual attributes of consumers."
12+
#'
13+
#' @name dds.dscr
14+
#' @docType data
15+
#' @format A dataframe with 1000 rows and 6 variables:
16+
#' \describe{
17+
#' \item{\code{id}}{Numeric, Unique identification code for each resident}
18+
#' \item{\code{age.cohort}}{A factor, \code{0-5} years,
19+
#' \code{6-12} years, \code{13-17} years, \code{18-21} years, \code{22-50} years,
20+
#' and \code{51+} years}
21+
#' \item{\code{age}}{Numeric, Age measured in years}
22+
#' \item{\code{gender}}{A factor, with levels \code{Female} or \code{Male}}
23+
#' \item{\code{expenditures}}{Numeric, Amount of expenditures spent by the
24+
#' State on an individual annually, measured in USD}
25+
#' \item{\code{ethnicity}}{Factor, Ethnic group, recorded as
26+
#' \code{American Indian}, \code{Asian}, \code{Black}, \code{Hispanic},
27+
#' \code{Multi Race}, \code{Native Hawaiian}, \code{Other},
28+
#' \code{White not Hispanic}}
29+
#' }
30+
#' #' @references www.amstat.org/publications/jse/v22n1/mickel.pdf Taylor, Stanley A.,
31+
#' and Amy E. Mickel. Simpson's paradox: A data set and discrimination case study
32+
#' exercise. Journal of Statistics Education 22.1 (2014).
33+
#' Data contained in supplement B of Taylor and Mickel.
34+
#'
35+
"dds.discr"
36+

0 commit comments

Comments
 (0)