Skip to content

Commit dfd927e

Browse files
authored
Merge pull request #856 from cmu-delphi/release/delphi-epidata-0.3.11
Release Delphi Epidata 0.3.11
2 parents c6e2238 + 6c382f5 commit dfd927e

File tree

19 files changed

+631
-78
lines changed

19 files changed

+631
-78
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.10
2+
current_version = 0.3.11
33
commit = False
44
tag = False
55

docs/api/covidcast-signals/dsew-cpr.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,20 @@ grand_parent: COVIDcast Epidata API
1717

1818
The Community Profile Report (CPR) is published by the Data Strategy and Execution Workgroup (DSEW) of the White House COVID-19 Team. For more information, see the [official description and data dictionary at healthdata.gov](https://healthdata.gov/Health/COVID-19-Community-Profile-Report/gqxm-d9w9) for "COVID-19 Community Profile Report".
1919

20-
This data source provides various COVID-19 related metrics, of which we report hospital admissions. Other sources of hospital admissions data in COVIDcast include [HHS](hhs.md) and [medical insurance claims](hospital-admissions.md). The CPR differs from these sources in that it is part of the public health surveillance stream (like HHS, unlike claims) but is available at a daily-county level (like claims, unlike HHS). CPR hospital admissions figures at the state level and above are meant to match those from HHS, but are known to differ. See the Limitations section for details.
20+
This data source provides various COVID-19 related metrics, of which we report hospital admissions and vaccinations.
2121

22-
County, MSA, state, and HHS-level values are pulled directly from CPR; nation-level values are aggregated up from the state level.
22+
For hospital admissions, other sources of data in COVIDcast include [HHS](hhs.md) and [medical insurance claims](hospital-admissions.md). The CPR differs from these sources in that it is part of the public health surveillance stream (like HHS, unlike claims) but is available at a daily-county level (like claims, unlike HHS). CPR hospital admissions figures at the state level and above are meant to match those from HHS, but are known to differ. See the Limitations section for details.
23+
24+
County, MSA, state, and HHS-level values are pulled directly from CPR when available; nation-level values are aggregated up from the state level.
2325

2426
| Signal | Description |
2527
| --- | --- |
2628
| `confirmed_admissions_covid_1d_7dav` | Number of adult and pediatric confirmed COVID-19 hospital admissions occurring each day. Smoothed using a 7-day average. <br/> **Earliest date available:** 2019-12-16 for state, HHS, and nation; 2021-01-06 for MSA and county |
2729
| `confirmed_admissions_covid_1d_prop_7dav` | Number of adult and pediatric confirmed COVID-19 hospital admissions occurring each day, per 100,000 population. Smoothed using a 7-day average. <br/> **Earliest date available:** 2019-12-16 for state, HHS, and nation; 2021-01-06 for MSA and county |
30+
| `people_full_vaccinated` | "People fully vaccinated includes those who have received two doses of the Pfizer-BioNTech or Moderna vaccine and those who have received one dose of the J&J/Janssen vaccine" - from the CPR data dictionary. <br/> **Earliest date available:** 2021-01-15 at any geo level except MSA and 2021-04-01 at the MSA level.|
31+
| `people_booster_doses` |"The count of people who received a booster dose includes anyone who is fully vaccinated and has received another dose of COVID-19 vaccine since 2021-08-13. This includes people who received booster doses and people who received additional doses." - from the CPR data dictionary. <br/> **Earliest date available:** 2021-11-01 for state, HHS, and nation. Not available below state level. |
32+
| `doses_admin_7dav` | "Doses administered shown by date of report, not date of administration. ... [S]ubmitting entities will have the ability to update or delete previously submitted records using new functionality available in CDC’s Data Clearinghouse. Use of this new functionality may result in fluctuations across metrics as historical data are updated or deleted" - from the CPR data dictionary. Smoothed using a 7-day average. <br/> **Earliest date available:** 2021-04-29 for state, HHS, and nation. Not available below state level. |
33+
| `booster_doses_admin_7dav` | "Doses administered shown by date of report, not date of administration. ... [S]ubmitting entities will have the ability to update or delete previously submitted records using new functionality available in CDC’s Data Clearinghouse. Use of this new functionality may result in fluctuations across metrics as historical data are updated or deleted" - from the CPR data dictionary. "[A] booster dose includes anyone who is fully vaccinated and has received another dose of COVID-19 vaccine since August 13, 2021. This includes people who received booster doses and people who received additional doses." - from the CPR data dictionary. Smoothed using a 7-day average.<br/> **Earliest date available:** 2021-11-01 for state, HHS, and nation. Not available below state level. |
2834

2935
## Table of contents
3036
{: .no_toc .text-delta}
@@ -36,14 +42,27 @@ County, MSA, state, and HHS-level values are pulled directly from CPR; nation-le
3642

3743
For counts-based fields like hospital admissions, CPR reports rolling sums for the preceding 7 days. The 7-day average signals are computed by Delphi by dividing each sum by 7 and assigning it to the last date in the included range, so e.g. the signal for June 7 is the average of the underlying data for June 1 through 7, inclusive.
3844

39-
The `confirmed_admissions_covid_1d_7dav` signal mirrors the `Confirmed COVID-19 admissions - last 7 days` CPR field for all geographic resolutions except nation. Nation-level admissions is calculated by summing state-level values.
45+
The `confirmed_admissions_covid_1d_7dav` signal mirrors the `Confirmed COVID-19 admissions - last 7 days` CPR field for all geographic resolutions except nation. Nation-level admissions is calculated by summing state-level values.
46+
47+
The `doses_admin_7dav` and `booster_doses_admin_7dav` signals mirror the `Doses administered - last 7 days` and `Booster doses administered - last 7 days` CPR fields for all geographic resolutions except nation. Nation-level doses are calculated by summing state-level values.
4048

4149
## Limitations
4250

4351
Nation-level estimates may be inaccurate since aggregations are done using state-level smoothed values instead of raw values. Ideally we would aggregate raw values before smoothing, but the raw values are not accessible in this case.
4452

4553
Because DSEW does not provide updates on weekends, estimates are not available for all dates.
4654

55+
Currently, of all the vaccination signals, county-level data is only available for `people_full_vaccinated`. Until 2021-11-15, several states reported vaccinated people not allocated to any individual county. These unallocated counts were reported using a FIPS code ending with `000` for that state, which is never a FIPS code for a real county.
56+
57+
This data source is susceptible to large corrections that can create strange data effects such as negative counts and sudden changes of 1M+ counts from one day to the next. Many of these corrections are documented in the "High-Visibility Data Notes" section in the first tab of the CPR spreadsheet for that day. To locate the correct spreadsheet for some `time_value` R, consult the following table:
58+
59+
| Signal type | CPR date |
60+
| - | - |
61+
| Hospital Admissions | usually R+2, sometimes R+1 |
62+
| Vaccinations | usually R+1, sometimes R+2 |
63+
64+
Not all CPRs have the same lag between the CPR date (listed in the filename) and the date for a particular signal.
65+
4766
### Differences with HHS reports
4867

4968
An analysis comparing the
@@ -56,7 +75,7 @@ An analysis comparing the
5675

5776
The report is currently updated daily, excluding weekends. However, this is subject to change; DSEW previously issued updates on a twice-weekly schedule. We check for updates daily.
5877

59-
Hospital admissions are reported with a lag of 2 days, but since the CPR is not updated on weekends, lag effectively varies from 2-4 days.
78+
The CPR is prepared with an internal lag of 1-2 days for most signals. The file is usually posted to healthdata.gov the day after the date listed in the filename, excluding weekends and federal holidays. This results in an effective lag in COVIDcast of 2-4 days, or 5 days when Monday is a holiday.
6079

6180
## Source and Licensing
6281

docs/api/covidcast-signals/quidel.md

Lines changed: 43 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ grand_parent: COVIDcast Epidata API
2020
* **Earliest issue available:** July 29, 2020
2121
* **Number of data revisions since May 19, 2020:** 1
2222
* **Date of last change:** October 22, 2020
23-
* **Available for:** hrr, msa, state (see [geography coding docs](../covidcast_geography.md))
23+
* **Available for:** county, hrr, msa, state, HHS, nation (see [geography coding docs](../covidcast_geography.md))
2424
* **Time type:** day (see [date format docs](../covidcast_times.md))
2525
* **License:** [CC BY](../covidcast_licensing.md#creative-commons-attribution)
2626

@@ -68,60 +68,66 @@ $$
6868
p = \frac{100 x}{n}
6969
$$
7070

71-
We estimate p across 3 temporal-spatial aggregation schemes:
71+
We estimate p across 6 temporal-spatial aggregation schemes:
72+
- daily, at the county level;
7273
- daily, at the MSA (metropolitan statistical area) level;
7374
- daily, at the HRR (hospital referral region) level;
74-
- daily, at the state level.
75+
- daily, at the state level;
76+
- daily, at the HHS level;
77+
- daily, at the US national level.
7578

76-
**MSA and HRR levels**: In a given MSA or HRR, suppose $$N$$ COVID tests are taken
77-
in a certain time period, $$X$$ is the number of tests taken with positive
78-
results.
79+
#### Standard Error
7980

80-
For raw signals:
81-
- if $$N \geq 50$$, we simply use:
81+
We assume the estimates for each time point follow a binomial distribution. The
82+
estimated standard error then is:
8283

8384
$$
84-
p = \frac{100 X}{N}
85+
\text{se} = 100 \sqrt{ \frac{\frac{p}{100}(1- \frac{p}{100})}{N} }
8586
$$
8687

87-
For smoothed signals, before taking the temporal pooling average,
88-
- if $$N \geq 50$$, we also use:
88+
#### Smoothing
89+
90+
We add two kinds of smoothing to the smoothed signals:
91+
92+
##### Temporal Smoothing
93+
Smoothed estimates are formed by pooling data over time. That is, daily, for
94+
each location, we first pool all data available in that location over the last 7
95+
days, and we then recompute everything described in the two subsections above.
96+
97+
Pooling in this way makes estimates available in more geographic areas, as many areas
98+
report very few tests per day, but have enough data to report when 7 days are considered.
99+
100+
##### Geographical Smoothing
101+
102+
**County, MSA and HRR levels**: In a given County, MSA or HRR, suppose $$N$$ COVID tests
103+
are taken in a certain time period, $$X$$ is the number of tests taken with positive
104+
results.
105+
106+
107+
For smoothed signals, after taking the temporal pooling,
108+
- if $$N \geq 50$$, we still use:
89109
$$
90110
p = \frac{100 X}{N}
91111
$$
92-
- if $$25 \leq N < 50$$, we lend $$50 - N$$ fake samples from its home state to shrink the
112+
- if $$25 \leq N < 50$$, we lend $$50 - N$$ fake samples from its parent state to shrink the
93113
estimate to the state's mean, which means:
94114
$$
95115
p = 100 \left( \frac{N}{50} \frac{X}{N} + \frac{50 - N}{50} \frac{X_s}{N_s} \right)
96116
$$
97117
where $$N_s, X_s$$ are the number of COVID tests and the number of COVID tests
98-
taken with positive results taken in its home state in the same time period.
118+
taken with positive results taken in its parent state in the same time period.
119+
A parent state is defined as the state with the largest proportion of the population
120+
in this county/MSA/HRR.
99121

100-
**State level**: the states with fewer than 50 tests are discarded. For the
101-
rest of the states with sufficient samples,
122+
Counties with sample sizes smaller than 50 are merged into megacounties for
123+
the raw signals; counties with sample sizes smaller than 25 are merged into megacounties for
124+
the smoothed signals.
102125

126+
**State level, HHS level, National level**: locations with fewer than 50 tests are discarded. For the remaining locations,
103127
$$
104128
p = \frac{100 X}{N}
105129
$$
106130

107-
#### Standard Error
108-
109-
We assume the estimates for each time point follow a binomial distribution. The
110-
estimated standard error then is:
111-
112-
$$
113-
\text{se} = 100 \sqrt{ \frac{\frac{p}{100}(1- \frac{p}{100})}{N} }
114-
$$
115-
116-
#### Smoothing
117-
118-
Smoothed estimates are formed by pooling data over time. That is, daily, for
119-
each location, we first pool all data available in that location over the last 7
120-
days, and we then recompute everything described in the last two
121-
subsections. Pooling in this way makes estimates available in more geographic
122-
areas, as many areas report very few tests per day, but have enough data to
123-
report when 7 days are considered.
124-
125131
### Lag and Backfill
126132

127133
Because testing centers may report their data to Quidel several days after they
@@ -142,13 +148,13 @@ This data source is based on data provided to us by a lab testing company. They
142148

143149
### Missingness
144150

145-
When fewer than 50 tests are reported in a state on a specific day, no data is
151+
When fewer than 50 tests are reported in a state/a HHS region/US on a specific day, no data is
146152
reported for that area on that day; an API query for all reported states on that
147153
day will not include it.
148154

149-
When fewer than 50 tests are reported in an HRR or MSA on a specific day, and
150-
not enough samples can be filled in from the parent state, no data is reported
151-
for that area on that day; an API query for all reported geographic areas on
155+
When fewer than 50 tests are reported in a county, HRR or MSA on a specific day, and
156+
not enough samples can be filled in from the parent state for smoothed signals specifically,
157+
no data is reported for that area on that day; an API query for all reported geographic areas on
152158
that day will not include it.
153159

154160
## Flu Tests

0 commit comments

Comments
 (0)