forked from jrnold/masteringmetrics
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNHIS2009.Rmd
91 lines (79 loc) · 3 KB
/
NHIS2009.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
output: html_document
editor_options:
chunk_output_type: console
---
# National Health Interview Survey
This reproduces the analyses in Table 1.1 of @AngristPischke2014.
which compares people with and without health insurance in the 2009 National Health Interview Survey (NHIS).
The code is derived from [NHIS2009_hicompare.do](http://masteringmetrics.com/wp-content/uploads/2015/01/NHIS2009_hicompare.do).
Load the prerequisite packages.
```{r libs,message=FALSE}
library("tidyverse")
library("magrittr")
library("haven")
```
Load the data (originally from <http://masteringmetrics.com/wp-content/uploads/2015/01/Data.zip>), and adjust a few of the columns to account for differences in
how Stata and R store data.
```{r NHIS2009}
data("NHIS2009", package = "masteringmetrics")
```
Remove missing values.
```{r NHIS2009_remove_missing}
NHIS2009 <- NHIS2009 %>%
filter(marradult, perweight != 0) %>%
group_by(serial) %>%
mutate(hi_hsb = mean(hi_hsb1, na.rm = TRUE)) %>%
filter(!is.na(hi_hsb), !is.na(hi)) %>%
mutate(female = sum(fml)) %>%
filter(female == 1) %>%
select(-female)
```
For the sample only include married adults between 26 and 59 in age, and remove single person households.
```{r NHIS2009_sample}
NHIS2009 <- NHIS2009 %>%
filter(between(age, 26, 59),
marradult, adltempl >= 1)
```
Keep only single family households.
```{r NHIS2009_single_family}
NHIS2009 <- NHIS2009 %>%
group_by(serial) %>%
filter(length(serial) > 1L) %>%
ungroup()
```
Tables of wives and husbands by health insurance. status.
The weighting following the "analytic" weights in the original `.do` file which weights observations by `perweight` and normalizes the weights so that the sub-samples of males and females have the same number as the original sample.
```{r}
NHIS2009 %>%
group_by(fml) %>%
# normalize person weights to match number of observations in each
# group
mutate(perweight = perweight / sum(perweight) * n()) %>%
group_by(fml, hi) %>%
summarise(n_wt = sum(perweight)) %>%
group_by(fml) %>%
mutate(prop = n_wt / sum(n_wt))
```
Compare sample statistics of mean and women, with and without health insurance.
```{r NHIS2009_diff}
varlist <- c("hlth", "nwhite", "age", "yedu", "famsize", "empl", "inc")
NHIS2009_diff <- NHIS2009 %>%
# rlang::set_attrs with NULL removes attributes from columns.
# this avoids a warning from gather about differing attributes
map_dfc(~ rlang::set_attrs(.x, NULL)) %>%
select(fml, hi, one_of(varlist)) %>%
gather(variable, value, -fml, -hi) %>%
group_by(fml, hi, variable) %>%
summarise(mean = mean(value, na.rm = TRUE), sd = sd(value, na.rm = TRUE)) %>%
gather(stat, value, -fml, -hi, -variable) %>%
unite(stat_hi, stat, hi) %>%
spread(stat_hi, value) %>%
mutate(diff = mean_1 - mean_0)
```
```{r NHIS2009_diff_table,results='asis'}
knitr::kable(NHIS2009_diff, digits = 3)
```
## References
- <http://masteringmetrics.com/wp-content/uploads/2014/12/ReadMe_NHIS.txt>
- <http://masteringmetrics.com/wp-content/uploads/2015/01/NHIS2009_hicompare.do>