-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
89 lines (69 loc) · 3.39 KB
/
README.Rmd
File metadata and controls
89 lines (69 loc) · 3.39 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# <span style="color: #425682"><b>seine</b></span>: Semiparametric Ecological Inference <a href="https://corymccartan.com/seine/"><img src="man/figures/logo.svg" align="right" height="144" /></a>
<!-- badges: start -->
[](https://github.com/CoryMcCartan/seine/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
Ecological inference (EI) is the statistical problem of learning individual-level
associations from aggregate-level data.
Without certain *identifying assumptions* and proper *estimation methods*,
researchers can easily draw incorrect conclusions from aggregate data, finding
patterns where none exist, missing important individual-level patterns, or
even concluding an effect runs in the opposite direction than it actually does.
This is known as the *ecological fallacy*.
The **seine** package allows researchers to perform modern ecological inference
quickly, accurately, and transparently.
- **Double/debiased machine learning** allows for controlling for confounding
covariates, which increases the plausibility of identifying assumptions.
Machine learning can be used to estimate the key regression model and avoid
strong parametric assumptions made by existing EI methods.
- **Sensitivity analysis** and **benchmarking** let researchers understand how
violations of their assumptions will affect results.
- A **tidy interface** makes the package modular and easy to use, and works well
with pipe-based workflows.
- **Minimal dependencies** and efficient estimation routines keep everything
**fast** and lightweight.
## Installation
You can install the development version of **seine** with
``` r
remotes::install_github("CoryMcCartan/seine")
```
## Example
The package contains county-level data from the 1968 U.S. presidential election
for southern states, where Hubert Humphrey (Democrat) ran against Richard Nixon
(Republican) and George Wallace (independent; segregationist).
In this data, one might be interested in estimating vote choice by race.
To do so, **seine** provides both a formula interface and a tidy interface
through a new `ei_spec()` object. Here, we'll demonstrate the latter.
```{r}
library(seine)
data(elec_1968)
spec = ei_spec(
elec_1968,
predictors = vap_white:vap_other,
outcome = pres_dem_hum:pres_ind_wal,
total = pres_total,
covariates = c(state, pop_city:pop_rural, farm:educ_coll,
inc_00_03k:inc_25_99k)
)
m = ei_ridge(spec)
rr = ei_riesz(spec, penalty = m$penalty)
ei_est(regr = m, riesz = rr, data = spec, conf_level = 0.95)
```
This workflow is explained in more detail on the [Get Started](https://corymccartan.com/seine/articles/seine.html)
page, along with demonstrations of data preprocessing and sensitivity analysis.
## Name
In addition to being an acronym for "semiparametric ecological inference,"
**seine** (pronounced SAYN) refers to a type of fishing net, which is used to
catch fish like tuna in aggregate. A net also is visually similar to the
tomography lines which are commonly used to visualize the interaction of the
latent data and the accounting identity in 2×2 ecological inference.