Skip to content

Commit 800de19

Browse files
committed
docs: draft for JOSS
1 parent 36dd0f7 commit 800de19

File tree

2 files changed

+120
-0
lines changed

2 files changed

+120
-0
lines changed

paper.bib

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
@misc{byambadalai2024estimatingdistributionaltreatmenteffects,
2+
title={Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction},
3+
author={Undral Byambadalai and Tatsushi Oka and Shota Yasui},
4+
year={2024},
5+
eprint={2407.16037},
6+
archivePrefix={arXiv},
7+
primaryClass={econ.EM},
8+
url={https://arxiv.org/abs/2407.16037},
9+
}
10+
11+
@book{fisher1935design,
12+
title={The Design of Experiments},
13+
author={Fisher, Ronald A.},
14+
year={1935},
15+
publisher={Oliver and Boyd}
16+
}
17+
18+
@ARTICLE{2020NumPy-Array,
19+
author = {Harris, Charles R. and Millman, K. Jarrod and
20+
van der Walt, Stéfan J and Gommers, Ralf and
21+
Virtanen, Pauli and Cournapeau, David and
22+
Wieser, Eric and Taylor, Julian and Berg, Sebastian and
23+
Smith, Nathaniel J. and Kern, Robert and Picus, Matti and
24+
Hoyer, Stephan and van Kerkwijk, Marten H. and
25+
Brett, Matthew and Haldane, Allan and
26+
Fernández del Río, Jaime and Wiebe, Mark and
27+
Peterson, Pearu and Gérard-Marchant, Pierre and
28+
Sheppard, Kevin and Reddy, Tyler and Weckesser, Warren and
29+
Abbasi, Hameer and Gohlke, Christoph and
30+
Oliphant, Travis E.},
31+
title = {Array programming with {NumPy}},
32+
journal = {Nature},
33+
year = {2020},
34+
volume = {585},
35+
pages = {357–362},
36+
doi = {10.1038/s41586-020-2649-2}
37+
}
38+
39+
@article{scikit-learn,
40+
title={Scikit-learn: Machine Learning in {P}ython},
41+
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
42+
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
43+
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
44+
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
45+
journal={Journal of Machine Learning Research},
46+
volume={12},
47+
pages={2825--2830},
48+
year={2011}
49+
}

paper.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: 'dte_adj: A Python package for Distributional Treatment Effects'
3+
tags:
4+
- Python
5+
- Distributional Treatment Effects
6+
- Variance Reduction
7+
authors:
8+
- name: Tomu Hirata
9+
orcid: 0009-0006-3140-291X
10+
equal-contrib: true
11+
affiliation: "1, 3"
12+
- name: Undral Byambadalai
13+
corresponding: true
14+
affiliation: 1
15+
- name: Tatsushi Oka
16+
corresponding: true
17+
affiliation: "1, 2"
18+
- name: Shota Yasui
19+
corresponding: true
20+
affiliation: 1
21+
affiliations:
22+
- name: Cyber Agent, Inc, Japan
23+
index: 1
24+
- name: Keio University, Japan
25+
index: 2
26+
- name: Indeed Technologies Japan, Japan
27+
index: 3
28+
date: 9 August 2024
29+
bibliography: paper.bib
30+
31+
# Optional fields if submitting to a AAS journal too, see this blog post:
32+
# https://blog.joss.theoj.org/2018/12/a-new-collaboration-with-aas-publishing
33+
aas-doi: 10.3847/xxxxx
34+
aas-journal: International Conference on Machine Learning
35+
---
36+
37+
# Summary
38+
39+
`dte_adj` is a Python package for computing empirical cumulative distribution function (CDF) and distributional treatment effect (DTE) from data obtained by Randomized control tests. This package also contains a novel method to reduce variance of DTE using pre-treatment covariates introduced in `@Undral:2024`.
40+
41+
# Statement of need
42+
43+
Since the groundbreaking work by `@Fisher:1935`, randomized experiments have been essential in understanding the impact of interventions and shaping policy decisions. A widely used metric in this context is the Average Treatment Effect (ATE). However, exploring the distributional treatment effects often offers a more nuanced understanding than focusing solely on the average effects.
44+
Python is widely used in the research community recently with its flexibility and ease-of-use in the user-interface. However, there is no popular Python library for computing Distributional Treatment Effect from data obtained from randomized experiments. While scipy provides a method for computing the empirical cumulative distribution function, it lacks convenient functions for calculating DTE or for estimating the variance of the distribution.
45+
`dte_adj` was developed to fill the gap by offering the functionalities for 1) computing CDF from data, 2) calculating DTE and its confidence band based on CDF and 3) visualizing DTE. This library uses `numpy` as input and output of methods, which is widely used for matrix computation in Python. The main classes of this library also follows the interface of popular library `scikit-learn`, which makes it easy for the users with Machine Learning development experieneces.
46+
47+
# Functionalities
48+
49+
The high level functionalities of `dte_adj` are as follows:
50+
1. Computing CDF and its variance based on number arrays
51+
2. Calculating distributional parameters and their confidence bands
52+
3. Visualiving distributional parameters and the confidence bands
53+
54+
It currently offers two classes to compute CDF and its variance.
55+
- `SimpleDistributionEstimator`: this class offers a standard way to compute empirical CDF
56+
- `AdjustedDistributionEstimator`: this class offers a way to compute CDF with smaller variance adjusted by pre-treatment covariates introduced in `@Undral:2024`
57+
58+
Both classes implement following methods to calculate distributional parameters.
59+
- `predict_dte`: method for computing Distributional Treatment Effect $DTE_{w, w'}(y) := F_{Y(w)}(y) - F_{Y(w')}(y)$, where $y$ is an outcome variable, $w$ is treatment type , and $F_{Y(w)}(y)$ is cumulative likelihood for treatment type $w$ and outcome $y$.
60+
- `predict_pte`: method for computing Probability Treatment Effect (PTE) $PTE_{w, w'}(y, h) := \left( F_{Y(w)}(y+h) - F_{Y(w)}(y) \right) - \left( F_{Y(w')}(y+h) - F_{Y(w')}(y) \right)$, where $h > 0$ is an interval of each evaluation window.
61+
- `predict_qte`: method for computing Quantile Treatment Effect (QTE) $QTE_{w, w'}(\tau) := F_{Y(w)}^{-1}(\tau) - F_{Y(w')}^{-1}(\tau)$, where $\tau$ is quantile.
62+
63+
Lastly, `dte_adj.plot` module can be used for visualiting the distribution parameters. The examples of the visualization are available in the figures below.
64+
65+
![DTE](docs/source/_static/dte_moment.png)
66+
![PTE](docs/source/_static/pte_simple.png)
67+
![QTE](docs/source/_static/qte.png)
68+
69+
# Acknowledgements
70+
71+
# References

0 commit comments

Comments
 (0)