docs: draft for JOSS

TomeHirata · TomeHirata · commit 800de19adbf9 · 2024-08-09T15:28:57.000+09:00
diff --git a/paper.bib b/paper.bib
@@ -0,0 +1,49 @@
+@misc{byambadalai2024estimatingdistributionaltreatmenteffects,
+      title={Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction}, 
+      author={Undral Byambadalai and Tatsushi Oka and Shota Yasui},
+      year={2024},
+      eprint={2407.16037},
+      archivePrefix={arXiv},
+      primaryClass={econ.EM},
+      url={https://arxiv.org/abs/2407.16037}, 
+}
+
+@book{fisher1935design,
+  title={The Design of Experiments},
+  author={Fisher, Ronald A.},
+  year={1935},
+  publisher={Oliver and Boyd}
+}
+
+@ARTICLE{2020NumPy-Array,
+  author  = {Harris, Charles R. and Millman, K. Jarrod and
+            van der Walt, Stéfan J and Gommers, Ralf and
+            Virtanen, Pauli and Cournapeau, David and
+            Wieser, Eric and Taylor, Julian and Berg, Sebastian and
+            Smith, Nathaniel J. and Kern, Robert and Picus, Matti and
+            Hoyer, Stephan and van Kerkwijk, Marten H. and
+            Brett, Matthew and Haldane, Allan and
+            Fernández del Río, Jaime and Wiebe, Mark and
+            Peterson, Pearu and Gérard-Marchant, Pierre and
+            Sheppard, Kevin and Reddy, Tyler and Weckesser, Warren and
+            Abbasi, Hameer and Gohlke, Christoph and
+            Oliphant, Travis E.},
+  title   = {Array programming with {NumPy}},
+  journal = {Nature},
+  year    = {2020},
+  volume  = {585},
+  pages   = {357–362},
+  doi     = {10.1038/s41586-020-2649-2}
+}
+
+@article{scikit-learn,
+  title={Scikit-learn: Machine Learning in {P}ython},
+  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
+          and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
+          and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
+          Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
+  journal={Journal of Machine Learning Research},
+  volume={12},
+  pages={2825--2830},
+  year={2011}
+}
diff --git a/paper.md b/paper.md
@@ -0,0 +1,71 @@
+---
+title: 'dte_adj: A Python package for Distributional Treatment Effects'
+tags:
+  - Python
+  - Distributional Treatment Effects
+  - Variance Reduction
+authors:
+  - name: Tomu Hirata
+    orcid: 0009-0006-3140-291X
+    equal-contrib: true
+    affiliation: "1, 3"
+  - name: Undral Byambadalai 
+    corresponding: true
+    affiliation: 1
+  - name: Tatsushi Oka
+    corresponding: true
+    affiliation: "1, 2"
+  - name: Shota Yasui 
+    corresponding: true
+    affiliation: 1
+affiliations:
+ - name: Cyber Agent, Inc, Japan
+   index: 1
+ - name: Keio University, Japan
+   index: 2
+ - name: Indeed Technologies Japan, Japan
+   index: 3
+date: 9 August 2024
+bibliography: paper.bib
+
+# Optional fields if submitting to a AAS journal too, see this blog post:
+# https://blog.joss.theoj.org/2018/12/a-new-collaboration-with-aas-publishing
+aas-doi: 10.3847/xxxxx 
+aas-journal: International Conference on Machine Learning
+---
+
+# Summary
+
+`dte_adj` is a Python package for computing empirical cumulative distribution function (CDF) and distributional treatment effect (DTE) from data obtained by Randomized control tests. This package also contains a novel method to reduce variance of DTE using pre-treatment covariates introduced in `@Undral:2024`.
+
+# Statement of need
+
+Since the groundbreaking work by `@Fisher:1935`, randomized experiments have been essential in understanding the impact of interventions and shaping policy decisions. A widely used metric in this context is the Average Treatment Effect (ATE). However, exploring the distributional treatment effects often offers a more nuanced understanding than focusing solely on the average effects.
+Python is widely used in the research community recently with its flexibility and ease-of-use in the user-interface. However, there is no popular Python library for computing Distributional Treatment Effect from data obtained from randomized experiments. While scipy provides a method for computing the empirical cumulative distribution function, it lacks convenient functions for calculating DTE or for estimating the variance of the distribution.
+`dte_adj` was developed to fill the gap by offering the functionalities for 1) computing CDF from data, 2) calculating DTE and its confidence band based on CDF and 3) visualizing DTE. This library uses `numpy` as input and output of methods, which is widely used for matrix computation in Python. The main classes of this library also follows the interface of popular library `scikit-learn`, which makes it easy for the users with Machine Learning development experieneces.
+
+# Functionalities
+
+The high level functionalities of `dte_adj` are as follows:
+1. Computing CDF and its variance based on number arrays
+2. Calculating distributional parameters and their confidence bands
+3. Visualiving distributional parameters and the confidence bands
+
+It currently offers two classes to compute CDF and its variance.
+- `SimpleDistributionEstimator`: this class offers a standard way to compute empirical CDF
+- `AdjustedDistributionEstimator`: this class offers a way to compute CDF with smaller variance adjusted by pre-treatment covariates introduced in `@Undral:2024`
+
+Both classes implement following methods to calculate distributional parameters.
+- `predict_dte`: method for computing Distributional Treatment Effect $DTE_{w, w'}(y) := F_{Y(w)}(y) - F_{Y(w')}(y)$, where $y$ is an outcome variable, $w$ is treatment type , and $F_{Y(w)}(y)$ is cumulative likelihood for treatment type $w$ and outcome $y$.
+- `predict_pte`: method for computing Probability Treatment Effect (PTE) $PTE_{w, w'}(y, h) := \left( F_{Y(w)}(y+h) - F_{Y(w)}(y) \right) - \left( F_{Y(w')}(y+h) - F_{Y(w')}(y) \right)$, where $h > 0$ is an interval of each evaluation window.
+- `predict_qte`: method for computing Quantile Treatment Effect (QTE) $QTE_{w, w'}(\tau) := F_{Y(w)}^{-1}(\tau) - F_{Y(w')}^{-1}(\tau)$, where $\tau$ is quantile.
+
+Lastly, `dte_adj.plot` module can be used for visualiting the distribution parameters. The examples of the visualization are available in the figures below.
+
+![DTE](docs/source/_static/dte_moment.png)
+![PTE](docs/source/_static/pte_simple.png)
+![QTE](docs/source/_static/qte.png)
+
+# Acknowledgements
+
+# References