Predictive Uncertainty with Deep Learning and Count Data

This repository contains the official implementation of "Fully Heteroscedastic Count Regression with Deep Double Poisson Networks".

Important Links

Important figures used in the paper, along with the code that generated them, can be found in this directory.

Our implementations of various "single forward pass" techniques referenced in the paper can be found at the following locations:

Gaussian DNN
Poisson DNN
NB DNN
Stirn et al.
Seitzer et al. (with a beta_scheduler)
Immer et al.
DDPN (ours)
β-DDPN (ours) (with a beta_scheduler)

Implementations of "deep ensembles" referenced in the paper are found at:

Laksh. et al.
Immer et al.
Stirn et al.
Seitzer et al. (where individual models were trained with a beta_scheduler)
Poisson DNN Mixture
NB DNN Mixture
DDPN Mixture
β-DDPN Mixture (where individual models were trained with a beta_scheduler)

Getting Started

Install Project Dependencies

conda create --name deep-uncertainty python=3.10
conda activate deep-uncertainty
pip install -r requirements.txt

Install Pre-Commit Hook

To install this repo's pre-commit hook with automatic linting and code quality checks, simply execute the following command:

pre-commit install

When you commit new code, the pre-commit hook will run a series of scripts to standardize formatting. There will also be a flake8 check that provides warnings about various Python styling violations. These must be resolved for the commit to go through. If you need to bypass the linters for a specific commit, add the --no-verify flag to your git commit command.

Downloading Data

Contact the authors for access to the datasets used in the experiments.

Experiments

Reproducibility

Training configs for each model benchmarked in "Fully Heteroscedastic Count Regression with Deep Double Poisson Networks" can be found in the top-level configs directory.

To re-run the experiments from the paper, first contact the authors for the requisite dataset files and ensure they are saved in a top-level data directory. Then run the following command for any valid dataset:

bash deep_uncertainty/scripts/train_models.sh <dataset-name>

The resultant model weights will be saved to chkp/{dataset-name}. Note that some aspects of training configs may need to be adjusted depending on available hardware (# GPUs / GPU capacity) to ensure that the effective batch size matches what is reported in the paper.

Models can then be evaluated via

bash deep_uncertainty/scripts/eval_models.sh <dataset-name> <results-dir>

Ensembles are evaluated via

bash deep_uncertainty/scripts/eval_ensembles.sh <dataset_name>

Results will be saved to {results-dir}/{dataset-name}.

Training models

To train a model, first fill out a config (using this config as a template). Then, from the terminal, run

python deep_uncertainty/training/train_model.py --config path/to/your/config.yaml

Logs / saved model weights will be found at the locations specified in your config.

Training on Tabular Datasets

If fitting a model on tabular data, the training script assumes the dataset will be stored locally in .npz files with X_train, y_train, X_val, y_val, X_test, and y_test splits (for reference, these files are automatically produced by our synthetic data generating code). Pass a path to this .npz file in the dataset spec key in the config (also ensure that the dataset type is set to tabular and the dataset input_dim key is properly specified).

Training on Image Datasets

The currently-supported image datasets for training models are:

MNIST (Regressing the digit labels instead of classifying)
COCO-People (All images in COCO containing people, labeled with the count of "person" annotations)

To train a model on any of these datasets, simply specify "image" for the dataset type key in the config, then set dataset spec to the requisite dataset name (see the options in the ImageDatasetName class here)

Training on Text Datasets

The currently-supported text datasets for training models are:

Amazon Reviews (We predict the review rating (1-5 stars) from its associated text)

To train a model on this dataset, simply specify "text" for the dataset type key in the config, then set dataset spec to the requisite dataset name (see the options in the TextDatasetName class here)

Evaluating Models

Individual Models

To obtain evaluation metrics for a given model (and have them save to its log directory), use the following command:

python deep_uncertainty/evaluation/eval_model.py \
--log-dir path/to/training/log/dir \
--chkp-path path/to/model.ckpt

Ensembles

Sometimes, we may wish to evaluate an ensemble of models. To do this, first fill out a config using this file as a template. Then run:

python deep_uncertainty/evaluation/eval_ensemble.py --config path/to/config.yaml

Adding New Models

All regression models should inherit from the DiscreteRegressionNN class (found here). This base class is a lightning module, which allows for a lot of typical NN boilerplate code to be abstracted away. Beyond setting a few class attributes like loss_fn while calling the super-initializer, the only methods you need to actually write to make a new module are:

_forward_impl (defines a forward pass through the network)
_predict_impl (defines how to make predictions with the network, including any transformations on the output of the forward pass)
_point_prediction (defines how to interpret network output as a single point prediction for a regression target)
_addl_test_metrics_dict (defines any metrics beyond rmse/mae that are computed during model evaluation)
_update_addl_test_metrics_batch (defines how to update additional metrics beyond rmse/mae for each test batch).

See existing model classes like GaussianNN (found here) for an example of these steps.

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
configs		configs
deep_uncertainty		deep_uncertainty
results		results
tests		tests
weights/isolated_count_data		weights/isolated_count_data
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Uncertainty with Deep Learning and Count Data

Important Links

Getting Started

Install Project Dependencies

Install Pre-Commit Hook

Downloading Data

Experiments

Reproducibility

Training models

Training on Tabular Datasets

Training on Image Datasets

Training on Text Datasets

Evaluating Models

Individual Models

Ensembles

Adding New Models

About

Releases

Packages

Contributors 6

Languages

License

porterjenkins/deep_uncertainty

Folders and files

Latest commit

History

Repository files navigation

Predictive Uncertainty with Deep Learning and Count Data

Important Links

Getting Started

Install Project Dependencies

Install Pre-Commit Hook

Downloading Data

Experiments

Reproducibility

Training models

Training on Tabular Datasets

Training on Image Datasets

Training on Text Datasets

Evaluating Models

Individual Models

Ensembles

Adding New Models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages