Add probabilistic extrapolation model for classification accuracy #503
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request: Accuracy Extrapolation Module for PyHealth
Contributor Information
Contribution Type
Dataset Performance Extrapolation Module
Description
This pull request adds a new module to PyHealth that enables users to predict model performance (accuracy, AUROC, etc.) when trained on larger datasets based on smaller pilot datasets. The implementation builds on the APEx-GP approach from "A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets given Small Pilot Data" with two significant improvements:
The module is particularly valuable for healthcare ML applications where data collection is expensive and time-consuming, as it helps researchers make informed decisions about whether collecting more data is likely to significantly improve model performance.
Files Overview
Core Implementation:
pyhealth/metrics/extrapolation.py
: Main module implementing GP-based performance extrapolationpyhealth/metrics/__init__.py
: Updated to include the new module exportspyhealth/utils.py
: Added tensor_to_numpy helper functionExamples & Documentation:
pyhealth/metrics/README_EXTRAPOLATION.md
: Detailed module documentationPyHealth/examples/accuracy_extrapolation_example.py
: Example usage scriptTests:
pyhealth/unittests/test_extrapolation.py
: Unit tests for the moduleDependencies: