Add ECG Processing Support to SHHS dataset for Sleep Signal Analysis #636
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Students: Soumya Mazumder (soumyam4), Salazar, Andrew (aas15), Lin, Sharon (xinyiyl2)
Paper Title: WatchSleepNet: A Novel Model and Pretraining Approach for Advancing Sleep Staging with Smartwatches
Paper Link:
Overview
This PR implements support for the Sleep Heart Health Study (SHHS) dataset in PyHealth, enabling researchers to work with polysomnographic signals including EEG and ECG data for sleep-related cardiovascular and neurological research.
Changes
SHHS data overview can be found here - Sleep Heart Health Study (SHHS)
This PR enhances the support for existing SHHS dataset - shhh.py by adding new feature to extract ecg signal from edf files.
1. Modification of file shhs.py
A. Fix the function process_EEG_data() - The function is currently not working as it is inheriting the class BaseSignalDataset which is deprecated in the recent version. Proper modification has been done to make the function working.
B. Add new function process_ECG_data() -
A new function process_ECG_data() is added which provides the below features-
Usage Example
`from pyhealth.datasets import SHHSDataset
Initialize dataset
dataset = SHHSDataset(
root="/path/to/SHHS/",
dev=True, # Development mode for faster testing
refresh_cache=False # Use existing cache
)
Process EEG data for sleep analysis
eeg_data = dataset.process_EEG_data()
print(f"Processed {len(eeg_data)} patients")
Process ECG data with flexible annotation handling
success = dataset.process_ECG_data(
out_dir="/output/path/",
require_annotations=False, # Handle missing annotations gracefully
select_chs=["ECG"],
target_fs=100
)
print(f"ECG processing successful: {success}")`
2. Modification of file utils.py
This contains the utility function required for processing different datasets. I have added two new functions here -
read_edf_data() - to process the polysomnography signals
Parameters:
data_path: path to EDF file.
label_path: SHHS XML annotation file.
dataset: "SHHS" or "MESA".
select_chs: list of channels to extract.
target_fs: optional downsample frequency.
Returns:
data: (T, C) extracted channel signals.
fs: sampling frequency.
stages: stage array aligned with signal.
save_to_npz
"""Saves extracted ECG/PPG/sleep staging data to NPZ."""
3. Creation of new file shhs_test.py
This includes the relevant test cases for new as well as existing functions for shhs dataset.
Testing
python -m pytest [test_shhs.py]