Skip to content

Conversation

@Soumya123p
Copy link

@Soumya123p Soumya123p commented Dec 1, 2025

Students: Soumya Mazumder (soumyam4), Salazar, Andrew (aas15), Lin, Sharon (xinyiyl2)
Paper Title: WatchSleepNet: A Novel Model and Pretraining Approach for Advancing Sleep Staging with Smartwatches
Paper Link:

Overview

This PR implements support for the Sleep Heart Health Study (SHHS) dataset in PyHealth, enabling researchers to work with polysomnographic signals including EEG and ECG data for sleep-related cardiovascular and neurological research.

Changes

SHHS data overview can be found here - Sleep Heart Health Study (SHHS)
This PR enhances the support for existing SHHS dataset - shhh.py by adding new feature to extract ecg signal from edf files.

1. Modification of file shhs.py

  • A. Fix the function process_EEG_data() - The function is currently not working as it is inheriting the class BaseSignalDataset which is deprecated in the recent version. Proper modification has been done to make the function working.

  • B. Add new function process_ECG_data() -
    A new function process_ECG_data() is added which provides the below features-

    • Advanced ECG signal processing with configurable parameters:
    • require_annotations: Optional annotation requirement (default: True)
    • select_chs: Configurable channel selection (default: ["ECG"])
    • target_fs: Target sampling frequency (default: 100 Hz)

Usage Example
`from pyhealth.datasets import SHHSDataset

Initialize dataset

dataset = SHHSDataset(
root="/path/to/SHHS/",
dev=True, # Development mode for faster testing
refresh_cache=False # Use existing cache
)

Process EEG data for sleep analysis

eeg_data = dataset.process_EEG_data()
print(f"Processed {len(eeg_data)} patients")

Process ECG data with flexible annotation handling

success = dataset.process_ECG_data(
out_dir="/output/path/",
require_annotations=False, # Handle missing annotations gracefully
select_chs=["ECG"],
target_fs=100
)
print(f"ECG processing successful: {success}")`

2. Modification of file utils.py

This contains the utility function required for processing different datasets. I have added two new functions here -
read_edf_data() - to process the polysomnography signals
Parameters:
data_path: path to EDF file.
label_path: SHHS XML annotation file.
dataset: "SHHS" or "MESA".
select_chs: list of channels to extract.
target_fs: optional downsample frequency.
Returns:
data: (T, C) extracted channel signals.
fs: sampling frequency.
stages: stage array aligned with signal.

save_to_npz
"""Saves extracted ECG/PPG/sleep staging data to NPZ."""

3. Creation of new file shhs_test.py

This includes the relevant test cases for new as well as existing functions for shhs dataset.

Testing
python -m pytest [test_shhs.py]

Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will poke Jathurshan later for this, but we've been looking into a precache for preprocessing argument here, and I think you've roughly implemented the idea. We were wondering if it was possible to make the process_EEG_data() here optional? As well as be able to take in PyHealth processors as part of the arguments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants