Skip to content

Support BIDS compliant eyetracking data I/O (BEP 020)#1512

Open
scott-huberty wants to merge 43 commits intomne-tools:mainfrom
scott-huberty:eyetrack
Open

Support BIDS compliant eyetracking data I/O (BEP 020)#1512
scott-huberty wants to merge 43 commits intomne-tools:mainfrom
scott-huberty:eyetrack

Conversation

@scott-huberty
Copy link
Copy Markdown
Collaborator

@scott-huberty scott-huberty commented Feb 5, 2026

Seeing what it will take to read/write eyetracking data according to BIDS-specification BEP 020.

I had to hack around the internals a bit because the BIDS spec for eyetracking data differs a bit from what we've seen with other modalities (AFAIK). For example:

  1. instead of storing data in a single binary file, the data for each eye gets written to its own separate text file.
  2. eyetracking data does not get its own modality/datatype. If eyetracking data is recorded on its own, it gets stored in a behavior directory ('beh'). If eyetracking data is collected alongside another modality such as EEG, it gets written to the 'eeg' directory.
  3. regardless of the directory it is written to (e.g. 'beh', 'eeg', 'func'), eyetracking text files are given the _physio suffix. When reading, to figure out if a <match>_physio.tsv file contains eyetracking data (and not some other physiological data type), you must open its corresponding JSON sidecar file and inspect the PhsyioType field.

Needs

@scott-huberty scott-huberty marked this pull request as draft February 5, 2026 18:54
Comment thread mne_bids/read.py
Comment thread mne_bids/path.py
Comment on lines +1896 to 1899
# With the addition of 'physioevents' files, we need to explicitly prepend an
# underscore to the search suffix to avoid finding multiple candidates.
search_str_complete = str(search_dir / f"{search_str_filename}*_{search_suffix}")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A BIDS dataset with eyetracking data will have both <match>_events.tsv and <match>_physioevents.tsv (see example directory tree below).

Thus, on main, doing bids_path.find_matching_sidecar(suffix="events", extension=".tsv") Could return both <match>_events.tsv and <match>_physioevents.tsv.

Thus I think it is necessary to add the underscore like I have here, which fixes the aforementioned issue.

This change does break a currently existing test ( tests/test_path -k test_find_matching_sidecar). Though I'd argue that my change here improves behavior and negates the need for that test, e.g. in the case that these two files are present:

<match>_coordsystem.json
<match>_2coordsystem.json

Doing find_matching_sidecar(suffix='coordsystem', extension='.json')` will now return a single file path and will no longer raise an error. I think that this is the desirable behavior.

|bids/
|--- README
|--- dataset_description.json
|--- participants.json
|--- participants.tsv
|--- sub-01/
|------ ses-01/
|--------- sub-01_ses-01_scans.tsv
|--------- beh/
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_channels.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_events.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_events.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physio.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physio.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physioevents.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physioevents.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physio.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physio.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physioevents.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physioevents.tsv

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment above, in b24d5f9 I refactored test_find_matching_sidecar to simulate a new scenario that will still trigger the "Expected to find a single File" error when using find_matching_sidecar

scott-huberty and others added 3 commits February 16, 2026 10:45
Per my comment at mne-tools#1512 (comment) ... Since I updated the behavior of find_matching_sidecar in mne-tools#1512 , this test no longer failed under the conditions that this test simulated.

In my comment I argue that this is actually an improvement. As such I have adjustd this test to simulate a new scenario that will trigger the same error.
Eyetracking BIDS mandates that physioevents TSV files do not have headers
Comment thread mne_bids/write.py Outdated
@scott-huberty scott-huberty marked this pull request as ready for review February 19, 2026 06:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 86.21190% with 95 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.22%. Comparing base (e66a9ea) to head (e8aa431).
⚠️ Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
mne_bids/physio/eyetracking.py 80.28% 68 Missing ⚠️
mne_bids/tests/test_eyetracking.py 89.70% 14 Missing ⚠️
mne_bids/physio/generic.py 84.61% 6 Missing ⚠️
mne_bids/read.py 91.11% 4 Missing ⚠️
mne_bids/tsv_handler.py 92.68% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1512      +/-   ##
==========================================
- Coverage   96.96%   96.22%   -0.74%     
==========================================
  Files          43       47       +4     
  Lines       10617    11265     +648     
==========================================
+ Hits        10295    10840     +545     
- Misses        322      425     +103     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@scott-huberty
Copy link
Copy Markdown
Collaborator Author

scott-huberty commented Feb 19, 2026

I'd say this is a decent first draft so I am opening this up for review.

This is a hefty PR, and I am in no particular rush to push it through. I'd rather sleep well knowing that a few others took a close look, particularly wrt the API, even if it takes a while for folks to get around to reviewing this.

For reviewers, I'd start by looking at read_raw_bids and write_raw_bids to see how I am wiring the Eyetracking support into the codebase. I'd also take a look at the tutorial I added, to get an idea of how the user must configure their BIDSPath in order to write stand-alone eyetracking data, how to eyetracking data collected with another modality, how to write calibration information, and how to read all this information back from disk.

EDIT: I have it on my TODO to bolster the codecov, but will wait until after initial review.

@scott-huberty
Copy link
Copy Markdown
Collaborator Author

Re-designating this as draft, pending upstream discussion which may clarify or change the Eyetracking spec. I want to make sure that this PR follows the current/eventual spec before asking for reviewer time.

x_coordinate, y_coordinate data are REQUIRED and their column names MUST be x_coordinate, y_coordiante

if pupil data is present, its column name MUST be pupil_size
insted of appending RecordedEye
@scott-huberty scott-huberty marked this pull request as ready for review February 27, 2026 19:55
Comment thread mne_bids/path.py
Comment on lines 1641 to +1663
def _parse_ext(raw_fname):
"""Split a filename into its name and extension."""
raw_fname = str(raw_fname)
fname, ext = os.path.splitext(raw_fname)
# Some callsites in our codebase pass _parse_ext(None) ...
if not raw_fname:
return "", ""
raw_fname = Path(raw_fname)
fname, exts = raw_fname.with_suffix(""), raw_fname.suffixes
while fname.suffix:
fname = fname.with_suffix("")
# BTi data is the only file format that does not have a file extension
if ext == "" or "c,rf" in fname:
if not exts or "c,rf" in str(fname):
logger.info(
'Found no extension for raw file, assuming "BTi" format '
"and appending extension .pdf"
)
ext = ".pdf"
# If ending on .gz, check whether it is an .nii.gz file
elif ext == ".gz" and raw_fname.endswith(".nii.gz"):
ext = ".nii.gz"
fname = fname[:-4] # cut off the .nii
return fname, ext
elif len(exts) == 1:
ext = exts[0]
else: # >1 extension e.g. .nii.gz, .tsv.gzc
ext = "".join(raw_fname.suffixes)
fname = raw_fname.name[: -len(ext)] # cut off the .nii.gz, tsv.gz etc.
# TODO: Should we return Path obj, and refactor call sites if needed?
return str(fname), ext
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need _parse_ext to to handle *_tsv.gz files so I refactored this function to generally be able to handle multi-extension filenames (whereas before it would only handle this for the .nii.gz case). While I was add it, I modernized the code by moving from os.path to pathlib.Path

assert (dest / "session" / "segment.dat").read_bytes() == b"\x00\x01\x02"


@pytest.mark.xfail(reason="Need to discuss this with devs.")
Copy link
Copy Markdown
Collaborator Author

@scott-huberty scott-huberty Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my comment on _parse_ext, I refactored _parse_ext to handle multi-extension filenames.

However on L232, we seem to test that copyfile_eeglab(raw_fname, new_name) will be able to copy <match>.set to <match>.set.set without error. .

I think this is because on main:

_parse_ext("/CONVERTED_test_raw.set.set") 

Will (incorrectly!) return ('/CONVERTED_test_raw.set', '.set') .. Which just happened to play well with copyfile_eeglab.

if I can change one line in this test:

new_name = bids_root / f"CONVERTED_{fname}.set"

To

new_name = bids_root / f"CONVERTED_{fname}"

Then this test will pass again.

Comment thread mne_bids/tsv_handler.py
Comment on lines +201 to +203
def _from_compressed_tsv(fname, dtypes=None):
"""Wrap _from_tsv and then read column names from corresponding JSON."""
fname = Path(fname)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Physiological (including eyetracking) data files need to be saved into gzipped text files. So I added functionality to be able to read/write such files.

Comment thread mne_bids/write.py
# would like to overwrite the existing dataset
if bids_path.fpath.exists():
# TODO: Why did I gatekeep eyetracking from this path. Remove and see if it breaks.
if bids_path.fpath.exists() and not is_eyetracking_only:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that I gatekeep this path from eyetracking-only data is because, bids_path.fpath will point to the _physio.tsv.gz file, which already exists at this point because it gets created earlier in this function (by write_eyetrack_tsvs).

Maybe there is a cleaner way to go about this but I will need to think about it.

@scott-huberty
Copy link
Copy Markdown
Collaborator Author

Note to self and reviewers:

This PR is huge. I am going to split out parts of this into standalone PR's, to reduce the diff here and aid focused reviews (e.g. the compressed TSV I/O code, the _parse_ext refactoring code, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant