Support BIDS compliant eyetracking data I/O (BEP 020)#1512
Support BIDS compliant eyetracking data I/O (BEP 020)#1512scott-huberty wants to merge 43 commits intomne-tools:mainfrom
Conversation
for more information, see https://pre-commit.ci
| # With the addition of 'physioevents' files, we need to explicitly prepend an | ||
| # underscore to the search suffix to avoid finding multiple candidates. | ||
| search_str_complete = str(search_dir / f"{search_str_filename}*_{search_suffix}") | ||
|
|
There was a problem hiding this comment.
A BIDS dataset with eyetracking data will have both <match>_events.tsv and <match>_physioevents.tsv (see example directory tree below).
Thus, on main, doing bids_path.find_matching_sidecar(suffix="events", extension=".tsv") Could return both <match>_events.tsv and <match>_physioevents.tsv.
Thus I think it is necessary to add the underscore like I have here, which fixes the aforementioned issue.
This change does break a currently existing test ( tests/test_path -k test_find_matching_sidecar). Though I'd argue that my change here improves behavior and negates the need for that test, e.g. in the case that these two files are present:
<match>_coordsystem.json
<match>_2coordsystem.json
Doing find_matching_sidecar(suffix='coordsystem', extension='.json')` will now return a single file path and will no longer raise an error. I think that this is the desirable behavior.
|bids/
|--- README
|--- dataset_description.json
|--- participants.json
|--- participants.tsv
|--- sub-01/
|------ ses-01/
|--------- sub-01_ses-01_scans.tsv
|--------- beh/
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_channels.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_events.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_events.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physio.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physio.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physioevents.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye1_physioevents.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physio.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physio.tsv
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physioevents.json
|------------ sub-01_ses-01_task-foo_run-01_recording-eye2_physioevents.tsv
There was a problem hiding this comment.
Per my comment above, in b24d5f9 I refactored test_find_matching_sidecar to simulate a new scenario that will still trigger the "Expected to find a single File" error when using find_matching_sidecar
Per my comment at mne-tools#1512 (comment) ... Since I updated the behavior of find_matching_sidecar in mne-tools#1512 , this test no longer failed under the conditions that this test simulated. In my comment I argue that this is actually an improvement. As such I have adjustd this test to simulate a new scenario that will trigger the same error.
for more information, see https://pre-commit.ci
Eyetracking BIDS mandates that physioevents TSV files do not have headers
…data) e.g. it should be sub-foo_ses-bar_events.tsv NOT sub-foo_ses-bar_recording-eye1_events.tsv
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1512 +/- ##
==========================================
- Coverage 96.96% 96.22% -0.74%
==========================================
Files 43 47 +4
Lines 10617 11265 +648
==========================================
+ Hits 10295 10840 +545
- Misses 322 425 +103 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'd say this is a decent first draft so I am opening this up for review. This is a hefty PR, and I am in no particular rush to push it through. I'd rather sleep well knowing that a few others took a close look, particularly wrt the API, even if it takes a while for folks to get around to reviewing this. For reviewers, I'd start by looking at EDIT: I have it on my TODO to bolster the codecov, but will wait until after initial review. |
|
Re-designating this as draft, pending upstream discussion which may clarify or change the Eyetracking spec. I want to make sure that this PR follows the current/eventual spec before asking for reviewer time. |
x_coordinate, y_coordinate data are REQUIRED and their column names MUST be x_coordinate, y_coordiante if pupil data is present, its column name MUST be pupil_size
insted of appending RecordedEye
| def _parse_ext(raw_fname): | ||
| """Split a filename into its name and extension.""" | ||
| raw_fname = str(raw_fname) | ||
| fname, ext = os.path.splitext(raw_fname) | ||
| # Some callsites in our codebase pass _parse_ext(None) ... | ||
| if not raw_fname: | ||
| return "", "" | ||
| raw_fname = Path(raw_fname) | ||
| fname, exts = raw_fname.with_suffix(""), raw_fname.suffixes | ||
| while fname.suffix: | ||
| fname = fname.with_suffix("") | ||
| # BTi data is the only file format that does not have a file extension | ||
| if ext == "" or "c,rf" in fname: | ||
| if not exts or "c,rf" in str(fname): | ||
| logger.info( | ||
| 'Found no extension for raw file, assuming "BTi" format ' | ||
| "and appending extension .pdf" | ||
| ) | ||
| ext = ".pdf" | ||
| # If ending on .gz, check whether it is an .nii.gz file | ||
| elif ext == ".gz" and raw_fname.endswith(".nii.gz"): | ||
| ext = ".nii.gz" | ||
| fname = fname[:-4] # cut off the .nii | ||
| return fname, ext | ||
| elif len(exts) == 1: | ||
| ext = exts[0] | ||
| else: # >1 extension e.g. .nii.gz, .tsv.gzc | ||
| ext = "".join(raw_fname.suffixes) | ||
| fname = raw_fname.name[: -len(ext)] # cut off the .nii.gz, tsv.gz etc. | ||
| # TODO: Should we return Path obj, and refactor call sites if needed? | ||
| return str(fname), ext |
There was a problem hiding this comment.
I need _parse_ext to to handle *_tsv.gz files so I refactored this function to generally be able to handle multi-extension filenames (whereas before it would only handle this for the .nii.gz case). While I was add it, I modernized the code by moving from os.path to pathlib.Path
| assert (dest / "session" / "segment.dat").read_bytes() == b"\x00\x01\x02" | ||
|
|
||
|
|
||
| @pytest.mark.xfail(reason="Need to discuss this with devs.") |
There was a problem hiding this comment.
per my comment on _parse_ext, I refactored _parse_ext to handle multi-extension filenames.
However on L232, we seem to test that copyfile_eeglab(raw_fname, new_name) will be able to copy <match>.set to <match>.set.set without error. .
I think this is because on main:
_parse_ext("/CONVERTED_test_raw.set.set") Will (incorrectly!) return ('/CONVERTED_test_raw.set', '.set') .. Which just happened to play well with copyfile_eeglab.
if I can change one line in this test:
new_name = bids_root / f"CONVERTED_{fname}.set"To
new_name = bids_root / f"CONVERTED_{fname}"Then this test will pass again.
| def _from_compressed_tsv(fname, dtypes=None): | ||
| """Wrap _from_tsv and then read column names from corresponding JSON.""" | ||
| fname = Path(fname) |
There was a problem hiding this comment.
Physiological (including eyetracking) data files need to be saved into gzipped text files. So I added functionality to be able to read/write such files.
| # would like to overwrite the existing dataset | ||
| if bids_path.fpath.exists(): | ||
| # TODO: Why did I gatekeep eyetracking from this path. Remove and see if it breaks. | ||
| if bids_path.fpath.exists() and not is_eyetracking_only: |
There was a problem hiding this comment.
The reason that I gatekeep this path from eyetracking-only data is because, bids_path.fpath will point to the _physio.tsv.gz file, which already exists at this point because it gets created earlier in this function (by write_eyetrack_tsvs).
Maybe there is a cleaner way to go about this but I will need to think about it.
|
Note to self and reviewers: This PR is huge. I am going to split out parts of this into standalone PR's, to reduce the diff here and aid focused reviews (e.g. the compressed TSV I/O code, the |
Seeing what it will take to read/write eyetracking data according to BIDS-specification BEP 020.
I had to hack around the internals a bit because the BIDS spec for eyetracking data differs a bit from what we've seen with other modalities (AFAIK). For example:
'beh'). If eyetracking data is collected alongside another modality such as EEG, it gets written to the'eeg'directory.'beh','eeg','func'), eyetracking text files are given the_physiosuffix. When reading, to figure out if a<match>_physio.tsvfile contains eyetracking data (and not some other physiological data type), you must open its corresponding JSON sidecar file and inspect thePhsyioTypefield.Needs