Skip to content

Fallback recovery for "EGI epoch first/last samps could not be parsed" (ValueError in read_raw_egi) #13951

@The1per

Description

@The1per

Description of the problem

mne.io.read_raw_egi throws a ValueError when reading certain EGI .mff files
because n_samps_epochs (from epochs.xml) does not match the actual block samples in signal1.bin.

This mismatch occurs due to: Improperly closed recordings , asynchronous writes during PSG/PNS recording and pauses or impedance checks during recording that create irregular fractional epochs and maybe something else

I wrote a lightweight fallback function that reuses MNE's internal _get_blocks to calculate the exact duration based on the actual binary payload, recreating a valid epochs.xml on the fly - so n_samps_epochs == n_samps_block.

Would you be open to a Pull Request adding this as a fallback mechanism (e.g., recover_epochs=True) in read_raw_egi?

As a permanent solution within MNE, rather than rewriting the XML on disk,we can modifying the validation block in mne/io/egi/egimff.py (inside _read_header, (lines 97-108). Instead of raising a RuntimeError when bad == True, it could dynamically overwrite the epochs dictionary in memory with n_samps_block and signal_blocks["n_blocks"] (perhaps guarded by a recover_epochs=True parameter). So we can fix epoch file without altering the user's original raw data on disk

Thanks

Steps to reproduce

import xml.etree.ElementTree as ET
import re
import shutil
from pathlib import Path
from mne.io.egi.general import _get_blocks

def _recover_egi_epochs_xml(mff_path):
    """Recover or recreate a corrupted epochs.xml for EGI .mff files.

    This function parses the binary `signal1.bin` using MNE's internal `_get_blocks` 
    to calculate the exact number of blocks and samples. It then retrieves the time 
    divisor from `info.xml` and calculates the precise ``endTime`` in microseconds 
    to reconstruct a valid ``epochs.xml``.
    
    Parameters
    ----------
    mff_path : str | Path
        The path to the .mff directory.
    """
    mff_path = Path(mff_path)
    signal_bin = mff_path / "signal1.bin"
    epochs_xml = mff_path / "epochs.xml"
    info_xml_path = mff_path / "info.xml"
    
    if not signal_bin.exists() or not info_xml_path.exists():
        raise FileNotFoundError(f"Missing essential MFF files in {mff_path}")

    # 1. Use MNE's built-in parser to get exact block and sample counts
    # This guarantees our new epochs.xml will perfectly match MNE's expectations
    signal_blocks = _get_blocks(str(signal_bin))
    total_samples = int(signal_blocks["samples_block"].sum())
    last_block_idx = int(signal_blocks["n_blocks"])
    sfreq = int(signal_blocks["sfreq"])

    # 2. Extract record time format from info.xml to determine fractional multiplier
    tree = ET.parse(info_xml_path)
    root = tree.getroot()
    record_time_elem = root.find(".//{http://www.egi.com/info_mff}recordTime")
    if record_time_elem is None:
        raise ValueError("Could not find <recordTime> in info.xml")
        
    record_time = record_time_elem.text
    match = re.match(r".*\.(\d{6}(?:\d{3})?)[+-]", record_time)
    if not match:
        raise ValueError(f"Unexpected recordTime format: {record_time}")
        
    frac = match.group(1)
    # Determine if microseconds (len 6) or nanoseconds (len 9)
    div = 1000 if len(frac) == 6 else 1000000

    # 3. Calculate exact duration in microseconds
    # endTime is expected in microseconds.
    duration_us = int((total_samples / sfreq) * (div * 1000))

    # 4. Backup existing corrupted epochs.xml (if it exists)
    if epochs_xml.exists():
        backup_epochs_xml = mff_path.parent / f"{mff_path.name}__epochs_corrupted.xml"
        if not backup_epochs_xml.exists():
            shutil.copy2(epochs_xml, backup_epochs_xml)

    # 5. Write the corrected epochs.xml
    xml_str = (
        f'<?xml version="1.0" encoding="utf-8"?>\n'
        f'<epochs xmlns="http://www.egi.com/epochs_mff">\n'
        f'    <epoch>\n'
        f'        <beginTime>0</beginTime>\n'
        f'        <endTime>{duration_us}</endTime>\n'
        f'        <firstBlock>1</firstBlock>\n'
        f'        <lastBlock>{last_block_idx}</lastBlock>\n'
        f'    </epoch>\n'
        f'</epochs>\n'
    )
    epochs_xml.write_text(xml_str, encoding="utf-8")
    print(f"Successfully created/recovered epochs.xml for {mff_path.name} with {total_samples} samples and {last_block_idx} blocks.")

Link to data

No response

Expected results

MNE either loads the file (e.g. throwing a warning), or offers a recover_epochs=True parameter that uses its internal _get_blocks to rebuild the epochs.xml on the fly to salvage the clinical data.

Actual results

RuntimeError: EGI epoch first/last samps could not be parsed:
[0]
[17680358]

Additional information

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions