Skip to content

FIX: Write text files with utf-8 encoding instead of utf-8-sig#1531

Open
scott-huberty wants to merge 3 commits intomne-tools:mainfrom
scott-huberty:encoding
Open

FIX: Write text files with utf-8 encoding instead of utf-8-sig#1531
scott-huberty wants to merge 3 commits intomne-tools:mainfrom
scott-huberty:encoding

Conversation

@scott-huberty
Copy link
Copy Markdown
Collaborator

Fixes #1530 cc @sappelhoff @hoechenberger

Hope you don't mind that I added 2 new constants to config.py. I think this makes the code intent clearer and will make updating these encodings easier in the future.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 96.98%. Comparing base (bbc83e7) to head (a2f71e2).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
mne_bids/utils.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1531      +/-   ##
==========================================
- Coverage   97.00%   96.98%   -0.02%     
==========================================
  Files          43       43              
  Lines       10669    10679      +10     
==========================================
+ Hits        10349    10357       +8     
- Misses        320      322       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@hoechenberger hoechenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, great so far!

Except for JSON reading: we should NOT support reading JSON with a BOM, as this simply isn't valid JSON. We should fail hard in this case.

As for the changelog update: I would think that the only relevant change for users is TSV writing. I would be specific about this and omit that other text files are affected as well. But this is just my personal view

bruAristimunha added a commit to bruAristimunha/mne-bids that referenced this pull request Mar 9, 2026
np.loadtxt with encoding="utf-8-sig" crashes on TSV files that contain
Latin-1 characters such as µ (micro-sign, 0xB5), which is common in
European datasets for channel units like "µV".

Add a try/except UnicodeDecodeError that retries with latin-1 encoding
and emits a warning. This is related to open issue mne-tools#1530 and PR mne-tools#1531.

Discovered via OpenNeuro datasets during eegdash batch ingestion.
@scott-huberty
Copy link
Copy Markdown
Collaborator Author

Except for JSON reading: we should NOT support reading JSON with a BOM, as this simply isn't valid JSON. We should fail hard in this case.

OK! addressed in a2f71e2 . I expanded my two encoding constants into a little class, now that we have different encoding rules for TSV vs JSON I/O. For me this feels like a clean approach but If folks think it is overkill feel free to let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MNE-BIDS should not include a BOM when writing TSV files

2 participants