FIX: Write text files with utf-8 encoding instead of utf-8-sig#1531
FIX: Write text files with utf-8 encoding instead of utf-8-sig#1531scott-huberty wants to merge 3 commits intomne-tools:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1531 +/- ##
==========================================
- Coverage 97.00% 96.98% -0.02%
==========================================
Files 43 43
Lines 10669 10679 +10
==========================================
+ Hits 10349 10357 +8
- Misses 320 322 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
hoechenberger
left a comment
There was a problem hiding this comment.
Hi, great so far!
Except for JSON reading: we should NOT support reading JSON with a BOM, as this simply isn't valid JSON. We should fail hard in this case.
As for the changelog update: I would think that the only relevant change for users is TSV writing. I would be specific about this and omit that other text files are affected as well. But this is just my personal view
np.loadtxt with encoding="utf-8-sig" crashes on TSV files that contain Latin-1 characters such as µ (micro-sign, 0xB5), which is common in European datasets for channel units like "µV". Add a try/except UnicodeDecodeError that retries with latin-1 encoding and emits a warning. This is related to open issue mne-tools#1530 and PR mne-tools#1531. Discovered via OpenNeuro datasets during eegdash batch ingestion.
OK! addressed in a2f71e2 . I expanded my two encoding constants into a little class, now that we have different encoding rules for TSV vs JSON I/O. For me this feels like a clean approach but If folks think it is overkill feel free to let me know. |
Fixes #1530 cc @sappelhoff @hoechenberger
Hope you don't mind that I added 2 new constants to
config.py. I think this makes the code intent clearer and will make updating these encodings easier in the future.