fix: Support reading from files that have an UTF-8 Byte Order Mark #670

ldemidov · 2025-07-08T22:52:58Z

Adds support for reading files with UTF-8 BOM. This is commonly created by Windows text editors and should be skipped because serde deserialization will not handle those bytes.

We have encountered this issue with our Windows customers who may create UTF-8 BOM files without their knowledge. Although we fixed it with a custom FileSource implementation, it would be nice to have this in the upstream to help others who may run into this issue.

This PR came from discussion in #565
Unlike that PR, this one handles only UTF-8 Boms, and not other encodings, and does not pull in any new dependencies.

Adds a test with a UTF-8 BOM text file.
Updates FileSourceFile to skip the 3 BOM bytes if they are detected.

…ot parsed properly when read using FileSourceFile

coveralls · 2025-07-09T15:33:06Z

Pull Request Test Coverage Report for Build 16156062432

Details

6 of 6 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.1%) to 64.77%

Totals
Change from base Build 16154172080:	0.1%
Covered Lines:	945
Relevant Lines:	1459

💛 - Coveralls

ldemidov added 2 commits July 8, 2025 18:34

Adds test (currently failing) to show that a UTF-8 file with BOM is n…

0680a28

…ot parsed properly when read using FileSourceFile

Skip the UTF-8 BOM bytes when reading the file contents in FileSource.

588bd66

ldemidov mentioned this pull request Jul 8, 2025

Added BOM sniffing check to file load #565

Open

ldemidov changed the title ~~Support-utf8-bom-files~~ fix: Support reading from files that have an UTF-8 Byte Order Mark Jul 8, 2025

epage merged commit d8bdf0f into rust-cli:main Jul 9, 2025
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Support reading from files that have an UTF-8 Byte Order Mark #670

fix: Support reading from files that have an UTF-8 Byte Order Mark #670

Uh oh!

ldemidov commented Jul 8, 2025

Uh oh!

coveralls commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

fix: Support reading from files that have an UTF-8 Byte Order Mark #670

fix: Support reading from files that have an UTF-8 Byte Order Mark #670

Uh oh!

Conversation

ldemidov commented Jul 8, 2025

Uh oh!

coveralls commented Jul 9, 2025

Pull Request Test Coverage Report for Build 16156062432

Details

💛 - Coveralls

Uh oh!

Uh oh!

Uh oh!