Bug: email parser mishandles old-style boundaries #519

sebbASF · 2020-08-17T22:45:36Z

The code that parses boundary strings strips <>. This breaks parsing of some messages, for example the unit test corpus file tomcat-ancient-boundary.mbox which has the following boundary:

Content-Type: multipart/mixed; boundary="<<001-3e1dcd5a-119e>>"

Once parsed, the boundary becomes "<001-3e1dcd5a-119e>" which does not match.

There are two bugs for this:
https://bugs.python.org/issue28945
https://bugs.python.org/issue29020
but unfortunately no fix in sight.

It's possible to monkey-patch the library by providing a replacement copy of the method email.utils.collapse_rfc2231_value.

It might make sense to add this as an option (at least initially) for the importer so that missing messages could be imported.

Attached is some test code to demonstrate the fix.

parse_email.py.zip

sebbASF added the bug label Aug 20, 2020

sebbASF mentioned this issue Feb 7, 2022

Bug: email parser mishandles old-style boundaries apache/incubator-ponymail-foal#231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: email parser mishandles old-style boundaries #519

Bug: email parser mishandles old-style boundaries #519

sebbASF commented Aug 17, 2020

Bug: email parser mishandles old-style boundaries #519

Bug: email parser mishandles old-style boundaries #519

Comments

sebbASF commented Aug 17, 2020