You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code that parses boundary strings strips <>. This breaks parsing of some messages, for example the unit test corpus file tomcat-ancient-boundary.mbox which has the following boundary:
The code that parses boundary strings strips <>. This breaks parsing of some messages, for example the unit test corpus file tomcat-ancient-boundary.mbox which has the following boundary:
Content-Type: multipart/mixed; boundary="<<001-3e1dcd5a-119e>>"
Once parsed, the boundary becomes "<001-3e1dcd5a-119e>" which does not match.
There are two bugs for this:
https://bugs.python.org/issue28945
https://bugs.python.org/issue29020
but unfortunately no fix in sight.
It's possible to monkey-patch the library by providing a replacement copy of the method email.utils.collapse_rfc2231_value.
It might make sense to add this as an option (at least initially) for the importer so that missing messages could be imported.
Attached is some test code to demonstrate the fix.
parse_email.py.zip
The text was updated successfully, but these errors were encountered: