Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: email parser mishandles old-style boundaries #519

Open
sebbASF opened this issue Aug 17, 2020 · 0 comments
Open

Bug: email parser mishandles old-style boundaries #519

sebbASF opened this issue Aug 17, 2020 · 0 comments
Labels

Comments

@sebbASF
Copy link
Contributor

sebbASF commented Aug 17, 2020

The code that parses boundary strings strips <>. This breaks parsing of some messages, for example the unit test corpus file tomcat-ancient-boundary.mbox which has the following boundary:

Content-Type: multipart/mixed; boundary="<<001-3e1dcd5a-119e>>"

Once parsed, the boundary becomes "<001-3e1dcd5a-119e>" which does not match.

There are two bugs for this:
https://bugs.python.org/issue28945
https://bugs.python.org/issue29020
but unfortunately no fix in sight.

It's possible to monkey-patch the library by providing a replacement copy of the method email.utils.collapse_rfc2231_value.

It might make sense to add this as an option (at least initially) for the importer so that missing messages could be imported.

Attached is some test code to demonstrate the fix.

parse_email.py.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant