Skip to content

Account for missing and invalid address headers #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

seanthegeek
Copy link

Python began strict email address format verification by default in Python 3.13 and backported the changes for security reasons.

As a result, when an address header is not used or malformed, mail-parser will return [('','')] (issues #132 and #133)

To fix these issues while maintaining the security of the default strict=True option in email.utils.parseaddr and email.utils.getaddresses, the following changes were made to mail-parser:

  • The existing constant ADDRESSES_HEADERS list now only includes headers that can contain multiple addresses
    • bcc
    • cc
    • reply-to
    • to
  • A new constant ADDRESS_HEADERS list includes headers that can only contain one address
    • delivered-to
    • from
    • sender
  • Header parsing is only attempted if the header exists and has a value (Closes Junk data returned for address headers that are not used #133)
  • Headers in the ADDRESS_HEADERS list are parsed using email.utils.parseaddr instead of email.utils.getaddresses, returning a tuple instead of a list of tuples
  • For headers in either list, if an invalid address header is detected, a string stating Invalid {} header is added to the defects list, where {} is the name of the header, and has_defects is set to True
  • Invalid headers in the ADDRESS_HEADERS are parsed manually if email.utils.parseaddr considers the address invalid, in order to show the intent of the defect on mail clients (Closes An email address will not be parsed if the Real Name is also an email address #132)

Demo email 1

From: [email protected] <[email protected]>
To: [email protected]
Subject: Example Email

Hello world!

Demo 1 JSON output before the changes

{
  "from": [
    [
      "",
      ""
    ]
  ],
  "delivered-to": [
    [
      "",
      ""
    ]
  ],
  "cc": [
    [
      "",
      ""
    ]
  ],
  "body": "Hello world!",
  "to_domains": [
    "",
    "example.com"
  ],
  "reply-to": [
    [
      "",
      ""
    ]
  ],
  "subject": "Example Email",
  "bcc": [
    [
      "",
      ""
    ]
  ],
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "has_defects": false
}

Demo 1 JSON output after the changes

{
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "body": "Hello world!",
  "from": [
    "[email protected]",
    "[email protected]"
  ],
  "subject": "Example Email",
  "to_domains": [
    "example.com"
  ],
  "has_defects": true,
  "defects": [
    "Invalid from header"
  ],
  "defects_categories": []
}

Demo email 2

From: [email protected]
To: [email protected]
Subject: Example Email

Hello world!

Demo 2 JSON output before the changes

{
  "from": [
    [
      "",
      ""
    ]
  ],
  "delivered-to": [
    [
      "",
      ""
    ]
  ],
  "cc": [
    [
      "",
      ""
    ]
  ],
  "body": "Hello world!",
  "to_domains": [
    "",
    "example.com"
  ],
  "reply-to": [
    [
      "",
      ""
    ]
  ],
  "subject": "Example Email",
  "bcc": [
    [
      "",
      ""
    ]
  ],
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "has_defects": false
}

Demo 2 JSON output after the changes

{
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "body": "Hello world!",
  "from": [
    "",
    "[email protected]"
  ],
  "subject": "Example Email",
  "to_domains": [
    "example.com"
  ],
  "has_defects": false,
  "defects": [],
  "defects_categories": []
}

Python began strict email address format verification [by default](python/cpython@4a153a1) in Python 3.13 and backported the changes for [security reasons](GHSA-5mwm-wccq-xqcp).

As a result, when an address header is not used or malformed, `mail-parser` will return `[('','')]` (issues SpamScope#132 and SpamScope#133)

To fix these issues while maintaining the security of the default `strict=True` option in `email.utils.parseaddr` and `email.utils.getaddresses`, the following changes were made to `mail-parser`:

- The existing constant `ADDRESSES_HEADERS` list now only includes headers that can contain multiple addresses
  - `bcc`
  - `cc`
  - `reply-to`
  - `to`
- A new constant `ADDRESS_HEADERS` list includes headers that can only contain one address
  - `delivered-to`
  - `from`
  - `sender`
- Header parsing is only attempted if the header exists and has a value (Closes SpamScope#133)
- Headers in the `ADDRESS_HEADERS` list are parsed using `email.utils.parseaddr` instead of `email.utils.getaddresses`, returning a tuple instead of a list of tuples
- For headers in either list, if  an invalid address header is detected, a string stating `Invalid {} header` is added to the `defects` list, where `{}` is the name of the header, and `has_defects` is set to `True`
- Invalid headers in the `ADDRESS_HEADERS` are parsed manually if `email.utils.parseaddr` considers the address invalid, in order to show the intent of the defect on mail clients (Closes SpamScope#132)

## Demo email 1

```enail
From: [email protected] <[email protected]>
To: [email protected]
Subject: Example Email

Hello world!
```

## Demo 1 JSON output before the changes

```json
{
  "from": [
    [
      "",
      ""
    ]
  ],
  "delivered-to": [
    [
      "",
      ""
    ]
  ],
  "cc": [
    [
      "",
      ""
    ]
  ],
  "body": "Hello world!",
  "to_domains": [
    "",
    "example.com"
  ],
  "reply-to": [
    [
      "",
      ""
    ]
  ],
  "subject": "Example Email",
  "bcc": [
    [
      "",
      ""
    ]
  ],
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "has_defects": false
}
```

## Demo 1 JSON output after the changes

```json
{
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "body": "Hello world!",
  "from": [
    "[email protected]",
    "[email protected]"
  ],
  "subject": "Example Email",
  "to_domains": [
    "example.com"
  ],
  "has_defects": true,
  "defects": [
    "Invalid from header"
  ],
  "defects_categories": []
}
```

## Demo email 2

```enail
From: [email protected]
To: [email protected]
Subject: Example Email

Hello world!
```

## Demo 2 JSON output before the changes

```json
{
  "from": [
    [
      "",
      ""
    ]
  ],
  "delivered-to": [
    [
      "",
      ""
    ]
  ],
  "cc": [
    [
      "",
      ""
    ]
  ],
  "body": "Hello world!",
  "to_domains": [
    "",
    "example.com"
  ],
  "reply-to": [
    [
      "",
      ""
    ]
  ],
  "subject": "Example Email",
  "bcc": [
    [
      "",
      ""
    ]
  ],
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "has_defects": false
}
```

## Demo 2 JSON output after the changes

```json
{
  "to": [
    [
      "",
      "[email protected]"
    ]
  ],
  "body": "Hello world!",
  "from": [
    "",
    "[email protected]"
  ],
  "subject": "Example Email",
  "to_domains": [
    "example.com"
  ],
  "has_defects": false,
  "defects": [],
  "defects_categories": []
}
```
@seanthegeek seanthegeek marked this pull request as draft March 24, 2025 02:25
@seanthegeek
Copy link
Author

seanthegeek commented Mar 24, 2025

Switched this to a draft because some attributes aren't working correctly. I'll look into this tomorrow.

@seanthegeek
Copy link
Author

Fixed

@seanthegeek seanthegeek marked this pull request as ready for review March 24, 2025 12:34
@seanthegeek seanthegeek changed the title Account for empty and invalid address headers Account for missing and invalid address headers Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant