Description
Hi! 👋
What's the problem this feature will solve?
By following the pyproject.toml
specification and using build backends such as PDM-Backend (which makes use of the pyproject-metadata package), an author with a name with non-ASCII characters (e.g., João Palmeiro) and an email address is outputted as =?utf-8?q?Jo=C3=A3o_Palmeiro?= <[email protected]>
for the Author-email
core metadata field.
The Author-email
core metadata field is used to populate the Author
sidebar field for a package page.
The pyproject-metadata package leverages the email.utils.formataddr()
function to process the values of the authors
field of the pyproject.toml
file. This function encodes names following RFC 2047 if they have non-ASCII characters (the default charset
is utf-8
) and it is this value (e.g., =?utf-8?q?Jo=C3=A3o_Palmeiro?=
) that is written to metadata files like PKG-INFO
:
Metadata-Version: 2.1
Name: template-python-pdm-package
Version: 0.0.0
Summary: Opinionated Python + PDM template for new packages.
Author-Email: =?utf-8?q?Jo=C3=A3o_Palmeiro?= <[email protected]>
...
As a concrete example, check the FastAPI package, please:

Instead of Sebastián Ramírez, the author's name appears as =?utf-8?q?Sebasti=C3=A1n_Ram=C3=ADrez?=
.
In my opinion, given that the specification talks about RFC-822 and using the email.utils.formataddr()
function or the pyproject-metadata package in build backends (current or future ones) are valid approaches, I believe Warehouse/PyPI should decode RFC 2047-encoded author names. In this way, the authors names can be displayed as expected in the Author
sidebar field independently, that is, with the characters used in the pyproject.toml
file.
Describe the solution you'd like
Instead of =?utf-8?q?Jo=C3=A3o_Palmeiro?=
, I would like to see João Palmeiro
in the Author
sidebar field on a package page regardless of the build backend used (given that this is not an issue when using Hatchling, for example).
So, I propose the following changes (or similar ones) to the format_email
filter and its unit test:
+ from email.header import decode_header, make_header
def format_email(metadata_email: str) -> tuple[str, str]:
"""
Return the name and email address from a metadata RFC-822 string.
+ RFC 2047-encoded names are supported and decoded accordingly.
Use Jinja's `first` and `last` to access each part in a template.
TODO: Support more than one email address, per RFC-822.
"""
emails = []
for name, email in getaddresses([metadata_email]):
+ name = str(make_header(decode_header(name)))
if "@" not in email:
return name, ""
emails.append((name, email))
return emails[0][0], emails[0][1]
@pytest.mark.parametrize(
("meta_email", "expected_name", "expected_email"),
[
("not-an-email-address", "", ""),
("[email protected]", "", "[email protected]"),
('"Foo Bar" <[email protected]>', "Foo Bar", "[email protected]"),
+ ('=?utf-8?q?Jo=C3=A3o_Bar?= <[email protected]>', "João Bar", "[email protected]"),
],
)
def test_format_email(meta_email, expected_name, expected_email):
name, email = filters.format_email(meta_email)
assert name == expected_name
assert email == expected_email
Let me know what you think and if I can open a PR. Thanks!
Additional context
References:
- https://backend.pdm-project.org/
- https://github.com/pypa/pyproject-metadata
- https://docs.python.org/3/library/email.header.html
- https://packaging.python.org/en/latest/specifications/core-metadata/#author-email
- https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#authors-maintainers
- https://packaging.python.org/en/latest/specifications/pyproject-toml/#authors-maintainers
Related issues/discussions: