Display RFC 2047-encoded author names correctly in the sidebar of a package page

Hi! 👋 

**What's the problem this feature will solve?**

By following the [`pyproject.toml` specification](https://packaging.python.org/en/latest/specifications/pyproject-toml/#authors-maintainers) and using build backends such as [PDM-Backend](https://backend.pdm-project.org/) (which makes use of the [pyproject-metadata](https://github.com/pypa/pyproject-metadata) package), an author with a name with non-ASCII characters (e.g., João Palmeiro) and an email address is outputted as `=?utf-8?q?Jo=C3=A3o_Palmeiro?= <joaopalmeiro@proton.me>` for the `Author-email` core metadata field. 

The `Author-email` core metadata field is used to populate the [`Author` sidebar field](https://github.com/pypi/warehouse/blob/b419f327fe25e75e0c21a20860712418db3bd025/warehouse/templates/includes/packaging/project-data.html#L127-L128) for a package page.

The pyproject-metadata package [leverages the `email.utils.formataddr()` function](https://github.com/pypa/pyproject-metadata/blob/154e6b670bd047d9869db36584be89df32df0614/pyproject_metadata/__init__.py#L360-L363) to process the values ​​of the `authors` field of the `pyproject.toml` file. This function encodes names following RFC 2047 if they have non-ASCII characters (the default `charset` is `utf-8`) and it is this value (e.g., `=?utf-8?q?Jo=C3=A3o_Palmeiro?=`) that is written to metadata files like `PKG-INFO`:

```plain
Metadata-Version: 2.1
Name: template-python-pdm-package
Version: 0.0.0
Summary: Opinionated Python + PDM template for new packages.
Author-Email: =?utf-8?q?Jo=C3=A3o_Palmeiro?= <joaopalmeiro@proton.me>
...
```

As a concrete example, check the [FastAPI package](https://pypi.org/project/fastapi/), please:

<img width="1582" alt="image" src="https://github.com/user-attachments/assets/705c879b-07b0-4b2a-8702-e028a59b355f">

Instead of [Sebastián Ramírez](https://github.com/fastapi/fastapi/blob/a3f42718de8a6da2e89d476df92e6e81d7eb24e1/pyproject.toml#L12), the author's name appears as `=?utf-8?q?Sebasti=C3=A1n_Ram=C3=ADrez?=`.

In my opinion, given that the [specification talks about RFC-822](https://packaging.python.org/en/latest/specifications/core-metadata/#author-email) and using the `email.utils.formataddr()` function or the pyproject-metadata package in build backends (current or future ones) are valid approaches, I believe Warehouse/PyPI should decode RFC 2047-encoded author names. In this way, the authors names can be displayed as expected in the `Author` sidebar field independently, that is, with the characters used in the `pyproject.toml` file.

**Describe the solution you'd like**

Instead of `=?utf-8?q?Jo=C3=A3o_Palmeiro?=`, I would like to see `João Palmeiro` in the `Author` sidebar field on a package page regardless of the build backend used (given that this is not an issue when using [Hatchling](https://pypi.org/project/hatchling/), for example).

So, I propose the following changes (or similar ones) to the [`format_email` filter](https://github.com/pypi/warehouse/blob/b419f327fe25e75e0c21a20860712418db3bd025/warehouse/filters.py#L178-L189) and its [unit test](https://github.com/pypi/warehouse/blob/b419f327fe25e75e0c21a20860712418db3bd025/tests/unit/test_filters.py#L274-L285):

```diff
+ from email.header import decode_header, make_header


def format_email(metadata_email: str) -> tuple[str, str]:
    """
    Return the name and email address from a metadata RFC-822 string.
+   RFC 2047-encoded names are supported and decoded accordingly.
    Use Jinja's `first` and `last` to access each part in a template.
    TODO: Support more than one email address, per RFC-822.
    """
    emails = []
    for name, email in getaddresses([metadata_email]):
+       name = str(make_header(decode_header(name)))
        if "@" not in email:
            return name, ""
        emails.append((name, email))
    return emails[0][0], emails[0][1]
```

```diff
@pytest.mark.parametrize(
    ("meta_email", "expected_name", "expected_email"),
    [
        ("not-an-email-address", "", ""),
        ("foo@bar.com", "", "foo@bar.com"),
        ('"Foo Bar" <foo@bar.com>', "Foo Bar", "foo@bar.com"),
+       ('=?utf-8?q?Jo=C3=A3o_Bar?= <joao@bar.com>', "João Bar", "joao@bar.com"),
    ],
)
def test_format_email(meta_email, expected_name, expected_email):
    name, email = filters.format_email(meta_email)
    assert name == expected_name
    assert email == expected_email
```

Let me know what you think and if I can open a PR. Thanks!

**Additional context**

References:

- https://backend.pdm-project.org/
- https://github.com/pypa/pyproject-metadata
- https://docs.python.org/3/library/email.header.html
- https://packaging.python.org/en/latest/specifications/core-metadata/#author-email
- https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#authors-maintainers
- https://packaging.python.org/en/latest/specifications/pyproject-toml/#authors-maintainers

Related issues/discussions:

- https://github.com/pypi/warehouse/issues/9400
- https://github.com/pypi/warehouse/issues/14813
- https://discuss.python.org/t/core-metadata-email-fields-unicode/7421/8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Display RFC 2047-encoded author names correctly in the sidebar of a package page #16496

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Display RFC 2047-encoded author names correctly in the sidebar of a package page #16496

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions