Skip to content

ENH: add support for symbolic links in package repository #728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 2, 2025

Conversation

dnicolodi
Copy link
Member

@dnicolodi dnicolodi commented Mar 30, 2025

Symbolic links are included as regular files in the sdist archive. Only symbolic links pointing to files withing the archive are supported. Replaces #713.

@dnicolodi dnicolodi force-pushed the sdist-symlinks branch 2 times, most recently from ef8925e to 9ac7f0a Compare March 30, 2025 14:11
@rgommers rgommers added the enhancement New feature or request label Apr 24, 2025
Copy link
Contributor

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dnicolodi, this looks pretty good to me. The one thing I'd suggest is to emit warnings for symlinks that get ignored, rather than silently skipping over them. Something like:

WARNING: symlink pointing outside package ignored: symlinks-1.0.0/baz.py

WDYT?

@dnicolodi
Copy link
Member Author

I think the warning is a good idea. Probably it is a good idea to warn also for other archive members that are not handled: anything that is not a file and symbolic links to directories even if withing the archive. The reason for not adding the warning was that the latest release that (accidentally) supported symlinks didn't warn either. I'll add the warnings.

@dnicolodi dnicolodi force-pushed the sdist-symlinks branch 3 times, most recently from de51b39 to 4365161 Compare April 24, 2025 20:50
@dnicolodi
Copy link
Member Author

Warnings added.

Symbolic links are included as regular files in the sdist archive.
Only symbolic links pointing to files withing the archive are
supported.
member = copy.copy(meson_dist.getmember(target))
member.name = name
except KeyError:
warnings.warn(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR. I am curious - what would be the suggested approach to disable this warning?

In the original PR I saw nanoarrow discussed, so I just wanted to clarify the use case there. With nanoarrow being a multi-language project, the repo is laid out like:

c/
    <some_c_sources>
    meson.build
python/
    <some_python_sources>
    meson.build

So the point of the symlink in python/subprojects is to allow the python module to be built on the sources in the c directory. When the sdist is created, we are happy to not have any symlinked content included, as we expect the user to have the c sources installed on the system.

So in our case the warnings are unnecessary

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW that comment applies to really allow of the Arrow libraries (pyarrow, nanoarrow, adbc-driver-<driver_name>) as they are all multi-language projects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is just a warning that tells you that your symbolic links will not be included in the sdist, as the sdist format does not allow them. Would you prefer meson-python to drop the symlinks silently?

Anyhow, That source layout, unless there is a pyproject.toml in the root directory, like this

meson.build
pyproject.toml
c/
    <some_c_sources>
    meson.build
python/
    <some_python_sources>
    meson.build

would result in an invalid Python sdist when created with meson-python. Thus the fact that the warning is emitted in your case is completely irrelevant: you cannot use the result of the operation emitting a warning for anything useful, thus maybe do not perform the operation at all.

I think this way of organizing the project is problematic: it bundles the sources for the C and Python parts in the same repository, thus coupling them tightly, but at the same time wants to treat the Python part as a separate project which is loosely coupled to the C part at build time. I don't know how you came to choose this organization, but I have the impression that it creates way more problems that it solves.

What I would have done is to have the C and Python parts in two separate repositories. The Python part would bring in the C part either as a git submodule or as a Meson subproject or as a Meson subproject referenced as a git submodule.

But it is your project, so you can organize it as you like, and if you like mono-repos, you can have one. However, you are choosing to work against established best practices, thus do not expect tools to bend their workings to accommodate your choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW that comment applies to really allow of the Arrow libraries (pyarrow, nanoarrow, adbc-driver-<driver_name>) as they are all multi-language projects

If that the case, their repository layout cannot be packaged into a Python sdist as symbolic links are not supported by the sdist standard. Anyway, their repository structure does not work for creating a sdist with meson-python. Thus I don't see the problem: these project choose to make things complicated for themselves and will need to find an ad-hoc way to generate a valid sdist from their monorepo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to only publish wheels, and not sdists, which is no doubt a tempting proposition to much scientific code that has painful sdist-building workflows (esp. doesn't work well at all on Windows)

you cannot use the result of the operation emitting a warning for anything useful, thus maybe do not perform the operation at all.

... but in this case that's exactly what their plan is, right? Not to use it, but only have it exist as a convenience when already building inside the monorepo?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of nanoarrow, what we do is use a symlink for local development (so that a user can test C/python changes all at once), but for a sdist we end up using a custom dist script that replaces the symlink with a copy of the files needed to build a standalone Python library.

https://github.com/apache/arrow-nanoarrow/blob/4c1e484ca7d575250444e0a8eee5884e13489104/python/generate_dist.py

  • I personally think that people who use monorepos are often (though not always) the same type of people that don't even want to create sdists at all

While the majority of users I would think are fine with pre-packaged wheels, I would be hesitant to rely on that exclusively. Using PyArrow as an example, there are legitimate use cases where end users do not want all of the standard functionality the package bundles with its C/C++ extensions, so they end up building from source with their own build options. I know that the version of PyArrow provided in lambda containers turns off some build features that the standard wheel provides, because AWS Lambda has certain size restrictions and the default build is too large.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dist script for sdist generation sounds quite reasonable.

Re the repo layout: this is actually pretty common especially for older C and C++ projects with optional Python bindings. It'd be good to support that use case explicitly via a test package; we've encountered it multiple times by now (not suggesting it needs to be part of this PR).

Thinking about it more: I like the nanoarrow repo setup I think. It's probably easier to deal with a C++ library in .. as a subproject than it would be to use the top-level meson.build file as the entry point for C++ and Python packages at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR with nanoarrow's main branch - sdist and wheel builds work as expected, and no warnings are visible. So I'll go ahead with merging this PR. We can open a new issue to keep track of the use case and adding a test package and/or some docs (I'm interested in doing that, but no time for it right now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in gh-744

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rgommers !

@rgommers rgommers added this to the v0.19.0 milestone May 2, 2025
@dnicolodi dnicolodi mentioned this pull request May 2, 2025
Copy link
Contributor

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dnicolodi, this looks good to me so time to give it a go!

@rgommers rgommers modified the milestones: v0.19.0, v0.18.0 May 2, 2025
@rgommers rgommers merged commit 34efb74 into mesonbuild:main May 2, 2025
39 checks passed
@rgommers
Copy link
Contributor

rgommers commented May 3, 2025

Going through the release process turned up an issue - our own sdist (which I didn't test before🤦🏼) has problems:

Created /home/rgommers/code/meson-python/.mesonpy-f9vct4_v/meson-dist/meson-python-undefined.tar.gz
meson-python: warning: symbolic link with absolute path target, pointing outside the archive, or dangling ignored: meson-python-undefined/docs/changelog.rst
meson-python: warning: symbolic link with absolute path target, pointing outside the archive, or dangling ignored: meson-python-undefined/tests/packages/symlinks/baz.py
meson-python: warning: symbolic link with absolute path target, pointing outside the archive, or dangling ignored: meson-python-undefined/tests/packages/symlinks/qux.py
meson-python: warning: symbolic link with absolute path target, pointing outside the archive, or dangling ignored: meson-python-undefined/tests/packages/symlinks/submodule/__init__.py
meson-python: warning: symbolic link with absolute path target, pointing outside the archive, or dangling ignored: meson-python-undefined/tests/packages/symlinks/submodule/bbb.py

I haven't uploaded the release nor pushed the tag yet. It'd be good to resolve this first, I'll have a look now.

The 0.17.1 sdist on PyPI is also missing the changelog.rst file, so this isn't a new problem. That said, it doesn't point to outside the archive, so something seems wrong.

@rgommers
Copy link
Contributor

rgommers commented May 3, 2025

The problem seems to be related to version handling, which messes with the TarInfo names:

In [1]: dist_name
Out[1]: 'meson_python-0.18.0'

In [2]: meson_version
Out[2]: 'undefined'

In [3]: meson_dist.getmembers()[:30]
Out[3]: 
[<TarInfo 'meson-python-undefined' at 0x76fa1e53af80>,
 <TarInfo 'meson_python-0.18.0/.cirrus.yml' at 0x76fa1e53aec0>,
 <TarInfo 'meson-python-undefined/.github' at 0x76fa1e53b040>,
 <TarInfo 'meson-python-undefined/.github/workflows' at 0x76fa1e53b100>,
 <TarInfo 'meson_python-0.18.0/.github/workflows/docs.yml' at 0x76fa1e53b280>,
 <TarInfo 'meson_python-0.18.0/.github/workflows/tests.yml' at 0x76fa1e53b1c0>,
 <TarInfo 'meson_python-0.18.0/.gitignore' at 0x76fa1e53b400>,
 <TarInfo 'meson_python-0.18.0/.mailmap' at 0x76fa1e53b340>,
 <TarInfo 'meson_python-0.18.0/.pre-commit-config.yaml' at 0x76fa1e53b4c0>,
 <TarInfo 'meson_python-0.18.0/.readthedocs.yaml' at 0x76fa1e53b580>,
 <TarInfo 'meson_python-0.18.0/CHANGELOG.rst' at 0x76fa1e53b700>,
 <TarInfo 'meson_python-0.18.0/LICENSE' at 0x76fa1e53b7c0>,
 <TarInfo 'meson-python-undefined/LICENSES' at 0x76fa1e53b640>,
 <TarInfo 'meson_python-0.18.0/LICENSES/MIT.txt' at 0x76fa1e53b880>,
 <TarInfo 'meson_python-0.18.0/README.rst' at 0x76fa1e53ba00>,
 <TarInfo 'meson_python-0.18.0/RELEASE.rst' at 0x76fa1e53b940>,
 <TarInfo 'meson-python-undefined/ci' at 0x76fa1e53bac0>,
 <TarInfo 'meson_python-0.18.0/ci/alpine-3.docker' at 0x76fa1e53bc40>,
 <TarInfo 'meson_python-0.18.0/ci/archlinux.docker' at 0x76fa1e53bdc0>,
 <TarInfo 'meson_python-0.18.0/ci/debian-11.docker' at 0x76fa1e53bd00>,
 <TarInfo 'meson_python-0.18.0/ci/debian-12.docker' at 0x76fa1e53be80>,
 <TarInfo 'meson_python-0.18.0/ci/debian-unstable.docker' at 0x76fa1e53bf40>,
 <TarInfo 'meson_python-0.18.0/ci/fedora-41.docker' at 0x76fa1e3f0100>,
 <TarInfo 'meson_python-0.18.0/ci/manylinux.docker' at 0x76fa1e3f0040>,
 <TarInfo 'meson_python-0.18.0/ci/miniconda.docker' at 0x76fa1e3f01c0>,
 <TarInfo 'meson_python-0.18.0/ci/opensuse-15.docker' at 0x76fa1e53bb80>,
 <TarInfo 'meson_python-0.18.0/codecov.yml' at 0x76fa1e3f0340>,
 <TarInfo 'meson-python-undefined/docs' at 0x76fa1e3f0280>,
 <TarInfo 'meson_python-0.18.0/docs/about.rst' at 0x76fa1e3f0400>,
 <TarInfo 'meson-python-undefined/docs/changelog.rst' at 0x76fa1e3f0580>]

In [4]: target
Out[4]: 'meson-python-undefined/CHANGELOG.rst'

In [5]: name
Out[5]: 'meson-python-undefined/docs/changelog.rst

The problem is that the list of meson_dist members being iterated over is being modified during iteration, by:

                    # Rewrite the path to match the sdist distribution name.
                    stem = member.name.split('/', 1)[1]
                    member.name = '/'.join((dist_name, stem))

That's probably also why the added test in this PR passed: whether there's a problem or not depends on iteration order.

@rgommers
Copy link
Contributor

rgommers commented May 3, 2025

Okay, problem one is solved by inserting member = copy.copy(member).

Problem two is these warnings remaining:

Created /home/rgommers/code/meson-python/.mesonpy-jxzty9ou/meson-dist/meson-python-undefined.tar.gz
meson-python: warning: symbolic link with absolute path target, pointing outside the archive, or dangling ignored: meson-python-undefined/tests/packages/symlinks/baz.py
meson-python: warning: symbolic link with absolute path target, pointing outside the archive, or dangling ignored: meson-python-undefined/tests/packages/symlinks/qux.py

That's very minor though - and it doesn't really matter if we put them back or not as regular files, because it doesn't affect the test suite. The one reason to add a dist script to include them in our sdist is to silence the warnings. Worth doing still though, because warnings like that are distracting.

ntBre added a commit to astral-sh/ruff that referenced this pull request May 9, 2025
Summary
--

This should resolve the formatter ecosystem errors we've been seeing lately.
mesonbuild/meson-python#728 added the links, which I
think are intentionally broken for testing purposes.

Test Plan
--

Ecosystem check on this PR
ntBre added a commit to astral-sh/ruff that referenced this pull request May 9, 2025
Summary
--

This should resolve the formatter ecosystem errors we've been seeing
lately. mesonbuild/meson-python#728 added the
links, which I think are intentionally broken for testing purposes.

Test Plan
--

Ecosystem check on this PR
QuLogic added a commit to QuLogic/matplotlib that referenced this pull request May 10, 2025
Version 0.18 should restore handling of symlinks:
mesonbuild/meson-python#728
oscargus pushed a commit to matplotlib/matplotlib that referenced this pull request May 10, 2025
Version 0.18 should restore handling of symlinks:
mesonbuild/meson-python#728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants