Skip to content

ENH: add support for symbolic links in package repository #728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions mesonpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import argparse
import collections
import contextlib
import copy
import difflib
import functools
import importlib.machinery
Expand All @@ -23,6 +24,7 @@
import os
import pathlib
import platform
import posixpath
import re
import shutil
import subprocess
Expand Down Expand Up @@ -937,6 +939,33 @@

with tarfile.open(meson_dist_path, 'r:gz') as meson_dist, mesonpy._util.create_targz(sdist_path) as sdist:
for member in meson_dist.getmembers():
# Recursively resolve symbolic links. The source distribution
# archive format specification allows for symbolic links as
# long as the target path does not include a '..' component.
# This makes symbolic links support unusable in most cases,
# therefore include the symbolic link targets as regular files
# in all cases.
while member.issym():
name = member.name
target = posixpath.normpath(posixpath.join(posixpath.dirname(member.name), member.linkname))
try:
# This can be implemented using the .replace() method
# in Python 3.12 and later. The .replace() method was
# added as part of PEP 706 and back-ported to Python
# 3.9 and later in patch releases, thus it cannot be
# relied upon until the minimum supported Python
# version is 3.12.
member = copy.copy(meson_dist.getmember(target))
member.name = name
except KeyError:
warnings.warn(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR. I am curious - what would be the suggested approach to disable this warning?

In the original PR I saw nanoarrow discussed, so I just wanted to clarify the use case there. With nanoarrow being a multi-language project, the repo is laid out like:

c/
    <some_c_sources>
    meson.build
python/
    <some_python_sources>
    meson.build

So the point of the symlink in python/subprojects is to allow the python module to be built on the sources in the c directory. When the sdist is created, we are happy to not have any symlinked content included, as we expect the user to have the c sources installed on the system.

So in our case the warnings are unnecessary

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW that comment applies to really allow of the Arrow libraries (pyarrow, nanoarrow, adbc-driver-<driver_name>) as they are all multi-language projects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is just a warning that tells you that your symbolic links will not be included in the sdist, as the sdist format does not allow them. Would you prefer meson-python to drop the symlinks silently?

Anyhow, That source layout, unless there is a pyproject.toml in the root directory, like this

meson.build
pyproject.toml
c/
    <some_c_sources>
    meson.build
python/
    <some_python_sources>
    meson.build

would result in an invalid Python sdist when created with meson-python. Thus the fact that the warning is emitted in your case is completely irrelevant: you cannot use the result of the operation emitting a warning for anything useful, thus maybe do not perform the operation at all.

I think this way of organizing the project is problematic: it bundles the sources for the C and Python parts in the same repository, thus coupling them tightly, but at the same time wants to treat the Python part as a separate project which is loosely coupled to the C part at build time. I don't know how you came to choose this organization, but I have the impression that it creates way more problems that it solves.

What I would have done is to have the C and Python parts in two separate repositories. The Python part would bring in the C part either as a git submodule or as a Meson subproject or as a Meson subproject referenced as a git submodule.

But it is your project, so you can organize it as you like, and if you like mono-repos, you can have one. However, you are choosing to work against established best practices, thus do not expect tools to bend their workings to accommodate your choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW that comment applies to really allow of the Arrow libraries (pyarrow, nanoarrow, adbc-driver-<driver_name>) as they are all multi-language projects

If that the case, their repository layout cannot be packaged into a Python sdist as symbolic links are not supported by the sdist standard. Anyway, their repository structure does not work for creating a sdist with meson-python. Thus I don't see the problem: these project choose to make things complicated for themselves and will need to find an ad-hoc way to generate a valid sdist from their monorepo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to only publish wheels, and not sdists, which is no doubt a tempting proposition to much scientific code that has painful sdist-building workflows (esp. doesn't work well at all on Windows)

you cannot use the result of the operation emitting a warning for anything useful, thus maybe do not perform the operation at all.

... but in this case that's exactly what their plan is, right? Not to use it, but only have it exist as a convenience when already building inside the monorepo?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of nanoarrow, what we do is use a symlink for local development (so that a user can test C/python changes all at once), but for a sdist we end up using a custom dist script that replaces the symlink with a copy of the files needed to build a standalone Python library.

https://github.com/apache/arrow-nanoarrow/blob/4c1e484ca7d575250444e0a8eee5884e13489104/python/generate_dist.py

  • I personally think that people who use monorepos are often (though not always) the same type of people that don't even want to create sdists at all

While the majority of users I would think are fine with pre-packaged wheels, I would be hesitant to rely on that exclusively. Using PyArrow as an example, there are legitimate use cases where end users do not want all of the standard functionality the package bundles with its C/C++ extensions, so they end up building from source with their own build options. I know that the version of PyArrow provided in lambda containers turns off some build features that the standard wheel provides, because AWS Lambda has certain size restrictions and the default build is too large.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dist script for sdist generation sounds quite reasonable.

Re the repo layout: this is actually pretty common especially for older C and C++ projects with optional Python bindings. It'd be good to support that use case explicitly via a test package; we've encountered it multiple times by now (not suggesting it needs to be part of this PR).

Thinking about it more: I like the nanoarrow repo setup I think. It's probably easier to deal with a C++ library in .. as a subproject than it would be to use the top-level meson.build file as the entry point for C++ and Python packages at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR with nanoarrow's main branch - sdist and wheel builds work as expected, and no warnings are visible. So I'll go ahead with merging this PR. We can open a new issue to keep track of the use case and adding a test package and/or some docs (I'm interested in doing that, but no time for it right now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in gh-744

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rgommers !

'symbolic link with absolute path target, pointing outside the '
f'archive, or dangling ignored: {name}', stacklevel=1)
break
if member.isdir():
warnings.warn(

Check warning on line 966 in mesonpy/__init__.py

View check run for this annotation

Codecov / codecov/patch

mesonpy/__init__.py#L966

Added line #L966 was not covered by tests
f'symbolic link pointing to a directory ignored: {name}', stacklevel=1)

if member.isfile():
file = meson_dist.extractfile(member.name)

Expand Down Expand Up @@ -971,6 +1000,10 @@

sdist.addfile(member, file)

elif not member.isdir() and not member.issym():
warnings.warn(

Check warning on line 1004 in mesonpy/__init__.py

View check run for this annotation

Codecov / codecov/patch

mesonpy/__init__.py#L1004

Added line #L1004 was not covered by tests
f'special file in the source archive ignored: {member.name}', stacklevel=1)

# Add 'PKG-INFO'.
member = tarfile.TarInfo(f'{dist_name}/PKG-INFO')
member.uid = member.gid = 0
Expand Down
3 changes: 3 additions & 0 deletions tests/packages/symlinks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# SPDX-FileCopyrightText: 2025 The meson-python developers
#
# SPDX-License-Identifier: MIT
1 change: 1 addition & 0 deletions tests/packages/symlinks/baz.py
16 changes: 16 additions & 0 deletions tests/packages/symlinks/meson.build
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# SPDX-FileCopyrightText: 2025 The meson-python developers
#
# SPDX-License-Identifier: MIT

project('symlinks', version: '1.0.0')

py = import('python').find_installation()

py.install_sources(
'__init__.py',
'submodule/__init__.py',
'submodule/aaa.py',
'submodule/bbb.py',
subdir: 'symlinks',
preserve_path: true,
)
7 changes: 7 additions & 0 deletions tests/packages/symlinks/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# SPDX-FileCopyrightText: 2021 The meson-python developers
#
# SPDX-License-Identifier: MIT

[build-system]
build-backend = 'mesonpy'
requires = ['meson-python']
1 change: 1 addition & 0 deletions tests/packages/symlinks/qux.py
1 change: 1 addition & 0 deletions tests/packages/symlinks/submodule/__init__.py
6 changes: 6 additions & 0 deletions tests/packages/symlinks/submodule/aaa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# SPDX-FileCopyrightText: 2025 The meson-python developers
#
# SPDX-License-Identifier: MIT

def foo():
return 42
1 change: 1 addition & 0 deletions tests/packages/symlinks/submodule/bbb.py
19 changes: 19 additions & 0 deletions tests/test_sdist.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,3 +216,22 @@ def test_reproducible(package_pure, tmp_path):

assert sdist_path_a == sdist_path_b
assert tmp_path.joinpath('a', sdist_path_a).read_bytes() == tmp_path.joinpath('b', sdist_path_b).read_bytes()

@pytest.mark.filterwarnings('ignore:symbolic link')
def test_symlinks(tmp_path, sdist_symlinks):
with tarfile.open(sdist_symlinks, 'r:gz') as sdist:
names = {member.name for member in sdist.getmembers()}
mtimes = {member.mtime for member in sdist.getmembers()}

assert names == {
'symlinks-1.0.0/PKG-INFO',
'symlinks-1.0.0/meson.build',
'symlinks-1.0.0/pyproject.toml',
'symlinks-1.0.0/__init__.py',
'symlinks-1.0.0/submodule/__init__.py',
'symlinks-1.0.0/submodule/aaa.py',
'symlinks-1.0.0/submodule/bbb.py',
}

# All the archive members have a valid mtime.
assert 0 not in mtimes
Loading