-
Notifications
You must be signed in to change notification settings - Fork 77
ENH: add support for symbolic links in package repository #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ef8925e
to
9ac7f0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnicolodi, this looks pretty good to me. The one thing I'd suggest is to emit warnings for symlinks that get ignored, rather than silently skipping over them. Something like:
WARNING: symlink pointing outside package ignored: symlinks-1.0.0/baz.py
WDYT?
I think the warning is a good idea. Probably it is a good idea to warn also for other archive members that are not handled: anything that is not a file and symbolic links to directories even if withing the archive. The reason for not adding the warning was that the latest release that (accidentally) supported symlinks didn't warn either. I'll add the warnings. |
de51b39
to
4365161
Compare
Warnings added. |
Symbolic links are included as regular files in the sdist archive. Only symbolic links pointing to files withing the archive are supported.
4365161
to
8a59030
Compare
member = copy.copy(meson_dist.getmember(target)) | ||
member.name = name | ||
except KeyError: | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR. I am curious - what would be the suggested approach to disable this warning?
In the original PR I saw nanoarrow discussed, so I just wanted to clarify the use case there. With nanoarrow being a multi-language project, the repo is laid out like:
c/
<some_c_sources>
meson.build
python/
<some_python_sources>
meson.build
So the point of the symlink in python/subprojects is to allow the python module to be built on the sources in the c
directory. When the sdist is created, we are happy to not have any symlinked content included, as we expect the user to have the c sources installed on the system.
So in our case the warnings are unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW that comment applies to really allow of the Arrow libraries (pyarrow, nanoarrow, adbc-driver-<driver_name>) as they are all multi-language projects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning is just a warning that tells you that your symbolic links will not be included in the sdist, as the sdist format does not allow them. Would you prefer meson-python to drop the symlinks silently?
Anyhow, That source layout, unless there is a pyproject.toml
in the root directory, like this
meson.build
pyproject.toml
c/
<some_c_sources>
meson.build
python/
<some_python_sources>
meson.build
would result in an invalid Python sdist when created with meson-python. Thus the fact that the warning is emitted in your case is completely irrelevant: you cannot use the result of the operation emitting a warning for anything useful, thus maybe do not perform the operation at all.
I think this way of organizing the project is problematic: it bundles the sources for the C and Python parts in the same repository, thus coupling them tightly, but at the same time wants to treat the Python part as a separate project which is loosely coupled to the C part at build time. I don't know how you came to choose this organization, but I have the impression that it creates way more problems that it solves.
What I would have done is to have the C and Python parts in two separate repositories. The Python part would bring in the C part either as a git submodule or as a Meson subproject or as a Meson subproject referenced as a git submodule.
But it is your project, so you can organize it as you like, and if you like mono-repos, you can have one. However, you are choosing to work against established best practices, thus do not expect tools to bend their workings to accommodate your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW that comment applies to really allow of the Arrow libraries (pyarrow, nanoarrow, adbc-driver-<driver_name>) as they are all multi-language projects
If that the case, their repository layout cannot be packaged into a Python sdist as symbolic links are not supported by the sdist standard. Anyway, their repository structure does not work for creating a sdist with meson-python. Thus I don't see the problem: these project choose to make things complicated for themselves and will need to find an ad-hoc way to generate a valid sdist from their monorepo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible to only publish wheels, and not sdists, which is no doubt a tempting proposition to much scientific code that has painful sdist-building workflows (esp. doesn't work well at all on Windows)
you cannot use the result of the operation emitting a warning for anything useful, thus maybe do not perform the operation at all.
... but in this case that's exactly what their plan is, right? Not to use it, but only have it exist as a convenience when already building inside the monorepo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of nanoarrow, what we do is use a symlink for local development (so that a user can test C/python changes all at once), but for a sdist we end up using a custom dist script that replaces the symlink with a copy of the files needed to build a standalone Python library.
- I personally think that people who use monorepos are often (though not always) the same type of people that don't even want to create sdists at all
While the majority of users I would think are fine with pre-packaged wheels, I would be hesitant to rely on that exclusively. Using PyArrow as an example, there are legitimate use cases where end users do not want all of the standard functionality the package bundles with its C/C++ extensions, so they end up building from source with their own build options. I know that the version of PyArrow provided in lambda containers turns off some build features that the standard wheel provides, because AWS Lambda has certain size restrictions and the default build is too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dist script for sdist generation sounds quite reasonable.
Re the repo layout: this is actually pretty common especially for older C and C++ projects with optional Python bindings. It'd be good to support that use case explicitly via a test package; we've encountered it multiple times by now (not suggesting it needs to be part of this PR).
Thinking about it more: I like the nanoarrow
repo setup I think. It's probably easier to deal with a C++ library in ..
as a subproject than it would be to use the top-level meson.build
file as the entry point for C++ and Python packages at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this PR with nanoarrow
's main
branch - sdist and wheel builds work as expected, and no warnings are visible. So I'll go ahead with merging this PR. We can open a new issue to keep track of the use case and adding a test package and/or some docs (I'm interested in doing that, but no time for it right now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in gh-744
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rgommers !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnicolodi, this looks good to me so time to give it a go!
Going through the release process turned up an issue - our own sdist (which I didn't test before🤦🏼) has problems:
I haven't uploaded the release nor pushed the tag yet. It'd be good to resolve this first, I'll have a look now. The 0.17.1 sdist on PyPI is also missing the |
The problem seems to be related to version handling, which messes with the TarInfo names:
The problem is that the list of # Rewrite the path to match the sdist distribution name.
stem = member.name.split('/', 1)[1]
member.name = '/'.join((dist_name, stem)) That's probably also why the added test in this PR passed: whether there's a problem or not depends on iteration order. |
Okay, problem one is solved by inserting Problem two is these warnings remaining:
That's very minor though - and it doesn't really matter if we put them back or not as regular files, because it doesn't affect the test suite. The one reason to add a dist script to include them in our sdist is to silence the warnings. Worth doing still though, because warnings like that are distracting. |
Summary -- This should resolve the formatter ecosystem errors we've been seeing lately. mesonbuild/meson-python#728 added the links, which I think are intentionally broken for testing purposes. Test Plan -- Ecosystem check on this PR
Summary -- This should resolve the formatter ecosystem errors we've been seeing lately. mesonbuild/meson-python#728 added the links, which I think are intentionally broken for testing purposes. Test Plan -- Ecosystem check on this PR
Version 0.18 should restore handling of symlinks: mesonbuild/meson-python#728
Version 0.18 should restore handling of symlinks: mesonbuild/meson-python#728
Symbolic links are included as regular files in the sdist archive. Only symbolic links pointing to files withing the archive are supported. Replaces #713.