Skip to content

gh-132933: Allow zipapp target to overwrite filtered source file #132934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Jolmberg
Copy link

@Jolmberg Jolmberg commented Apr 25, 2025

@bedevere-app
Copy link

bedevere-app bot commented Apr 25, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@picnixz
Copy link
Member

picnixz commented Apr 25, 2025

I wrongly said "no NEWS" but since the fix has been backported to 3.13, we need a NEWS entry as 3.13 is a stable release (and because we already shipped a release of 3.14; actually everything that changes from something that has been shipped in another release should have a NEWS entry when possible)

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should be supporting the approach of putting the target in the same directory as the sources are coming from. If you can get the logic to work correctly and cleanly, I'll consider adding it, as I don't want to arbitrarily break people's workflows. But please understand, that there's a reason the existing check is simplistic, and it's clearly stated in the comments - it's to catch common errors, not to make the approach robust.

Lib/zipapp.py Outdated
raise ZipAppError(
f"The target archive {target} overwrites one of the source files.")
if filter:
arcname = target.relative_to(source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a valid check. source and target could be on different drives (on Windows), in which case relative_to will fail with a ValueError.

The logic here seems backwards to me. Rather than trying to replicate the filtering done in the archive writing loop, we should be calculating files_to_add more precisely, based on the filter. That wouldn't fix the problem of target being on a different drive, though - there's a fundamental problem with the whole idea here, as filter is defined to work on the arcname of the source file, and target may not even have a defined arcname, because logically it isn't a source file.

Ultimately, the correct answer here is what I've said elsewhere - don't put your output file in the directory you're creating the zipapp from. That's a much more straightforward approach than trying to exclude it using a filter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time with this! You are right of course, that check is wrong and can fail even within a single drive I believe. I've rewritten the code to filter files_to_add right away and then just use your original target check.

I can appreciate your sentiment about not writing the output file to the source directory, but I would prefer not to mess with my build system and I still think this change is for the better. I use the filter as a whitelist, precisely because I know that I can't trust the source directory to always be clean. I think it's pretty risk free™.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, thinking some more about it, I'm not sure I see how the relative_to check could fail. It's only performed after we've confirmed that target is equal to one of the paths inside source. But there could be intricacies to the Path class that I'm missing, and I think the new code is a cleaner way of doing this either way.

@bedevere-app
Copy link

bedevere-app bot commented Apr 25, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@bedevere-app
Copy link

bedevere-app bot commented Apr 25, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

files_to_add = {f: f2 for f in files_to_add
if filter(f2 := f.relative_to(source))}
else:
files_to_add = {f: f.relative_to(source) for f in files_to_add}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comprehension doesn't add to readability, IMO. I suggest something more like:

files_to_add = []
for file in sorted(source.rglob("*")):
    relative_name = file.relative_to(source)
    if filter is None or filter(relative_name):
        files_to_add.append((file, relative_name))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do away with the comprehension, but building files_to_add as a list of tuples makes it more difficult to search for target in it. A simple target in files_to_add will no longer work.
One option is to again use a dictionary for files_to_add. Another is to do the check for target as part of this loop, like this:

files_to_add = []
for file in sorted(source.rglob("*")):
    relative_name = file.relative_to(source)
    if filter is None or filter(relative_name):
        if file == target:
            raise ZipAppError(...)
        files_to_add.append((file, relative_name))

I have no strong opinion. Doing the target check separately as target in files_to_add is more idiomatic but doing it as part of the loop (as above) is probably a tiny bit more efficient.

@@ -0,0 +1 @@
The zipapp module now applies the provided filter to the list of files to be added before determining whether the target file overwrites a source file. This allows the target file to be written inside the source directory provided it is excluded by the filter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree strongly with this news item, as it suggests that writing the target into the source directory is intended usage.

Maybe

The zipapp module now applies the filter when creating the list of files to add, rather than waiting until the file is being added to the archive

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I wanted to get a bit of rationale in there, but I agree that it could come off as a recommendation, and it got long-winded too. I'll change it.

@@ -154,15 +159,14 @@ def create_archive(source, target=None, interpreter=None, main=None,
raise ZipAppError(
f"The target archive {target} overwrites one of the source files.")


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintended blank line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will fix!

@@ -113,6 +113,18 @@ def test_target_overwrites_source_file(self):
with self.assertRaises(zipapp.ZipAppError):
zipapp.create_archive(source, target)

def test_target_overwrites_filtered_source_file(self):
# The target cannot be one of the files to add.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The target cannot be one of the files to add.
# If there's a filter that excludes the target,
# the overwrite check shouldn't trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants