Skip to content

Fix sdist directory name extraction — rstrip misused as suffix removal#10741

Open
bysiber wants to merge 1 commit intopython-poetry:mainfrom
bysiber:fix/sdist-suffix-removal
Open

Fix sdist directory name extraction — rstrip misused as suffix removal#10741
bysiber wants to merge 1 commit intopython-poetry:mainfrom
bysiber:fix/sdist-suffix-removal

Conversation

@bysiber
Copy link
Contributor

@bysiber bysiber commented Feb 20, 2026

Problem

_prepare_sdist uses archive.name.rstrip(suffix) to strip the file extension and derive the expected directory name inside the extracted sdist. However, str.rstrip() treats its argument as a set of characters to strip, not as a suffix string.

For .tar.gz archives:

  • archive.suffix".gz"
  • "package-1.0.tar.gz".rstrip(".gz") strips {'.', 'g', 'z'}"package-1.0.tar" (leaves .tar)

For .zip archives with names ending in strip-set characters:

  • "zipp-24.0.0.zip".rstrip(".zip") strips {'.', 'z', 'i', 'p'} → could over-strip the package name itself

This means the fallback directory lookup (when the sdist extracts to multiple entries) always looks for the wrong name, fails the is_dir() check, and falls through to using archive_dir directly.

Fix

Replace rstrip(suffix) with explicit suffix matching against known archive extensions (.tar.gz, .tar.bz2, .tar.xz, .tar, .zip).

Summary by Sourcery

Bug Fixes:

  • Correct sdist directory name extraction by removing known archive suffixes explicitly for .tar.* and .zip files instead of using rstrip(), preventing fallback to the wrong directory.

@sourcery-ai
Copy link

sourcery-ai bot commented Feb 20, 2026

Reviewer's Guide

Adjusts sdist directory name derivation to strip known archive suffixes correctly instead of misusing rstrip, and adds documentation describing a separate EnvCommandError vs CalledProcessError issue and its fix intent.

File-Level Changes

Change Details Files
Fix sdist extraction directory name derivation by correctly removing archive suffixes.
  • Replace use of str.rstrip(suffix) on archive.name with explicit suffix detection and slicing for common archive types
  • .tar.gz, .tar.bz2, .tar.xz, .tar, and .zip are now handled via an ordered endswith check to compute the stem
  • Fallback behavior remains: if the derived sdist directory does not exist, continue using archive_dir as before
src/poetry/installation/chef.py
Add a PR body markdown file documenting an EnvCommandError vs CalledProcessError bug and its intended fix.
  • Introduce a new pr_body_1.md file explaining that Executor._remove currently catches CalledProcessError even though run_pip re-raises EnvCommandError
  • Describe how this leads to dead code and unintended fatal failures when a dependency was already uninstalled externally
  • Document the proposed fix at a high level: catch EnvCommandError instead and remove the unused CalledProcessError import
pr_body_1.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The sdist stem extraction would be more robust and self-documenting if it used Path.suffixes or Path.with_suffix logic rather than a hard-coded list of extensions, so you don’t need to keep the list in sync with future supported archive types.
  • The new pr_body_1.md file looks like a temporary PR description artifact and probably shouldn’t be committed to the repository; consider removing it from the change set.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The sdist stem extraction would be more robust and self-documenting if it used `Path.suffixes` or `Path.with_suffix` logic rather than a hard-coded list of extensions, so you don’t need to keep the list in sync with future supported archive types.
- The new `pr_body_1.md` file looks like a temporary PR description artifact and probably shouldn’t be committed to the repository; consider removing it from the change set.

## Individual Comments

### Comment 1
<location> `src/poetry/installation/chef.py:113-114` </location>
<code_context>
-                sdist_dir = archive_dir / archive.name.rstrip(suffix)
+                # Remove known archive suffixes properly — rstrip treats
+                # its argument as a character set, not a suffix string.
+                stem = archive.name
+                for ext in (".tar.gz", ".tar.bz2", ".tar.xz", ".tar", ".zip"):
+                    if stem.endswith(ext):
+                        stem = stem[: -len(ext)]
</code_context>

<issue_to_address>
**suggestion:** Consider handling additional common archive suffixes or deriving them more generically.

This implementation only accounts for a small, fixed set of suffixes and misses common variants like `.tgz` or `.tbz2`. You could either expand the list of handled extensions or use `Path(archive.name).suffixes` to strip multi-part suffixes more generically (e.g., remove all trailing suffixes when the last one is a known compression type) so the logic is more robust to other sdist archive names.

Suggested implementation:

```python
            if len(elements) == 1 and elements[0].is_dir():
                sdist_dir = elements[0]
            else:
                # Derive the stem by stripping common (possibly multi-part) archive
                # and compression suffixes more generically. This handles variants
                # like .tar.gz, .tar.bz2, .tgz, .tbz2, etc.
                stem = archive.name
                suffixes = Path(stem).suffixes

                if suffixes:
                    compression_suffixes = {".gz", ".bz2", ".xz", ".zip"}
                    archive_suffixes = {".tar", ".zip", ".tgz", ".tbz2"}

                    # Strip trailing compression suffixes, and a preceding .tar if present
                    while suffixes and suffixes[-1] in compression_suffixes:
                        last = suffixes.pop()
                        stem = stem[: -len(last)]
                        if suffixes and suffixes[-1] == ".tar":
                            last_tar = suffixes.pop()
                            stem = stem[: -len(last_tar)]

                    # If we still have a trailing archive-like suffix, strip that too
                    if suffixes and suffixes[-1] in archive_suffixes:
                        last = suffixes.pop()
                        stem = stem[: -len(last)]

                sdist_dir = archive_dir / stem
                if not sdist_dir.is_dir():
                    sdist_dir = archive_dir

```

This change uses `Path(stem).suffixes`. If `Path` is not already imported in this module, add:

- `from pathlib import Path`

near the other imports at the top of `src/poetry/installation/chef.py`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +113 to +114
stem = archive.name
for ext in (".tar.gz", ".tar.bz2", ".tar.xz", ".tar", ".zip"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider handling additional common archive suffixes or deriving them more generically.

This implementation only accounts for a small, fixed set of suffixes and misses common variants like .tgz or .tbz2. You could either expand the list of handled extensions or use Path(archive.name).suffixes to strip multi-part suffixes more generically (e.g., remove all trailing suffixes when the last one is a known compression type) so the logic is more robust to other sdist archive names.

Suggested implementation:

            if len(elements) == 1 and elements[0].is_dir():
                sdist_dir = elements[0]
            else:
                # Derive the stem by stripping common (possibly multi-part) archive
                # and compression suffixes more generically. This handles variants
                # like .tar.gz, .tar.bz2, .tgz, .tbz2, etc.
                stem = archive.name
                suffixes = Path(stem).suffixes

                if suffixes:
                    compression_suffixes = {".gz", ".bz2", ".xz", ".zip"}
                    archive_suffixes = {".tar", ".zip", ".tgz", ".tbz2"}

                    # Strip trailing compression suffixes, and a preceding .tar if present
                    while suffixes and suffixes[-1] in compression_suffixes:
                        last = suffixes.pop()
                        stem = stem[: -len(last)]
                        if suffixes and suffixes[-1] == ".tar":
                            last_tar = suffixes.pop()
                            stem = stem[: -len(last_tar)]

                    # If we still have a trailing archive-like suffix, strip that too
                    if suffixes and suffixes[-1] in archive_suffixes:
                        last = suffixes.pop()
                        stem = stem[: -len(last)]

                sdist_dir = archive_dir / stem
                if not sdist_dir.is_dir():
                    sdist_dir = archive_dir

This change uses Path(stem).suffixes. If Path is not already imported in this module, add:

  • from pathlib import Path

near the other imports at the top of src/poetry/installation/chef.py.

str.rstrip() treats its argument as a set of characters, not a
suffix string.  For a .tar.gz archive, archive.suffix returns
'.gz', so rstrip('.gz') strips the characters {'.', 'g', 'z'}
from the right, giving 'package-1.0.tar' instead of 'package-1.0'.

For .zip files the character-stripping behavior can be even worse
— e.g. 'zipp-24.0.0.zip'.rstrip('.zip') over-strips because the
package name itself ends with characters in the strip set.

Replace rstrip with proper suffix matching against known archive
extensions.
@bysiber bysiber force-pushed the fix/sdist-suffix-removal branch from 346ddf2 to 4f10e54 Compare February 20, 2026 06:55
Copy link
Member

@radoering radoering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. However, I think that there is a better solution:
We can use poetry.core.packages.utils.splitext to get the suffix and then use removesuffix().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants