Fix sdist directory name extraction — rstrip misused as suffix removal#10741
Fix sdist directory name extraction — rstrip misused as suffix removal#10741bysiber wants to merge 1 commit intopython-poetry:mainfrom
Conversation
Reviewer's GuideAdjusts sdist directory name derivation to strip known archive suffixes correctly instead of misusing rstrip, and adds documentation describing a separate EnvCommandError vs CalledProcessError issue and its fix intent. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The sdist stem extraction would be more robust and self-documenting if it used
Path.suffixesorPath.with_suffixlogic rather than a hard-coded list of extensions, so you don’t need to keep the list in sync with future supported archive types. - The new
pr_body_1.mdfile looks like a temporary PR description artifact and probably shouldn’t be committed to the repository; consider removing it from the change set.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The sdist stem extraction would be more robust and self-documenting if it used `Path.suffixes` or `Path.with_suffix` logic rather than a hard-coded list of extensions, so you don’t need to keep the list in sync with future supported archive types.
- The new `pr_body_1.md` file looks like a temporary PR description artifact and probably shouldn’t be committed to the repository; consider removing it from the change set.
## Individual Comments
### Comment 1
<location> `src/poetry/installation/chef.py:113-114` </location>
<code_context>
- sdist_dir = archive_dir / archive.name.rstrip(suffix)
+ # Remove known archive suffixes properly — rstrip treats
+ # its argument as a character set, not a suffix string.
+ stem = archive.name
+ for ext in (".tar.gz", ".tar.bz2", ".tar.xz", ".tar", ".zip"):
+ if stem.endswith(ext):
+ stem = stem[: -len(ext)]
</code_context>
<issue_to_address>
**suggestion:** Consider handling additional common archive suffixes or deriving them more generically.
This implementation only accounts for a small, fixed set of suffixes and misses common variants like `.tgz` or `.tbz2`. You could either expand the list of handled extensions or use `Path(archive.name).suffixes` to strip multi-part suffixes more generically (e.g., remove all trailing suffixes when the last one is a known compression type) so the logic is more robust to other sdist archive names.
Suggested implementation:
```python
if len(elements) == 1 and elements[0].is_dir():
sdist_dir = elements[0]
else:
# Derive the stem by stripping common (possibly multi-part) archive
# and compression suffixes more generically. This handles variants
# like .tar.gz, .tar.bz2, .tgz, .tbz2, etc.
stem = archive.name
suffixes = Path(stem).suffixes
if suffixes:
compression_suffixes = {".gz", ".bz2", ".xz", ".zip"}
archive_suffixes = {".tar", ".zip", ".tgz", ".tbz2"}
# Strip trailing compression suffixes, and a preceding .tar if present
while suffixes and suffixes[-1] in compression_suffixes:
last = suffixes.pop()
stem = stem[: -len(last)]
if suffixes and suffixes[-1] == ".tar":
last_tar = suffixes.pop()
stem = stem[: -len(last_tar)]
# If we still have a trailing archive-like suffix, strip that too
if suffixes and suffixes[-1] in archive_suffixes:
last = suffixes.pop()
stem = stem[: -len(last)]
sdist_dir = archive_dir / stem
if not sdist_dir.is_dir():
sdist_dir = archive_dir
```
This change uses `Path(stem).suffixes`. If `Path` is not already imported in this module, add:
- `from pathlib import Path`
near the other imports at the top of `src/poetry/installation/chef.py`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| stem = archive.name | ||
| for ext in (".tar.gz", ".tar.bz2", ".tar.xz", ".tar", ".zip"): |
There was a problem hiding this comment.
suggestion: Consider handling additional common archive suffixes or deriving them more generically.
This implementation only accounts for a small, fixed set of suffixes and misses common variants like .tgz or .tbz2. You could either expand the list of handled extensions or use Path(archive.name).suffixes to strip multi-part suffixes more generically (e.g., remove all trailing suffixes when the last one is a known compression type) so the logic is more robust to other sdist archive names.
Suggested implementation:
if len(elements) == 1 and elements[0].is_dir():
sdist_dir = elements[0]
else:
# Derive the stem by stripping common (possibly multi-part) archive
# and compression suffixes more generically. This handles variants
# like .tar.gz, .tar.bz2, .tgz, .tbz2, etc.
stem = archive.name
suffixes = Path(stem).suffixes
if suffixes:
compression_suffixes = {".gz", ".bz2", ".xz", ".zip"}
archive_suffixes = {".tar", ".zip", ".tgz", ".tbz2"}
# Strip trailing compression suffixes, and a preceding .tar if present
while suffixes and suffixes[-1] in compression_suffixes:
last = suffixes.pop()
stem = stem[: -len(last)]
if suffixes and suffixes[-1] == ".tar":
last_tar = suffixes.pop()
stem = stem[: -len(last_tar)]
# If we still have a trailing archive-like suffix, strip that too
if suffixes and suffixes[-1] in archive_suffixes:
last = suffixes.pop()
stem = stem[: -len(last)]
sdist_dir = archive_dir / stem
if not sdist_dir.is_dir():
sdist_dir = archive_dirThis change uses Path(stem).suffixes. If Path is not already imported in this module, add:
from pathlib import Path
near the other imports at the top of src/poetry/installation/chef.py.
str.rstrip() treats its argument as a set of characters, not a
suffix string. For a .tar.gz archive, archive.suffix returns
'.gz', so rstrip('.gz') strips the characters {'.', 'g', 'z'}
from the right, giving 'package-1.0.tar' instead of 'package-1.0'.
For .zip files the character-stripping behavior can be even worse
— e.g. 'zipp-24.0.0.zip'.rstrip('.zip') over-strips because the
package name itself ends with characters in the strip set.
Replace rstrip with proper suffix matching against known archive
extensions.
346ddf2 to
4f10e54
Compare
radoering
left a comment
There was a problem hiding this comment.
Good catch. However, I think that there is a better solution:
We can use poetry.core.packages.utils.splitext to get the suffix and then use removesuffix().
Problem
_prepare_sdistusesarchive.name.rstrip(suffix)to strip the file extension and derive the expected directory name inside the extracted sdist. However,str.rstrip()treats its argument as a set of characters to strip, not as a suffix string.For
.tar.gzarchives:archive.suffix→".gz""package-1.0.tar.gz".rstrip(".gz")strips{'.', 'g', 'z'}→"package-1.0.tar"(leaves.tar)For
.ziparchives with names ending in strip-set characters:"zipp-24.0.0.zip".rstrip(".zip")strips{'.', 'z', 'i', 'p'}→ could over-strip the package name itselfThis means the fallback directory lookup (when the sdist extracts to multiple entries) always looks for the wrong name, fails the
is_dir()check, and falls through to usingarchive_dirdirectly.Fix
Replace
rstrip(suffix)with explicit suffix matching against known archive extensions (.tar.gz,.tar.bz2,.tar.xz,.tar,.zip).Summary by Sourcery
Bug Fixes: