Enable multi-arch package download and filtering in download_prerelease_packages.py#5028
Enable multi-arch package download and filtering in download_prerelease_packages.py#5028araravik-psd wants to merge 8 commits intomainfrom
Conversation
| "amd_torch_device", | ||
| "amd_torchvision_device", | ||
| "rocm_sdk_device", | ||
| } |
There was a problem hiding this comment.
drive-by: we may want a placeholder here for amd_torchaudio_device (and in a few other scripts/workflows) so we remember if pytorch/audio#4180 gets further along
There was a problem hiding this comment.
Updated with a placeholder
| ) | ||
|
|
||
| parser.add_argument( | ||
| "--list-packages-multi-arch", |
There was a problem hiding this comment.
| "--list-packages-multi-arch", | |
| "--list-multi-arch-packages", |
?
There was a problem hiding this comment.
Also updated the tests with the argument change
https://gist.github.com/araravik-psd/58a570bccd944fc8a5df7a96d6c6abe3
| "--multi-arch", | ||
| action=argparse.BooleanOptionalAction, | ||
| default=False, | ||
| help="--multi-arch requires prefix like v4/whl/", |
There was a problem hiding this comment.
This help is confusing since the option is boolean.
There was a problem hiding this comment.
Updated the help text here.
There was a problem hiding this comment.
We may want to rename this script?
There was a problem hiding this comment.
I suggest renaming to download_release_artifacts.py. As this is not only for prereleases.
There was a problem hiding this comment.
Well it's not artifacts, those are rather the tarballs we push to the artifact buckets. It is rather download_python_packages isn't it?
There was a problem hiding this comment.
yes that is more accurate, changing the script name to download_python_packages.py
| @@ -167,6 +189,64 @@ | |||
| } | |||
|
|
|||
|
|
|||
| def is_allowed_multi_arch_package(filename: str) -> bool: | |||
There was a problem hiding this comment.
Why is this distinction important here? Multi-arch will always be in a different index / suffic.
There was a problem hiding this comment.
The distinction is not for separating storage layouts, multi-arch already uses a dedicated prefix.
The filtering here is used to restrict downloads/listings to promotable package families only, while excluding unrelated artifacts that may exist in the same bucket/prefix.
In particular, multi-arch introduces additional dynamic package families (for example amd_torch_device_* and rocm_sdk_device_*) that are not part of the legacy single-arch allow-list, so the helper centralizes that filtering logic for the multi-arch flow.
I am mainly trying to keep multi-arch package selection independent from the legacy single-arch categorization logic and not overload categorize_package() with different semantics used. I hope this is good, if you would like me to consolidate all together I can do that too
There was a problem hiding this comment.
I think this could actually reuse quite some code from other scripts, see my comments in the upload script PR.
There was a problem hiding this comment.
Ok I will create an issue to reuse build_tools/github_actions/publish_rocm_to_release_buckets.py functionality in the next version of the script and update this PR with the issue.
Adds support for downloading and listing multi-arch packages from flat S3 layouts (e.g. v4/whl/) and introduces consistent filtering using both existing and multi-arch package allow lists.
Key Changes
Package filtering
Multi-arch arch filtering
multi-arch package listing
multi-arch package downloads
New functionality
CLI updates and new Variables added:
Added flags:
--multi-arch
--list-packages-multi-arch
Notes
Multi-arch flow assumes a flat prefix layout (no per-arch directories)
Single-arch flow remains unchanged and continues to use existing logic
Filtering is now consistent across listing and download flows
Testing
Listing functionality for both multi-arch and single arch are added below:
https://gist.github.com/araravik-psd/58a570bccd944fc8a5df7a96d6c6abe3
Downloading multi-arch packages
https://gist.github.com/araravik-psd/38682418ed118f8aa06d023b35d44911