[AMD] Clean patch and move script to amd/#973
[AMD] Clean patch and move script to amd/#973zyzshishui wants to merge 4 commits intoradixark:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the ROCm Dockerfile for MI350/MI355, switching to a new base image and updating the build process for core components like AITER and TransformerEngine. It also updates the Qwen3-4B training script and removes several obsolete run scripts. The reviewer suggests optimizing the Dockerfile by combining apt-get commands and Python dependency installations to reduce image layers, and recommends using pattern matching instead of hardcoded line numbers in sed commands to improve robustness.
| RUN apt update | ||
| # Install build tools and diagnostics utilities. | ||
| RUN apt install -y build-essential cmake dnsutils ethtool git nvtop rsync |
There was a problem hiding this comment.
It is a best practice to combine apt-get update and apt-get install into a single RUN command. This reduces the number of image layers and ensures that the package list is fresh when installing. Additionally, using apt-get is preferred for non-interactive scripts, and cleaning up /var/lib/apt/lists/* helps keep the image size small.
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
dnsutils \
ethtool \
git \
nvtop \
rsync \
&& rm -rf /var/lib/apt/lists/*
| git submodule sync --recursive && \ | ||
| git submodule update --init --recursive && \ | ||
| # Temporary fixes for the current ROCm 7.2 image/toolchain combination. | ||
| sed -i '459 s/if.*:/if False:/' aiter/ops/triton/attention/pa_mqa_logits.py && \ |
| RUN rm -rf /usr/lib/python3/dist-packages/jwt /usr/lib/python3/dist-packages/PyJWT* && \ | ||
| pip install -r /tmp/requirements.txt | ||
|
|
||
| # Pin numpy 1.x for Megatron compatibility. | ||
| RUN pip install "numpy<2" |
There was a problem hiding this comment.
The numpy installation can be combined with the requirements.txt installation to reduce the number of layers and ensure all dependencies are resolved in a single step.
RUN rm -rf /usr/lib/python3/dist-packages/jwt /usr/lib/python3/dist-packages/PyJWT* && \
pip install -r /tmp/requirements.txt "numpy<2"
New Dockerfile WIP