Skip to content

[AMD] Clean patch and move script to amd/#973

Open
zyzshishui wants to merge 4 commits intoradixark:mainfrom
zyzshishui:upgrade
Open

[AMD] Clean patch and move script to amd/#973
zyzshishui wants to merge 4 commits intoradixark:mainfrom
zyzshishui:upgrade

Conversation

@zyzshishui
Copy link
Copy Markdown
Contributor

New Dockerfile WIP

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the ROCm Dockerfile for MI350/MI355, switching to a new base image and updating the build process for core components like AITER and TransformerEngine. It also updates the Qwen3-4B training script and removes several obsolete run scripts. The reviewer suggests optimizing the Dockerfile by combining apt-get commands and Python dependency installations to reduce image layers, and recommends using pattern matching instead of hardcoded line numbers in sed commands to improve robustness.

Comment on lines +59 to +61
RUN apt update
# Install build tools and diagnostics utilities.
RUN apt install -y build-essential cmake dnsutils ethtool git nvtop rsync
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is a best practice to combine apt-get update and apt-get install into a single RUN command. This reduces the number of image layers and ensures that the package list is fresh when installing. Additionally, using apt-get is preferred for non-interactive scripts, and cleaning up /var/lib/apt/lists/* helps keep the image size small.

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    cmake \
    dnsutils \
    ethtool \
    git \
    nvtop \
    rsync \
    && rm -rf /var/lib/apt/lists/*

git submodule sync --recursive && \
git submodule update --init --recursive && \
# Temporary fixes for the current ROCm 7.2 image/toolchain combination.
sed -i '459 s/if.*:/if False:/' aiter/ops/triton/attention/pa_mqa_logits.py && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a hardcoded line number (459) in sed is fragile. If the upstream file aiter/ops/triton/attention/pa_mqa_logits.py changes, this command might modify the wrong line or fail. It is safer to use a pattern match to target the specific code block.

Comment on lines +135 to +139
RUN rm -rf /usr/lib/python3/dist-packages/jwt /usr/lib/python3/dist-packages/PyJWT* && \
pip install -r /tmp/requirements.txt

# Pin numpy 1.x for Megatron compatibility.
RUN pip install "numpy<2"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The numpy installation can be combined with the requirements.txt installation to reduce the number of layers and ensure all dependencies are resolved in a single step.

RUN rm -rf /usr/lib/python3/dist-packages/jwt /usr/lib/python3/dist-packages/PyJWT* && \
    pip install -r /tmp/requirements.txt "numpy<2"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant