Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaudi RHOAI notebook image 1.19.0 update #589

Merged
merged 3 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions enterprise/redhat/openshift-ai/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@ Intel® Gaudi AI Software Tools for OpenShift AI(RedHat OpenShift Data Science/R

| Notebook Container Name | Tools | Image Name |
| -----------------------------| ------------- | ------------- |
| Intel Gaudi Notebook Container 1.17.0-495 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.0-495-rhel-9.2`](registry.connect.redhat.com/intel/gaudi-notebooks@sha256:a62baf968caa7dd23b7f4cdcddc26e109d894f1436e247b4ea1e2fb4a5c94d54) |
| Intel Gaudi Notebook Container 1.17.1-40 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.1-40-rhel-9.2`](registry.connect.redhat.com/intel/gaudi-notebooks@sha256:00ca535956b7fcdd91e71bc4a3cd4493ddcaceea9b8d7bb95a7edc0e1cb0bac4) |
| Intel Gaudi Notebook Container 1.18.0-524 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.18.0-524-rhel-9.2`](registry.connect.redhat.com/intel/gaudi-notebooks@sha256:142b11253e5708ff9744c895868b2adda2f6f01c40127b71f1aca3d7a6e6bc29) |
| Intel Gaudi Notebook Container 1.19.0-561 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.19.0-561-rhel-9.4`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=676b2fd90e7869ebe618b78b&gti-tabs=registry-tokens), [`registry.connect.redhat.com/intel/gaudi-notebooks:1.19.0-561-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=676b32a224dabcac4cf84cb3&gti-tabs=registry-tokens) |
| Intel Gaudi Notebook Container 1.18.0-524 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.18.0-524-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=67181ccca33f4e501721789f&gti-tabs=registry-tokens) |
| Intel Gaudi Notebook Container 1.17.1-40 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.1-40-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=66fc3bf38081186cfc015a0d&gti-tabs=registry-tokens) |
| Intel Gaudi Notebook Container 1.17.0-495 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.0-495-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=66e0ac59261d52855750518c&gti-tabs=registry-tokens) |

## Run Gaudi Notebook Containers

Expand Down
6 changes: 4 additions & 2 deletions enterprise/redhat/openshift-ai/gaudi/crd-sample.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ apiVersion: aitools.intel/v1
kind: GaudiAIToolsContainer
metadata:
labels:
gaudi-software-version: '1.17.1-40'
gaudi-software-version: '1.19.0-561'
name: intel-gaudi
spec:
nameOverride: ""
Expand All @@ -25,6 +25,8 @@ spec:
registry: registry.connect.redhat.com
repo: intel/gaudi-notebooks
tags:
- gaudi_software: "1.17.1-40"
- gaudi_software: "1.19.0-561"
rhel_os: "9.2"
- gaudi_software: "1.19.0-561"
rhel_os: "9.4"
namespace: redhat-ods-applications
12 changes: 4 additions & 8 deletions enterprise/redhat/openshift-ai/gaudi/docker/Dockerfile.rhel9.2
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ RUN mkdir -p /licenses && \
ENV PYTHON_VERSION=3.10
COPY install-python310.sh .
RUN ./install-python310.sh rhel9.2 && rm install-python310.sh
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/python.conf && ldconfig
ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

COPY install_efa.sh .
Expand Down Expand Up @@ -194,17 +195,11 @@ RUN dnf install --allowerasing -y \
gperftools-devel && \
dnf clean all && rm -rf /var/cache/yum

RUN dnf config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo -y && \
dnf install --allowerasing -y intel-mkl-64bit-2020.4-912 && \
dnf clean all && rm -rf /var/cache/yum

# Set LD_PRELOAD after all required installations to
# avoid warnings during docker creation
ENV LD_PRELOAD=/lib64/libtcmalloc.so.4
ENV TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD=7516192768

RUN rm -rf /tmp/*

USER 1001

COPY --chown=1001:0 install_packages.sh .
Expand Down Expand Up @@ -238,9 +233,10 @@ RUN python -m pip install -r requirements.txt && \
RUN cd ${APP_ROOT}/ && \
git clone https://github.com/HabanaAI/vllm-fork.git && \
cd vllm-fork && \
VLLM_TARGET_DEVICE=hpu pip install -e .
git checkout habana_main && \
pip install -r requirements-hpu.txt && \
VLLM_TARGET_DEVICE=hpu python setup.py develop

WORKDIR ${APP_ROOT}/src
ENV NOTEBOOK_SAMPLES_LINK="https://raw.githubusercontent.com/intel/ai-containers/refs/heads/main/enterprise/redhat/openshift-ai/gaudi/demo/oneapi-sample.ipynb"

ENTRYPOINT ["bash", "-c", "/opt/app-root/builder/run"]
70 changes: 27 additions & 43 deletions enterprise/redhat/openshift-ai/gaudi/docker/Dockerfile.rhel9.4
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,15 @@ RUN echo "[CRB]" > /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-Official-SHA256" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgcheck=1" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo

RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf clean all && rm -rf /var/cache/yum

RUN dnf install -y \
python3-dnf-plugin-versionlock && \
dnf versionlock add redhat-release* && \
dnf clean all

RUN dnf update -y && dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf clean all

RUN dnf update -y && dnf install -y \
clang \
cmake3 \
cpp \
Expand All @@ -49,8 +54,8 @@ RUN dnf install -y \
lsof \
python3-devel \
openssh-clients \
openssl-1:3.0.7-28.el9_4 \
openssl-devel-1:3.0.7-28.el9_4 \
openssl \
openssl-devel \
libjpeg-devel \
openssh-server \
lsb_release \
Expand All @@ -66,24 +71,15 @@ RUN dnf install -y \
python3.11-pip \
python3.11-devel \
python3.11-rpm \
ffmpeg-free \
python3-dnf-plugin-versionlock && \
ffmpeg-free && \
# update pkgs (except OS version) for resolving potentials CVEs
dnf versionlock add redhat-release* openssl* libcurl-minimal curl-minimal ima-evm-utils python3-rpm rpm* && \
dnf update -y && \
dnf versionlock add python3-rpm rpm* && \
dnf clean all && rm -rf /var/cache/yum && \
rm -f /etc/ssh/ssh_host_*_key*

RUN mkdir -p /licenses && \
wget -O /licenses/LICENSE https://raw.githubusercontent.com/intel/ai-containers/main/LICENSE

RUN alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2 && \
alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 && \
alternatives --set python3 /usr/bin/python3.11 && \
alternatives --install /usr/bin/pip3 pip3 /usr/bin/pip3.11 2 && \
alternatives --install /usr/bin/pip3 pip3 /usr/bin/pip3.9 1 && \
alternatives --set pip3 /usr/bin/pip3.11

COPY install_efa.sh .
RUN ./install_efa.sh && rm install_efa.sh && rm -rf /etc/ld.so.conf.d/efa.conf /etc/profile.d/efa.sh

Expand Down Expand Up @@ -193,32 +189,18 @@ RUN echo "[CRB]" > /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-Official-SHA256" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgcheck=1" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo

RUN dnf install --allowerasing -y \
curl-7.76.1-29.el9_4.1 \
cairo-devel \
numactl-devel \
iproute \
which \
zlib-devel \
lapack-devel \
openblas-devel \
numactl \
gperftools-devel && \
dnf clean all && rm -rf /var/cache/yum

RUN echo "[oneAPI]" >> /etc/yum.repos.d/oneAPI.repo && \
echo "name=Intel® oneAPI repository" >> /etc/yum.repos.d/oneAPI.repo && \
echo "baseurl=https://yum.repos.intel.com/oneapi" >> /etc/yum.repos.d/oneAPI.repo && \
echo 'enabled=1' >> /etc/yum.repos.d/oneAPI.repo && \
echo "gpgcheck=1" >> /etc/yum.repos.d/oneAPI.repo && \
echo "repo_gpgcheck=1" >> /etc/yum.repos.d/oneAPI.repo && \
echo "gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB" >> /etc/yum.repos.d/oneAPI.repo

RUN dnf install --allowerasing -y intel-oneapi-mkl-2024.2.0 && \
RUN dnf update -y && dnf install --nodocs --setopt=install_weak_deps=false --allowerasing -y \
cairo-devel \
numactl-devel \
iproute \
which \
zlib-devel \
lapack-devel \
openblas-devel \
numactl \
gperftools-devel && \
dnf clean all && rm -rf /var/cache/yum

ENV LD_LIBRARY_PATH=/opt/intel/oneapi/mkl/2024.2/lib:${LD_LIBRARY_PATH}

RUN rm -rf /tmp/*

USER 1001
Expand Down Expand Up @@ -260,9 +242,11 @@ RUN python -m pip install -r requirements.txt && \
RUN cd ${APP_ROOT}/ && \
git clone https://github.com/HabanaAI/vllm-fork.git && \
cd vllm-fork && \
VLLM_TARGET_DEVICE=hpu pip install -e .
git checkout habana_main && \
pip install -r requirements-hpu.txt && \
VLLM_TARGET_DEVICE=hpu python setup.py develop

WORKDIR ${APP_ROOT}/src
ENV JUPYTER_PRELOAD_REPOS="https://github.com/IntelAI/oneAPI-samples"
ENV REPO_BRANCH="main"
ENV NOTEBOOK_SAMPLES_LINK="https://raw.githubusercontent.com/intel/ai-containers/refs/heads/main/enterprise/redhat/openshift-ai/gaudi/demo/oneapi-sample.ipynb"

ENTRYPOINT ["bash", "-c", "/opt/app-root/builder/run"]
22 changes: 11 additions & 11 deletions enterprise/redhat/openshift-ai/gaudi/docker/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ services:
https_proxy: ${https_proxy}
no_proxy: ""
ARTIFACTORY_URL: ${ARTIFACTORY_URL:-vault.habana.ai}
VERSION: ${VERSION:-1.18.0}
REVISION: ${REVISION:-524}
VERSION: ${VERSION:-1.19.0}
REVISION: ${REVISION:-561}
context: .
target: gaudi-base
dockerfile: Dockerfile.rhel${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-base-${VERSION:-1.18.0}-${REVISION:-524}-rhel-${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-base-${VERSION:-1.19.0}-${REVISION:-561}-rhel-${RHEL_OS:-9.2}
entrypoint: ["/bin/bash", "-c"]
command: >
"hl-smi"
Expand All @@ -37,17 +37,17 @@ services:
BASE_IMAGE: ${BASE_IMAGE:-registry.access.redhat.com/ubi9/ubi}
BASE_TAG: ${RHEL_OS:-9.2}
BASE_NAME: rhel${RHEL_OS:-rhel9.2}
PT_VERSION: ${PT_VERSION:-2.4.0}
PT_VERSION: ${PT_VERSION:-2.5.1}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ""
ARTIFACTORY_URL: ${ARTIFACTORY_URL:-vault.habana.ai}
VERSION: ${VERSION:-1.18.0}
REVISION: ${REVISION:-524}
VERSION: ${VERSION:-1.19.0}
REVISION: ${REVISION:-561}
context: .
target: gaudi-pytorch
dockerfile: Dockerfile.rhel${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-pytorch-${VERSION:-1.18.0}-${REVISION:-524}-rhel-${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-pytorch-${VERSION:-1.19.0}-${REVISION:-561}-rhel-${RHEL_OS:-9.2}
entrypoint: ["/bin/bash", "-c"]
command: >
"python -c 'import torch'"
Expand All @@ -57,17 +57,17 @@ services:
BASE_IMAGE: ${BASE_IMAGE:-registry.access.redhat.com/ubi9/ubi}
BASE_TAG: ${RHEL_OS:-9.2}
BASE_NAME: ${BASE_NAME:-rhel9.2}
PT_VERSION: ${PT_VERSION:-2.4.0}
PT_VERSION: ${PT_VERSION:-2.5.1}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ""
ARTIFACTORY_URL: ${ARTIFACTORY_URL:-vault.habana.ai}
VERSION: ${VERSION:-1.18.0}
REVISION: ${REVISION:-524}
VERSION: ${VERSION:-1.19.0}
REVISION: ${REVISION:-561}
context: .
target: gaudi-notebooks
dockerfile: Dockerfile.rhel${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-notebook-${VERSION:-1.18.0}-${REVISION:-524}-rhel-${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-notebook-${VERSION:-1.19.0}-${REVISION:-561}-rhel-${RHEL_OS:-9.2}
entrypoint: ["/bin/bash", "-c"]
command: >
"python -m jupyter notebook --version"
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@ case "${_BASE_NAME}" in
echo "Skip install Python3.10 from source on Ubuntu22.04"
exit 0
;;
*debian* | *ubuntu*)
*ubuntu*)
apt update
apt install -y libsqlite3-dev libreadline-dev
;;
*rhel*)
yum install -y sqlite-devel readline-devel xz-devel
dnf install -y sqlite-devel readline-devel xz-devel
;;
*tencentos3.1*)
dnf install -y sqlite-devel readline-devel zlib-devel xz-devel bzip2-devel libffi-devel
Expand All @@ -42,21 +42,6 @@ case "${_BASE_NAME}" in
make && make install
ln -s /etc/pki/tls/cert.pem /usr/local/openssl-1.1.1w/ssl/cert.pem

PATH=$PATH:/usr/local/protoc/bin:/usr/local/openssl-1.1.1w/bin
LD_LIBRARY_PATH=/usr/local/openssl-1.1.1w/lib:$LD_LIBRARY_PATH
_SSL_LIB="--with-openssl=/usr/local/openssl-1.1.1w"
;;
*amzn2*)
yum install -y sqlite-devel readline-devel
wget -nv -O /opt/openssl-1.1.1w.tar.gz https://github.com/openssl/openssl/releases/download/OpenSSL_1_1_1w/openssl-1.1.1w.tar.gz &&
cd /opt/ &&
tar xzf openssl-1.1.1w.tar.gz &&
rm -rf openssl-1.1.1w.tar.gz &&
cd openssl-1.1.1w &&
./config --prefix=/usr/local/openssl-1.1.1w shared zlib &&
make && make install
ln -s /etc/pki/tls/cert.pem /usr/local/openssl-1.1.1w/ssl/cert.pem

PATH=$PATH:/usr/local/protoc/bin:/usr/local/openssl-1.1.1w/bin
LD_LIBRARY_PATH=/usr/local/openssl-1.1.1w/lib:$LD_LIBRARY_PATH
_SSL_LIB="--with-openssl=/usr/local/openssl-1.1.1w"
Expand All @@ -75,9 +60,6 @@ make -j && make altinstall
# post install
case "${_BASE_NAME}" in
*rhel9*)
alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.10 2 &&
alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 &&
alternatives --set python3 /usr/local/bin/python3.10
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
;;
*tencentos3.1*)
Expand All @@ -88,16 +70,6 @@ case "${_BASE_NAME}" in
alternatives --set python3 /usr/local/bin/python3.10
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
;;
*amzn2*)
update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.10 3 &&
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 2 &&
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
;;
*debian*)
update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.10 3
update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.8 2
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
;;
esac

python3 -m pip install --upgrade pip setuptools
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,6 @@ case "${BASE_NAME}" in
*rhel8*)
os_string="rhel86"
;;
*amzn2*)
os_string="amzn2"
;;
*tencentos*)
os_string="tencentos31"
;;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# LLM Packages
deepspeed @ git+https://github.com/HabanaAI/DeepSpeed.git@1.17.1
deepspeed @ git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0

# Datascience and useful extensions
kafka-python~=2.0.2
Expand Down
Loading