Skip to content

Commit

Permalink
Gaudi RHOAI notebook image 1.19.0 update (#589)
Browse files Browse the repository at this point in the history
Signed-off-by: sharvil10 <[email protected]>
  • Loading branch information
sharvil10 authored Jan 15, 2025
1 parent 92a56c2 commit 0b6a6b0
Show file tree
Hide file tree
Showing 8 changed files with 53 additions and 101 deletions.
7 changes: 4 additions & 3 deletions enterprise/redhat/openshift-ai/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@ Intel® Gaudi AI Software Tools for OpenShift AI(RedHat OpenShift Data Science/R

| Notebook Container Name | Tools | Image Name |
| -----------------------------| ------------- | ------------- |
| Intel Gaudi Notebook Container 1.17.0-495 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.0-495-rhel-9.2`](registry.connect.redhat.com/intel/gaudi-notebooks@sha256:a62baf968caa7dd23b7f4cdcddc26e109d894f1436e247b4ea1e2fb4a5c94d54) |
| Intel Gaudi Notebook Container 1.17.1-40 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.1-40-rhel-9.2`](registry.connect.redhat.com/intel/gaudi-notebooks@sha256:00ca535956b7fcdd91e71bc4a3cd4493ddcaceea9b8d7bb95a7edc0e1cb0bac4) |
| Intel Gaudi Notebook Container 1.18.0-524 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.18.0-524-rhel-9.2`](registry.connect.redhat.com/intel/gaudi-notebooks@sha256:142b11253e5708ff9744c895868b2adda2f6f01c40127b71f1aca3d7a6e6bc29) |
| Intel Gaudi Notebook Container 1.19.0-561 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.19.0-561-rhel-9.4`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=676b2fd90e7869ebe618b78b&gti-tabs=registry-tokens), [`registry.connect.redhat.com/intel/gaudi-notebooks:1.19.0-561-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=676b32a224dabcac4cf84cb3&gti-tabs=registry-tokens) |
| Intel Gaudi Notebook Container 1.18.0-524 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.18.0-524-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=67181ccca33f4e501721789f&gti-tabs=registry-tokens) |
| Intel Gaudi Notebook Container 1.17.1-40 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.1-40-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=66fc3bf38081186cfc015a0d&gti-tabs=registry-tokens) |
| Intel Gaudi Notebook Container 1.17.0-495 | [Intel® Gaudi Software Stack*](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html), [Intel® Gaudi PyTorch](https://docs.habana.ai/en/latest/PyTorch/index.html), [Intel® Gaudi vLLM](https://github.com/HabanaAI/vllm-fork.git), [Intel® Gaudi DeepSpeed](https://github.com/HabanaAI/DeepSpeed) | [`registry.connect.redhat.com/intel/gaudi-notebooks:1.17.0-495-rhel-9.2`](https://catalog.redhat.com/software/containers/66e2072057f53c17d73d67e1?architecture=amd64&tag=1.18.0-524-rhel-9.2&image=66e0ac59261d52855750518c&gti-tabs=registry-tokens) |

## Run Gaudi Notebook Containers

Expand Down
6 changes: 4 additions & 2 deletions enterprise/redhat/openshift-ai/gaudi/crd-sample.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ apiVersion: aitools.intel/v1
kind: GaudiAIToolsContainer
metadata:
labels:
gaudi-software-version: '1.17.1-40'
gaudi-software-version: '1.19.0-561'
name: intel-gaudi
spec:
nameOverride: ""
Expand All @@ -25,6 +25,8 @@ spec:
registry: registry.connect.redhat.com
repo: intel/gaudi-notebooks
tags:
- gaudi_software: "1.17.1-40"
- gaudi_software: "1.19.0-561"
rhel_os: "9.2"
- gaudi_software: "1.19.0-561"
rhel_os: "9.4"
namespace: redhat-ods-applications
12 changes: 4 additions & 8 deletions enterprise/redhat/openshift-ai/gaudi/docker/Dockerfile.rhel9.2
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ RUN mkdir -p /licenses && \
ENV PYTHON_VERSION=3.10
COPY install-python310.sh .
RUN ./install-python310.sh rhel9.2 && rm install-python310.sh
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/python.conf && ldconfig
ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

COPY install_efa.sh .
Expand Down Expand Up @@ -194,17 +195,11 @@ RUN dnf install --allowerasing -y \
gperftools-devel && \
dnf clean all && rm -rf /var/cache/yum

RUN dnf config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo -y && \
dnf install --allowerasing -y intel-mkl-64bit-2020.4-912 && \
dnf clean all && rm -rf /var/cache/yum

# Set LD_PRELOAD after all required installations to
# avoid warnings during docker creation
ENV LD_PRELOAD=/lib64/libtcmalloc.so.4
ENV TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD=7516192768

RUN rm -rf /tmp/*

USER 1001

COPY --chown=1001:0 install_packages.sh .
Expand Down Expand Up @@ -238,9 +233,10 @@ RUN python -m pip install -r requirements.txt && \
RUN cd ${APP_ROOT}/ && \
git clone https://github.com/HabanaAI/vllm-fork.git && \
cd vllm-fork && \
VLLM_TARGET_DEVICE=hpu pip install -e .
git checkout habana_main && \
pip install -r requirements-hpu.txt && \
VLLM_TARGET_DEVICE=hpu python setup.py develop

WORKDIR ${APP_ROOT}/src
ENV NOTEBOOK_SAMPLES_LINK="https://raw.githubusercontent.com/intel/ai-containers/refs/heads/main/enterprise/redhat/openshift-ai/gaudi/demo/oneapi-sample.ipynb"

ENTRYPOINT ["bash", "-c", "/opt/app-root/builder/run"]
70 changes: 27 additions & 43 deletions enterprise/redhat/openshift-ai/gaudi/docker/Dockerfile.rhel9.4
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,15 @@ RUN echo "[CRB]" > /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-Official-SHA256" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgcheck=1" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo

RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf clean all && rm -rf /var/cache/yum

RUN dnf install -y \
python3-dnf-plugin-versionlock && \
dnf versionlock add redhat-release* && \
dnf clean all

RUN dnf update -y && dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf clean all

RUN dnf update -y && dnf install -y \
clang \
cmake3 \
cpp \
Expand All @@ -49,8 +54,8 @@ RUN dnf install -y \
lsof \
python3-devel \
openssh-clients \
openssl-1:3.0.7-28.el9_4 \
openssl-devel-1:3.0.7-28.el9_4 \
openssl \
openssl-devel \
libjpeg-devel \
openssh-server \
lsb_release \
Expand All @@ -66,24 +71,15 @@ RUN dnf install -y \
python3.11-pip \
python3.11-devel \
python3.11-rpm \
ffmpeg-free \
python3-dnf-plugin-versionlock && \
ffmpeg-free && \
# update pkgs (except OS version) for resolving potentials CVEs
dnf versionlock add redhat-release* openssl* libcurl-minimal curl-minimal ima-evm-utils python3-rpm rpm* && \
dnf update -y && \
dnf versionlock add python3-rpm rpm* && \
dnf clean all && rm -rf /var/cache/yum && \
rm -f /etc/ssh/ssh_host_*_key*

RUN mkdir -p /licenses && \
wget -O /licenses/LICENSE https://raw.githubusercontent.com/intel/ai-containers/main/LICENSE

RUN alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2 && \
alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 && \
alternatives --set python3 /usr/bin/python3.11 && \
alternatives --install /usr/bin/pip3 pip3 /usr/bin/pip3.11 2 && \
alternatives --install /usr/bin/pip3 pip3 /usr/bin/pip3.9 1 && \
alternatives --set pip3 /usr/bin/pip3.11

COPY install_efa.sh .
RUN ./install_efa.sh && rm install_efa.sh && rm -rf /etc/ld.so.conf.d/efa.conf /etc/profile.d/efa.sh

Expand Down Expand Up @@ -193,32 +189,18 @@ RUN echo "[CRB]" > /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-Official-SHA256" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo && \
echo "gpgcheck=1" >> /etc/yum.repos.d/CentOS-Linux-CRB.repo

RUN dnf install --allowerasing -y \
curl-7.76.1-29.el9_4.1 \
cairo-devel \
numactl-devel \
iproute \
which \
zlib-devel \
lapack-devel \
openblas-devel \
numactl \
gperftools-devel && \
dnf clean all && rm -rf /var/cache/yum

RUN echo "[oneAPI]" >> /etc/yum.repos.d/oneAPI.repo && \
echo "name=Intel® oneAPI repository" >> /etc/yum.repos.d/oneAPI.repo && \
echo "baseurl=https://yum.repos.intel.com/oneapi" >> /etc/yum.repos.d/oneAPI.repo && \
echo 'enabled=1' >> /etc/yum.repos.d/oneAPI.repo && \
echo "gpgcheck=1" >> /etc/yum.repos.d/oneAPI.repo && \
echo "repo_gpgcheck=1" >> /etc/yum.repos.d/oneAPI.repo && \
echo "gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB" >> /etc/yum.repos.d/oneAPI.repo

RUN dnf install --allowerasing -y intel-oneapi-mkl-2024.2.0 && \
RUN dnf update -y && dnf install --nodocs --setopt=install_weak_deps=false --allowerasing -y \
cairo-devel \
numactl-devel \
iproute \
which \
zlib-devel \
lapack-devel \
openblas-devel \
numactl \
gperftools-devel && \
dnf clean all && rm -rf /var/cache/yum

ENV LD_LIBRARY_PATH=/opt/intel/oneapi/mkl/2024.2/lib:${LD_LIBRARY_PATH}

RUN rm -rf /tmp/*

USER 1001
Expand Down Expand Up @@ -260,9 +242,11 @@ RUN python -m pip install -r requirements.txt && \
RUN cd ${APP_ROOT}/ && \
git clone https://github.com/HabanaAI/vllm-fork.git && \
cd vllm-fork && \
VLLM_TARGET_DEVICE=hpu pip install -e .
git checkout habana_main && \
pip install -r requirements-hpu.txt && \
VLLM_TARGET_DEVICE=hpu python setup.py develop

WORKDIR ${APP_ROOT}/src
ENV JUPYTER_PRELOAD_REPOS="https://github.com/IntelAI/oneAPI-samples"
ENV REPO_BRANCH="main"
ENV NOTEBOOK_SAMPLES_LINK="https://raw.githubusercontent.com/intel/ai-containers/refs/heads/main/enterprise/redhat/openshift-ai/gaudi/demo/oneapi-sample.ipynb"

ENTRYPOINT ["bash", "-c", "/opt/app-root/builder/run"]
22 changes: 11 additions & 11 deletions enterprise/redhat/openshift-ai/gaudi/docker/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ services:
https_proxy: ${https_proxy}
no_proxy: ""
ARTIFACTORY_URL: ${ARTIFACTORY_URL:-vault.habana.ai}
VERSION: ${VERSION:-1.18.0}
REVISION: ${REVISION:-524}
VERSION: ${VERSION:-1.19.0}
REVISION: ${REVISION:-561}
context: .
target: gaudi-base
dockerfile: Dockerfile.rhel${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-base-${VERSION:-1.18.0}-${REVISION:-524}-rhel-${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-base-${VERSION:-1.19.0}-${REVISION:-561}-rhel-${RHEL_OS:-9.2}
entrypoint: ["/bin/bash", "-c"]
command: >
"hl-smi"
Expand All @@ -37,17 +37,17 @@ services:
BASE_IMAGE: ${BASE_IMAGE:-registry.access.redhat.com/ubi9/ubi}
BASE_TAG: ${RHEL_OS:-9.2}
BASE_NAME: rhel${RHEL_OS:-rhel9.2}
PT_VERSION: ${PT_VERSION:-2.4.0}
PT_VERSION: ${PT_VERSION:-2.5.1}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ""
ARTIFACTORY_URL: ${ARTIFACTORY_URL:-vault.habana.ai}
VERSION: ${VERSION:-1.18.0}
REVISION: ${REVISION:-524}
VERSION: ${VERSION:-1.19.0}
REVISION: ${REVISION:-561}
context: .
target: gaudi-pytorch
dockerfile: Dockerfile.rhel${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-pytorch-${VERSION:-1.18.0}-${REVISION:-524}-rhel-${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-pytorch-${VERSION:-1.19.0}-${REVISION:-561}-rhel-${RHEL_OS:-9.2}
entrypoint: ["/bin/bash", "-c"]
command: >
"python -c 'import torch'"
Expand All @@ -57,17 +57,17 @@ services:
BASE_IMAGE: ${BASE_IMAGE:-registry.access.redhat.com/ubi9/ubi}
BASE_TAG: ${RHEL_OS:-9.2}
BASE_NAME: ${BASE_NAME:-rhel9.2}
PT_VERSION: ${PT_VERSION:-2.4.0}
PT_VERSION: ${PT_VERSION:-2.5.1}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ""
ARTIFACTORY_URL: ${ARTIFACTORY_URL:-vault.habana.ai}
VERSION: ${VERSION:-1.18.0}
REVISION: ${REVISION:-524}
VERSION: ${VERSION:-1.19.0}
REVISION: ${REVISION:-561}
context: .
target: gaudi-notebooks
dockerfile: Dockerfile.rhel${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-notebook-${VERSION:-1.18.0}-${REVISION:-524}-rhel-${RHEL_OS:-9.2}
image: ${REGISTRY}/${REPO}:b-${GITHUB_RUN_NUMBER:-0}-gaudi-notebook-${VERSION:-1.19.0}-${REVISION:-561}-rhel-${RHEL_OS:-9.2}
entrypoint: ["/bin/bash", "-c"]
command: >
"python -m jupyter notebook --version"
32 changes: 2 additions & 30 deletions enterprise/redhat/openshift-ai/gaudi/docker/install-python310.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@ case "${_BASE_NAME}" in
echo "Skip install Python3.10 from source on Ubuntu22.04"
exit 0
;;
*debian* | *ubuntu*)
*ubuntu*)
apt update
apt install -y libsqlite3-dev libreadline-dev
;;
*rhel*)
yum install -y sqlite-devel readline-devel xz-devel
dnf install -y sqlite-devel readline-devel xz-devel
;;
*tencentos3.1*)
dnf install -y sqlite-devel readline-devel zlib-devel xz-devel bzip2-devel libffi-devel
Expand All @@ -42,21 +42,6 @@ case "${_BASE_NAME}" in
make && make install
ln -s /etc/pki/tls/cert.pem /usr/local/openssl-1.1.1w/ssl/cert.pem

PATH=$PATH:/usr/local/protoc/bin:/usr/local/openssl-1.1.1w/bin
LD_LIBRARY_PATH=/usr/local/openssl-1.1.1w/lib:$LD_LIBRARY_PATH
_SSL_LIB="--with-openssl=/usr/local/openssl-1.1.1w"
;;
*amzn2*)
yum install -y sqlite-devel readline-devel
wget -nv -O /opt/openssl-1.1.1w.tar.gz https://github.com/openssl/openssl/releases/download/OpenSSL_1_1_1w/openssl-1.1.1w.tar.gz &&
cd /opt/ &&
tar xzf openssl-1.1.1w.tar.gz &&
rm -rf openssl-1.1.1w.tar.gz &&
cd openssl-1.1.1w &&
./config --prefix=/usr/local/openssl-1.1.1w shared zlib &&
make && make install
ln -s /etc/pki/tls/cert.pem /usr/local/openssl-1.1.1w/ssl/cert.pem

PATH=$PATH:/usr/local/protoc/bin:/usr/local/openssl-1.1.1w/bin
LD_LIBRARY_PATH=/usr/local/openssl-1.1.1w/lib:$LD_LIBRARY_PATH
_SSL_LIB="--with-openssl=/usr/local/openssl-1.1.1w"
Expand All @@ -75,9 +60,6 @@ make -j && make altinstall
# post install
case "${_BASE_NAME}" in
*rhel9*)
alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.10 2 &&
alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 &&
alternatives --set python3 /usr/local/bin/python3.10
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
;;
*tencentos3.1*)
Expand All @@ -88,16 +70,6 @@ case "${_BASE_NAME}" in
alternatives --set python3 /usr/local/bin/python3.10
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
;;
*amzn2*)
update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.10 3 &&
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 2 &&
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
;;
*debian*)
update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.10 3
update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.8 2
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
;;
esac

python3 -m pip install --upgrade pip setuptools
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,6 @@ case "${BASE_NAME}" in
*rhel8*)
os_string="rhel86"
;;
*amzn2*)
os_string="amzn2"
;;
*tencentos*)
os_string="tencentos31"
;;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# LLM Packages
deepspeed @ git+https://github.com/HabanaAI/DeepSpeed.git@1.17.1
deepspeed @ git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0

# Datascience and useful extensions
kafka-python~=2.0.2
Expand Down

0 comments on commit 0b6a6b0

Please sign in to comment.