Skip to content

Add testing DSS with Canonical Kubernetes (New) #1793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft
30 changes: 23 additions & 7 deletions contrib/checkbox-dss-validation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ lxd init --auto
git clone https://github.com/canonical/checkbox
cd checkbox/contrib/checkbox-dss-validation
snapcraft
sudo snap install --dangerous --classic ./checkbox-dss_3.0_amd64.snap
sudo snap install --dangerous --classic ./checkbox-dss_3.1_amd64.snap
```

Make sure that the provider service is running and active:
Expand All @@ -38,21 +38,37 @@ Some test need dependencies, and a helper script is available to install them:
checkbox-dss.install-deps
```

By default this will install the `data-science-stack` snap from the `latest/stable`
channel. To instead install from `latest/edge` use:
By default this will install the `data-science-stack` snap from the `1.0/stable`
channel. To instead install from `1.1/edge` use:

```shell
checkbox-dss.install-deps --dss-snap-channel latest/edge
checkbox-dss.install-deps --dss-snap-channel 1.1/edge
```

Furthermore, the default `microk8s` snap channel is `1.28/stable` in classic mode,
but this can be customized as
Furthermore, by default, `microk8s` snap from channel `1.28/stable` is installed
in classic mode.
The channel for `microk8s` can be customized as
(please note that this snap must to be `--classic` to enable GPU support):

```shell
checkbox-dss.install-deps --microk8s-snap-channel 1.31/stable
checkbox-dss.install-deps --microk8s-snap-channel 1.32/stable
```

It is also possible to install [Canonical Kubernetes](https://snapcraft.io/k8s)
**instead** of `microk8s`.
**Only one of, and at least one of either**
`microk8s` or Canonical Kubernetes must be installed.
Canonical Kubernetes will be installed instead of `microk8s` if the argument
`--canonical-k8s-snap-channel` is provided with a value:

```shell
checkbox-dss.install-deps --canonical-k8s-snap-channel 1.32/stable --dss-snap-channel 1.1/edge
```

When Canonical Kubernetes is install, the snape `helm` will also be installed from the
default channel in `--classic` mode.
`helm` will be used to enable NVIDIA GPU in the Canonical Kubernetes cluster.

These validations also need the `kubectl` snap installed, and the default channel
used for that is `1.29/stable`, but can be customized as shown previously by passing
the appropriate channel name for `--kubectl-snap-channel`.
Expand Down
43 changes: 37 additions & 6 deletions contrib/checkbox-dss-validation/bin/install-deps
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
set -e

dss_snap_channel="latest/stable"
dss_snap_channel="1.0/stable"
microk8s_snap_channel="1.28/stable"
kubectl_snap_channel="1.29/stable"

Expand Down Expand Up @@ -35,21 +35,41 @@ setup_microk8s_snap() {
deployment/coredns \
deployment/hostpath-provisioner
sudo microk8s.kubectl -n kube-system rollout status ds/calico-node

# hack as redirecting stdout anywhere but /dev/null throws a permission denied error
# see: https://forum.snapcraft.io/t/eksctl-cannot-write-to-stdout/17254/4
sudo microk8s.kubectl config view --raw | tee "${SNAP_REAL_HOME}/.kube/config" >/dev/null
}

setup_kubectl_snap() {
# This is needed to overcome the following bug within microk8s:
# https://github.com/canonical/microk8s/issues/4453
echo -e "\nInstalling kubectl snap from channel $1"
sudo snap install kubectl --classic --channel="$1"
}

setup_canonical_k8s_snap() {
echo -e "\nInstalling Canonical k8s snap from channel $1"
sudo snap install k8s --classic --channel "$1"
sudo k8s bootstrap
sudo k8s enable local-storage

# Directory needed for kubeconfig
mkdir -p "${SNAP_REAL_HOME}/.kube"

# hack as redirecting stdout anywhere but /dev/null throws a permission denied error
# see: https://forum.snapcraft.io/t/eksctl-cannot-write-to-stdout/17254/4
sudo microk8s.kubectl config view --raw | tee "${SNAP_REAL_HOME}/.kube/config" >/dev/null
sudo k8s config | tee "${SNAP_REAL_HOME}/.kube/config" >/dev/null
}

help_function() {
echo "This script is used install all dependencies for checkbox-dss to run; defaults for optional arguments are shown in usage"
echo "Usage: checkbox-dss.install-deps [--dss-snap-channel $dss_snap_channel] [--microk8s-snap-channel $microk8s_snap_channel] [--kubectl-snap-channel $kubectl_snap_channel]"
echo "This script is used install all dependencies for checkbox-dss to run;"
echo "defaults for optional arguments are shown in usage except for"
echo "'--canonical-k8s-channel':"
echo "if this arg is not provided, then microk8s will be installed from its default value,"
echo "otherwise Canonical k8s will be installed from the given channel instead of microk8s."
echo "Usage: checkbox-dss.install-deps [--dss-snap-channel $dss_snap_channel] [--canonical-k8s-channel NO-DEFAULT | --microk8s-snap-channel $microk8s_snap_channel] [--kubectl-snap-channel $kubectl_snap_channel]"
exit 2
}

main() {
Expand All @@ -67,12 +87,23 @@ main() {
kubectl_snap_channel="$2"
shift 2
;;
--canonical-k8s-snap-channel)
canonical_k8s_snap_channel="$2"
shift 2
;;
*) help_function ;;
esac
done

echo -e "\n Step 1/4: Setting up microk8s"
setup_microk8s_snap "$microk8s_snap_channel"
if [[ -n "$canonical_k8s_snap_channel" ]]; then
echo -e "\n Step 1/4: Setting up Canonical k8s and helm"
setup_canonical_k8s_snap "$canonical_k8s_snap_channel"
else
echo -e "\n Step 1/4: Setting up microk8s and helm"
setup_microk8s_snap "$microk8s_snap_channel"
fi
echo -e "\n Installing the helm snap"
sudo snap install helm --classic

echo -e "\n Step 2/4: Setting up kubectl"
setup_kubectl_snap "$kubectl_snap_channel"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ set -eou pipefail

NAMESPACE="${1:-"gpu-operator-resources"}"
sleep 10
kubectl -n "$NAMESPACE" rollout status ds/gpu-operator-node-feature-discovery-worker
kubectl -n "$NAMESPACE" rollout status ds/gpu-operator-feature-discovery
sleep 10
kubectl -n "$NAMESPACE" rollout status ds/nvidia-device-plugin-daemonset
sleep 10
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ flags: simple
imports: from com.canonical.certification import executable
requires:
executable.name == 'dss'
executable.name == 'microk8s'
_summary: Check that the DSS environment initializes
estimated_duration: 2m
command:
Expand Down Expand Up @@ -254,31 +253,37 @@ command: run_dss.sh remove pytorch-intel

# NVIDIA CUDA jobs ############################################################

id: microk8s_nvidia_gpu_addon/enable
id: nvidia_gpu_operator/install
category_id: dss-regress
flags: simple
imports:
from com.canonical.certification import executable
from com.canonical.certification import graphics_card
requires:
graphics_card.vendor == 'NVIDIA Corporation'
executable.name == 'microk8s'
executable.name == 'helm'
executable.name == 'kubectl'
depends: dss/initialize
_summary: Enable NVIDIA GPU addon in microk8s
_summary: Install NVIDIA GPU operator in the Kubernetes cluster
estimated_duration: 10m
command:
set -eou pipefail
OPERATOR_VERSION="24.6.2"
microk8s enable gpu --driver=operator --version="${OPERATOR_VERSION}"
check_cuda_rollout.sh
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait --generate-name -n gpu-operator-resources --create-namespace nvidia/gpu-operator --kubeconfig ~/.kube/config
sleep 15
check_nvidia_gpu_rollout.sh

id: nvidia_gpu_addon/validations_succeed
id: nvidia_gpu_operator/validations_succeed
category_id: dss-regress
flags: simple
imports: from com.canonical.certification import executable
requires: executable.name == 'kubectl'
depends: microk8s_nvidia_gpu_addon/enable
imports:
from com.canonical.certification import executable
from com.canonical.certification import graphics_card
requires:
graphics_card.vendor == 'NVIDIA Corporation'
executable.name == 'kubectl'
depends: nvidia_gpu_operator/install
_summary: NVIDIA GPU validations should succeed
estimated_duration: 10s
command:
Expand All @@ -292,7 +297,7 @@ category_id: dss-regress
flags: simple
imports: from com.canonical.certification import executable
requires: executable.name == 'dss'
depends: nvidia_gpu_addon/validations_succeed
depends: nvidia_gpu_operator/validations_succeed
_summary: Check that dss status reports that NVIDIA GPU acceleration is enabled
estimated_duration: 5s
command:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ include:
dss/create_pytorch_intel_notebook
xpu/pytorch_can_use_xpu
dss/remove_pytorch_intel_notebook
microk8s_nvidia_gpu_addon/enable
nvidia_gpu_addon/validations_succeed
nvidia_gpu_operator/install
nvidia_gpu_operator/validations_succeed
dss/status_nvidia_gpu
dss/create_pytorch_cuda_notebook
cuda/pytorch_can_use_cuda
Expand Down
2 changes: 1 addition & 1 deletion contrib/checkbox-dss-validation/snap/snapcraft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: checkbox-dss
summary: Checkbox tests for validating the Data Science Stack
description: |
Collection of tests to be run on devices that are part of the dss-validation project
version: '3.0'
version: '3.1'
confinement: classic
grade: stable

Expand Down