Skip to content

Add testing DSS with Canonical Kubernetes (New) #1793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 76 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
ef06ea0
Add installing Canonical k8s instead of microk8s
motjuste Mar 14, 2025
0bdaf9b
Add job to enable NVIDIA GPU in Canonical k8s
motjuste Mar 14, 2025
8666050
Fix script name for checking nvidia gpu rollout
motjuste Mar 14, 2025
fb12328
Change nvidia gpu val. to depend on dss/initialize
motjuste Mar 14, 2025
0d5ce9d
Remove requirement for microk8s from dss/initialize
motjuste Mar 14, 2025
f34be48
Remove sudo in helm commands and pass kubeconfig
motjuste Mar 14, 2025
5714624
Update default dss channel to 1.0/stable
motjuste Mar 14, 2025
0f41093
Bump version of snap to 3.1
motjuste Mar 14, 2025
a7af76e
Update README documenting Canonical K8s
motjuste Mar 14, 2025
82b0c01
Add sleep before checking gpu rollout
motjuste Mar 14, 2025
a3a3610
Change to checking rollout status of parent daemonset
motjuste Mar 14, 2025
7746bf0
Unify enabling NVIDIA GPU in k8s
motjuste Mar 14, 2025
612a608
Minor update to summary of job
motjuste Mar 14, 2025
b703052
Merge branch 'main' into CHECKBOX-1781-add-canonical-k8s
motjuste Apr 22, 2025
7026636
Pin nvidia operator ver & in-line checking rollout
motjuste Apr 24, 2025
d6b9f87
Inline checking intel gpu plugin rollout
motjuste Apr 24, 2025
9279b2b
Merge branch 'main' into CHECKBOX-1781-add-canonical-k8s
motjuste Apr 24, 2025
f218899
Inline enabling intel gpu with kubectl apply
motjuste Apr 25, 2025
7eacbc8
In-line reading and passing kubeconfig
motjuste Apr 25, 2025
917e7ae
Move gpu setup in k8s to Python script
motjuste Apr 25, 2025
2796e1d
Fix commands passed to subprocess
motjuste Apr 25, 2025
5363802
Add workflow to build checkbox-dss snap
motjuste Apr 28, 2025
68d3de2
Fix formatting
motjuste Apr 28, 2025
75ba576
Add running checkbox-dss build on push and pr
motjuste Apr 28, 2025
5296d16
Remove unused jobs for publishing snap
motjuste Apr 28, 2025
66ae700
Fix command to update helm repo
motjuste Apr 28, 2025
e54b954
Remove unnecessary pointing to ~/.kube/config
motjuste Apr 28, 2025
0f82d42
Use snap build workflow in regression tests
motjuste Apr 28, 2025
4649eb5
Remove matrix in checkbox-dss snap build
motjuste Apr 28, 2025
3c07d73
Add downloading snap artifact
motjuste Apr 28, 2025
63f796c
Fix missing runs-on property
motjuste Apr 28, 2025
02c4bdf
Remove tmp job and download snap artifact for test
motjuste Apr 28, 2025
b24d049
Take dss and k8s type / channel as input
motjuste Apr 28, 2025
1e1b2e1
Set name of downloaded snap artifact
motjuste Apr 28, 2025
91ca218
Fix destination path to download built snap
motjuste Apr 28, 2025
9cc6948
Remove tmp step to verify loc of snap artifact
motjuste Apr 28, 2025
523cd01
Fix template-injection vulnerability from zizmor
motjuste Apr 28, 2025
c5692b3
Fix path to download the snap artifact
motjuste Apr 28, 2025
381e853
Accept pre-built snap in testflinger job
motjuste Apr 29, 2025
7562ffe
Remove unused REPLACE_BRANCH template param
motjuste Apr 29, 2025
e248ea4
Re-enable submitting job to testflinger
motjuste Apr 29, 2025
4a5c82f
Fix typo in testflinger job def
motjuste Apr 29, 2025
b62249d
Remove agent path in attachments
motjuste Apr 29, 2025
1f86262
Switch one queue to a more reliable device
motjuste Apr 29, 2025
9aeb632
Temporarily disable testflinger job
motjuste Apr 29, 2025
a9a99e6
Rename snap file to match attachment
motjuste Apr 29, 2025
6476a5f
List entire directory for debugging
motjuste Apr 29, 2025
eb5909b
Remove working directory
motjuste Apr 29, 2025
3f121de
Add info about inputs in job name
motjuste Apr 29, 2025
fb15559
Move channel info to run-name
motjuste Apr 29, 2025
02b0e64
Use timeout and retry helpers in gpu setup script
motjuste Apr 29, 2025
218f672
Read version from env, or use default; add tests
motjuste Apr 30, 2025
a0e6305
Add tests for installing intel gpu plugin
motjuste Apr 30, 2025
8eee8fc
Refactor installing intel gpu plugin
motjuste Apr 30, 2025
d430570
Add tests for installing nvidia gpu operator
motjuste Apr 30, 2025
c21a0bf
Refactor install nvidia gpu operator
motjuste Apr 30, 2025
b2e8a89
Add flag --is-microk8s for nvidia gpu install
motjuste Apr 30, 2025
f7ca5ae
Refactor test for nvidia gpu install
motjuste Apr 30, 2025
2f73aa0
Add configuring nvidia gpu operator for microk8s
motjuste Apr 30, 2025
02138ef
Add job to install nvidia gpu on microk8s
motjuste Apr 30, 2025
2b00d26
Revert "Add job to install nvidia gpu on microk8s"
motjuste Apr 30, 2025
a7f337f
Remove --is-microk8s argument
motjuste Apr 30, 2025
2bc9701
Add detect_if_microk8s
motjuste Apr 30, 2025
7d8abff
Add detecting microk8s in install nvidia gpu
motjuste Apr 30, 2025
676cb5d
Use subprocess.run to accept input
motjuste Apr 30, 2025
a14dcca
Add some informative print-outs with version etc
motjuste Apr 30, 2025
28737d0
Flush the print-outs to show them early
motjuste Apr 30, 2025
2c02f58
Add sleep before checking intel gpu label
motjuste Apr 30, 2025
6eb0d44
Don't catch TimeoutError in detecting microk8s
motjuste Apr 30, 2025
effdefb
Add sleep before checking nvidia validations
motjuste Apr 30, 2025
1965adb
Fix wrong timeouts for installing gpu
motjuste Apr 30, 2025
b6ff79c
Add extra rollout check with sleep before starting
motjuste Apr 30, 2025
c510057
Actually catch FileNotFoundError in detecting microk8s
motjuste Apr 30, 2025
fb191ff
Increase sleep before checking nvidia gpu rollout
motjuste May 5, 2025
62841cd
Refresh snapd before installing deps
motjuste May 5, 2025
3b033ed
Mark __main__ block for no coverage
motjuste May 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .github/workflows/checkbox-dss-build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: checkbox-dss Snap build
permissions:
contents: read
on:
push:
branches: [ main ]
paths:
- contrib/checkbox-dss-validation/checkbox-provider-dss/**
- .github/workflows/checkbox-dss-build.yaml
pull_request:
branches: [ main ]
paths:
- contrib/checkbox-dss-validation/checkbox-provider-dss/**
- .github/workflows/checkbox-dss-build.yaml
workflow_dispatch:
workflow_call:
outputs:
artifact-url:
value: ${{ jobs.snap_frontend_native.outputs.artifact-url }}

jobs:
snap_frontend_native:
outputs:
artifact-url: ${{ steps.upload_artifact.outputs.artifact-url }}
runs-on:
group: "Canonical self-hosted runners"
labels: ["self-hosted", "linux", "jammy", "large", "X64"]
timeout-minutes: 1200 #20h, this will timeout sooner due to inner timeouts
name: Checkbox DSS validation snap
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: false

- id: snap_build
uses: Wandalen/wretry.action@71a909ebf09f3ffdc6f42a17bd54ecb43481da49
name: Building the snaps
timeout-minutes: 600 # 10hours
with:
action: snapcore/[email protected]
attempt_delay: 600000 # 10min
attempt_limit: 5
with: |
path: contrib/checkbox-dss-validation/
snapcraft-channel: 8.x/stable

- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
name: Upload logs on failure
if: failure()
with:
name: snapcraft-log-checkbox-dss-snap
path: |
/home/runner/.cache/snapcraft/log/
/home/runner/.local/state/snapcraft/log/
contrib/checkbox-dss-validation/checkbox*.txt

- id: upload_artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
name: Upload the snaps as artifact
with:
name: checkbox-dss.snap
path: ${{ steps.snap_build.outputs.snap }}

# NOTE:@motjuste: We currently don't publish the checkbox-dss Snap
72 changes: 49 additions & 23 deletions .github/workflows/testflinger-contrib-dss-regression.yaml
Original file line number Diff line number Diff line change
@@ -1,39 +1,51 @@
name: Data Science Stack (DSS) Regression Testing
run-name: Testing DSS ${{ inputs.dss_channel }} on ${{ inputs.k8s_type }} ${{ inputs.k8s_channel }}
permissions:
contents: read
on:
workflow_dispatch:
# schedule:
# - cron: "0 7 * * 1" # every Monday 07:00 UTC
# push:
# branches:
# - main
# pull_request:
# branches:
# - main
inputs:
dss_channel:
description: "Channel of the DSS snap to test"
default: latest/edge
required: false
type: choice
options:
- latest/edge
- latest/stable
- 1.1/edge
- 1.1/stable
- 1.0/edge
- 1.0/stable
k8s_type:
description: "The type of K8s to deploy"
default: "canonical-k8s"
required: false
type: choice
options:
- canonical-k8s
- microk8s
k8s_channel:
description: "Channel of the K8s snap to deploy"
default: "1.32-classic/stable"
type: string

env:
BRANCH: ${{ github.head_ref || github.ref_name }}

jobs:
dss-snap-frontend-native-build:
uses: ./.github/workflows/checkbox-dss-build.yaml

regression-tests:
needs: dss-snap-frontend-native-build
name: Regression tests
runs-on: [testflinger]
defaults:
run:
working-directory: contrib/checkbox-dss-validation
strategy:
fail-fast: false
matrix:
dss_channel:
- 1/stable
- 1/candidate
- latest/edge
microk8s_channel:
- 1.28/stable
- 1.31/stable
queue:
- name: dell-precision-3470-c30322 #ADL iGPU + NVIDIA GPU
- name: dell-precision-3470-c30320 #ADL iGPU + NVIDIA GPU
provision_data: "distro: jammy"
- name: dell-precision-5680-c31665 #RPL iGPU + Arc Pro A60M dGPU
provision_data: "url: http://10.102.196.9/somerville/Platforms/jellyfish-muk/X96_A00/dell-bto-jammy-jellyfish-muk-X96-20230419-19_A00.iso"
Expand All @@ -44,15 +56,29 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: false
- uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093
name: Download the artifact with the built snap
with:
name: checkbox-dss.snap
- name: Rename snap artifact to match attachment
run: |
ls -lah checkbox-dss*.snap
mv checkbox-dss*.snap checkbox-dss.snap
ls -lah checkbox-dss*.snap
- name: Build job file from template
env:
DSS_CHANNEL: ${{ inputs.dss_channel }}
K8S_TYPE: ${{ inputs.k8s_type }}
K8S_CHANNEL: ${{ inputs.k8s_channel }}
run: |
sed -e "s|REPLACE_BRANCH|${BRANCH}|" \
-e "s|REPLACE_QUEUE|${{ matrix.queue.name }}|" \
sed -e "s|REPLACE_QUEUE|${{ matrix.queue.name }}|" \
-e "s|REPLACE_PROVISION_DATA|${{ matrix.queue.provision_data }}|" \
-e "s|REPLACE_DSS_CHANNEL|${{ matrix.dss_channel }}|" \
-e "s|REPLACE_MICROK8S_CHANNEL|${{ matrix.microk8s_channel }}|" \
-e "s|REPLACE_DSS_CHANNEL|${DSS_CHANNEL}|" \
-e "s|REPLACE_K8S_TYPE|${K8S_TYPE}|" \
-e "s|REPLACE_K8S_CHANNEL|${K8S_CHANNEL}|" \
${GITHUB_WORKSPACE}/contrib/checkbox-dss-validation/testflinger/job-def.yaml > \
${GITHUB_WORKSPACE}/job.yaml
cat ${GITHUB_WORKSPACE}/job.yaml
- name: Submit testflinger job
uses: canonical/testflinger/.github/actions/submit@a5c430ce76f981b5f344c65d82201a27f1e8c18a
with:
Expand Down
30 changes: 23 additions & 7 deletions contrib/checkbox-dss-validation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ lxd init --auto
git clone https://github.com/canonical/checkbox
cd checkbox/contrib/checkbox-dss-validation
snapcraft
sudo snap install --dangerous --classic ./checkbox-dss_3.0_amd64.snap
sudo snap install --dangerous --classic ./checkbox-dss_3.1_amd64.snap
```

Make sure that the provider service is running and active:
Expand All @@ -38,21 +38,37 @@ Some test need dependencies, and a helper script is available to install them:
checkbox-dss.install-deps
```

By default this will install the `data-science-stack` snap from the `latest/stable`
channel. To instead install from `latest/edge` use:
By default this will install the `data-science-stack` snap from the `1.0/stable`
channel. To instead install from `1.1/edge` use:

```shell
checkbox-dss.install-deps --dss-snap-channel latest/edge
checkbox-dss.install-deps --dss-snap-channel 1.1/edge
```

Furthermore, the default `microk8s` snap channel is `1.28/stable` in classic mode,
but this can be customized as
Furthermore, by default, `microk8s` snap from channel `1.28/stable` is installed
in classic mode.
The channel for `microk8s` can be customized as
(please note that this snap must to be `--classic` to enable GPU support):

```shell
checkbox-dss.install-deps --microk8s-snap-channel 1.31/stable
checkbox-dss.install-deps --microk8s-snap-channel 1.32/stable
```

It is also possible to install [Canonical Kubernetes](https://snapcraft.io/k8s)
**instead** of `microk8s`.
**Only one of, and at least one of either**
`microk8s` or Canonical Kubernetes must be installed.
Canonical Kubernetes will be installed instead of `microk8s` if the argument
`--canonical-k8s-snap-channel` is provided with a value:

```shell
checkbox-dss.install-deps --canonical-k8s-snap-channel 1.32/stable --dss-snap-channel 1.1/edge
```

When Canonical Kubernetes is install, the snape `helm` will also be installed from the
default channel in `--classic` mode.
`helm` will be used to enable NVIDIA GPU in the Canonical Kubernetes cluster.

These validations also need the `kubectl` snap installed, and the default channel
used for that is `1.29/stable`, but can be customized as shown previously by passing
the appropriate channel name for `--kubectl-snap-channel`.
Expand Down
45 changes: 39 additions & 6 deletions contrib/checkbox-dss-validation/bin/install-deps
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
set -e

dss_snap_channel="latest/stable"
dss_snap_channel="1.0/stable"
microk8s_snap_channel="1.28/stable"
kubectl_snap_channel="1.29/stable"

Expand Down Expand Up @@ -35,24 +35,46 @@ setup_microk8s_snap() {
deployment/coredns \
deployment/hostpath-provisioner
sudo microk8s.kubectl -n kube-system rollout status ds/calico-node

# hack as redirecting stdout anywhere but /dev/null throws a permission denied error
# see: https://forum.snapcraft.io/t/eksctl-cannot-write-to-stdout/17254/4
sudo microk8s.kubectl config view --raw | tee "${SNAP_REAL_HOME}/.kube/config" >/dev/null
}

setup_kubectl_snap() {
# This is needed to overcome the following bug within microk8s:
# https://github.com/canonical/microk8s/issues/4453
echo -e "\nInstalling kubectl snap from channel $1"
sudo snap install kubectl --classic --channel="$1"
}

setup_canonical_k8s_snap() {
echo -e "\nInstalling Canonical k8s snap from channel $1"
sudo snap install k8s --classic --channel "$1"
sudo k8s bootstrap
sudo k8s enable local-storage

# Directory needed for kubeconfig
mkdir -p "${SNAP_REAL_HOME}/.kube"

# hack as redirecting stdout anywhere but /dev/null throws a permission denied error
# see: https://forum.snapcraft.io/t/eksctl-cannot-write-to-stdout/17254/4
sudo microk8s.kubectl config view --raw | tee "${SNAP_REAL_HOME}/.kube/config" >/dev/null
sudo k8s config | tee "${SNAP_REAL_HOME}/.kube/config" >/dev/null
}

help_function() {
echo "This script is used install all dependencies for checkbox-dss to run; defaults for optional arguments are shown in usage"
echo "Usage: checkbox-dss.install-deps [--dss-snap-channel $dss_snap_channel] [--microk8s-snap-channel $microk8s_snap_channel] [--kubectl-snap-channel $kubectl_snap_channel]"
echo "This script is used install all dependencies for checkbox-dss to run;"
echo "defaults for optional arguments are shown in usage except for"
echo "'--canonical-k8s-channel':"
echo "if this arg is not provided, then microk8s will be installed from its default value,"
echo "otherwise Canonical k8s will be installed from the given channel instead of microk8s."
echo "Usage: checkbox-dss.install-deps [--dss-snap-channel $dss_snap_channel] [--canonical-k8s-channel NO-DEFAULT | --microk8s-snap-channel $microk8s_snap_channel] [--kubectl-snap-channel $kubectl_snap_channel]"
exit 2
}

main() {
sudo snap refresh snapd

while [ $# -ne 0 ]; do
case $1 in
--dss-snap-channel)
Expand All @@ -67,12 +89,23 @@ main() {
kubectl_snap_channel="$2"
shift 2
;;
--canonical-k8s-snap-channel)
canonical_k8s_snap_channel="$2"
shift 2
;;
*) help_function ;;
esac
done

echo -e "\n Step 1/4: Setting up microk8s"
setup_microk8s_snap "$microk8s_snap_channel"
if [[ -n "$canonical_k8s_snap_channel" ]]; then
echo -e "\n Step 1/4: Setting up Canonical k8s and helm"
setup_canonical_k8s_snap "$canonical_k8s_snap_channel"
else
echo -e "\n Step 1/4: Setting up microk8s and helm"
setup_microk8s_snap "$microk8s_snap_channel"
fi
echo -e "\n Installing the helm snap"
sudo snap install helm --classic

echo -e "\n Step 2/4: Setting up kubectl"
setup_kubectl_snap "$kubectl_snap_channel"
Expand Down

This file was deleted.

Loading
Loading