diff --git a/content/posts/2022/2022-01-27-running-kubeflow-inside-kind-with-gpu-support/index.md b/content/posts/2022/2022-01-27-running-kubeflow-inside-kind-with-gpu-support/index.md index 314c2199..ccb09dc1 100644 --- a/content/posts/2022/2022-01-27-running-kubeflow-inside-kind-with-gpu-support/index.md +++ b/content/posts/2022/2022-01-27-running-kubeflow-inside-kind-with-gpu-support/index.md @@ -30,6 +30,15 @@ _This blog post is intended more as personal notes than instructions, so take ev - You'll need an up-to-date version of the [Docker runtime](https://docs.docker.com/engine/install/ubuntu/), mine is `20.10.12`. - You'll want the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/overview.html). - You should also install [kubectl](https://kubernetes.io/docs/tasks/tools/), [kustomize](https://kustomize.io/) and [helm](https://helm.sh/) for interacting with our Kubernetes cluster. +- You should make sure you have nvidia fabric manager installed (ec2 instances by default do not have fabric manager installed), below are a set of commands you can run to install fabricmanager: +``` +driver_version=$(nvidia-smi | grep -oP "(?<=Driver Version: )[0-9.]+") +driver_major=$(echo ${driver_version} | cut -d. -f1) + +apt-get install nvidia-fabricmanager-${driver_major} -y +apt-mark hold nvidia-fabricmanager-${driver_major} +systemctl enable nvidia-fabricmanager.service +``` If you can run the following example you're all set.