Skip to content

Commit 6e3c4b3

Browse files
authored
Support MIG configuration (#43)
* Adds vgpu_nvidia_driver_install_enabled to vgpu role This is useful if you already have an nvidia driver installed. * Add example of configuring mig without vGPU * Bump version of MIG manager * Don't try and enable MIG mode if already enabled The command to enable will exit with non-zero status if you attempt to enable mig mode and it is already enabled. * Keep the linter happy * Improve readme * Break longline
1 parent 942989c commit 6e3c4b3

File tree

5 files changed

+64
-10
lines changed

5 files changed

+64
-10
lines changed

roles/vgpu/README.md

Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,57 @@
11
# stackhpc.linux.vgpu
22

3+
This role can configure vGPUs or Multi Instance GPU (MIG) on NVIDIA cards.
4+
35
## Prerequisites
46

5-
- [Download Nvidia GRID driver](https://docs.nvidia.com/grid/latest/grid-software-quick-start-guide/index.html#redeeming-pak-and-downloading-grid-software) (This requires a login).
6-
- The location of this file can be customised with the `vgpu_driver_url` variable:
7-
* e.g to use an artifact uploaded to a http server:
8-
`vgpu_driver_url: http://seed/pulp/content/nvidia/NVIDIA-GRID-Linux-KVM-525.85.07-525.85.05-528.24.zip`
9-
* e.g to use file the control host:
10-
`vgpu_driver_url: "{{ lookup('env', 'HOME'}}/NVIDIA-GRID-Linux-KVM-525.85.07-525.85.05-528.24.zip"`
7+
### Multi Instance GPU (MIG)
8+
9+
When creating MIG devices with no vGPU instances layered on top, there are no special requirements.
10+
11+
### VGPUs:
1112

1213
- Enable IOMUU
1314
- Make sure the related options are enabled in the BIOS
1415
- Intel CPUs require the intel_iommu kernel command line argument
1516

16-
## Enabling SR-IOV on dell hardware
17+
18+
#### Enabling SR-IOV on dell hardware
1719

1820
```
1921
/opt/dell/srvadmin/bin/idracadm7 set BIOS.IntegratedDevices.SriovGlobalEnable Enabled
2022
/opt/dell/srvadmin/bin/idracadm7 jobqueue create BIOS.Setup.1-1
2123
```
24+
25+
## Drivers
26+
27+
The role will attempt to install a driver from ``vgpu_driver_url``. Currently this only works with
28+
the data center drivers such as the
29+
[Nvidia GRID drivers](https://docs.nvidia.com/grid/latest/grid-software-quick-start-guide/index.html#redeeming-pak-and-downloading-grid-software)
30+
or the [AI enterprise drivers](https://www.nvidia.com/en-gb/data-center/products/ai-enterprise/);
31+
both of which can be obtained from the NVIDIA licensing portal. The use of data centre drivers is not mandatory
32+
if you only want to use MIG without vGPUs.
33+
34+
The location of this file can be customised with the `vgpu_driver_url` variable, e.g to use an artifact uploaded to a http server:
35+
36+
```
37+
vgpu_driver_url: http://seed/pulp/content/nvidia/NVIDIA-GRID-Linux-KVM-525.85.07-525.85.05-528.24.zip
38+
```
39+
40+
e.g to use a file on the control host:
41+
42+
```
43+
vgpu_driver_url: "{{ lookup('env', 'HOME'}}/NVIDIA-GRID-Linux-KVM-525.85.07-525.85.05-528.24.zip"
44+
```
45+
46+
At this moment in time, the role only supports zip archives, Future work may add support for other packaging formats such as: .deb and .rpm, and .run.
47+
48+
It is possible to install a driver via some other means by setting the ``vgpu_nvidia_driver_install_enabled`` configuration option, e.g:
49+
```
50+
---
51+
vgpu_nvidia_driver_install_enabled: false
52+
```
53+
54+
This will cause the role to assume that the driver is already installed.
2255

2356
## Running the role
2457

@@ -72,6 +105,13 @@ vgpu_definitions:
72105
index: 0
73106
- mdev_type: nvidia-697
74107
index: 1
108+
# Configuring a MIG without creating VGPUs. You may also want to set
109+
# vgpu_nvidia_driver_install_enabled: false if you have installed the nvidia
110+
# driver by some other means.
111+
- pci_address: "0000:17:00.0"
112+
mig_devices:
113+
"1g.10gb": 1
114+
"2g.20gb": 3
75115
```
76116

77117

roles/vgpu/defaults/main.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
---
2+
# Whether to install the nvidia driver. Set to false if you want to install the driver
3+
# via some other means.
4+
vgpu_nvidia_driver_install_enabled: true
25
vgpu_driver_url: ""
36
vgpu_driver_force_install: false
47
vgpu_driver_dkms: false
@@ -13,5 +16,5 @@ vgpu_mig_definitions: []
1316
vgpu_definitions: "{{ vgpu_mig_definitions }}"
1417

1518
# Packages providing nvidia-mig-manager
16-
vgpu_nvidia_mig_manager_rpm_url: https://github.com/NVIDIA/mig-parted/releases/download/v0.5.1/nvidia-mig-manager-0.5.1-1.x86_64.rpm
17-
vgpu_nvidia_mig_manager_deb_url: https://github.com/NVIDIA/mig-parted/releases/download/v0.5.1/nvidia-mig-manager_0.5.1-1_amd64.deb
19+
vgpu_nvidia_mig_manager_rpm_url: https://github.com/NVIDIA/mig-parted/releases/download/v0.12.1/nvidia-mig-manager-0.12.1-1.x86_64.rpm
20+
vgpu_nvidia_mig_manager_deb_url: https://github.com/NVIDIA/mig-parted/releases/download/v0.12.1/nvidia-mig-manager_0.12.1-1_amd64.deb

roles/vgpu/tasks/configure-gpu.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,20 @@
1010
become: true
1111
when: vgpu_definition.mig_devices is defined
1212

13+
- name: Collect mig status
14+
ansible.builtin.command: nvidia-smi -i {{ vgpu_definition.pci_address }} --query-gpu='mig.mode.current' --format csv,noheader
15+
changed_when: false
16+
register: mig_status_result
17+
when:
18+
- vgpu_definition.mig_devices is defined
19+
1320
- name: Enable mig mode
1421
ansible.builtin.command: nvidia-smi -i {{ vgpu_definition.pci_address }} -mig 1
1522
changed_when: false
1623
become: true
17-
when: vgpu_definition.mig_devices is defined
24+
when:
25+
- vgpu_definition.mig_devices is defined
26+
- mig_status_result.stdout != "Enabled"
1827

1928
- name: Template nvidia-sriov service
2029
ansible.builtin.template:

roles/vgpu/tasks/install.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
filename: "{{ vgpu_driver_url_components.path | basename }}"
2323
install_script: "{{ find_result.files.0.path }}"
2424
ansible_become: true
25+
when: vgpu_nvidia_driver_install_enabled | bool
2526
block:
2627
- name: Ensure target directory exists
2728
ansible.builtin.file:

roles/vgpu/tasks/validate.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
ansible.builtin.assert:
44
that: vgpu_driver_url | length > 0
55
fail_msg: "Please ensure you set the variable: vgpu_driver_url"
6+
when: vgpu_nvidia_driver_install_enabled | bool

0 commit comments

Comments
 (0)