Add defaults for GPU PCI passthrough #1586

Alex-Welsh · 2025-03-28T15:50:00Z

This change is designed to drastically simplify pci passthrough GPU configuration.

The idea is that we have a data dictionary containing common GPU types, and templating to write that data to nova config files based on a simple group-to-gpu map.

Configuring passthrough is as simple as creating a dictionary like this:

gpu_group_map:
  compute_a100:
    - a100_80
  compute_v100:
    - v100_32
  compute_multi_gpu:
    - a100_80
    - v100_32

(and passing through the group names to Kolla-Ansible)

The pci-passthrough.yml playbook manages host configuration (and is pre-hooked to overcloud host configure)

Templates for nova-compute.conf, nova-api.conf, and nova-scheduler.conf have been added.

Changes have now been tested in a prod environment

priteau · 2025-04-03T13:55:27Z

This should be using the stackhpc.linux collection if possible: stackhpc/ansible-collection-linux#28

etc/kayobe/ansible/pci-passthrough.yml

etc/kayobe/stackhpc-compute.yml

etc/kayobe/ansible/pci-passthrough.yml

etc/kayobe/kolla/config/nova/nova-compute.conf

Alex-Welsh · 2025-04-08T10:10:12Z

doc/source/operations/gpu-in-openstack.rst

This file may fit better under configuration/ than operations/. Opinions welcome

Alex-Welsh · 2025-04-08T10:28:39Z

This should be using the stackhpc.linux collection if possible: stackhpc/ansible-collection-linux#28

Agreed, though I'd rather get this merged so we can start using it, then update once the collection supports it

Alex-Welsh · 2025-04-08T10:30:24Z

I tried out the changes on a client deployment with three GPU types, worked very well

MoteHue

Just some docs changes now.
I've used this at a different customer site, worked a treat

MoteHue · 2025-04-09T10:26:31Z

doc/source/operations/gpu-in-openstack.rst

+Once host configuration is complete, deploy the OpenStack services:
+.. code-block:: console
+
+    kayobe overcloud service deploy -kt nova --kolla-limit compute_a100,compute_v100,compute_multi_gpu


Suggested change

Once host configuration is complete, deploy the OpenStack services:

.. code-block:: console

kayobe overcloud service deploy -kt nova --kolla-limit compute_a100,compute_v100,compute_multi_gpu

Once host configuration is complete, deploy Nova:

.. code-block:: console

kayobe overcloud service deploy -kt nova

Needs to target the controllers for Nova scheduler too.

MoteHue · 2025-04-09T10:30:06Z

doc/source/operations/gpu-in-openstack.rst

+This can be also defined in the openstack-config repository
+
+add extra_specs to flavor in etc/openstack-config/openstack-config.yml:


Suggested change

This can be also defined in the openstack-config repository

add extra_specs to flavor in etc/openstack-config/openstack-config.yml:

This can be also defined in the openstack-config repository.

Add extra_specs to flavor in etc/openstack-config/openstack-config.yml:

product-auto-label bot added size: m ansible Ansible playbooks kolla labels Mar 28, 2025

Alex-Welsh force-pushed the pci-passthrough-defaults branch from 115d514 to 8a06ffb Compare March 28, 2025 15:53

Alex-Welsh requested a review from MoteHue March 28, 2025 16:02

product-auto-label bot added size: l and removed size: m labels Mar 28, 2025

Alex-Welsh force-pushed the pci-passthrough-defaults branch from 0856eb0 to 9cd7720 Compare March 28, 2025 17:06

MoteHue reviewed Apr 7, 2025

View reviewed changes

etc/kayobe/ansible/pci-passthrough.yml Outdated Show resolved Hide resolved

MoteHue requested changes Apr 7, 2025

View reviewed changes

etc/kayobe/stackhpc-compute.yml Outdated Show resolved Hide resolved

etc/kayobe/stackhpc-compute.yml Outdated Show resolved Hide resolved

Alex-Welsh commented Apr 7, 2025

View reviewed changes

etc/kayobe/ansible/pci-passthrough.yml Outdated Show resolved Hide resolved

MoteHue requested changes Apr 7, 2025

View reviewed changes

etc/kayobe/kolla/config/nova/nova-compute.conf Outdated Show resolved Hide resolved

Alex-Welsh commented Apr 8, 2025

View reviewed changes

Alex-Welsh force-pushed the pci-passthrough-defaults branch from 2c3562c to bbd2eaa Compare April 8, 2025 10:26

Add defaults for GPU PCI passthrough configuration

bbd2eaa

Alex-Welsh marked this pull request as ready for review April 8, 2025 10:28

Alex-Welsh requested a review from a team as a code owner April 8, 2025 10:28

Alex-Welsh changed the title ~~WIP: Add defaults for GPU PCI passthrough configuration~~ Add defaults for GPU PCI passthrough Apr 8, 2025

MoteHue requested changes Apr 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add defaults for GPU PCI passthrough #1586

Add defaults for GPU PCI passthrough #1586

Alex-Welsh commented Mar 28, 2025 •

edited

Loading

priteau commented Apr 3, 2025

Alex-Welsh Apr 8, 2025

Alex-Welsh commented Apr 8, 2025

Alex-Welsh commented Apr 8, 2025

MoteHue left a comment

MoteHue Apr 9, 2025

MoteHue Apr 9, 2025

		This can be also defined in the openstack-config repository

		add extra_specs to flavor in etc/openstack-config/openstack-config.yml:

Add defaults for GPU PCI passthrough #1586

Are you sure you want to change the base?

Add defaults for GPU PCI passthrough #1586

Conversation

Alex-Welsh commented Mar 28, 2025 • edited Loading

priteau commented Apr 3, 2025

Alex-Welsh Apr 8, 2025

Choose a reason for hiding this comment

Alex-Welsh commented Apr 8, 2025

Alex-Welsh commented Apr 8, 2025

MoteHue left a comment

Choose a reason for hiding this comment

MoteHue Apr 9, 2025

Choose a reason for hiding this comment

MoteHue Apr 9, 2025

Choose a reason for hiding this comment

Alex-Welsh commented Mar 28, 2025 •

edited

Loading