Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add defaults for GPU PCI passthrough #1586

Open
wants to merge 1 commit into
base: stackhpc/2024.1
Choose a base branch
from

Conversation

Alex-Welsh
Copy link
Member

@Alex-Welsh Alex-Welsh commented Mar 28, 2025

This change is designed to drastically simplify pci passthrough GPU configuration.

The idea is that we have a data dictionary containing common GPU types, and templating to write that data to nova config files based on a simple group-to-gpu map.

Configuring passthrough is as simple as creating a dictionary like this:

gpu_group_map:
  compute_a100:
    - a100_80
  compute_v100:
    - v100_32
  compute_multi_gpu:
    - a100_80
    - v100_32

(and passing through the group names to Kolla-Ansible)

The pci-passthrough.yml playbook manages host configuration (and is pre-hooked to overcloud host configure)

Templates for nova-compute.conf, nova-api.conf, and nova-scheduler.conf have been added.

Changes have now been tested in a prod environment

@Alex-Welsh Alex-Welsh force-pushed the pci-passthrough-defaults branch from 115d514 to 8a06ffb Compare March 28, 2025 15:53
@Alex-Welsh Alex-Welsh requested a review from MoteHue March 28, 2025 16:02
@Alex-Welsh Alex-Welsh force-pushed the pci-passthrough-defaults branch from 0856eb0 to 9cd7720 Compare March 28, 2025 17:06
@priteau
Copy link
Member

priteau commented Apr 3, 2025

This should be using the stackhpc.linux collection if possible: stackhpc/ansible-collection-linux#28

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file may fit better under configuration/ than operations/. Opinions welcome

@Alex-Welsh Alex-Welsh force-pushed the pci-passthrough-defaults branch from 2c3562c to bbd2eaa Compare April 8, 2025 10:26
@Alex-Welsh
Copy link
Member Author

This should be using the stackhpc.linux collection if possible: stackhpc/ansible-collection-linux#28

Agreed, though I'd rather get this merged so we can start using it, then update once the collection supports it

@Alex-Welsh Alex-Welsh marked this pull request as ready for review April 8, 2025 10:28
@Alex-Welsh Alex-Welsh requested a review from a team as a code owner April 8, 2025 10:28
@Alex-Welsh Alex-Welsh changed the title WIP: Add defaults for GPU PCI passthrough configuration Add defaults for GPU PCI passthrough Apr 8, 2025
@Alex-Welsh
Copy link
Member Author

I tried out the changes on a client deployment with three GPU types, worked very well

Copy link
Contributor

@MoteHue MoteHue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some docs changes now.
I've used this at a different customer site, worked a treat

Comment on lines +73 to +76
Once host configuration is complete, deploy the OpenStack services:
.. code-block:: console

kayobe overcloud service deploy -kt nova --kolla-limit compute_a100,compute_v100,compute_multi_gpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once host configuration is complete, deploy the OpenStack services:
.. code-block:: console
kayobe overcloud service deploy -kt nova --kolla-limit compute_a100,compute_v100,compute_multi_gpu
Once host configuration is complete, deploy Nova:
.. code-block:: console
kayobe overcloud service deploy -kt nova

Needs to target the controllers for Nova scheduler too.

Comment on lines +87 to +89
This can be also defined in the openstack-config repository

add extra_specs to flavor in etc/openstack-config/openstack-config.yml:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This can be also defined in the openstack-config repository
add extra_specs to flavor in etc/openstack-config/openstack-config.yml:
This can be also defined in the openstack-config repository.
Add extra_specs to flavor in etc/openstack-config/openstack-config.yml:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants