Skip to content

Commit d67c456

Browse files
Update docs for GPU support with KVM (#526)
* Update docs for GPU support with KVM * Apply suggestions from code review Co-authored-by: Suresh Kumar Anaparti <[email protected]>
1 parent a5d29c0 commit d67c456

File tree

4 files changed

+137
-57
lines changed

4 files changed

+137
-57
lines changed

source/adminguide/hosts.rst

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,38 @@ Following hypervisor-specific documentations can be referred for different maxim
223223
Guest Instance limit check is not done while deploying an Instance on a KVM hypervisor host.
224224

225225

226+
.. _discovering-gpu-devices-on-kvm-hosts:
227+
228+
Discovering GPU Devices on KVM Hosts
229+
--------------------------------
230+
231+
For KVM, the user needs to ensure that IOMMU is enabled and the necessary
232+
drivers are installed. If vGPU is to be used, the user needs to ensure that
233+
the vGPU type is supported by the host and has been created on the host. The
234+
cloudstack agent uses the ``gpudiscovery.sh`` script to discover the GPU devices
235+
on the host. For more information on how to prepare the host for GPU
236+
passthrough, see `Managing GPU devices in virtual machines <https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_virtualization/assembly_managing-gpu-devices-in-virtual-machines_configuring-and-managing-virtualization>`_.
237+
238+
Once the host is configured with the GPU devices, the operator can trigger the
239+
discovery of the GPU devices on the host by using ``discoverGPUdevices`` command
240+
using cmk or use the ``Discover GPU devices`` button on the host details page in the UI.
241+
This triggers a request to the cloudstack agent to discover the GPU devices on
242+
the host.
243+
244+
The cloudstack agent uses the ``gpudiscovery.sh`` script to discover the GPU
245+
devices on a KVM host. The script is located in the
246+
``/usr/share/cloudstack-common/scripts/vm/`` directory on the host.
247+
248+
.. note::
249+
The script can be run manually to debug the discovery of the GPU devices on a host.
250+
251+
.. parsed-literal::
252+
253+
sudo /usr/share/cloudstack-common/scripts/vm/gpudiscovery.sh
254+
255+
The script will output the GPU devices in a JSON found on the host. The operator
256+
can also update the script to customize the discovery of the GPU devices on the host.
257+
226258

227259
Changing Host Password
228260
----------------------

source/adminguide/service_offerings.rst

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -289,22 +289,22 @@ To create a new compute offering:
289289
- Preferred: The instance will be deployed in dedicated infrastructure if
290290
possible. Otherwise, the instance can be deployed in shared infrastructure.
291291

292-
- **GPU**: Assign a physical GPU(GPU-passthrough) or a portion of a physical
292+
- **GPU Card**: Assign a physical GPU(GPU-passthrough) or a portion of a physical
293293
GPU card (vGPU) to the guest instance. It allows graphical applications to run on the instance.
294294
Select the card from the supported list of cards.
295-
The options given are NVIDIA GRID K1 and NVIDIA GRID K2. These are vGPU
296-
capable cards that allow multiple vGPUs on a single physical GPU. If you
297-
want to use a card other than these, follow the instructions in the
298-
**"GPU and vGPU support for CloudStack Guest instances"** page in the
299-
Cloudstack Version 4.4 Design Docs found in the Cloudstack Wiki.
300295

301-
- **vGPU Type**: Represents the type of virtual GPU to be assigned to a
296+
- **GPU Profile**: Represents the type of virtual GPU to be assigned to a
302297
guest instance. In this case, only a portion of a physical GPU card (vGPU) is
303298
assigned to the guest instance.
304-
Additionally, the **passthrough vGPU** type is defined to represent a physical GPU
305-
device. A **passthrough vGPU** can directly be assigned to a single guest instance.
306-
In this case, a physical GPU device is exclusively allotted to a single
307-
guest instance.
299+
Additionally, the **passthrough** type is defined to represent a physical GPU
300+
device. A **passthrough** can directly be assigned to a single guest instance.
301+
In this case, the physical GPU devices are exclusively allotted to a single guest instance.
302+
303+
- **GPU Count**: The number of GPUs to be assigned to the guest instance.
304+
This is applicable only for KVM hypervisor.
305+
306+
- **GPU Display**: Whether to use the GPU device attached to the guest instance for display.
307+
This is applicable only for KVM hypervisor.
308308

309309
- **Public**: Indicate whether the compute offering should be
310310
available to all domains or only some domains. Choose Yes to make it

source/adminguide/usage.rst

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -249,20 +249,29 @@ max.account.cpus Maximum number of CPU cores that can be used
249249
Default is 40.
250250
max.account.ram (MB) Maximum RAM that can be used for an Account.
251251
Default is 40960.
252+
max.account.gpus Maximum number of GPUs that can be used for an Account.
253+
Default is 20.
252254
max.account.primary.storage (GB) Maximum primary storage space that can be used for an Account.
253255
Default is 200.
254256
max.account.secondary.storage (GB) Maximum secondary storage space that can be used for an Account.
255257
Default is 400.
256-
max.project.cpus Maximum number of CPU cores that can be used for an Account.
258+
max.project.cpus Maximum number of CPU cores that can be used for a Project.
257259
Default is 40.
258-
max.project.ram (MB) Maximum RAM that can be used for an Account.
260+
max.project.ram (MB) Maximum RAM that can be used for a Project.
259261
Default is 40960.
260-
max.project.primary.storage (GB) Maximum primary storage space that can be used for an Account.
262+
max.project.gpus Maximum number of GPUs that can be used for a Project.
263+
Default is 20.
264+
max.project.primary.storage (GB) Maximum primary storage space that can be used for a Project.
261265
Default is 200.
262-
max.project.secondary.storage (GB) Maximum secondary storage space that can be used for an Account.
266+
max.project.secondary.storage (GB) Maximum secondary storage space that can be used for a Project.
263267
Default is 400.
264268
=================================== =================================================================
265269

270+
The GPU devices are not detached when the Instance is stopped. Therefore,
271+
the GPU devices for stopped Instances are counted towards the resource limits.
272+
To avoid this, the administrator can set the `gpu.detach.on.stop` global
273+
setting to `true` to detach the GPU devices when the Instance is stopped.
274+
266275
The administrator can also set limits for specific tagged host and storage
267276
resources for an account or domain. Such tags must be specified in the following
268277
global settings:

source/adminguide/virtual_machines.rst

Lines changed: 81 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1593,39 +1593,54 @@ CloudStack meet the intensive graphical processing requirement by means of the
15931593
high computation power of GPU/vGPU, and CloudStack users can run multimedia
15941594
rich applications, such as Auto-CAD, that they otherwise enjoy at their desk on
15951595
a virtualized environment.
1596-
CloudStack leverages the XenServer support for NVIDIA GRID Kepler 1 and 2 series
1597-
to run GPU/vGPU enabled Instances. NVIDIA GRID cards allows sharing a single GPU cards
1598-
among multiple Instances by creating vGPUs for each Instance. With vGPU technology, the
1599-
graphics commands from each Instance are passed directly to the underlying dedicated
1600-
GPU, without the intervention of the hypervisor. This allows the GPU hardware
1601-
to be time-sliced and shared across multiple Instances. XenServer hosts use the GPU
1602-
cards in following ways:
1603-
1604-
**GPU passthrough**: GPU passthrough represents a physical GPU which can be
1596+
1597+
For KVM, CloudStack leverages libvirt's PCI passthrough feature to assign a
1598+
physical GPU to a guest Instance. For vGPU profiles, depending on the vGPU type,
1599+
CloudStack uses mediated devices or Virtual Functions(VF) to assign a virtual
1600+
GPU to a guest Instance. It's the responsibility of the operator to ensure that
1601+
GPU devices are in correct state and are available for use on the host. If the
1602+
operator wants to use vGPU profiles, they need to ensure that the vGPU type is
1603+
supported by the host and has been created on the host.
1604+
1605+
For XenServer, CloudStack leverages the XenServer support for NVIDIA GRID
1606+
Kepler 1 and 2 series to run GPU/vGPU enabled Instances.
1607+
1608+
Some NVIDIA cards allow sharing a single GPU card among multiple Instances by
1609+
creating vGPUs for each Instance. With vGPU technology, the graphics commands
1610+
from each Instance are passed directly to the underlying dedicated GPU, without
1611+
the intervention of the hypervisor. This allows the GPU hardware to be
1612+
time-sliced and shared across multiple Instances. The GPU cards are used in the
1613+
following ways:
1614+
1615+
**passthrough**: GPU passthrough represents a physical GPU which can be
16051616
directly assigned to an Instance. GPU passthrough can be used on a hypervisor alongside
16061617
GRID vGPU, with some restrictions: A GRID physical GPU can either host GRID
16071618
vGPUs or be used as passthrough, but not both at the same time.
16081619

1609-
**GRID vGPU**: GRID vGPU enables multiple Instances to share a single physical GPU.
1620+
**vGPU**: vGPU enables multiple Instances to share a single physical GPU.
16101621
The Instances run an NVIDIA driver stack and get direct access to the GPU. GRID
16111622
physical GPUs are capable of supporting multiple virtual GPU devices (vGPUs)
1612-
that can be assigned directly to guest Instances. Guest Instances use GRID virtual GPUs in
1623+
that can be assigned directly to guest Instances. Guest Instances use vGPUs in
16131624
the same manner as a physical GPU that has been passed through by the
16141625
hypervisor: an NVIDIA driver loaded in the guest Instance provides direct access to
16151626
the GPU for performance-critical fast paths, and a paravirtualized interface to
1616-
the GRID Virtual GPU Manager, which is used for nonperformant management
1617-
operations. NVIDIA GRID Virtual GPU Manager for XenServer runs in dom0.
1627+
the NVIDIA vGPU Manager, which is used for nonperformant management
1628+
operations. NVIDIA vGPU Manager for XenServer runs in dom0.
1629+
16181630
CloudStack provides you with the following capabilities:
16191631

1620-
- Adding XenServer hosts with GPU/vGPU capability provisioned by the administrator.
1632+
- Adding hosts with GPU/vGPU capability provisioned by the administrator.
1633+
(Supports only XenServer & KVM)
16211634

1622-
- Creating a Compute Offering with GPU/vGPU capability.
1635+
- Creating a Compute Offering with GPU/vGPU capability. For KVM, it is possible to
1636+
specify the GPU count and whether to use the GPU for display. For XenServer,
1637+
GPU count is simply ignored and only one device is assigned to the guest Instance.
16231638

16241639
- Deploying an Instance with GPU/vGPU capability.
16251640

16261641
- Destroying an Instance with GPU/vGPU capability.
16271642

1628-
- Allowing an user to add GPU/vGPU support to an Instance without GPU/vGPU support by
1643+
- Allowing a user to add GPU/vGPU support to an Instance without GPU/vGPU support by
16291644
changing the Service Offering and vice-versa.
16301645

16311646
- Migrating Instances (cold migration) with GPU/vGPU capability.
@@ -1635,57 +1650,78 @@ CloudStack provides you with the following capabilities:
16351650
- Querying hosts to obtain information about the GPU cards, supported vGPU types
16361651
in case of GRID cards, and capacity of the cards.
16371652

1653+
- Limit an account/domain/project to use a certain number of GPUs.
1654+
16381655
Prerequisites and System Requirements
16391656
-------------------------------------
16401657

16411658
Before proceeding, ensure that you have these prerequisites:
16421659

1643-
- The vGPU-enabled XenServer 6.2 and later versions.
1644-
For more information, see `Citrix 3D Graphics Pack <https://www.citrix.com/go/private/vgpu.html>`_.
1660+
- CloudStack does not restrict the deployment of GPU-enabled Instances with
1661+
guest OS types that are not supported for GPU/vGPU functionality. The deployment
1662+
would be successful and a GPU/vGPU will also get allocated for Instances; however,
1663+
due to missing guest OS drivers, Instance would not be able to leverage GPU resources.
1664+
Therefore, it is recommended to use GPU-enabled service offering only with supported guest OS.
1665+
1666+
- NVIDIA GRID K1 (16 GiB video RAM) AND K2 (8 GiB of video RAM) cards supports
1667+
homogeneous virtual GPUs, implies that at any given time, the vGPUs resident on
1668+
a single physical GPU must be all of the same type. However, this restriction
1669+
doesn't extend across physical GPUs on the same card. Each physical GPU on a
1670+
K1 or K2 may host different types of virtual GPU at the same time. For example,
1671+
a GRID K2 card has two physical GPUs, and supports four types of virtual GPU;
1672+
GRID K200, GRID K220Q, GRID K240Q, AND GRID K260Q.
1673+
1674+
- NVIDIA driver must be installed to enable vGPU operation as for a physical NVIDIA GPU.
16451675

1646-
- GPU/vGPU functionality is supported for following HVM guest operating systems:
1647-
For more information, see `Citrix 3D Graphics Pack <https://www.citrix.com/go/private/vgpu.html>`_.
16481676

1649-
- Windows 7 (x86 and x64)
1677+
For XenServer:
16501678

1651-
- Windows Server 2008 R2
1679+
- the vGPU-enabled XenServer 6.2 and later versions.
1680+
For more information, see `Citrix 3D Graphics Pack <https://www.citrix.com/go/private/vgpu.html>`_.
16521681

1653-
- Windows Server 2012
1682+
- GPU/vGPU functionality is supported for following HVM guest operating systems:
1683+
For more information, see `Citrix 3D Graphics Pack <https://www.citrix.com/go/private/vgpu.html>`_.
16541684

1655-
- Windows 8 (x86 and x64)
1685+
- Windows 7 (x86 and x64)
16561686

1657-
- Windows 8.1 ("Blue") (x86 and x64)
1687+
- Windows Server 2008 R2
16581688

1659-
- Windows Server 2012 R2 (server equivalent of "Blue")
1689+
- Windows Server 2012
16601690

1661-
- CloudStack does not restrict the deployment of GPU-enabled Instances with guest OS types that are not supported by XenServer for GPU/vGPU functionality. The deployment would be successful and a GPU/vGPU will also get allocated for Instances; however, due to missing guest OS drivers, Instance would not be able to leverage GPU resources. Therefore, it is recommended to use GPU-enabled service offering only with supported guest OS.
1691+
- Windows 8 (x86 and x64)
16621692

1663-
- NVIDIA GRID K1 (16 GiB video RAM) AND K2 (8 GiB of video RAM) cards supports homogeneous virtual GPUs, implies that at any given time, the vGPUs resident on a single physical GPU must be all of the same type. However, this restriction doesn't extend across physical GPUs on the same card. Each physical GPU on a K1 or K2 may host different types of virtual GPU at the same time. For example, a GRID K2 card has two physical GPUs, and supports four types of virtual GPU; GRID K200, GRID K220Q, GRID K240Q, AND GRID K260Q.
1693+
- Windows 8.1 ("Blue") (x86 and x64)
16641694

1665-
- NVIDIA driver must be installed to enable vGPU operation as for a physical NVIDIA GPU.
1695+
- Windows Server 2012 R2 (server equivalent of "Blue")
16661696

1667-
- XenServer tools are installed in the Instance to get maximum performance on XenServer, regardless of type of vGPU you are using. Without the optimized networking and storage drivers that the XenServer tools provide, remote graphics applications running on GRID vGPU will not deliver maximum performance.
1697+
- XenServer tools are installed in the Instance to get maximum performance on
1698+
XenServer, regardless of type of vGPU you are using. Without the optimized
1699+
networking and storage drivers that the XenServer tools provide, remote
1700+
graphics applications running on GRID vGPU will not deliver maximum performance.
16681701

1669-
- To deliver high frames from multiple heads on vGPU, install XenDesktop with HDX 3D Pro remote graphics.
1702+
- To deliver high frames from multiple heads on vGPU, install XenDesktop with
1703+
HDX 3D Pro remote graphics.
16701704

16711705
Before continuing with configuration, consider the following:
16721706

1673-
- Deploying Instances GPU/vGPU capability is not supported if hosts are not available with enough GPU capacity.
1674-
1675-
- A Service Offering cannot be created with the GPU values that are not supported by CloudStack UI. However, you can make an API call to achieve this.
1707+
- Deploying Instances with GPU/vGPU capability is not supported if hosts are
1708+
not available with enough GPU capacity.
16761709

1677-
- Dynamic scaling is not supported. However, you can choose to deploy an Instance without GPU support, and at a later point, you can change the system offering to upgrade to the one with vGPU. You can achieve this by offline upgrade: stop the Instance, upgrade the Service Offering to the one with vGPU, then start the Instance.
1710+
- Dynamic scaling is not supported. However, you can choose to deploy an
1711+
Instance without GPU support, and at a later point, you can change the system
1712+
offering to upgrade to the one with vGPU. You can achieve this by offline
1713+
upgrade: stop the Instance, upgrade the Service Offering to the one with
1714+
vGPU, then start the Instance.
16781715

16791716
- Live migration of GPU/vGPU enabled Instance is not supported.
16801717

1681-
- Limiting GPU resources per Account/Domain is not supported.
1682-
16831718
- Disabling GPU at Cluster level is not supported.
16841719

16851720
- Notification thresholds for GPU resource is not supported.
16861721

1687-
Supported GPU Devices
1688-
---------------------
1722+
1723+
Supported GPU Devices for XenServer
1724+
-----------------------------------
16891725

16901726
.. cssclass:: table-striped table-bordered table-hover
16911727

@@ -1710,14 +1746,17 @@ GPU/vGPU Assignment Workflow
17101746

17111747
CloudStack follows the below sequence of operations to provide GPU/vGPU support for Instances:
17121748

1713-
#. Ensure that XenServer host is ready with GPU installed and configured.
1714-
For more information, see `Citrix 3D Graphics Pack <https://www.citrix.com/go/private/vgpu.html>`_.
1749+
#. Ensure that the host is ready with GPU installed and configured.
1750+
1751+
- For more information for XenServer, see `XenServer Documentation <https://docs.xenserver.com/en-us/citrix-hypervisor/graphics/hv-graphics-config>`_.
1752+
1753+
- For KVM, to configure the host see how to `discover GPU Devices on Hosts here <hosts.html#discovering-gpu-devices-on-kvm-hosts>`_.
17151754

17161755
#. Add the host to CloudStack.
17171756
CloudStack checks if the host is GPU-enabled or not. CloudStack queries the host and detect if it's GPU enabled.
17181757

17191758
#. Create a compute offering with GPU/vGPU support:
1720-
For more information, see `Creating a New Compute Offering <#creating-a-new-compute-offering>`__..
1759+
For more information, see `Creating a New Compute Offering <service_offerings.html#creating-a-new-compute-offering>`_.
17211760

17221761
#. Continue with any of the following operations:
17231762

0 commit comments

Comments
 (0)