-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Feature: Add support for GPU with KVM hosts #11143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables GPU support for KVM hosts by updating both backend utilities and the compute offering UI to attach and configure GPU cards and vGPU profiles.
- Removed a stray comment in the script utility header.
- Updated
AddComputeOffering.vue
to let users select GPU cards, vGPU profiles, GPU count, and display options.
Reviewed Changes
Copilot reviewed 153 out of 153 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
utils/src/main/java/com/cloud/utils/script/Script.java | Removed an extraneous comment line at the top of the file. |
ui/src/views/offering/AddComputeOffering.vue | Renamed form fields for GPU card and profile selection, added count/display controls and data-fetch methods. |
Comments suppressed due to low confidence (1)
ui/src/views/offering/AddComputeOffering.vue:262
- The form field name 'vgpuprofile' may conflict with the API parameter 'vgpuprofileid'. Consider renaming it to 'vgpuprofileid' to maintain consistency and avoid confusion when mapping form values to request parameters.
<a-form-item name="vgpuprofile" ref="vgpuprofile" :label="$t('label.vgpu.profile')" v-if="!isSystem && form.gpucardid && vgpuProfiles.length > 0">
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #11143 +/- ##
============================================
+ Coverage 17.00% 17.17% +0.16%
- Complexity 14727 14983 +256
============================================
Files 5832 5869 +37
Lines 517620 521513 +3893
Branches 62996 63474 +478
============================================
+ Hits 88008 89553 +1545
- Misses 419673 421894 +2221
- Partials 9939 10066 +127
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Great initiative @vishesh92; do you have any spec or documentation about it? |
I am still working on it. |
f6945ef
to
9bc8518
Compare
@blueorangutan package |
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Just to clarify, you have the spec/documentation and are working on the PR or you still do not have it and will create it? |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14034 |
@blueorangutan package |
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 14410 |
@blueorangutan package |
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 14414 |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14415 |
@blueorangutan test |
@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-13937)
|
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
1 similar comment
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
@blueorangutan package |
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14435 |
[SF] Trillian test result (tid-13924)
|
This PR allows attaching of GPU devices via PCI, mdev or VF to an Instance for KVM. It allows the operator to discover the GPU devices on the KVM host and create a Compute Offering with GPU support based on the available GPU devices on the host. Once the operator has created the Compute offering, it can be used by users to launch Instances with GPU devices.
Description
This PR allows attaching of GPU devices via PCI, mdev or VF to an Instance for KVM.
CWiki Design doc: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Support+for+GPU+with+KVM+hosts
Doc PR: apache/cloudstack-documentation#526
Generated summary
This pull request introduces several changes across multiple files, focusing on enhancing GPU-related functionality, adding new properties for VM hooks, and updating resource management capabilities. The most significant updates include the addition of GPU properties and event types, the introduction of new VM shell script properties, and modifications to resource limits and types to support GPU devices.
GPU-related enhancements:
api/src/main/java/com/cloud/agent/api/VgpuTypesInfo.java
: Added new fields such asdeviceType
,busAddress
,vendorId
, andvmName
to support detailed GPU device information. Also included getter and setter methods for these fields and updated constructors to accommodate the new properties. [1] [2] [3]api/src/main/java/com/cloud/agent/api/to/GPUDeviceTO.java
: Introduced new fields likegpuCount
andgpuDevices
to manage GPU device details and added corresponding getter/setter methods. Updated constructors to handle the new fields. [1] [2] [3]api/src/main/java/com/cloud/event/EventTypes.java
: Added new GPU-related event types (EVENT_GPU_CARD_CREATE
,EVENT_VGPU_PROFILE_CREATE
, etc.) and mapped them to corresponding entities such asGpuCard
andVgpuProfile
. [1] [2]VM hook properties:
agent/src/main/java/com/cloud/agent/properties/AgentProperties.java
: Added new shell script properties (AGENT_HOOKS_LIBVIRT_VM_XML_TRANSFORMER_SHELL_SCRIPT
,AGENT_HOOKS_LIBVIRT_VM_ON_START_SHELL_SCRIPT
, etc.) for VM lifecycle hooks, enabling execution of shell scripts for VM state changes. [1] [2] [3]Resource management updates:
api/src/main/java/com/cloud/capacity/Capacity.java
: Updated GPU capacity type ID from19
to11
.api/src/main/java/com/cloud/configuration/Resource.java
: Added a new resource type for GPUs (gpu
).api/src/main/java/com/cloud/user/ResourceLimitService.java
: Introduced new configuration keys for GPU limits at the account, domain, and project levels (DefaultMaxAccountGpus
,DefaultMaxDomainGpus
, etc.). Added methods to check, increment, and decrement GPU resource limits. [1] [2]Miscellaneous updates:
.github/workflows/ci.yml
: Added a new smoke test for deploying VMs with vGPU enabled (smoke/test_deploy_vgpu_enabled_vm
).api/src/main/java/org/apache/cloudstack/api/ApiConstants.java
: Added constants for GPU-related attributes such asBUS_ADDRESS
andDEVICE_NAME
. [1] [2]Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Screenshots (if appropriate):
How Has This Been Tested?
This was tested locally on my laptop with passthrough of a consumer graphics card (NVIDIA RTX 3050). Due to unavailability of actual hardware, I wasn't able to test with vGPU profiles or mdev.
Framework level testing was done using the simulator plugin.
How did you try to break this feature and the system with this change?