Skip to content

Conversation

umfranci
Copy link
Collaborator

Updates to the verify_max_gpu_provision test to dynamically validate GPU count based on actual node capabilities instead of hardcoding it to 8 GPUs. This fixes test failures for VMs with less than 8 GPUs and cases where container policies don't provide the required information.

  • Changed test to dynamically use node.capability.gpu_count
  • Minimum requirement reduced from 8 to 2 GPUs to test "multiple GPU" scenarios
  • Test now adapts to actual hardware capabilities rather than fixed assumptions

)
def verify_max_gpu_provision(self, node: Node, log: Logger) -> None:
_gpu_provision_check(8, node, log)
actual_gpu_count = node.capability.gpu_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is meant to validate a VM with the maximum GPU configuration. If it's modified as shown, it won't perform the verification correctly. If VM sizes with 8 GPUs are unavailable but still get scheduled, please check for a bug in the VM size capability calculation. If the GPU information is missing in some policy, please set it to 0 to skip this test case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks @squirrelsc - will try to check more on this but it seems if there is no capability returned for a VM from Azure, it assumes the one provided in the requirement and proceeds with the test.

In either case, can we have a check at the start of the test to skip the execution if actual GPU count is less than 8?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the request. If there's no capacity with 8 GPUs, the test case will be skipped—is that the check you're referring to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants