fix: resolve ND H100 & ND H200 loading timeout (#7584) #7594
Build #20251223.22_merge_147558809 had test failures
Details
- Failed: 2 (2.99%)
- Passed: 65 (97.01%)
- Other: 0 (0.00%)
- Total: 67
Annotations
Check failure on line 127 in Build log
azure-pipelines / AKS Linux VHD Build - PR check-in gate
Build log #L127
Script failed with exit code: 2
Check failure on line 127 in Build log
azure-pipelines / AKS Linux VHD Build - PR check-in gate
Build log #L127
Script failed with exit code: 2
Check failure on line 127 in Build log
azure-pipelines / AKS Linux VHD Build - PR check-in gate
Build log #L127
Script failed with exit code: 2
Check failure on line 127 in Build log
azure-pipelines / AKS Linux VHD Build - PR check-in gate
Build log #L127
Script failed with exit code: 2
Check failure on line 1 in Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
azure-pipelines / AKS Linux VHD Build - PR check-in gate
Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
Failed
Raw output
=== RUN Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
=== PAUSE Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
=== CONT Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
test_helpers.go:322: TAGS {Name:Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations ImageName:2204gen2containerd OS:ubuntu Arch:amd64 Airgap:false NonAnonymousACR:false GPU:false WASM:false BootstrapTokenFallback:false KubeletCustomConfig:false Scriptless:false VHDCaching:false}
vmss.go:173: creating VMSS "zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati" with BootstrapConfigMutator/NBC in resource group MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3
vmss.go:182: VMSS portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati/overview
vmss.go:188: Managed cluster portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v2-b7c70/overview
vmss.go:285: VM will be automatically deleted after the test finishes, to preserve it for debugging purposes set KEEP_VMSS=true or pause the test with a breakpoint before the test finishes or failed
vmss.go:285: SSH Instructions: (may take a few minutes for the VM to be ready for SSH)
========================
kubectl --kubeconfig <(az aks get-credentials --subscription "8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8" --resource-group "abe2e-westus3" --name "abe2e-kubenet-v2-b7c70" -f -) exec -it debug-mariner-tolerated-z5fgw -- bash -c "chroot /proc/1/root /bin/bash -c 'ssh -i sshkey10224075 -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=5 [email protected]'"
vmss.go:330: VM reached running state
vmss.go:235: created VMSS zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati in resource group MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3
kube.go:141: waiting for node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati to be ready in k8s API
kube.go:177: node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati000000 is ready. Taints: [{"key":"testkey1","value":"value1","effect":"NoSchedule"},{"key":"testkey2","value":"value2","effect":"NoSchedule"},{"key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true","effect":"NoSchedule"}] Conditions: [{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:25Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:25Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:25Z","reason":"KubeletHasSufficientPID","message":"kubelet has sufficient PID available"},{"type":"Ready","status":"True","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:26Z","reason":"KubeletReady","message":"kubelet is posting ready status"}]
kube.go:143: waited for node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati to be ready in k8s API for 276.079811ms
strings.go:37: Node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati took 1m to be created and 0s to be ready
test_helpers.go:223: Choosing the private ACR "privateacre2ewestus3" for the vm validation
test_helpers.go:735: SSH connectivity to 10.224.0.75 verified successfully
validators.go:1647: truncated pod name "zkzo-2