Skip to content

fix: resolve ND H100 & ND H200 loading timeout (#7584)

455fc4a
Select commit
Loading
Failed to load commit list.
Open

fix: resolve ND H100 & ND H200 loading timeout (#7584) #7594

fix: resolve ND H100 & ND H200 loading timeout (#7584)
455fc4a
Select commit
Loading
Failed to load commit list.
Azure Pipelines / AKS Linux VHD Build - PR check-in gate failed Dec 23, 2025 in 1h 15m 32s

Build #20251223.22_merge_147558809 had test failures

Details

Tests

  • Failed: 2 (2.99%)
  • Passed: 65 (97.01%)
  • Other: 0 (0.00%)
  • Total: 67

Annotations

Check failure on line 127 in Build log

See this annotation in the file changed.

@azure-pipelines azure-pipelines / AKS Linux VHD Build - PR check-in gate

Build log #L127

Script failed with exit code: 2

Check failure on line 127 in Build log

See this annotation in the file changed.

@azure-pipelines azure-pipelines / AKS Linux VHD Build - PR check-in gate

Build log #L127

Script failed with exit code: 2

Check failure on line 127 in Build log

See this annotation in the file changed.

@azure-pipelines azure-pipelines / AKS Linux VHD Build - PR check-in gate

Build log #L127

Script failed with exit code: 2

Check failure on line 127 in Build log

See this annotation in the file changed.

@azure-pipelines azure-pipelines / AKS Linux VHD Build - PR check-in gate

Build log #L127

Script failed with exit code: 2

Check failure on line 1 in Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations

See this annotation in the file changed.

@azure-pipelines azure-pipelines / AKS Linux VHD Build - PR check-in gate

Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations

Failed
Raw output
=== RUN   Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
=== PAUSE Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
=== CONT  Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations
    test_helpers.go:322: TAGS {Name:Test_Ubuntu2204_ChronyRestarts_Taints_And_Tolerations ImageName:2204gen2containerd OS:ubuntu Arch:amd64 Airgap:false NonAnonymousACR:false GPU:false WASM:false BootstrapTokenFallback:false KubeletCustomConfig:false Scriptless:false VHDCaching:false}
    vmss.go:173: creating VMSS "zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati" with BootstrapConfigMutator/NBC in resource group MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3
    vmss.go:182: VMSS portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati/overview
    vmss.go:188: Managed cluster portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v2-b7c70/overview
    vmss.go:285: VM will be automatically deleted after the test finishes, to preserve it for debugging purposes set KEEP_VMSS=true or pause the test with a breakpoint before the test finishes or failed
    vmss.go:285: SSH Instructions: (may take a few minutes for the VM to be ready for SSH)
        ========================
        kubectl --kubeconfig <(az aks get-credentials --subscription "8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8" --resource-group "abe2e-westus3"  --name "abe2e-kubenet-v2-b7c70" -f -) exec -it debug-mariner-tolerated-z5fgw -- bash -c "chroot /proc/1/root /bin/bash -c 'ssh -i sshkey10224075 -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=5 [email protected]'"
    vmss.go:330: VM reached running state
    vmss.go:235: created VMSS zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati in resource group MC_abe2e-westus3_abe2e-kubenet-v2-b7c70_westus3
    kube.go:141: waiting for node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati to be ready in k8s API
    kube.go:177: node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati000000 is ready. Taints: [{"key":"testkey1","value":"value1","effect":"NoSchedule"},{"key":"testkey2","value":"value2","effect":"NoSchedule"},{"key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true","effect":"NoSchedule"}] Conditions: [{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:25Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:25Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:25Z","reason":"KubeletHasSufficientPID","message":"kubelet has sufficient PID available"},{"type":"Ready","status":"True","lastHeartbeatTime":"2025-12-23T20:16:26Z","lastTransitionTime":"2025-12-23T20:16:26Z","reason":"KubeletReady","message":"kubelet is posting ready status"}]
    kube.go:143: waited for node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati to be ready in k8s API for 276.079811ms
    strings.go:37: Node zkzo-2025-12-23-ubuntu2204chronyrestartstaintsandtolerati took 1m to be created and 0s to be ready
    test_helpers.go:223: Choosing the private ACR "privateacre2ewestus3" for the vm validation
    test_helpers.go:735: SSH connectivity to 10.224.0.75 verified successfully
    validators.go:1647: truncated pod name "zkzo-2