Skip to content

Conversation

@kaessert
Copy link

@kaessert kaessert commented Nov 20, 2025

Description of your changes

Implement health check system for standard K8s resources that complements the existing Ready condition checking. Standard Kubernetes resources now use resource-specific health logic before falling back to Ready conditions.

Supported resources:

  • Deployment: checks replica counts and Available condition
  • StatefulSet: checks replicas and revision matching
  • DaemonSet: checks pod scheduling and availability
  • Service: checks LoadBalancer ingress assignment
  • Ingress: checks load balancer assignment
  • Secret, ConfigMap, ServiceAccount: ready if exists

Uses registry pattern for extensibility. All other resources (Crossplane managed resources, custom resources) continue using Ready condition checks.

I have:

What i did for additional testing is deployed https://github.com/upbound/configuration-app-model with following change:

 _metadata = lambda name: str -> any {
     { annotations = {
-        "krm.kcl.dev/composition-resource-name" = name } | ({
-            "krm.kcl.dev/ready" = "True"
-        } if _isResourceReady(name) else {})
+        "krm.kcl.dev/composition-resource-name" = name }
     }
 }

Then i deployed the example: kubectl apply -f examples/app/example.yaml

and observed everything to be healthy:

$> crossplane beta trace app example
NAME                               SYNCED   READY   STATUS
App/example (default)              True     True    Available
├─ Deployment/app1 (default)       -        -
├─ ServiceAccount/app1 (default)   -        -
└─ Service/app1 (default)          -        -

I additionally rendered the example i added with the function in dev mode and saw the XR coming up as ready there as well.

To confirm everything works with only standard crossplane resource i also ran e2e tests for https://github.com/upbound/configuration-aws-network by switching out the function with my custom one:

-    package: xpkg.upbound.io/crossplane-contrib/function-auto-ready
-    version: '>=v0.0.0'
+    package: xpkg.upbound.io/solutions/function-auto-ready
+    version: v0.0.0-tk-v2

and observed e2e tests going green:

$> up test run --e2e tests/e2etest-network/
  ✓   Parsing tests
  ✓   Collecting resources
  ✓   Collecting resources
  ✓   Generating language schemas
  ✓   Checking dependencies
  ✓   Building functions
  ✓   Building configuration package
  ✓   Creating development control plane in Spaces
  ✓   Ensuring repository exists
  ✓   Pushing function package xpkg.upbound.io/solutions/configuration-aws-network_network
  ✓   Pushing configuration image xpkg.upbound.io/solutions/configuration-aws-network:v0.0.0-1763648947
  ✓   Installing package on development control plane
 ▄ Waiting for package to be ready (1s)
 ....
2025/11/20 15:30:17 Skipping update step because the root resource does not exist
2025/11/20 15:30:17 Skipping update step because the skip-delete option is set to true
2025/11/20 15:30:17 Skipping import step because the skip-import option is set to true
2025/11/20 15:30:17 Written test files: /var/folders/h4/sh6rlg_53s17_cjkfp90v3r00000gn/T/e2e-test-network4223355159
2025/11/20 15:30:17 Running chainsaw tests at /var/folders/h4/sh6rlg_53s17_cjkfp90v3r00000gn/T/e2e-test-network4223355159
2025/11/20 15:30:17 Loading default configuration...
2025/11/20 15:30:17 - Using test file: 00-apply.yaml
2025/11/20 15:30:17 - ApplyTimeout 5s
2025/11/20 15:30:17 - AssertTimeout 30s
2025/11/20 15:30:17 - CleanupTimeout 30s
2025/11/20 15:30:17 - DeleteTimeout 15s
2025/11/20 15:30:17 - ErrorTimeout 30s
2025/11/20 15:30:17 - ExecTimeout 5s
2025/11/20 15:30:17 - Parallel 1
2025/11/20 15:30:17 Loading tests...
2025/11/20 15:30:17 - apply (/var/folders/h4/sh6rlg_53s17_cjkfp90v3r00000gn/T/e2e-test-network4223355159/case)
2025/11/20 15:30:17 Running tests...
=== RUN   chainsaw
=== PAUSE chainsaw
=== CONT  chainsaw
=== RUN   chainsaw/apply
=== PAUSE chainsaw/apply
=== CONT  chainsaw/apply
    | 15:30:18 | apply | @chainsaw                | CREATE    | OK    | v1/Namespace @ chainsaw-included-pelican
    | 15:30:18 | apply | Apply Resources          | TRY       | BEGIN |
    | 15:30:18 | apply | Apply Resources          | CMD       | RUN   |
        === COMMAND
        /bin/sh -c echo "Checking webhook health before proceeding..."
        curl -sL https://raw.githubusercontent.com/crossplane/uptest/main/hack/check_endpoints.sh -o /tmp/check_endpoints.sh && chmod +x /tmp/check_endpoints.sh
        /tmp/check_endpoints.sh
    | 15:30:20 | apply | Apply Resources          | SCRIPT    | LOG   |
        === STDOUT
        Checking webhook health before proceeding...
        === STDERR
        Error from server (Forbidden): endpointslices.discovery.k8s.io is forbidden: User "upbound:user:tobias.kasser.upbound.io" cannot list resource "endpointslices" in API group "discovery.k8s.io" in the namespace "default"
    | 15:30:20 | apply | Apply Resources          | SCRIPT    | DONE  |
    | 15:30:20 | apply | Apply Resources          | SLEEP     | RUN   |
    | 15:30:30 | apply | Apply Resources          | SLEEP     | DONE  |
    | 15:30:30 | apply | Apply Resources          | APPLY     | RUN   | aws.platform.upbound.io/v1alpha1/Network @ default/configuration-aws-network
    | 15:30:31 | apply | Apply Resources          | CREATE    | OK    | aws.platform.upbound.io/v1alpha1/Network @ default/configuration-aws-network
    | 15:30:31 | apply | Apply Resources          | APPLY     | DONE  | aws.platform.upbound.io/v1alpha1/Network @ default/configuration-aws-network
    | 15:30:31 | apply | Apply Resources          | CMD       | RUN   |
        === COMMAND
        /bin/sh -c echo "Running annotation script with retry logic"
        retry_annotate() {
          local max_attempts=10
          local delay=5
          local attempt=1
          local cmd="$1"

          while [ $attempt -le $max_attempts ]; do
            echo "Annotation attempt $attempt/$max_attempts for: $cmd"
            if eval "$cmd"; then
              echo "Annotation successful on attempt $attempt"
              return 0
            else
              echo "Annotation failed on attempt $attempt"
              if [ $attempt -lt $max_attempts ]; then
                echo "Retrying in ${delay}s..."
                sleep $delay
              fi
              ((attempt++))
            fi
          done
          echo "Annotation failed after $max_attempts attempts"
          return 1
        }
        retry_annotate "${KUBECTL} annotate --namespace default  network.aws.platform.upbound.io/configuration-aws-network upjet.upbound.io/test=true --overwrite"
    | 15:30:34 | apply | Apply Resources          | SCRIPT    | LOG   |
        === STDOUT
        Running annotation script with retry logic
        Annotation attempt 1/10 for: /usr/local/bin/kubectl annotate --namespace default  network.aws.platform.upbound.io/configuration-aws-network upjet.upbound.io/test=true --overwrite
        network.aws.platform.upbound.io/configuration-aws-network annotated
        Annotation successful on attempt 1
    | 15:30:34 | apply | Apply Resources          | SCRIPT    | DONE  |
    | 15:30:34 | apply | Apply Resources          | TRY       | END   |
    | 15:30:34 | apply | Assert Status Conditions | TRY       | BEGIN |
    | 15:30:34 | apply | Assert Status Conditions | ASSERT    | RUN   | aws.platform.upbound.io/v1alpha1/Network @ default/configuration-aws-network
NAME                                                                            RESOURCE                                     SYNCED   READY   STATUS
Network/configuration-aws-network (default)                                                                                  True     False   Creating: Unready resources: igw, mrt, route, and 13 more
├─ InternetGateway/configuration-aws-network-178a43130a66 (default)             igw                                          False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.VPCID: referenced field was empty (referenced resource may not yet be ready)
├─ MainRouteTableAssociation/configuration-aws-network-4cc5519bb7a7 (default)   mrt                                          False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.RouteTableID: referenced field was empty (referenced resource may not yet be ready)
├─ RouteTableAssociation/configuration-aws-network-036850f2e33e (default)       rta-us-west-2a-192-168-128-0-18-private      False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.RouteTableID: referenced field was empty (referenced resource may not yet be ready)
├─ RouteTableAssociation/configuration-aws-network-0f379ee397c9 (default)       rta-us-west-2b-192-168-192-0-18-private      False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.RouteTableID: referenced field was empty (referenced resource may not yet be ready)
├─ RouteTableAssociation/configuration-aws-network-6ace773906d0 (default)       rta-us-west-2a-192-168-0-0-18-public         False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.RouteTableID: referenced field was empty (referenced resource may not yet be ready)
├─ RouteTableAssociation/configuration-aws-network-ead624ac9a0d (default)       rta-us-west-2b-192-168-64-0-18-public        False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.RouteTableID: referenced field was empty (referenced resource may not yet be ready)
├─ RouteTable/configuration-aws-network-638f7808b124 (default)                  rt                                           False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.VPCID: referenced field was empty (referenced resource may not yet be ready)
├─ Route/configuration-aws-network-e12df2eea278 (default)                       route                                        False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.GatewayID: referenced field was empty (referenced resource may not yet be ready)
├─ SecurityGroupRule/configuration-aws-network-6c73eca6552d (default)           sgr-mysql                                    False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.SecurityGroupID: referenced field was empty (referenced resource may not yet be ready)
├─ SecurityGroupRule/configuration-aws-network-f43679bd68a6 (default)           sgr-postgres                                 False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.SecurityGroupID: referenced field was empty (referenced resource may not yet be ready)
├─ SecurityGroup/configuration-aws-network-9e60e68ade13 (default)               sg                                           False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.VPCID: referenced field was empty (referenced resource may not yet be ready)
├─ Subnet/configuration-aws-network-07a1f2798d67 (default)                      subnet-us-west-2a-192-168-128-0-18-private   False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.VPCID: referenced field was empty (referenced resource may not yet be ready)
├─ Subnet/configuration-aws-network-520ded92e531 (default)                      subnet-us-west-2b-192-168-192-0-18-private   False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.VPCID: referenced field was empty (referenced resource may not yet be ready)
├─ Subnet/configuration-aws-network-521e449cc233 (default)                      subnet-us-west-2b-192-168-64-0-18-public     False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.VPCID: referenced field was empty (referenced resource may not yet be ready)
├─ Subnet/configuration-aws-network-dbb334d3d9fc (default)                      subnet-us-west-2a-192-168-0-0-18-public      False    -       ReconcileError: cannot resolve references: mg.Spec.ForProvider.VPCID: referenced field was empty (referenced resource may not yet be ready)
└─ VPC/configuration-aws-network-651da1fd9c11 (default)                         vpc                                          True     False   Creating
...
NAME                                                                            RESOURCE                                     SYNCED   READY   STATUS
Network/configuration-aws-network (default)                                                                                  True     False   Creating: Unready resources: rta-us-west-2a-192-168-0-0-18-public
├─ InternetGateway/configuration-aws-network-178a43130a66 (default)             igw                                          True     True    Available
├─ MainRouteTableAssociation/configuration-aws-network-4cc5519bb7a7 (default)   mrt                                          True     True    Available
├─ RouteTableAssociation/configuration-aws-network-036850f2e33e (default)       rta-us-west-2a-192-168-128-0-18-private      True     True    Available
├─ RouteTableAssociation/configuration-aws-network-0f379ee397c9 (default)       rta-us-west-2b-192-168-192-0-18-private      True     True    Available
├─ RouteTableAssociation/configuration-aws-network-6ace773906d0 (default)       rta-us-west-2a-192-168-0-0-18-public         True     True    Available
├─ RouteTableAssociation/configuration-aws-network-ead624ac9a0d (default)       rta-us-west-2b-192-168-64-0-18-public        True     True    Available
├─ RouteTable/configuration-aws-network-638f7808b124 (default)                  rt                                           True     True    Available
├─ Route/configuration-aws-network-e12df2eea278 (default)                       route                                        True     True    Available
├─ SecurityGroupRule/configuration-aws-network-6c73eca6552d (default)           sgr-mysql                                    True     True    Available
├─ SecurityGroupRule/configuration-aws-network-f43679bd68a6 (default)           sgr-postgres                                 True     True    Available
├─ SecurityGroup/configuration-aws-network-9e60e68ade13 (default)               sg                                           True     True    Available
├─ Subnet/configuration-aws-network-07a1f2798d67 (default)                      subnet-us-west-2a-192-168-128-0-18-private   True     True    Available
├─ Subnet/configuration-aws-network-520ded92e531 (default)                      subnet-us-west-2b-192-168-192-0-18-private   True     True    Available
├─ Subnet/configuration-aws-network-521e449cc233 (default)                      subnet-us-west-2b-192-168-64-0-18-public     True     True    Available
├─ Subnet/configuration-aws-network-dbb334d3d9fc (default)                      subnet-us-west-2a-192-168-0-0-18-public      True     True    Available
└─ VPC/configuration-aws-network-651da1fd9c11 (default)                         vpc                                          True     True    Available
    | 15:33:09 | apply | Assert Status Conditions | ASSERT    | DONE  | aws.platform.upbound.io/v1alpha1/Network @ default/configuration-aws-network
    | 15:33:09 | apply | Assert Status Conditions | TRY       | END   |
    | 15:33:09 | apply | @chainsaw                | CLEANUP   | SKIP  |
=== NAME  chainsaw
    | 15:33:09 | chainsaw | @chainsaw | CLEANUP   | SKIP  |
--- PASS: chainsaw (0.00s)
    --- PASS: chainsaw/apply (172.06s)
PASS
2025/11/20 15:33:09 Tests Summary:
2025/11/20 15:33:09 - Passed: 1
2025/11/20 15:33:09 - Failed: 0
2025/11/20 15:33:09 - Skipped: 0
2025/11/20 15:33:09 Skipping test 01-update.yaml
2025/11/20 15:33:09 Skipping test 02-import.yaml
2025/11/20 15:33:09 Loading default configuration...
2025/11/20 15:33:09 - Using test file: 03-delete.yaml
2025/11/20 15:33:09 - ApplyTimeout 5s
2025/11/20 15:33:09 - AssertTimeout 30s
2025/11/20 15:33:09 - CleanupTimeout 30s
2025/11/20 15:33:09 - DeleteTimeout 15s
2025/11/20 15:33:09 - ErrorTimeout 30s
2025/11/20 15:33:09 - ExecTimeout 5s
2025/11/20 15:33:09 - Parallel 1
2025/11/20 15:33:09 Loading tests...
2025/11/20 15:33:09 - delete (/var/folders/h4/sh6rlg_53s17_cjkfp90v3r00000gn/T/e2e-test-network4223355159/case)
2025/11/20 15:33:09 Running tests...
=== RUN   chainsaw
=== PAUSE chainsaw
=== CONT  chainsaw
=== RUN   chainsaw/delete
=== PAUSE chainsaw/delete
=== CONT  chainsaw/delete
    | 15:33:10 | delete | @chainsaw        | CREATE    | OK    | v1/Namespace @ chainsaw-next-macaque
    | 15:33:10 | delete | Delete Resources | TRY       | BEGIN |
    | 15:33:10 | delete | Delete Resources | CMD       | RUN   |
        === COMMAND
        /bin/sh -c retry_kubectl() {
          local max_attempts=10
          local delay=5
          local attempt=1
          local cmd="$1"

          while [ $attempt -le $max_attempts ]; do
            echo "Kubectl attempt $attempt/$max_attempts for: $cmd"
            if eval "$cmd"; then
              echo "Kubectl operation successful on attempt $attempt"
              return 0
            else
              echo "Kubectl operation failed on attempt $attempt"
              if [ $attempt -lt $max_attempts ]; then
                echo "Retrying in ${delay}s..."
                sleep $delay
              fi
              ((attempt++))
            fi
          done
          echo "Kubectl operation failed after $max_attempts attempts"
          return 1
        }
        retry_kubectl "${KUBECTL} delete network.aws.platform.upbound.io/configuration-aws-network --wait=false --namespace default --ignore-not-found"
    | 15:33:12 | delete | Delete Resources | SCRIPT    | LOG   |
        === STDOUT
        Kubectl attempt 1/10 for: /usr/local/bin/kubectl delete network.aws.platform.upbound.io/configuration-aws-network --wait=false --namespace default --ignore-not-found
        network.aws.platform.upbound.io "configuration-aws-network" deleted
        Kubectl operation successful on attempt 1
    | 15:33:12 | delete | Delete Resources | SCRIPT    | DONE  |
    | 15:33:12 | delete | Delete Resources | TRY       | END   |
    | 15:33:12 | delete | Assert Deletion  | TRY       | BEGIN |
    | 15:33:12 | delete | Assert Deletion  | CMD       | RUN   |
        === COMMAND
        /bin/sh -c ${KUBECTL} wait --namespace default --for=delete network.aws.platform.upbound.io/configuration-aws-network --timeout 1h15m0s
    | 15:33:13 | delete | Assert Deletion  | SCRIPT    | DONE  |
    | 15:33:13 | delete | Assert Deletion  | TRY       | END   |
    | 15:33:13 | delete | @chainsaw        | CLEANUP   | SKIP  |
  ✓   Collecting resources
  ✓   Generating language schemas
  ✓   Checking dependencies
  ✓   Building functions
  ✓   Building configuration package
  ✓   Creating development control plane in Spaces
  ✓   Ensuring repository exists
  ✓   Pushing function package xpkg.upbound.io/solutions/configuration-aws-network_network
  ✓   Pushing configuration image xpkg.upbound.io/solutions/configuration-aws-network:v0.0.0-1763648947
  ✓   Installing package on development control plane
  ✓   Waiting for package to be ready
  ✓   Finding test resources
Cleanup summary: 0 deleted, 0 remaining after 0 attempts
Tearing down test control plane...

Test control plane deleted

SUCCESS:
SUCCESS: Tests Summary:
SUCCESS: ------------------
SUCCESS: Total Tests Executed: 1
SUCCESS: Passed tests:         1
SUCCESS: Failed tests:         0

Removed some of the logs due to limitations and brevity

Implement health check system for standard K8s resources that complements
the existing Ready condition checking. Standard Kubernetes resources now
use resource-specific health logic before falling back to Ready conditions.

Supported resources:
- Deployment: checks replica counts and Available condition
- StatefulSet: checks replicas and revision matching
- DaemonSet: checks pod scheduling and availability
- Service: checks LoadBalancer ingress assignment
- Ingress: checks load balancer assignment
- Secret, ConfigMap, ServiceAccount: ready if exists

Uses registry pattern for extensibility. All other resources (Crossplane
managed resources, custom resources) continue using Ready condition checks.

Signed-off-by: Tobias Kässer <[email protected]>
@kaessert kaessert force-pushed the tk/v2-standard-kube-resources branch from 58cd1f2 to 50bf50e Compare November 20, 2025 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant