Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ undeploy: ## Undeploy manager from the K8s cluster specified in ~/.kube/config.

.PHONY: chart
chart: manifests kustomize
cd config/manager && $(KUSTOMIZE) edit set image controller=$(shell echo ${IMG} | sed 's/\(:.*\)v/\1/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sed command is not the source of the problem reported in #453 .

bash-5.3$ echo foo:1234 | sed 's/\(:.*\)v/\1/'
foo:1234

cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
$(KUSTOMIZE) build config/default > chart/templates/operator.yaml
@cat config/samples/postcreate-hooks/openshift-crds.yaml > /tmp/hooks.yaml
@kubectl create secret generic postcreate-hooks --from-file=/tmp/hooks.yaml --dry-run=client --output=yaml > chart/templates/builtin-hooks.yaml
Expand Down
5 changes: 5 additions & 0 deletions config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ spec:
matchLabels:
control-plane: controller-manager
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
annotations:
Expand Down
158 changes: 158 additions & 0 deletions test/e2e/test-controller-image-update.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing invokes this test.

bash-5.3$ find * .github/workflows/* -type f -exec grep test-controller-image-update \{\} \; -print -exec echo \; 
bash-5.3$ 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not marked "executable".

bash-5.3$ ls -l test/e2e/test-c*
-rw-r--r--  1 mspreitz  staff  6175 Oct 21 14:24 test/e2e/test-controller-image-update.sh
-rwxr-xr-x  1 mspreitz  staff  2124 Dec 14 21:37 test/e2e/test-custom-cluster-name.sh

Copy link
Contributor

@MikeSpreitzer MikeSpreitzer Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this file in the current edition of main (commit a6acd41) fails. I have attached a log.

fail1214a.log

Oddly, that helm get manifest output includes the following:

---
# Source: kubeflex-operator/templates/operator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: manager
    app.kubernetes.io/created-by: kubeflex
    app.kubernetes.io/instance: controller-manager
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: deployment
    app.kubernetes.io/part-of: kubeflex
    control-plane: controller-manager
  name: kubeflex-controller-manager
  namespace: kubeflex-system

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! The trick is that the helm get manifest output does NOT include the kubeflex-system Namespace.

Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
#!/usr/bin/env bash
# Copyright 2024 The KubeStellar Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -x # echo so that users can understand what is happening
set -e # exit on error

SRC_DIR="$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)"
source "${SRC_DIR}/setup-shell.sh"

:
: -------------------------------------------------------------------------
: Test that controller manager image updates properly with make install-local-chart
: This test verifies the fix for issue #453 where pods weren't restarting
: with new images after running make install-local-chart
:

echo "=== Testing Controller Manager Image Update Fix ==="

# Verify kubeflex is installed and running
echo "1. Verifying kubeflex installation..."
if ! kubectl get namespace kubeflex-system >/dev/null 2>&1; then
echo "ERROR: kubeflex-system namespace not found. Please install kubeflex first."
exit 1
fi

# Get the current image before making changes
echo "2. Getting current controller manager image..."
CURRENT_IMAGE=$(kubectl get deployment kubeflex-controller-manager -n kubeflex-system -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].image}')
if [ -z "$CURRENT_IMAGE" ]; then
echo "ERROR: Could not get current image from deployment"
exit 1
fi
echo "Current image: $CURRENT_IMAGE"

# Get current pod names to verify they change
echo "3. Getting current pod names..."
CURRENT_PODS=$(kubectl get pods -n kubeflex-system -l control-plane=controller-manager -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | sort | tr '\n' ' ')
echo "Current pods (sorted): $CURRENT_PODS"

# Set a unique image tag to force a change
echo "4. Setting unique image tag for testing..."
export IMAGE_TAG="e2e-test-$(date +%s)"
echo "Using image tag: $IMAGE_TAG"

# Run the make install-local-chart command
echo "5. Running make install-local-chart..."
make install-local-chart

# Wait for the deployment to update with proper timeout and status checking
echo "6. Waiting for deployment to update..."
if ! kubectl rollout status deployment/kubeflex-controller-manager -n kubeflex-system --timeout=180s; then
echo "ERROR: Deployment rollout failed or timed out"
echo "Deployment status:"
kubectl describe deployment kubeflex-controller-manager -n kubeflex-system
echo "Pod status:"
kubectl get pods -n kubeflex-system -l control-plane=controller-manager
exit 1
fi

# Get the new image
echo "7. Getting new controller manager image..."
NEW_IMAGE=$(kubectl get deployment kubeflex-controller-manager -n kubeflex-system -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].image}')
if [ -z "$NEW_IMAGE" ]; then
echo "ERROR: Could not get new image from deployment"
exit 1
fi
echo "New image: $NEW_IMAGE"

# Get new pod names
echo "8. Getting new pod names..."
NEW_PODS=$(kubectl get pods -n kubeflex-system -l control-plane=controller-manager -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | sort | tr '\n' ' ')
echo "New pods (sorted): $NEW_PODS"

# Wait for deployment rollout to complete and all pods to be ready
echo "9. Waiting for deployment rollout to complete..."
if ! kubectl rollout status deployment/kubeflex-controller-manager -n kubeflex-system --timeout=300s; then
echo "ERROR: Deployment rollout failed or timed out"
echo "Deployment status:"
kubectl describe deployment kubeflex-controller-manager -n kubeflex-system
exit 1
fi

# Wait for all pods to be ready
echo "10. Waiting for all pods to be ready..."
if ! kubectl wait --for=condition=Ready pods -l control-plane=controller-manager -n kubeflex-system --timeout=120s; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested commit 81bb206 --- the one created by merging this PR into main --- hacked to run this test as part of the E2E suite. This test failed here because it waited on a Pod from the old Deployment.

10. Waiting for all pods to be ready...
+ kubectl wait --for=condition=Ready pods -l control-plane=controller-manager -n kubeflex-system --timeout=120s
pod/kubeflex-controller-manager-57d5b6b7cb-x7mpp condition met
error: timed out waiting for the condition on pods/kubeflex-controller-manager-656c9ddc78-2xmc5

I have attached the full log.

fail1215a.log

echo "ERROR: Not all pods are ready within timeout"
echo "Pod status:"
kubectl get pods -n kubeflex-system -l control-plane=controller-manager
echo "Pod events:"
kubectl describe pods -n kubeflex-system -l control-plane=controller-manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not show all the events about Pods. In particular, it does not show events about Pods that no longer exist when this command starts to run.

exit 1
fi

# Verify the fix worked
echo "11. Verifying the fix..."

# Check if image changed
if [ "$CURRENT_IMAGE" = "$NEW_IMAGE" ]; then
echo " FAILURE: Image did not change"
echo " Current: $CURRENT_IMAGE"
echo " New: $NEW_IMAGE"
exit 1
fi

# Check if pods changed (new pods created)
if [ "$CURRENT_PODS" = "$NEW_PODS" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing pod names as space-separated strings might not reliably detect pod changes if pod ordering changes or there are extra pods. Please do comparison with sorted lists or a more robust set comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @pdettori will use a set approach for this !! Thx for pointing out !

echo " FAILURE: Pods did not change (no new pods created)"
echo " Current pods (sorted): $CURRENT_PODS"
echo " New pods (sorted): $NEW_PODS"
exit 1
fi

# Check if new image contains our test tag
if [[ "$NEW_IMAGE" != *"$IMAGE_TAG"* ]]; then
echo " FAILURE: New image does not contain expected tag"
echo " Expected tag: $IMAGE_TAG"
echo " New image: $NEW_IMAGE"
exit 1
fi

# Check if all pods are running
POD_STATUS=$(kubectl get pods -n kubeflex-system -l control-plane=controller-manager -o jsonpath='{.items[*].status.phase}')
if [[ "$POD_STATUS" != *"Running"* ]]; then
echo " FAILURE: Pods are not in Running state"
echo " Pod status: $POD_STATUS"
exit 1
fi

# Verify deployment is available
echo "12. Verifying deployment is available..."
if ! kubectl wait --for=condition=Available deployment/kubeflex-controller-manager -n kubeflex-system --timeout=60s; then
echo " FAILURE: Deployment is not available"
kubectl describe deployment kubeflex-controller-manager -n kubeflex-system
exit 1
fi

echo " SUCCESS: Controller manager image update test passed!"
echo " Image changed from '$CURRENT_IMAGE' to '$NEW_IMAGE'"
echo " New pods created: $NEW_PODS"
echo " All pods are running"
echo " Deployment is available"

:
: -------------------------------------------------------------------------
: SUCCESS: Controller manager image updates properly with make install-local-chart
: