-
Notifications
You must be signed in to change notification settings - Fork 53
🐛 fix: resolve controller manager image update issue #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing invokes this test. bash-5.3$ find * .github/workflows/* -type f -exec grep test-controller-image-update \{\} \; -print -exec echo \;
bash-5.3$
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file is not marked "executable". bash-5.3$ ls -l test/e2e/test-c*
-rw-r--r-- 1 mspreitz staff 6175 Oct 21 14:24 test/e2e/test-controller-image-update.sh
-rwxr-xr-x 1 mspreitz staff 2124 Dec 14 21:37 test/e2e/test-custom-cluster-name.sh
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Testing this file in the current edition of Oddly, that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aha! The trick is that the |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,158 @@ | ||
| #!/usr/bin/env bash | ||
| # Copyright 2024 The KubeStellar Authors. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| set -x # echo so that users can understand what is happening | ||
| set -e # exit on error | ||
|
|
||
| SRC_DIR="$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)" | ||
| source "${SRC_DIR}/setup-shell.sh" | ||
|
|
||
| : | ||
| : ------------------------------------------------------------------------- | ||
| : Test that controller manager image updates properly with make install-local-chart | ||
| : This test verifies the fix for issue #453 where pods weren't restarting | ||
| : with new images after running make install-local-chart | ||
| : | ||
|
|
||
| echo "=== Testing Controller Manager Image Update Fix ===" | ||
|
|
||
| # Verify kubeflex is installed and running | ||
| echo "1. Verifying kubeflex installation..." | ||
| if ! kubectl get namespace kubeflex-system >/dev/null 2>&1; then | ||
| echo "ERROR: kubeflex-system namespace not found. Please install kubeflex first." | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Get the current image before making changes | ||
| echo "2. Getting current controller manager image..." | ||
| CURRENT_IMAGE=$(kubectl get deployment kubeflex-controller-manager -n kubeflex-system -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].image}') | ||
| if [ -z "$CURRENT_IMAGE" ]; then | ||
| echo "ERROR: Could not get current image from deployment" | ||
| exit 1 | ||
| fi | ||
| echo "Current image: $CURRENT_IMAGE" | ||
|
|
||
| # Get current pod names to verify they change | ||
| echo "3. Getting current pod names..." | ||
| CURRENT_PODS=$(kubectl get pods -n kubeflex-system -l control-plane=controller-manager -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | sort | tr '\n' ' ') | ||
| echo "Current pods (sorted): $CURRENT_PODS" | ||
|
|
||
| # Set a unique image tag to force a change | ||
| echo "4. Setting unique image tag for testing..." | ||
| export IMAGE_TAG="e2e-test-$(date +%s)" | ||
| echo "Using image tag: $IMAGE_TAG" | ||
|
|
||
| # Run the make install-local-chart command | ||
| echo "5. Running make install-local-chart..." | ||
| make install-local-chart | ||
|
|
||
| # Wait for the deployment to update with proper timeout and status checking | ||
| echo "6. Waiting for deployment to update..." | ||
| if ! kubectl rollout status deployment/kubeflex-controller-manager -n kubeflex-system --timeout=180s; then | ||
| echo "ERROR: Deployment rollout failed or timed out" | ||
| echo "Deployment status:" | ||
| kubectl describe deployment kubeflex-controller-manager -n kubeflex-system | ||
| echo "Pod status:" | ||
| kubectl get pods -n kubeflex-system -l control-plane=controller-manager | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Get the new image | ||
| echo "7. Getting new controller manager image..." | ||
| NEW_IMAGE=$(kubectl get deployment kubeflex-controller-manager -n kubeflex-system -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].image}') | ||
| if [ -z "$NEW_IMAGE" ]; then | ||
| echo "ERROR: Could not get new image from deployment" | ||
| exit 1 | ||
| fi | ||
| echo "New image: $NEW_IMAGE" | ||
|
|
||
| # Get new pod names | ||
| echo "8. Getting new pod names..." | ||
| NEW_PODS=$(kubectl get pods -n kubeflex-system -l control-plane=controller-manager -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | sort | tr '\n' ' ') | ||
| echo "New pods (sorted): $NEW_PODS" | ||
|
|
||
| # Wait for deployment rollout to complete and all pods to be ready | ||
| echo "9. Waiting for deployment rollout to complete..." | ||
| if ! kubectl rollout status deployment/kubeflex-controller-manager -n kubeflex-system --timeout=300s; then | ||
| echo "ERROR: Deployment rollout failed or timed out" | ||
| echo "Deployment status:" | ||
| kubectl describe deployment kubeflex-controller-manager -n kubeflex-system | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Wait for all pods to be ready | ||
| echo "10. Waiting for all pods to be ready..." | ||
| if ! kubectl wait --for=condition=Ready pods -l control-plane=controller-manager -n kubeflex-system --timeout=120s; then | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tested commit 81bb206 --- the one created by merging this PR into I have attached the full log. |
||
| echo "ERROR: Not all pods are ready within timeout" | ||
| echo "Pod status:" | ||
| kubectl get pods -n kubeflex-system -l control-plane=controller-manager | ||
| echo "Pod events:" | ||
| kubectl describe pods -n kubeflex-system -l control-plane=controller-manager | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does not show all the events about Pods. In particular, it does not show events about Pods that no longer exist when this command starts to run. |
||
| exit 1 | ||
| fi | ||
|
|
||
| # Verify the fix worked | ||
| echo "11. Verifying the fix..." | ||
|
|
||
| # Check if image changed | ||
| if [ "$CURRENT_IMAGE" = "$NEW_IMAGE" ]; then | ||
| echo " FAILURE: Image did not change" | ||
| echo " Current: $CURRENT_IMAGE" | ||
| echo " New: $NEW_IMAGE" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check if pods changed (new pods created) | ||
| if [ "$CURRENT_PODS" = "$NEW_PODS" ]; then | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comparing pod names as space-separated strings might not reliably detect pod changes if pod ordering changes or there are extra pods. Please do comparison with sorted lists or a more robust set comparison.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure @pdettori will use a set approach for this !! Thx for pointing out ! |
||
| echo " FAILURE: Pods did not change (no new pods created)" | ||
| echo " Current pods (sorted): $CURRENT_PODS" | ||
| echo " New pods (sorted): $NEW_PODS" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check if new image contains our test tag | ||
| if [[ "$NEW_IMAGE" != *"$IMAGE_TAG"* ]]; then | ||
| echo " FAILURE: New image does not contain expected tag" | ||
| echo " Expected tag: $IMAGE_TAG" | ||
| echo " New image: $NEW_IMAGE" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Check if all pods are running | ||
| POD_STATUS=$(kubectl get pods -n kubeflex-system -l control-plane=controller-manager -o jsonpath='{.items[*].status.phase}') | ||
| if [[ "$POD_STATUS" != *"Running"* ]]; then | ||
| echo " FAILURE: Pods are not in Running state" | ||
| echo " Pod status: $POD_STATUS" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Verify deployment is available | ||
| echo "12. Verifying deployment is available..." | ||
| if ! kubectl wait --for=condition=Available deployment/kubeflex-controller-manager -n kubeflex-system --timeout=60s; then | ||
| echo " FAILURE: Deployment is not available" | ||
| kubectl describe deployment kubeflex-controller-manager -n kubeflex-system | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo " SUCCESS: Controller manager image update test passed!" | ||
| echo " Image changed from '$CURRENT_IMAGE' to '$NEW_IMAGE'" | ||
| echo " New pods created: $NEW_PODS" | ||
| echo " All pods are running" | ||
| echo " Deployment is available" | ||
|
|
||
| : | ||
| : ------------------------------------------------------------------------- | ||
| : SUCCESS: Controller manager image updates properly with make install-local-chart | ||
| : | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This
sedcommand is not the source of the problem reported in #453 .