-
Notifications
You must be signed in to change notification settings - Fork 52
🐛 fix: resolve controller manager image update issue #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix: resolve controller manager image update issue #519
Conversation
|
Hi @RayyanSeliya. Thanks for your PR. I'm waiting for a kubestellar member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
| fi | ||
|
|
||
| # Check if pods changed (new pods created) | ||
| if [ "$CURRENT_PODS" = "$NEW_PODS" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing pod names as space-separated strings might not reliably detect pod changes if pod ordering changes or there are extra pods. Please do comparison with sorted lists or a more robust set comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @pdettori will use a set approach for this !! Thx for pointing out !
Signed-off-by: RayyanSeliya <[email protected]>
Signed-off-by: RayyanSeliya <[email protected]>
03b1c25 to
fd557af
Compare
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 3528c0c0dae40b77f1776ff3773a4219f1002efc |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pdettori The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
This PR introduced bug #588 |
|
|
||
| .PHONY: chart | ||
| chart: manifests kustomize | ||
| cd config/manager && $(KUSTOMIZE) edit set image controller=$(shell echo ${IMG} | sed 's/\(:.*\)v/\1/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sed command is not the source of the problem reported in #453 .
bash-5.3$ echo foo:1234 | sed 's/\(:.*\)v/\1/'
foo:1234There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing invokes this test.
bash-5.3$ find * .github/workflows/* -type f -exec grep test-controller-image-update \{\} \; -print -exec echo \;
bash-5.3$ There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not marked "executable".
bash-5.3$ ls -l test/e2e/test-c*
-rw-r--r-- 1 mspreitz staff 6175 Oct 21 14:24 test/e2e/test-controller-image-update.sh
-rwxr-xr-x 1 mspreitz staff 2124 Dec 14 21:37 test/e2e/test-custom-cluster-name.shThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing this file in the current edition of main (commit a6acd41) fails. I have attached a log.
Oddly, that helm get manifest output includes the following:
---
# Source: kubeflex-operator/templates/operator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: manager
app.kubernetes.io/created-by: kubeflex
app.kubernetes.io/instance: controller-manager
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: deployment
app.kubernetes.io/part-of: kubeflex
control-plane: controller-manager
name: kubeflex-controller-manager
namespace: kubeflex-system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! The trick is that the helm get manifest output does NOT include the kubeflex-system Namespace.
| echo "Pod status:" | ||
| kubectl get pods -n kubeflex-system -l control-plane=controller-manager | ||
| echo "Pod events:" | ||
| kubectl describe pods -n kubeflex-system -l control-plane=controller-manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not show all the events about Pods. In particular, it does not show events about Pods that no longer exist when this command starts to run.
|
|
||
| # Wait for all pods to be ready | ||
| echo "10. Waiting for all pods to be ready..." | ||
| if ! kubectl wait --for=condition=Ready pods -l control-plane=controller-manager -n kubeflex-system --timeout=120s; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested commit 81bb206 --- the one created by merging this PR into main --- hacked to run this test as part of the E2E suite. This test failed here because it waited on a Pod from the old Deployment.
10. Waiting for all pods to be ready...
+ kubectl wait --for=condition=Ready pods -l control-plane=controller-manager -n kubeflex-system --timeout=120s
pod/kubeflex-controller-manager-57d5b6b7cb-x7mpp condition met
error: timed out waiting for the condition on pods/kubeflex-controller-manager-656c9ddc78-2xmc5
I have attached the full log.
Summary
This PR fixes issue #453 where the
kubeflex-controller-managerpod was not updating with new images after runningmake install-local-chart. The root cause was a problematicsedcommand in the Makefile'scharttarget that caused image references to remain as<placeholder>instead of the actual image name.Related issue(s)
Fixes #453
Problem
After running
make install-local-chart, the kubeflex-controller-manager pod was not restarting with the new image, continuing to run the old version. The deployment object was not being updated with the new image reference, causing the pod to continue running the old version.Evidence from the original issue:
ko.local/manager:f72a332kubeflex-controller-manager-699d869487-p6wsxRoot Cause
The issue was in the Makefile's
charttarget:The
sedcommand was designed to remove 'v' from version tags, butIMAGE_TAGuses git commit hashes (likef72a332), not version tags with 'v'. This caused the image reference to remain as<placeholder>in the generated chart.Solution
1. Fixed Makefile chart target
# FIXED - Simplified to use the full image reference cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}2. Enhanced Deployment configuration
RollingUpdatestrategy for better update control3. Added comprehensive E2E test
test/e2e/test-controller-image-update.shto verify the fixTesting
The fix has been thoroughly tested with the following results:
✅ Image Update: Image reference updates correctly with new tags
✅ Pod Recreation: New pods are created during deployment
✅ Deployment Stability: All pods are running and ready
✅ E2E Test Passes: Comprehensive test verifies the complete workflow
Test Output: