Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] modifying maxSurge doesn't work under some settings #1666

Open
myname4423 opened this issue Jul 19, 2024 · 1 comment · May be fixed by #1929
Open

[BUG] modifying maxSurge doesn't work under some settings #1666

myname4423 opened this issue Jul 19, 2024 · 1 comment · May be fixed by #1929
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@myname4423
Copy link

myname4423 commented Jul 19, 2024

What happened:
Test 1:
step 1: create a cloneset with following setting (replicas=4, paused = false):
image
step 2: update the cloneset.spec.template to release a new version [As expected]:
get the cloneset:
image
the log:
I0719 09:37:04.282526 1 cloneset_sync_utils.go:124] Calculate diffs for CloneSet bg/echoserver, replicas=4, partition=0, maxSurge=4, maxUnavailable=0, allPods=4, newRevisionPods=0, newRevisionActivePods=0, oldRevisionPods=4, oldRevisionActivePods=4, unavailableNewRevisionCount=0, unavailableOldRevisionCount=4, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0, enabledPreparingUpdateAsUpdate=false, useDefaultIsPodUpdate=false. Result: {scaleUpNum:4 scaleUpNumOldRevision:0 scaleDownNum:0 scaleDownNumOldRevision:0 scaleUpLimit:4 deleteReadyLimit:0 useSurge:4 useSurgeOldRevision:0 updateNum:4 updateMaxUnavailable:0}

Test 2:
step 1: create a cloneset with following setting (replicas=4, paused = false):
(similar to test 1, but with maxSurge = 50%
image

step 2: update the cloneset.spec.template to release a new version [As expected]:
get the cloneset:
image
the log:
I0719 09:28:51.366992 1 cloneset_sync_utils.go:124] Calculate diffs for CloneSet bg/echoserver, replicas=4, partition=0, maxSurge=2, maxUnavailable=0, allPods=4, newRevisionPods=0, newRevisionActivePods=0, oldRevisionPods=4, oldRevisionActivePods=4, unavailableNewRevisionCount=0, unavailableOldRevisionCount=4, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0, enabledPreparingUpdateAsUpdate=false, useDefaultIsPodUpdate=false. Result: {scaleUpNum:2 scaleUpNumOldRevision:0 scaleDownNum:0 scaleDownNumOldRevision:0 scaleUpLimit:2 deleteReadyLimit:0 useSurge:2 useSurgeOldRevision:0 updateNum:4 updateMaxUnavailable:0}

so far so good

step 3: update the maxSurge from 50% to 100%
get the cloneset:
image
complete yaml:
apiVersion: apps.kruise.io/v1alpha1 kind: CloneSet metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps.kruise.io/v1alpha1","kind":"CloneSet","metadata":{"annotations":{},"labels":{"app":"echoserver"},"name":"echoserver","namespace":"bg"},"spec":{"minReadySeconds":99999,"replicas":4,"selector":{"matchLabels":{"app":"echoserver"}},"template":{"metadata":{"labels":{"app":"echoserver","version":"base"}},"spec":{"containers":[{"env":[{"name":"version","value":"base"},{"name":"PORT","value":"8080"},{"name":"POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}},{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"POD_IP","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}},{"name":"NODE_NAME","value":"version2"}],"image":"cilium/echoserver:latest","imagePullPolicy":"IfNotPresent","name":"echoserver","ports":[{"containerPort":8080,"name":"http"}]}]}},"updateStrategy":{"maxSurge":"100%","maxUnavailable":0,"partition":0,"paused":false,"type":"ReCreate"}}} creationTimestamp: "2024-07-19T09:28:24Z" generation: 3 labels: app: echoserver name: echoserver namespace: bg resourceVersion: "14120354" uid: e9b9f64c-eb05-48a8-89be-4db735f2a011 spec: minReadySeconds: 99999 replicas: 4 revisionHistoryLimit: 10 scaleStrategy: {} selector: matchLabels: app: echoserver template: metadata: creationTimestamp: null labels: app: echoserver version: base spec: containers: - env: - name: version value: base - name: PORT value: "8080" - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: NODE_NAME value: version2 image: cilium/echoserver:latest imagePullPolicy: IfNotPresent name: echoserver ports: - containerPort: 8080 name: http protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 updateStrategy: maxSurge: 100% maxUnavailable: 0 partition: 0 type: ReCreate status: availableReplicas: 0 collisionCount: 0 currentRevision: echoserver-54bf67658f expectedUpdatedReplicas: 4 labelSelector: app=echoserver observedGeneration: 3 readyReplicas: 6 replicas: 6 updateRevision: echoserver-5464fddcfb updatedReadyReplicas: 2 updatedReplicas: 2
the log:
I0719 09:32:09.518530 1 cloneset_sync_utils.go:124] Calculate diffs for CloneSet bg/echoserver, replicas=4, partition=0, maxSurge=4, maxUnavailable=0, allPods=6, newRevisionPods=2, newRevisionActivePods=2, oldRevisionPods=4, oldRevisionActivePods=4, unavailableNewRevisionCount=2, unavailableOldRevisionCount=4, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0, enabledPreparingUpdateAsUpdate=false, useDefaultIsPodUpdate=false. Result: {scaleUpNum:0 scaleUpNumOldRevision:0 scaleDownNum:0 scaleDownNumOldRevision:0 scaleUpLimit:0 deleteReadyLimit:0 useSurge:2 useSurgeOldRevision:0 updateNum:2 updateMaxUnavailable:2}

we can find that, the useSurge is still 2, instead of 4 as test 1. the cloneset now still has 50% pods of new version instead of 100% as expected.

What you expected to happen:
the test 2 should has the consistent result with the test 1, ie. when changing maxSurge from 50% to 100%, controller is expected to scale 100% replicas of pods of new version, instead of not working like now.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
the "bug" is probably caused by this logic in function calculateDiffsWithExpectation

if scaleSurge >= updateSurge {

the useSurge is calculated from min(maxSurge, max(scaleSurge, updateSurge)), where maxSurge is 4, scaleSurge is 0, updateSurge is:

  • in test 1, it is min(|4|, |-4|), and thus the final useSurge is 4
  • in test 2, it is min(|4|,|-2|), and thus the final useSurge is 2, which caused this bug

Environment:

  • Kruise version:
  • Kubernetes version (use kubectl version):
  • Install details (e.g. helm install args):
  • Others:
@myname4423 myname4423 added the kind/bug Something isn't working label Jul 19, 2024
@myname4423 myname4423 changed the title [BUG] YOUR_TITLE [BUG] modifying maxSurge doesn't work under some settings Jul 19, 2024
@myname4423 myname4423 reopened this Jul 19, 2024
Copy link

stale bot commented Nov 16, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Nov 16, 2024
@furykerry furykerry removed the wontfix This will not be worked on label Nov 18, 2024
@zmberg zmberg added this to the 1.9 milestone Dec 25, 2024
@AiRanthem AiRanthem linked a pull request Feb 18, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants