-
Notifications
You must be signed in to change notification settings - Fork 147
modify: support rollback #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @huww98. Thanks for your PR. I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Implementation of kubernetes/enhancements#5482 |
56ff79c
to
19cef74
Compare
/ok-to-test |
|
||
// Check if we should change our target | ||
_, inUncertainState := ctrl.uncertainPVCs.Load(pvcKey) | ||
if (status != nil && status.Status == v1.PersistentVolumeClaimModifyVolumeInProgress && inUncertainState) || pvcSpecVacName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If status.Status == v1.PersistentVolumeClaimModifyVolumeInProgress alone should be a sufficient condition to not modify volume anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that condition exists to ensure that for transient failing modifications we keep trying to reconcile to whatever value was recorded in target
and do not start with new value of vac
.
t.Fatalf("expected error to be %v, got %v", finalErr, err) | ||
} | ||
// should clear uncertain state | ||
assertUncertain(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add cases to verify what the conditions should be in each step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now The ModifyVolumeError condition will overwrite ModifyingVolume, we will always get only ModifyVolumeError after a failed modification. So I think it maybe not very interesting to check it here.
Added stronger assertions to TestMarkControllerModifyVolumeStatus
.
@huww98 let me know when this PR is ready for another round of review. thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validated that the change introduced in this PR does not break existing modify volume workflows by building a custom resizer image including this change and deploying latest version of EBS CSI Driver. I ran the external storage test suite and manually tested a few edge cases.
lgtm besides the outstanding feedback from @gnufied and @sunnylovestiramisu that needs to be addressed.
We should keep retry the previously specified target.
ef281b6
to
eff5e89
Compare
Support rollback to VAC A if modifying from A to B failed with a final error. This works just like we modifying it again to C on final error. The significant changes in the sync logic: - Always retry if pvc.Status.ModifyVolumeStatus is not nil, which means the last transation does not finish successfully. - Keep reconciling to the previous target if spec is rolled back to nil, until it succeeds or we get an infeasible error. Then we just leave it at its current state and stop reconciling for it, since user may not care about it now.
pvcModifier should return independent object from the start of the call chain to avoid interference between test cases.
Start modify @ t1 Type Status LastProbeTime LastTransitionTime Reason Message ---- ------ ------------- ------------------ ------ ------- ModifyingVolume True t1 t1 Modifying volume to <VAC name> is in progress. final error @ t2: - Use gPRC code for Reason - update LastTransitionTime only if Reason changes Type Status LastProbeTime LastTransitionTime Reason Message ---- ------ ------------- ------------------ ------ ------- ModifyingVolume True t2 t1 Modifying volume to <VAC name> failed. Waiting for retry. ModifyVolumeError True t2 t2 Internal Final error. Retry @ t3: - update ModifyingVolume LastProbeTime Type Status LastProbeTime LastTransitionTime Reason Message ---- ------ ------------- ------------------ ------ ------- ModifyingVolume True t3 t1 Modifying volume to <VAC name> is in progress. ModifyVolumeError True t2 t2 Internal Final error. non-final error @ t4: Type Status LastProbeTime LastTransitionTime Reason Message ---- ------ ------------- ------------------ ------ ------- ModifyingVolume True t4 t1 Modifying volume to <VAC name> is still in progress. ModifyVolumeError True t4 t4 DeadlineExceeded Progress: 10%.
@huww98 you didn't address:
|
@gnufied I think we have reached agreement about this: the status.modifyStatus should not be cleared
|
Okay, lets move past this issue for now. I was mainly talking about case of VAC being But I think I found a bug in handling of non-existing VACs which can cause users to abuse quota trivially. Example flow:
|
Make it harder to abuse VAC quota by change spec to an non-existing VAC when the volume is partially modified. Basically reverting 6c8c10f.
It is a bit weird if we left I'm thinking that can we just deprecate the Pending status? Maybe we should left |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnufied, huww98 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-csi-external-resizer-1-33-on-kubernetes-master |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Support rollback to VAC A if modifying from A to B failed with a final error.
This works just like we modifying it again to C on final error.
The significant changes in the sync logic:
last transation does not finish successfully.
it succeeds or we get an infeasible error. Then we just leave it at its
current state and stop reconciling for it, since user may not care about it
now.
See kubernetes/enhancements#5482 for details
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The first 2 commits are already included in my other PRs. Please review the last commit. I will rebase when the previous PRs are merged.
Does this PR introduce a user-facing change?: