Skip to content

Add optional pause prior to completing lifecycle action to allow PVC cleanup #651

Closed
@mjseid

Description

@mjseid

Describe the feature
Add an optional wait after sucessful pod evcition before completing the ASG lifecycle hook. Could default to 0, but in my case it appears the nodes are getting terminated before all needed drain tasks are complete.

Is the feature request related to a problem?
I came across this project as a potential solution to a common issue with the EBS CSI storage driver. If a node is ungracefully terminated (ex ASG instance refresh) without being drained pods with PVC's will not be able to come up on new nodes until the 6 minute force-detach happens in the controller.

I've installed this project, and it does sucessfully evict the pods with PVC's and wait for a completed lifecycle hook prior to terminating the nodes, but my stateful pods are still getting the pvc multi-attach error and having to wait the 6 minutes. If I manually drain the nodes and then manually delete them, I am not seeing this issue so I believe ASG is just terminating the nodes too quickly before the controller can fully un-attach any PVC's.

Describe alternatives you've considered
I can get pods with PVC's to move appropriately if I set the lifecycle hook heartbeat timeout to 60 seconds with action of CONTINUE, and then remove the "autoscaling:CompleteLifecycleAction" privledge on the IAM role for this project. So the handler evicts the pod, and then can't complete the lifecycle action but Autoscaling continues with deleting the node after the 60 second timeout.

It works for my use case, but only b/c I have a small amount of pods per node and they evict quickly. It would be better to just inject this wait time prior to completing the lifecycle action.

Metadata

Metadata

Assignees

Labels

Type: QuestionAll types of questions to/from customers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions