Other Pod(s) Status Clean up & Pod(s) stuck in Terminating state #89

reddymh · 2022-12-12T12:11:57Z

Added the below changes:

Pod(s) which are stuck in Terminating state and requires graceful delete based on age/time ( experimental change and will discuss about controlling thru flag)
Pod(s) which are in Error/ContainerStatusUnknown/OOMKilled/Terminated/Completed(Sometimes running pod changes to completed due to node re-creation/preemptive nodes ) based on age/time.
Other states like OutOfpods , OutOfcpus , Terminated etc

Issues #88 :

#88

…ate logic

…ator

lwolf · 2022-12-12T13:47:29Z

pkg/controller/pod.go

 }

-func shouldDeletePod(pod *corev1.Pod, orphaned, pending, evicted, successful, failed time.Duration) bool {
+func shouldDeleteTerminatingPod(pod *corev1.Pod, orphaned, pending, evicted, terminating, successful, failed time.Duration) bool {


why do you need to pass orphaned, pending, evicted,successful, failed durations if you only use terminating?

lwolf · 2022-12-12T13:47:53Z

pkg/controller/pod.go

+		if !podFinishTime.IsZero() {
+			age := time.Since(podFinishTime)
+			if terminating > 0 && age >= terminating {
+				log.Println("Pod(s) Which Are In Terminating State")


log.Printf instead of 3 loglines

@lwolf for troubleshooting I have added these logs and I thought I have removed those lines.will update the same.

lwolf · 2022-12-12T13:49:31Z

pkg/controller/pod.go

 	if pod.Status.Phase == corev1.PodFailed && pod.Status.Reason == "Evicted" && evicted > 0 {
 		return true
 	}
+	if pod.Status.Phase == corev1.PodFailed && pod.Status.Reason == "OutOfpods" && evicted > 0 {


please add comment about outOfpods and outOfcpu reason or a link to the docs describing it's behavior.

lwolf · 2022-12-12T13:51:12Z

pkg/controller/pod_test.go

 	for name, tc := range testCases {
 		t.Run(name, func(t *testing.T) {
-			result := shouldDeletePod(tc.podSpec, tc.orphaned, tc.pending, tc.evicted, tc.successful, tc.failed)
+			result := shouldDeletePod(tc.podSpec, tc.orphaned, tc.pending, tc.evicted, tc.terminated, tc.successful, tc.failed)


please add test scenario for every case that you're adding

@lwolf yes. I will be adding the test cases for each scenario and pushing one by one change for this PR

lwolf · 2022-12-12T13:55:34Z

pkg/controller/pod.go

-func shouldDeletePod(pod *corev1.Pod, orphaned, pending, evicted, successful, failed time.Duration) bool {
+func shouldDeleteTerminatingPod(pod *corev1.Pod, orphaned, pending, evicted, terminating, successful, failed time.Duration) bool {
+	// terminating pods which got hanged, those with or without owner references, but in Evicted state
+	//  - uses c.deleteEvictedAfter, this one is tricky, because there is no timestamp of eviction.


I assume this comment is just a copy-paste from the shouldDeletePod, cause it doesn't make any sense

lwolf · 2022-12-12T13:57:04Z

pkg/controller/controller.go

+
+// In Case If Pod(s) is Stuck in Terminating state just in case to delete with force
+// Not goood way to fid the root cause of the issue why pod is stuck in terminating state
+func (c *Kleaner) DeletePodWithForce(pod *corev1.Pod) {


as we discussed in the issue I'd prefere not to have force-deletion in the codebase

@lwolf yes but we can keep by enabling the flag. it will be useful when there is an issue calico and workload on this node will be blocked due to this issue so to avoid those pod(s) which are stuck in terminating state until the real issue is fixed

no, the issue should be solved by node draining, not force-deletion.

Reddy added 3 commits November 30, 2022 23:50

added terminated|OutOfPod(s)|OutOfCPU(s)|Completed with terminated st…

4d85a08

…ate logic

updated

02b327b

Merge branch 'master' of https://github.com/reddymh/kube-cleanup-oper…

a650707

…ator

lwolf requested changes Dec 12, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Other Pod(s) Status Clean up & Pod(s) stuck in Terminating state #89

Other Pod(s) Status Clean up & Pod(s) stuck in Terminating state #89

Uh oh!

reddymh commented Dec 12, 2022

Uh oh!

lwolf Dec 12, 2022

Uh oh!

lwolf Dec 12, 2022

Uh oh!

reddymh Dec 12, 2022

Uh oh!

lwolf Dec 12, 2022

Uh oh!

lwolf Dec 12, 2022

Uh oh!

reddymh Dec 12, 2022

Uh oh!

lwolf Dec 12, 2022

Uh oh!

lwolf Dec 12, 2022

Uh oh!

reddymh Dec 12, 2022

Uh oh!

lwolf Dec 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Other Pod(s) Status Clean up & Pod(s) stuck in Terminating state #89

Are you sure you want to change the base?

Other Pod(s) Status Clean up & Pod(s) stuck in Terminating state #89

Uh oh!

Conversation

reddymh commented Dec 12, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants