-
Notifications
You must be signed in to change notification settings - Fork 108
Other Pod(s) Status Clean up & Pod(s) stuck in Terminating state #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| func shouldDeletePod(pod *corev1.Pod, orphaned, pending, evicted, successful, failed time.Duration) bool { | ||
| func shouldDeleteTerminatingPod(pod *corev1.Pod, orphaned, pending, evicted, terminating, successful, failed time.Duration) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to pass orphaned, pending, evicted,successful, failed durations if you only use terminating?
| if !podFinishTime.IsZero() { | ||
| age := time.Since(podFinishTime) | ||
| if terminating > 0 && age >= terminating { | ||
| log.Println("Pod(s) Which Are In Terminating State") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Printf instead of 3 loglines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lwolf for troubleshooting I have added these logs and I thought I have removed those lines.will update the same.
| if pod.Status.Phase == corev1.PodFailed && pod.Status.Reason == "Evicted" && evicted > 0 { | ||
| return true | ||
| } | ||
| if pod.Status.Phase == corev1.PodFailed && pod.Status.Reason == "OutOfpods" && evicted > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comment about outOfpods and outOfcpu reason or a link to the docs describing it's behavior.
| for name, tc := range testCases { | ||
| t.Run(name, func(t *testing.T) { | ||
| result := shouldDeletePod(tc.podSpec, tc.orphaned, tc.pending, tc.evicted, tc.successful, tc.failed) | ||
| result := shouldDeletePod(tc.podSpec, tc.orphaned, tc.pending, tc.evicted, tc.terminated, tc.successful, tc.failed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add test scenario for every case that you're adding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lwolf yes. I will be adding the test cases for each scenario and pushing one by one change for this PR
| func shouldDeletePod(pod *corev1.Pod, orphaned, pending, evicted, successful, failed time.Duration) bool { | ||
| func shouldDeleteTerminatingPod(pod *corev1.Pod, orphaned, pending, evicted, terminating, successful, failed time.Duration) bool { | ||
| // terminating pods which got hanged, those with or without owner references, but in Evicted state | ||
| // - uses c.deleteEvictedAfter, this one is tricky, because there is no timestamp of eviction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this comment is just a copy-paste from the shouldDeletePod, cause it doesn't make any sense
|
|
||
| // In Case If Pod(s) is Stuck in Terminating state just in case to delete with force | ||
| // Not goood way to fid the root cause of the issue why pod is stuck in terminating state | ||
| func (c *Kleaner) DeletePodWithForce(pod *corev1.Pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we discussed in the issue I'd prefere not to have force-deletion in the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lwolf yes but we can keep by enabling the flag. it will be useful when there is an issue calico and workload on this node will be blocked due to this issue so to avoid those pod(s) which are stuck in terminating state until the real issue is fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the issue should be solved by node draining, not force-deletion.
Added the below changes:
Issues #88 :
#88