-
Notifications
You must be signed in to change notification settings - Fork 83
Unmount fuse on termination #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This issue has already been discussed extensively in multiple places (e.g., buildbarn/bb-clientd#2, Slack threads). Let me get straight to the point: I am not planning on merging a change like this. The reasons are as follows:
Effort spent writing this fix could also have been spent root causing the actual source of the lockups in Kubelet. It would also make you a rockstar, as it would also fix many other use cases people have using FUSE on Kubernetes. See kubernetes/kubernetes#7890 for more details. |
|
Sidenote: the only reason we do perform unmounts for NFSv4 mounts is because stale NFS mounts are far worse than the ones left behind by FUSE. Because NFS is a network based protocol, the kernel will end up performing timed retries for a prolonged period of time, causing processes in user space to hang. This differs from FUSE, where any attempt to access a stale FUSE mount causes system calls to immediately fail with ENXIO. I wouldn't be amazed if these lockups in Kubelet are merely caused by some retry loop in Kubelet itself, which fails to account for stale FUSE mounts being present. All that likely needs to be done is inject some umount() system call in between those retries to guarantee forward progress. |
|
Thanks for your response! Sorry if I touched a nerve - I made the tweak pretty quickly after seeing the issue while testing FUSE in my cluster, and seeing the TODO note. I appreciate the added context here. Would it be more helpful if I updated the TODO to mark this as deliberate? |
|
Sure thing! Go for it! |
|
Giving it some more thought: I think that the TODO in the code is still valid. This is about shutting down the FUSE server goroutine gracefully. I think that still applies, and the TODO should remain intact. That's distinct from unmounting. Documenting explicitly why we don't unmount seems fine though. |
|
I've submitted kubernetes/kubernetes#129550 after combing through some of the kubernetes source and (I believe) identified the offending lines. Making sure I understand the intent of the original TODO, it doesn't looks like The |
Terminates the fuse mount on server termination. Without this - pods can potentially fail to get cleaned up by the kubelet due to the remaining FUSE mount.