Unmount fuse on termination #159

alexnovak · 2025-01-08T20:50:11Z

Terminates the fuse mount on server termination. Without this - pods can potentially fail to get cleaned up by the kubelet due to the remaining FUSE mount.

EdSchouten · 2025-01-08T21:16:23Z

This issue has already been discussed extensively in multiple places (e.g., buildbarn/bb-clientd#2, Slack threads). Let me get straight to the point: I am not planning on merging a change like this. The reasons are as follows:

In general, leaving the stale FUSE mount behind is in fact desirable, as it effectively causes the mount path to be marked as dead. If an unmount would take place and the underlying directory happens to be writable, any lingering build actions would unintentionally create new output files into the directory below the mount. These files would subsequently become hidden when bb_worker is restarted and a new mount is created on top. This might make a systems administrator wonder where disk space has gone.
The change you've provided is incomplete, as there is no absolute guarantee that this code path gets run. If Kubernetes SIGKILLs bb_worker and then destroys the pod, you still end up locking up Kubelet. Coming up with a 'best effort' workaround for something for which a reliable fix can be designed is in my opinion poor taste.

Effort spent writing this fix could also have been spent root causing the actual source of the lockups in Kubelet. It would also make you a rockstar, as it would also fix many other use cases people have using FUSE on Kubernetes. See kubernetes/kubernetes#7890 for more details.

EdSchouten · 2025-01-08T21:21:48Z

Sidenote: the only reason we do perform unmounts for NFSv4 mounts is because stale NFS mounts are far worse than the ones left behind by FUSE. Because NFS is a network based protocol, the kernel will end up performing timed retries for a prolonged period of time, causing processes in user space to hang. This differs from FUSE, where any attempt to access a stale FUSE mount causes system calls to immediately fail with ENXIO.

I wouldn't be amazed if these lockups in Kubelet are merely caused by some retry loop in Kubelet itself, which fails to account for stale FUSE mounts being present. All that likely needs to be done is inject some umount() system call in between those retries to guarantee forward progress.

alexnovak · 2025-01-08T22:12:09Z

Thanks for your response! Sorry if I touched a nerve - I made the tweak pretty quickly after seeing the issue while testing FUSE in my cluster, and seeing the TODO note. I appreciate the added context here. Would it be more helpful if I updated the TODO to mark this as deliberate?

EdSchouten · 2025-01-08T22:19:41Z

Sure thing! Go for it!

EdSchouten · 2025-01-09T10:04:23Z

Giving it some more thought: I think that the TODO in the code is still valid. This is about shutting down the FUSE server goroutine gracefully. I think that still applies, and the TODO should remain intact. That's distinct from unmounting.

Documenting explicitly why we don't unmount seems fine though.

alexnovak · 2025-01-09T16:30:57Z

I've submitted kubernetes/kubernetes#129550 after combing through some of the kubernetes source and (I believe) identified the offending lines.

Making sure I understand the intent of the original TODO, it doesn't looks like Serve offers any particularly graceful way of handling context delivered to the server. In its core loop - it looks like the only ways to convince it to stop are with an unmount, or an unhandled error
https://github.com/hanwen/go-fuse/blob/v2.5.1/fuse/server.go#L474-L503

The exit towards the top of the function seems to be a carve-out so they can limit the concurrency, and doesn't seem to be reachable through any public function. I'm guessing in your ideal world - we'd have another way to trigger exit behavior here?

Unmount fuse on termination

765a088

EdSchouten closed this Jan 8, 2025

EdSchouten reopened this Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unmount fuse on termination #159

Unmount fuse on termination #159

Uh oh!

alexnovak commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 8, 2025 •

edited

Loading

Uh oh!

alexnovak commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 9, 2025

Uh oh!

alexnovak commented Jan 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Unmount fuse on termination #159

Are you sure you want to change the base?

Unmount fuse on termination #159

Uh oh!

Conversation

alexnovak commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexnovak commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 8, 2025

Uh oh!

EdSchouten commented Jan 9, 2025

Uh oh!

alexnovak commented Jan 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EdSchouten commented Jan 8, 2025 •

edited

Loading