Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devices/console: Fix a bug which would cause libkrun to hang on exit #261

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

mtjhrc
Copy link
Collaborator

@mtjhrc mtjhrc commented Jan 28, 2025

This fixes a hang observed when the VM is about to exit. The problem was the console device was joining a thread on vm shutdown (or device reset) which would never exit.

To replicate the hang, the stdin must not be a terminal (for example you can use a pipe).
Before this fix, this would hang:
echo hello | ./chroot_vm rootfs_fedora /bin/cat
(Btw libkrun used 100% (1 core) of CPU when it was stuck in this state - I don't know why exactly.)

This seems to be a regression introduced by the commit: 4076b7: devices/console: implement reset method

fn wait_until_readable(&self, _stopfd: Option<&EventFd>) {
let mut poll_fds = [PollFd::new(self.sigint_evt.as_raw_fd(), PollFlags::POLLIN)];
fn wait_until_readable(&self, stopfd: Option<&EventFd>) {
let mut poll_fds = Vec::new();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut poll_fds = Vec::new();
let mut poll_fds = Vec::with_capacity(2);

maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

(Side note/rant: The performance doesn't really matter here, but it's a shame Rust has no easy way to just stack allocate a fixed capacity vector using only the standard library, which would be the best choice in situations like this 😞 )

Copy link
Member

@tylerfanelli tylerfanelli Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be a vector? If the size is fixed, is [PollFd; 2] stack allocated? I believe so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be a vector? If the size is fixed, is [PollFd; 2] stack allocated? I believe so.

Well, the size is not fixed, only the capacity is. I could use [PollFd; 2] but then if you want to poll only 1 descriptor you have to initialize the other one to some invalid PollFd and conditionally pass a slice of either 1 or 2 values into the poll(). Or you could just have 2 different calls to the poll function in an if-else.

There is also another place where we do the same - impl PortInput for PortInputFd:

fn wait_until_readable(&self, stopfd: Option<&EventFd>) {
let mut poll_fds = Vec::new();
poll_fds.push(PollFd::new(self.as_raw_fd(), PollFlags::POLLIN));
if let Some(stopfd) = stopfd {
poll_fds.push(PollFd::new(stopfd.as_raw_fd(), PollFlags::POLLIN));
}
poll(&mut poll_fds, -1).expect("Failed to poll");
}

I looked into this more, and it seems the argument in the trait stopfd: Option<&EventFd>, doesn't really have to be an Option and we can always pass a StopFd. I'd rather change it in a separate refactoring/cleanup PR along with some other changes to the console device though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I'm mainly just commenting on the heap allocation of Vec. Sure, there'd be some extra logic involved (perhaps you could also try [Option<PollFd>; 2] but that also requires some logic in the handlers) but it may give you the stack allocation you're looking for.

Make PortInputEmpty and PortInputSigInt `wait_until_readable` implementations
poll the stopfd EventFd to ensure the thread eventually exits.
(The thread is joined in Port::shutdown method causing the hang).

To replicate the hang, the stdin must not be a terminal. For example you can use
a pipe - `echo hello | ./chroot_vm rootfs_fedora /bin/cat` would hang without
this fix.

This seems to be a regresion introduced by the commit:
4076b7: devices/console: implement reset method

Signed-off-by: Matej Hrica <[email protected]>
Copy link
Contributor

@slp slp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed to fix the issue here. Thanks!

@slp slp merged commit e9281c6 into containers:main Feb 6, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants