Skip to content

Added function get_enabled to export vring enable status. #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

WeiChungHsu
Copy link

Summary of the PR

Added function get_enabled to export vring enable status.

Please summarize here why the changes in this PR are needed.
I am working on snapshot/restore feature for vhost-device-vsock crate. And during snapshot, system sends "FrontendReq::SET_VRING_ENABLE, enable false" to vhost-device-vsock device. Then the vring was disabled. However, the vring tx/rx queue still execute iter() which will return error and then system (vring) hang because vring has been disabled.

My proposed solution to solve this snapshot/restore hang issue is adding vring enable status checking before access/iter() the vring. Therefore, we need this get_enabled function to check if vring is enabled or disabled. I believe in the future, other vhost device may have similar hang issue when it executes snapshot.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

@Ablu
Copy link
Collaborator

Ablu commented Mar 10, 2025

Do you have an example how the corresponding change looks on the vhost-device-vsock side?

@WeiChungHsu
Copy link
Author

Do you have an example how the corresponding change looks on the vhost-device-vsock side?

When the Vring is disabled (and queue set to not ready), the vhu_vsock_thread.rs still try to iterate and read RX queue, therefore, it hang at this line:
https://github.com/rust-vmm/vhost-device/blob/main/vhost-device-vsock/src/vhu_vsock_thread.rs#L591-L594

Furthermore, it hang because this Queue iter() operation return error at this line:
https://github.com/rust-vmm/vm-virtio/blob/main/virtio-queue/src/queue.rs#L592-L597

My solution in vhu_vsock_thread.rs:
let mut vring_mut = vring.get_mut();

if !vring_mut.get_enabled() {
    return Ok(true);
}

    let queue = vring_mut.get_queue_mut();

    while let Some(mut avail_desc) = queue
        .iter(atomic_mem.memory())
        .map_err(|_| Error::IterateQueue)?
        .next()

@Ablu
Copy link
Collaborator

Ablu commented Mar 11, 2025

Furthermore, it hang because this Queue iter() operation return error at this line:
https://github.com/rust-vmm/vm-virtio/blob/main/virtio-queue/src/queue.rs#L592-L597

Hm. But shouldn't it error out in that case versus hanging? I do not fully understand where the hang would happen.

Ideally, I think it would be great if we can avoid easy footguns by designing the API in a way that avoids them altogether. I assume chances are that people will forget this check in other implementations too?

@stefano-garzarella
Copy link
Member

@WeiChungHsu can you fix CI issues and rebase this?

@WeiChungHsu
Copy link
Author

Do you have an example how the corresponding change looks on the vhost-device-vsock side?

Hi,

Couple days ago, I created a PR for vhost-device-vsock side. It used this new vring function to check if vring is enabled/disabled.
rust-vmm/vhost-device#827

Sorry for my late reply.

@WeiChungHsu
Copy link
Author

@WeiChungHsu can you fix CI issues and rebase this?

Yes, I am working on it now.

@WeiChungHsu WeiChungHsu force-pushed the main branch 3 times, most recently from e613696 to 8b80093 Compare March 21, 2025 23:01
@stefano-garzarella
Copy link
Member

@WeiChungHsu please avoid merging from main, instead rebase this PR on current main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants