Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
0f69a09
Added a first draft of the controller delay stuff
gurasinghMS Jul 14, 2025
d823483
Created the fault injetion controller
gurasinghMS Jul 15, 2025
7546230
Made some progress on invoking the fault function from the local object
gurasinghMS Jul 16, 2025
537ba84
Adding an AdminHandler middle layer to deal with relay of messages
gurasinghMS Jul 18, 2025
0f25ae9
Adding some more interception logit to the fault injection work. Shou…
gurasinghMS Jul 18, 2025
0a69f13
2 things here. AdminState should not have public members I am not sur…
gurasinghMS Jul 19, 2025
8d3d244
Pausing some progress here real quick to get a delay option out first
gurasinghMS Jul 22, 2025
0d19265
Able to now intercept commands coming to the admin queue
gurasinghMS Jul 23, 2025
4714d44
Tests in a working state with the admin queue intercept
gurasinghMS Jul 25, 2025
8c32e06
Files are now sectioned out properly with proper imports etc
gurasinghMS Jul 25, 2025
3ca4701
Removing any tracing debug statements
gurasinghMS Jul 25, 2025
3ad4d3d
Removing some unused variables
gurasinghMS Jul 25, 2025
9314e20
Removing more unused stuff
gurasinghMS Jul 25, 2025
ef37035
Removing non required changes + cleanup
gurasinghMS Jul 25, 2025
c91042e
No longer using mesh system to notify the submission queue of the mos…
gurasinghMS Jul 25, 2025
305aa00
First draft of PR with the fault injection nvme controller
gurasinghMS Jul 28, 2025
effd368
Removing unused debugging statements
gurasinghMS Jul 28, 2025
1321401
Removing some of the tracing statements again
gurasinghMS Jul 28, 2025
3bccdc6
Removing unused code
gurasinghMS Jul 28, 2025
6532188
Removing more unused code here
gurasinghMS Jul 28, 2025
4f51062
Removing tracing from pci.rs
gurasinghMS Jul 28, 2025
1d8b1ee
Cleaned up the submission queue fault injection work
gurasinghMS Jul 28, 2025
ee9c9d3
Pausing work to figure out a bug with the admin doorbell write through
gurasinghMS Jul 29, 2025
a2a045c
removing the fault injection queue. It is no longer needed
gurasinghMS Jul 29, 2025
02f3d18
Pausing work to check up on how to set up the fault injection function
gurasinghMS Jul 29, 2025
fbe20e8
Saving work
gurasinghMS Jul 29, 2025
00ca6d9
Added the fault controller functiongit add *.rs It is able to success…
gurasinghMS Jul 29, 2025
3bf6fb1
Committing before making a draft PR on the main repo
gurasinghMS Jul 30, 2025
804e9e1
Addressing PR comments
gurasinghMS Jul 30, 2025
9f658ed
Fault function is now returning an Option<Command> type. Better enabl…
gurasinghMS Jul 31, 2025
fb01ef0
Added the capability to change/fault an admin command
gurasinghMS Jul 31, 2025
0071f19
Changing test for fault injection
gurasinghMS Jul 31, 2025
82f0225
Responding to some copilot comments
gurasinghMS Jul 31, 2025
41e9488
Removed a lot of the junk / unused values from the code. Looking much…
gurasinghMS Jul 31, 2025
00c6249
Fixed driver tests showcasing delay and fault functionality
gurasinghMS Jul 31, 2025
b88e503
Aborting attempt to implement controller reset functionality
gurasinghMS Jul 31, 2025
5a3dc70
Update vm/devices/storage/nvme/src/fault_injection/pci.rs
gurasinghMS Jul 31, 2025
c657b83
Update vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs
gurasinghMS Jul 31, 2025
170fd3c
Update vm/devices/storage/nvme/src/fault_injection/pci.rs
gurasinghMS Jul 31, 2025
21c6b10
Responded to comments and much more overall PR cleanup
gurasinghMS Aug 1, 2025
3d5b8c5
Even more cleanup for the admin queue runner
gurasinghMS Aug 1, 2025
13c3456
Changes based on PR comments
gurasinghMS Aug 1, 2025
db7e2ba
Renamed controller tests and added cfg test tag
gurasinghMS Aug 1, 2025
6a9cd65
Added a type alias for fault function passed in to the controller
gurasinghMS Aug 1, 2025
4d20aab
Moved task_control in cargo file
gurasinghMS Aug 1, 2025
52d072c
FaultInjectionController is now behind a feature directive. Tried mov…
gurasinghMS Aug 1, 2025
2f97ec0
Update vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs
gurasinghMS Aug 1, 2025
5d8daec
Fixing cargo xtask fmt
gurasinghMS Aug 1, 2025
dd2815b
Added better error handling in the admin handler during command proce…
gurasinghMS Aug 1, 2025
67965f1
Adding an issue for non-implemented controller reset functionality
gurasinghMS Aug 1, 2025
de60e21
Removed the cfg test tag from the fault injection test function
gurasinghMS Aug 1, 2025
b7421ef
Forking over the entire nvme controller
gurasinghMS Aug 5, 2025
6193c15
Forked over the original nvme controller to a new crate called nvme_t…
gurasinghMS Aug 6, 2025
feed61b
Saving changes before rebase
gurasinghMS Aug 6, 2025
03f7763
Minor changes to the Cargo files for cleanup post emulator fork
gurasinghMS Aug 6, 2025
9f437e8
Minor changes to the Cargo files for cleanup post emulator fork
gurasinghMS Aug 6, 2025
e681005
Responding based on comments in the PR
gurasinghMS Aug 7, 2025
891b9fb
Fixing xtask fmt issues
gurasinghMS Aug 7, 2025
a1cb2e0
Added 2 unit tests each testing for admin queue fault functionality
gurasinghMS Aug 7, 2025
a349c19
Repsonding to comments from PR
gurasinghMS Aug 7, 2025
cd10492
Merge branch 'main' into controller_delay
gurasinghMS Aug 8, 2025
2ed201b
Changed the comment indicating that the allow_dma switch is false and…
gurasinghMS Aug 11, 2025
4c66197
Merge branch 'main' into controller_delay
gurasinghMS Aug 11, 2025
c9d9a8f
Merge branch 'main' into controller_delay
gurasinghMS Aug 11, 2025
74f62c7
Removing the copy trait from Completions as per guidance in the comments
gurasinghMS Aug 11, 2025
82507fe
Removing unused nvme_test member
gurasinghMS Aug 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -4688,6 +4688,7 @@ name = "nvme_driver"
version = "0.0.0"
dependencies = [
"anyhow",
"async-trait",
"chipset_device",
"disklayer_ram",
"event-listener",
Expand All @@ -4699,6 +4700,7 @@ dependencies = [
"mesh",
"nvme",
"nvme_spec",
"nvme_test",
"pal_async",
"parking_lot",
"pci_core",
Expand Down Expand Up @@ -4736,6 +4738,40 @@ dependencies = [
"zerocopy 0.8.24",
]

[[package]]
name = "nvme_test"
version = "0.0.0"
dependencies = [
"async-trait",
"chipset_device",
"device_emulators",
"disk_backend",
"event-listener",
"futures",
"futures-concurrency",
"guestmem",
"guid",
"inspect",
"mesh",
"nvme_common",
"nvme_resources",
"nvme_spec",
"pal_async",
"parking_lot",
"pci_core",
"pci_resources",
"scsi_buffers",
"task_control",
"thiserror 2.0.12",
"tracelimit",
"tracing",
"unicycle",
"user_driver",
"vm_resource",
"vmcore",
"zerocopy 0.8.24",
]

[[package]]
name = "object"
version = "0.36.7"
Expand Down
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ members = [
"petri/petri-tool",
"vm/loader/igvmfilegen",
"vm/vmgs/vmgs_lib",
"vm/vmgs/vmgstool",
"vm/vmgs/vmgstool",
]
exclude = [
"xsync",
Expand Down Expand Up @@ -248,6 +248,7 @@ nvme_common = { path = "vm/devices/storage/nvme_common" }
nvme_driver = { path = "vm/devices/storage/disk_nvme/nvme_driver" }
nvme_resources = { path = "vm/devices/storage/nvme_resources" }
nvme_spec = { path = "vm/devices/storage/nvme_spec" }
nvme_test = { path = "vm/devices/storage/nvme_test" }
storage_string = { path = "vm/devices/storage/storage_string" }
vmswitch = { path = "vm/devices/net/vmswitch" }
pci_bus = { path = "vm/devices/pci/pci_bus" }
Expand Down
2 changes: 2 additions & 0 deletions vm/devices/storage/disk_nvme/nvme_driver/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ pci_core.workspace = true
scsi_buffers.workspace = true
test_with_tracing.workspace = true
user_driver_emulated_mock.workspace = true
nvme_test.workspace = true
async-trait.workspace = true

guid.workspace = true

Expand Down
110 changes: 110 additions & 0 deletions vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,19 @@ use inspect::Inspect;
use inspect::InspectMut;
use nvme::NvmeControllerCaps;
use nvme_spec::Cap;
use nvme_spec::Command;
use nvme_spec::Completion;
use nvme_spec::nvm::DsmRange;
use nvme_test::FaultConfiguration;
use nvme_test::QueueFaultBehavior;
use pal_async::DefaultDriver;
use pal_async::async_test;
use pal_async::timer::PolledTimer;
use parking_lot::Mutex;
use pci_core::msi::MsiInterruptSet;
use scsi_buffers::OwnedRequestBuffers;
use std::sync::Arc;
use std::time::Duration;
use test_with_tracing::test;
use user_driver::DeviceBacking;
use user_driver::DeviceRegisterIo;
Expand All @@ -26,9 +32,57 @@ use user_driver_emulated_mock::DeviceTestMemory;
use user_driver_emulated_mock::EmulatedDevice;
use user_driver_emulated_mock::Mapping;
use vmcore::vm_task::SingleDriverBackend;
use vmcore::vm_task::VmTaskDriver;
use vmcore::vm_task::VmTaskDriverSource;
use zerocopy::IntoBytes;

struct AdminQueueFault {
pub driver: VmTaskDriver,
}

#[async_trait::async_trait]
impl nvme_test::QueueFault for AdminQueueFault {
async fn fault_submission_queue(&self, mut command: Command) -> QueueFaultBehavior<Command> {
tracing::info!("Faulting submission queue using cid sequence number mismatch");
let opcode = nvme_spec::AdminOpcode(command.cdw0.opcode());
match opcode {
nvme_spec::AdminOpcode::CREATE_IO_COMPLETION_QUEUE => {
// Overwrite the previous cid to cause a panic.
command.cdw0.set_cid(0);
QueueFaultBehavior::Update(command)
}
_ => QueueFaultBehavior::Default,
}
}

async fn fault_completion_queue(
&self,
_completion: Completion,
) -> QueueFaultBehavior<Completion> {
tracing::info!("Faulting completion queue using delay");
PolledTimer::new(&self.driver)
.sleep(Duration::from_millis(100))
.await;
QueueFaultBehavior::Default
}
}

#[async_test]
#[should_panic(expected = "assertion `left == right` failed: cid sequence number mismatch:")]
async fn test_nvme_command_fault(driver: DefaultDriver) {
let task_driver = VmTaskDriverSource::new(SingleDriverBackend::new(driver.clone())).simple();

test_nvme_fault_injection(
driver,
FaultConfiguration {
admin_fault: Some(Box::new(AdminQueueFault {
driver: task_driver,
})),
},
)
.await;
}

#[async_test]
async fn test_nvme_driver_direct_dma(driver: DefaultDriver) {
test_nvme_driver(driver, true).await;
Expand Down Expand Up @@ -309,6 +363,62 @@ async fn test_nvme_save_restore_inner(driver: DefaultDriver) {
// .unwrap();
}

async fn test_nvme_fault_injection(driver: DefaultDriver, fault_configuration: FaultConfiguration) {
const MSIX_COUNT: u16 = 2;
const IO_QUEUE_COUNT: u16 = 64;
const CPU_COUNT: u32 = 64;

// Arrange: Create 8MB of space. First 4MB for the device and second 4MB for the payload.
let pages = 1024; // 4MB
let device_test_memory = DeviceTestMemory::new(pages * 2, false, "test_nvme_driver");
let guest_mem = device_test_memory.guest_memory(); // Access to 0-8MB
let dma_client = device_test_memory.dma_client(); // Access 0-4MB
let payload_mem = device_test_memory.payload_mem(); // allow_dma is false, so this will follow the 'normal' test path (i.e. with bounce buffering behind the scenes)

// Arrange: Create the NVMe controller and driver.
let driver_source = VmTaskDriverSource::new(SingleDriverBackend::new(driver));
let mut msi_set = MsiInterruptSet::new();
let nvme = nvme_test::NvmeFaultController::new(
&driver_source,
guest_mem.clone(),
&mut msi_set,
&mut ExternallyManagedMmioIntercepts,
nvme_test::NvmeFaultControllerCaps {
msix_count: MSIX_COUNT,
max_io_queues: IO_QUEUE_COUNT,
subsystem_id: Guid::new_random(),
},
fault_configuration,
);

nvme.client() // 2MB namespace
.add_namespace(1, disklayer_ram::ram_disk(2 << 20, false).unwrap())
.await
.unwrap();
let device = NvmeTestEmulatedDevice::new(nvme, msi_set, dma_client.clone());
let driver = NvmeDriver::new(&driver_source, CPU_COUNT, device, false)
.await
.unwrap();
let namespace = driver.namespace(1).await.unwrap();

// Act: Write 1024 bytes of data to disk starting at LBA 1.
let buf_range = OwnedRequestBuffers::linear(0, 16384, true); // 32 blocks
payload_mem.write_at(0, &[0xcc; 4096]).unwrap();
namespace
.write(
0,
1,
2,
false,
&payload_mem,
buf_range.buffer(&payload_mem).range(),
)
.await
.unwrap();

driver.shutdown().await;
}

#[derive(Inspect)]
pub struct NvmeTestEmulatedDevice<T: InspectMut, U: DmaClient> {
device: EmulatedDevice<T, U>,
Expand Down
2 changes: 1 addition & 1 deletion vm/devices/storage/nvme/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@ guid.workspace = true
inspect.workspace = true
mesh.workspace = true
pal_async.workspace = true
task_control.workspace = true
async-trait.workspace = true
event-listener.workspace = true
futures.workspace = true
futures-concurrency.workspace = true
parking_lot.workspace = true
task_control.workspace = true
thiserror.workspace = true
tracelimit.workspace = true
tracing.workspace = true
Expand Down
2 changes: 1 addition & 1 deletion vm/devices/storage/nvme_spec/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ open_enum! {
}

#[repr(C)]
#[derive(Debug, IntoBytes, Immutable, KnownLayout, FromBytes)]
#[derive(Debug, Clone, IntoBytes, Immutable, KnownLayout, FromBytes)]
pub struct Completion {
pub dw0: u32,
pub dw1: u32,
Expand Down
45 changes: 45 additions & 0 deletions vm/devices/storage/nvme_test/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

[package]
name = "nvme_test"
edition.workspace = true
rust-version.workspace = true

[dependencies]
disk_backend.workspace = true
nvme_common.workspace = true
nvme_resources.workspace = true
nvme_spec.workspace = true
scsi_buffers.workspace = true

device_emulators.workspace = true
pci_core.workspace = true
pci_resources.workspace = true

chipset_device.workspace = true
guestmem.workspace = true
vmcore.workspace = true
vm_resource.workspace = true

guid.workspace = true
inspect.workspace = true
mesh.workspace = true
pal_async.workspace = true
async-trait.workspace = true
event-listener.workspace = true
futures.workspace = true
futures-concurrency.workspace = true
parking_lot.workspace = true
task_control.workspace = true
thiserror.workspace = true
tracelimit.workspace = true
tracing.workspace = true
unicycle.workspace = true
zerocopy = { workspace = true, features = ["alloc"] }

[dev-dependencies]
user_driver.workspace = true

[lints]
workspace = true
78 changes: 78 additions & 0 deletions vm/devices/storage/nvme_test/src/error.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

//! Error and result related types.

use crate::spec;
use std::error::Error;

/// An NVMe error, consisting of a status code and optional error source.
#[derive(Debug)]
pub struct NvmeError {
status: spec::Status,
source: Option<Box<dyn Error + Send + Sync>>,
}

impl NvmeError {
pub fn new(status: spec::Status, source: impl Into<Box<dyn Error + Send + Sync>>) -> Self {
Self {
status,
source: Some(source.into()),
}
}
}

impl Error for NvmeError {
fn source(&self) -> Option<&(dyn Error + 'static)> {
self.source.as_ref().map(|x| x.as_ref() as _)
}
}

impl std::fmt::Display for NvmeError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self.status.status_code_type() {
spec::StatusCodeType::GENERIC => {
write!(f, "general error {:#x?}", self.status)
}
spec::StatusCodeType::COMMAND_SPECIFIC => {
write!(f, "command-specific error {:#x?}", self.status)
}
spec::StatusCodeType::MEDIA_ERROR => {
write!(f, "media error {:#x?}", self.status)
}
_ => write!(f, "{:#x?}", self.status),
}
}
}

impl From<spec::Status> for NvmeError {
fn from(status: spec::Status) -> Self {
NvmeError {
status,
source: None,
}
}
}

/// The result of an NVMe command.
#[derive(Default)]
pub struct CommandResult {
pub status: spec::Status,
pub dw: [u32; 2],
}

impl<T: Into<NvmeError>> From<T> for CommandResult {
fn from(status: T) -> Self {
CommandResult::new(status, [0; 2])
}
}

impl CommandResult {
pub fn new(status: impl Into<NvmeError>, dw: [u32; 2]) -> Self {
let status = status.into();
Self {
status: status.status,
dw,
}
}
}
Loading
Loading