Rudimentary NVMe emulation fuzzer #966

iximeow · 2025-10-28T18:20:02Z

This tries doing a bunch of random operations against an NVMe device and checks the operations against a limited model of what the results of those operations should be.

The initial stab at this is what caught #965, and it caught a bug in an intermediate state of #953 (which other phd tests did notice anyway). This fuzzing would probably be best with actual I/O operations mixed in, and I think that should be relatively straightforward to add from here., but as-is it's useful!

This would probably be best phrased as a cargo-fuzz test to at least get coverage-guided fuzzing. Because of the statefulness of NVMe I think either way we'd want the model of expected device state and a pick-actions-then-run execution to further guide cargo-fuzz into useful parts of the device state.

The initial approach at this allowed for device reset and migration at arbitrary times via a separate thread. When that required synchronizing the model of device state it was effectively interleaved with "guest" operations on the device, and in practice admin commands are serialized by the NvmeCtrl state lock anyway. It may be more interesting to revisit with concurrent I/O operations on submission/completion queues.

This tries doing a bunch of random operations against an NVMe device and checks the operations against a limited model of what the results of those operations should be. The initial stab at this is what caught #965, and it caught a bug in an intermediate state of #953 (which other phd tests did notice anyway). This fuzzing would probably be best with actual I/O operations mixed in, and I think that *should* be relatively straightforward to add from here., but as-is it's useful! This would probably be best phrased as a `cargo-fuzz` test to at least get coverage-guided fuzzing. Because of the statefulness of NVMe I think either way we'd want the model of expected device state and a pick-actions-then-run execution to further guide `cargo-fuzz` into useful parts of the device state. The initial approach at this allowed for device reset and migration at arbitrary times via a separate thread. When that required synchronizing the model of device state it was effectively interleaved with "guest" operations on the device, and in practice admin commands are serialized by the `NvmeCtrl` state lock anyway. It may be more interesting to revisit with concurrent I/O operations on submission/completion queues.

iximeow · 2025-10-28T18:22:10Z

lib/propolis/src/hw/nvme/test.rs

+
+    let mut rng = Pcg64::seed_from_u64(seed);
+
+    for _ in 0..1_000 {


cargo test gets through this in about 4 seconds, but cargo test --release is .. almost immediate. I'd been running this at 100k iterations instead, there. Even a thousand seems like an OK place to be for CI?

mayhaps an env var?

iximeow · 2025-11-13T21:50:21Z

lib/propolis/src/hw/nvme/test.rs

+/// any point as `PciNvme` technically does. (In practice, reset immediately
+/// locks the inner `NvmeCtrl` to do the reset, for administrative options it is
+/// effectively serialized anyway.)
+struct FuzzCtx {


I'd talked with Patrick just a bit about the idea of having a more composable fuzzing framework as part of Propolis (the library). the implementation in this file is simultaneously:

an NVMe driver, which is driven ~randomly to exercise propolis/src/hw/nvme

a model of the NVMe controller state, based on the driver's operations (what queues are created, is the device initialized, etc)

a model of the synthesis of NVMe spec and controller model: given the current state, what NVMe operations should have what outcomes, and what state transitions are permissible from the current state?

you could swap out the word "NVMe" above for any other devices (the chipset would be interesting!). a reasonable thing to want would be fuzzing a pair of NVMe devices concurrently on the same PCI bridge. or poking a disk and NIC concurrently, or configuring a pair of NVMe devices to write on each others' queues, or ...

in the limit this seems to me like assembling an increasingly chaotic VM and configuring how chaotic it should be. I don't plan on adjusting this in that direction right now, but it seems like an interesting future direction this could take.

I like this idea, but I also agree that we should try to get this in before generalizing it.

jordanhendricks

I gave this a look and broadly it looks good, but I think I'm lacking some understanding of how we intend to use this. Is the idea that we will run a fuzzer in CI? Or as a standalone tool?

hawkw

neat!

hawkw · 2025-11-20T21:09:54Z

lib/propolis/src/hw/nvme/test.rs

+/// any point as `PciNvme` technically does. (In practice, reset immediately
+/// locks the inner `NvmeCtrl` to do the reset, for administrative options it is
+/// effectively serialized anyway.)
+struct FuzzCtx {


I like this idea, but I also agree that we should try to get this in before generalizing it.

hawkw · 2025-11-20T21:13:55Z

lib/propolis/src/hw/nvme/test.rs

+        // 64 MB feels like a reasonable (but very tiny!) size for a test
+        // disk.


are there compelling reasons to vary the block size and size of the test disk, in future?

hawkw · 2025-11-20T21:21:14Z

lib/propolis/src/hw/nvme/test.rs

+    // I/O submission/completion queues are interleaved (for fun more than
+    // anything else). With 256kb of memory for queues we can have
+    // up to 64 I/O queues in the form of 32 submission and completion
+    // queues.


turbo nitpick: mayhaps we could have a const IO_QUEUE_REGION_SIZE = 256 and derive the 32 et al from that (and use that 256 in the address calculation above)? that way there's one constant to mess with and everything else just works.

iximeow requested review from hawkw and pfmooney October 28, 2025 18:20

iximeow commented Oct 28, 2025

View reviewed changes

clippy...

cb63659

iximeow requested review from jordanhendricks and removed request for pfmooney November 13, 2025 21:38

iximeow commented Nov 13, 2025

View reviewed changes

jordanhendricks reviewed Nov 20, 2025

View reviewed changes

hawkw reviewed Nov 20, 2025

View reviewed changes

iximeow mentioned this pull request Dec 9, 2025

NVMe devices don't tolerate migration #981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rudimentary NVMe emulation fuzzer #966

Rudimentary NVMe emulation fuzzer #966

Uh oh!

iximeow commented Oct 28, 2025

Uh oh!

iximeow Oct 28, 2025

Uh oh!

hawkw Nov 20, 2025

Uh oh!

iximeow Nov 13, 2025

Uh oh!

hawkw Nov 20, 2025

Uh oh!

jordanhendricks left a comment

Uh oh!

hawkw left a comment

Uh oh!

hawkw Nov 20, 2025

Uh oh!

hawkw Nov 20, 2025

Uh oh!

hawkw Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		let mut rng = Pcg64::seed_from_u64(seed);

		for _ in 0..1_000 {

		// 64 MB feels like a reasonable (but very tiny!) size for a test
		// disk.

Rudimentary NVMe emulation fuzzer #966

Are you sure you want to change the base?

Rudimentary NVMe emulation fuzzer #966

Uh oh!

Conversation

iximeow commented Oct 28, 2025

Uh oh!

iximeow Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

hawkw Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

iximeow Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

hawkw Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

jordanhendricks left a comment

Choose a reason for hiding this comment

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

hawkw Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

hawkw Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

hawkw Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants