Hypervisor Testing Research Papers

A systematic collection of research papers on hypervisor testing and fuzzing, including virtual device testing, vCPU emulation, hypercall interfaces, and nested virtualization. This repository accompanies our survey paper “Hypervisor Testing: Techniques, Challenges, and Future Directions”. Feel free to make contributions by creating pull requests.

Paper Collection Methodology

We followed a rigorous literature review protocol adapted from Kitchenham’s guidelines:

Database Search: ACM Digital Library, IEEE Xplore, USENIX, DBLP, Semantic Scholar

Search Query:

("Hypervisor" OR "VMM" OR "QEMU" OR "KVM" OR "Xen" OR "Hyper-V" OR "VirtualBox" OR "Virtual Device")
AND ("Fuzzing" OR "Fuzz Testing" OR "Security Testing" OR "Vulnerability Detection" OR "Symbolic Execution")

Venue Filter: Top-tier security (S&P, USENIX Security, CCS, NDSS), systems (OSDI, SOSP, EuroSys, ATC), and software engineering (ICSE, FSE, ASE) conferences.

Snowballing: Backward (references) and forward (Google Scholar citations) until saturation.

Tool Collection: GitHub search with star ranking and activity filtering.

All Papers (By Year)

2026

EuroSys

NecoFuzz: Effective Fuzzing of Nested Virtualization via Fuzz-Harness Virtual Machines
- Authors: Ishii et al. (University of Tokyo)
- Target: KVM, Xen, VirtualBox (Nested Virtualization)
- Findings: 6 vulnerabilities (CVE-2023-30456, CVE-2024-21106)

NDSS

HyperMirage: Direct State Manipulation in Hybrid Virtual CPU Fuzzing [pdf]
- Authors: Andreas et al.
- Target: Multiple hypervisors (vCPU emulation)

2025

NDSS

Truman: Constructing Device Behavior Models from OS Drivers to Fuzz Virtual Devices [pdf]
- Authors: Ma et al.
- Target: QEMU, VirtualBox, VMware Workstation Pro, Parallels
- Findings: 54 new bugs, 6 CVEs

ICSE

InSVDF: Interface-State-Aware Virtual Device Fuzzing [pdf]
- Authors: Zhang et al.
- Target: QEMU
- Findings: 2 new vulnerabilities, 1 CVE

2024

USENIX Security

HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface [pdf]
- Authors: Bulekov et al.
- Target: Multiple hypervisors (universal approach)

2023

S&P (IEEE Symposium on Security and Privacy)

ViDeZZo: Dependency-aware Virtual Device Fuzzing [pdf]
- Authors: Liu et al.
- Target: QEMU, VirtualBox (28 virtual devices across 4 architectures)
- Findings: 24 existing + 28 new bugs, 7 patches accepted

ASE

VD-Guard: DMA Guided Fuzzing for Hypervisor Virtual Device [pdf]
- Authors: Liu et al.
- Target: QEMU, VirtualBox
- Findings: 4 new vulnerabilities, 3 CVEs

DSN

IRIS: A Record and Replay Framework to Enable Hardware-assisted Virtualization Fuzzing [pdf]
- Authors: Cesarano et al. (Federico II University of Naples)
- Target: Xen hypervisor
- GitHub: https://github.com/dessertlab/iris

2022

USENIX Security

Morphuzz: Bending (Input) Space to Fuzz Virtual Devices [pdf]
- Authors: Bulekov et al.
- Target: QEMU
MundoFuzz: Hypervisor Fuzzing with Statistical Coverage Testing and Grammar Inference [pdf]
- Authors: Myung et al.
- Target: Multiple hypervisors

EuroSys

Nyx-Net: Network Fuzzing with Incremental Snapshots [pdf]
- Authors: Schumilo et al.
- Target: Network services (extends Nyx framework)
- Findings: Bugs in Lighttpd, MySQL client, Firefox IPC

2021

USENIX Security

Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types [pdf]
- Authors: Schumilo et al.
- Target: QEMU/KVM, bhyve

CCS

V-Shuttle: Scalable and Semantics-Aware Hypervisor Virtual Device Fuzzing [pdf]
- Authors: Pan et al.
- Target: QEMU, VirtualBox
- Findings: 35 new bugs, 17 CVEs
HyperFuzzer: An Efficient Hybrid Fuzzer for Virtual CPUs [pdf]
- Authors: Ge et al. (Microsoft Research)
- Target: Microsoft Hyper-V (vCPU emulation)
- Findings: 11 previously unknown bugs

Black Hat USA

hAFL1: Our Journey of Fuzzing Hyper-V and Discovering a Critical 0-Day [slides]
- Authors: Harpaz & Hadar (Guardicore, SafeBreach)
- Target: Microsoft Hyper-V (vmswitch.sys)
- Findings: CVE-2021-28476 (CVSS 9.9)

2020

NDSS

HYPER-CUBE: High-Dimensional Hypervisor Fuzzing [pdf]
- Authors: Schumilo et al.
- Target: QEMU, VirtualBox, ACRN, bhyve, VMware Fusion
- Findings: 54 novel bugs, 43 CVEs

2017

RAID

VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices [pdf]
- Authors: Henderson et al.
- Target: QEMU

Papers by Testing Target

Virtual Device Testing

Virtual devices are the primary attack surface of hypervisors, exposing interfaces for MMIO/PIO operations, DMA transfers, and interrupt handling.

HYPER-CUBE: High-Dimensional Hypervisor Fuzzing (NDSS ‘20) [pdf]
Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types (USENIX Security ‘21) [pdf]
V-Shuttle: Scalable and Semantics-Aware Hypervisor Virtual Device Fuzzing (CCS ‘21) [pdf]
hAFL1: Our Journey of Fuzzing Hyper-V and Discovering a Critical 0-Day (Black Hat USA ‘21)
Morphuzz: Bending (Input) Space to Fuzz Virtual Devices (USENIX Security ‘22) [pdf]
ViDeZZo: Dependency-aware Virtual Device Fuzzing (S&P ‘23) [pdf]
VD-Guard: DMA Guided Fuzzing for Hypervisor Virtual Device (ASE ‘23) [pdf]
Truman: Constructing Device Behavior Models from OS Drivers to Fuzz Virtual Devices (NDSS ‘25) [pdf]
InSVDF: Interface-State-Aware Virtual Device Fuzzing (ICSE ‘25) [pdf]
VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices (RAID ‘17) [pdf]

vCPU Emulation Testing

vCPU emulation involves instruction decoding, operand handling, privilege checks, and exception injection. Vulnerabilities can cause incorrect guest execution or enable guest-to-host escape.

HyperFuzzer: An Efficient Hybrid Fuzzer for Virtual CPUs (CCS ‘21) [pdf]
HyperMirage: Direct State Manipulation in Hybrid Virtual CPU Fuzzing (NDSS ‘26)

Hypercall and VM-Exit Testing

Hypercalls provide a direct interface for guest-to-hypervisor communication, while VM-exits transfer control to the hypervisor for privileged operations.

MundoFuzz: Hypervisor Fuzzing with Statistical Coverage Testing and Grammar Inference (USENIX Security ‘22) [pdf]
HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface (USENIX Security ‘24) [pdf]

Nested Virtualization Testing

Nested virtualization enables running hypervisors inside VMs, introducing additional complexity in VMCS shadowing, nested page table management, and VM-exit handling.

IRIS: A Record and Replay Framework to Enable Hardware-assisted Virtualization Fuzzing (DSN ‘23) [pdf]
NecoFuzz: Effective Fuzzing of Nested Virtualization via Fuzz-Harness Virtual Machines (EuroSys ‘26) [pdf]

Papers by Technique

Coverage-Guided Fuzzing

Approaches that use code coverage feedback to guide input generation and explore new execution paths.

HYPER-CUBE: High-Dimensional Hypervisor Fuzzing (NDSS ‘20) [pdf]
Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types (USENIX Security ‘21) [pdf]
Nyx-Net: Network Fuzzing with Incremental Snapshots (EuroSys ‘22) [pdf]
MundoFuzz: Hypervisor Fuzzing with Statistical Coverage Testing and Grammar Inference (USENIX Security ‘22) [pdf]

Grammar and Dependency-Aware Fuzzing

Approaches that leverage protocol specifications, message dependencies, or device behavior models to generate semantically valid inputs.

V-Shuttle: Scalable and Semantics-Aware Hypervisor Virtual Device Fuzzing (CCS ‘21) [pdf]
ViDeZZo: Dependency-aware Virtual Device Fuzzing (S&P ‘23) [pdf]
MundoFuzz: Hypervisor Fuzzing with Statistical Coverage Testing and Grammar Inference (USENIX Security ‘22) [pdf]
Truman: Constructing Device Behavior Models from OS Drivers to Fuzz Virtual Devices (NDSS ‘25) [pdf]
hAFL1: Our Journey of Fuzzing Hyper-V and Discovering a Critical 0-Day (Black Hat USA ‘21)
DMA-Centric Approaches

Approaches that specifically target DMA (Direct Memory Access) handling in virtual devices.

Morphuzz: Bending (Input) Space to Fuzz Virtual Devices (USENIX Security ‘22) [pdf]
VD-Guard: DMA Guided Fuzzing for Hypervisor Virtual Device (ASE ‘23) [pdf]
InSVDF: Interface-State-Aware Virtual Device Fuzzing (ICSE ‘25) [pdf]

Hybrid Fuzzing with Symbolic Execution

Approaches that combine fuzzing with symbolic execution to systematically explore complex code paths.

HyperFuzzer: An Efficient Hybrid Fuzzer for Virtual CPUs (CCS ‘21) [pdf]
- Uses “Nimble Symbolic Execution” with Intel PT for efficient vCPU testing

Trace-Based and Replay Approaches

Approaches that use execution traces or record-and-replay mechanisms.

VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices (RAID ‘17) [pdf]
IRIS: A Record and Replay Framework to Enable Hardware-assisted Virtualization Fuzzing (DSN ‘23) [pdf]

Universal and Black-Box Approaches

Approaches designed to work across multiple hypervisors without requiring source code access or hypervisor-specific modifications.

HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface (USENIX Security ‘24) [pdf]
NecoFuzz: Effective Fuzzing of Nested Virtualization via Fuzz-Harness Virtual Machines (EuroSys ‘26)[pdf]

Target Hypervisors Summary

Hypervisor	Papers
QEMU/KVM	HYPER-CUBE, Nyx, Morphuzz, V-Shuttle, ViDeZZo, VD-Guard, Truman, InSVDF, VDF, NecoFuzz
VirtualBox	HYPER-CUBE, V-Shuttle, ViDeZZo, VD-Guard, Truman, NecoFuzz
Hyper-V	HyperFuzzer, hAFL1
Xen	IRIS, NecoFuzz
VMware	HYPER-CUBE (Fusion), Truman (Workstation Pro)
bhyve	HYPER-CUBE, Nyx
ACRN	HYPER-CUBE
Parallels	Truman

Bug Discovery Statistics

Tool	Venue	New Bugs	CVEs
HYPER-CUBE	NDSS ‘20	54	43
V-Shuttle	CCS ‘21	35	17
HyperFuzzer	CCS ‘21	11	-
hAFL1	Black Hat ‘21	1	1 (CVSS 9.9)
ViDeZZo	S&P ‘23	52	7+
VD-Guard	ASE ‘23	4	3
Truman	NDSS ‘25	54	6
InSVDF	ICSE ‘25	2	1
NecoFuzz	EuroSys ‘26	6	2

Open-Source Tools

Tool	Repository	Status
HYPER-CUBE	RUB-SysSec/hypercube	Available
Nyx	nyx-fuzz/Nyx	Available
Morphuzz	QEMU upstream	Merged
V-Shuttle	hustdebug/v-shuttle	Available
ViDeZZo	HexHive/ViDeZZo	Available
IRIS	dessertlab/iris	Available
Truman	truman	Available

Foundational Tools

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels (USENIX Security ‘17) [pdf] - Foundation for many hypervisor fuzzers
AFL: American Fuzzy Lop - Core mutation strategies used by many tools
Intel PT: Hardware tracing used for coverage feedback

Seven-Dimensional Taxonomy

We propose a unified taxonomy for classifying hypervisor testing techniques. Each dimension represents an orthogonal design axis.

Dimension	Question	Options
D1: Target	What component is tested?	Virtual devices, Hypercalls/VM-exits, vCPU emulation, Core subsystems
D2: Input Model	What is the input abstraction?	Raw bytes, Structured messages, I/O op sequences, Instruction+CPU state, Full VM state
D3: Input Source	Where do seeds come from?	Pattern/random, Trace-based, Specification-based, Inference-based, Driver-derived
D4: Instrumentation	How is execution observed?	Compile-time, Hardware tracing (Intel PT), Dynamic binary instrumentation, Emulation-based
D5: Feedback	What signals guide fuzzing?	Code coverage, State coverage, Interface coverage, Differential/semantic, Hybrid
D6: Execution & Reset	How is state managed?	VM snapshot, Fork-based (CoW), Full reboot, Nested virtualization
D7: Oracle	What counts as a bug?	Crash/hang, Sanitizers, Invariant violation, Differential divergence

Design Trade-offs

Four fundamental trade-offs govern hypervisor testing tool design:

Trade-off 1: Generality vs. Depth

Universal fuzzers (HyperPill): Work across multiple hypervisors but achieve shallower testing
Specialized fuzzers (V-Shuttle, HyperFuzzer): Achieve deeper testing through target-specific optimizations
Principle: Start broad, go deep - use universal approaches for initial assessment, then specialize

Trade-off 2: Structure vs. Speed

Richer input models (grammar-based, driver-derived): More valid inputs but higher generation overhead
Simpler models (raw bytes): Higher throughput but more invalid inputs rejected by parsers
Principle: Match input complexity to protocol complexity

Trade-off 3: Observability vs. Deployability

Maximum observability (emulation-based): 10-100x overhead but universal support
Hardware tracing (Intel PT): <5% overhead but requires specific hardware
Principle: Use minimum sufficient instrumentation

Trade-off 4: Reset Fidelity vs. Throughput

Fork-based (Morphuzz, ViDeZZo): Sub-millisecond reset but only user-space state
Snapshot-based (NYX): 1-10ms reset with full VM state isolation
Principle: Isolate what matters - fork for device fuzzing, snapshot for cross-device testing

Open Challenges

Challenge	Current Limitation	Potential Approach
State Space Explosion	Exponential growth in device states	Abstract interpretation, state hashing
Semantic Validity	Manual specification effort doesn’t scale	LLM-assisted inference, driver analysis
Coverage Noise	Non-deterministic signals from interrupts/timers	Statistical filtering, deterministic replay
Cross-Platform Portability	Architecture-specific tools (x86-centric)	Hardware interface abstraction
Scalable Triage	Manual crash analysis at scale	Automated root cause clustering
Emerging Architectures	Limited ARM/RISC-V support	ARM CoreSight, portable frameworks

Research Gaps by Attack Surface

Attack Surface	Papers	Gap Analysis
Virtual Devices	12 (71%)	Well-studied but complex protocols (NVMe, virtio-gpu) underexplored
vCPU Emulation	2 (12%)	Severely underexplored - extension instruction sets (AVX-512, SGX) untested
Hypercalls/VM-Exit	2 (12%)	Severely underexplored - systematic hypercall sequence testing missing
Core Subsystems	0 (0%)	Completely unexplored - MMU virtualization, scheduling, IOMMU

Evaluation Guidelines

Common Pitfalls (from our survey analysis)

Pitfall	Prevalence	Recommendation
Throughput without coverage context	41%	Report effective coverage rate alongside throughput
Device count without complexity classification	53%	Classify devices by complexity (simple/medium/complex)
CVE count without severity/deduplication	65%	Report bugs with root cause and CVSS severity
Snapshot configuration details omitted	47%	Specify guest memory, timing, enabled devices
Non-standardized time budgets	59%	Use 1h for quick comparison, 24h for thorough evaluation
Missing or inadequate baselines	35%	Compare against at least one prior tool

Recommended Reporting Checklist

Category	Required Information
Target	Hypervisor name/version; device list with complexity; commit hash
Configuration	Guest memory size; snapshot timing; enabled devices; instrumentation flags
Metrics	Edge coverage over time; throughput with context; per-device breakdown
Bugs	Deduplication method; root cause classification; severity (CVSS)
Reproducibility	Seeds and configurations; Docker/VM image; expected coverage range
Baselines	At least one prior tool on same targets/budget
Statistics	Multiple runs (>=5); mean and variance; significance tests

Contributing

Contributions are welcome:

Adding new papers
Updating paper information (links, findings)
Suggesting improvements to categorization

License

This documentation is licensed under CC BY-NC 4.0. Individual papers retain their original copyrights.

Hypervisor Testing Research Papers

Paper Collection Methodology

Contents

By Year

By Testing Target

By Technique

All Papers (By Year)

2026

EuroSys

NDSS

2025

NDSS

ICSE

2024

USENIX Security

2023

S&P (IEEE Symposium on Security and Privacy)

ASE

DSN

2022

USENIX Security

EuroSys

2021

USENIX Security

CCS

Black Hat USA

2020

NDSS

2017

RAID

Papers by Testing Target

Virtual Device Testing

vCPU Emulation Testing

Hypercall and VM-Exit Testing

Nested Virtualization Testing

Papers by Technique

Coverage-Guided Fuzzing

Grammar and Dependency-Aware Fuzzing

DMA-Centric Approaches

Hybrid Fuzzing with Symbolic Execution

Trace-Based and Replay Approaches

Universal and Black-Box Approaches

Target Hypervisors Summary

Bug Discovery Statistics

Open-Source Tools

Related Resources

Foundational Tools

Seven-Dimensional Taxonomy

Design Trade-offs

Trade-off 1: Generality vs. Depth

Trade-off 2: Structure vs. Speed

Trade-off 3: Observability vs. Deployability

Trade-off 4: Reset Fidelity vs. Throughput

Open Challenges

Research Gaps by Attack Surface

Evaluation Guidelines

Common Pitfalls (from our survey analysis)

Recommended Reporting Checklist

Contributing

License