nbd-vram

This is `alpha` software. `USE AT OWN RISK`

If you run into trouble booting at any point, simply switch to a TTY session and run the uninstall script sudo bash uninstall.sh from the path you originally installed NBD-VRAM from. That will get you back to a working state and allow you the opportunity to file an issue for me to take a look at. Please include as much information as possible in your bug report. System specifications, display adapter configurations (single GPU, hybrid, modes etc.) and concise logs where possible.

Use your NVIDIA GPU's VRAM as swap space on Linux.

Built for hybrid graphics laptops with soldered memory and no upgrade path. The display runs off the integrated AMD/ATI GPU. The NVIDIA card sits idle most of the time, its VRAM completely unused. This puts that VRAM to work as high-priority swap.

Tested on: AMD/ATI + RTX 3070 Laptop (GA104M, 16 GB RAM, 8 GB VRAM), driver 580.159.03, kernel 6.17, Pop!_OS. Allocated 7 GB for swap. End result including zram and SSD swap: ~46 GB total addressable memory, tripled from stock. Overflow order: RAM fills, then VRAM absorbs the spill (PCIe), then zram compresses the rest (CPU), then SSD only if everything else is exhausted.

How it works

A small daemon allocates VRAM via the CUDA driver API, then serves it as a block device using the NBD (Network Block Device) protocol over a Unix socket. The kernel's built-in nbd driver connects to it and exposes /dev/nbdX. From there it's a normal swap device.

Data path: kernel swap subsystem - /dev/nbdX - nbd kernel driver - Unix socket - nbd-vram daemon - cuMemcpyHtoD/DtoH - GPU VRAM.

No kernel module to write or maintain. No NVIDIA kernel symbols. Survives kernel and driver updates without rebuilding anything.

What it's for

NBD-VRAM is a free, low-wear tier of swap that makes a machine feel better under memory pressure. The benchmarks below don't fully capture why, because what you actually feel day to day is latency. When a process touches a paged-out page it stalls until swap answers: on an NVMe waking from a power-saving sleep that stall is milliseconds, felt as a stutter; on VRAM it is microseconds, with no stutter at all. Same idea as any swap, smoother machine, out of memory that was otherwise sitting idle.

Best of all, there's nothing you need to tune to get it to work as thread count auto-scales to your CPU.

Three main situations where you can benefit from NBD-VRAM:

Sporadic pressure a background app paging in, or switching back to something you left open hours ago. Single faults answered in ~250 us instead of milliseconds, so no stutter.

Concurrent pressure parallel compiles (make -j), dozens of browser tabs, VMs, data jobs spilling RAM. Many CPUs faulting at once, where the daemon's threads fan the I/O across all its connections and keep up.

Spare your SSD swap is write-heavy, and sending that churn to VRAM instead of finite NAND saves write cycles, which matters most on the soldered-storage laptops this is built for.

Why not the NVIDIA P2P API?

The "obvious" approach is nvidia_p2p_get_pages_persistent, which pins VRAM pages in BAR1 so the CPU can access them directly via ioremap_wc. Every existing project that tried this route hits the same wall: the NVIDIA driver returns EINVAL on consumer GeForce GPUs. Both the persistent and non-persistent variants, both flag values. It's gated at the RM level for Quadro/datacenter SKUs only, regardless of driver version.

The other approach - directly ioremap_wc the BAR1 physical address without going through the P2P API - also doesn't work. The GPU's internal page tables only have ~16 MiB of BAR1 mapped (just the display framebuffer). Reads from the rest return zeros. mkswap appears to succeed, then swapon fails because the swap header isn't actually there.

The NBD approach sidesteps all of this. cuMemcpyHtoD and cuMemcpyDtoH work on any CUDA GPU without any special permissions.

Requirements

NVIDIA GPU with CUDA support (any consumer RTX/GTX card)
NVIDIA driver with libcuda.so.1 (no CUDA toolkit needed)
Linux kernel 3.0+ (nbd module, built into most distros)
nbd-client package
gcc, make

Install

git clone https://github.com/c0dejedi/nbd-vram
cd nbd-vram
sudo ./install.sh
sudo systemctl start vram-swap-nbd

Verify:

swapon --show
# NAME       TYPE      SIZE USED PRIO
# /dev/nbd0  partition   7G   0B 1500

The service is enabled on install, so it comes up automatically on every boot.

Configuration

Edit /etc/systemd/system/vram-swap-nbd.service:

Environment=VRAM_SETUP_SIZE_MB=7168    # how much VRAM to use
Environment=VRAM_SWAP_PRIORITY=1500    # swap priority (higher = used first)
Environment=VRAM_NBD_THREADS=8         # worker threads; install.sh sets this to nproc
Environment=VRAM_NBD_CONNECTIONS=8     # nbd connections; keep equal to threads

The daemon tries the requested size first and backs off in 512 MiB steps if the GPU is short on memory - so it will grab as much as it can even if the display compositor is already loaded. VRAM_SETUP_SIZE_MB is the ceiling, not a hard requirement.

VRAM_NBD_THREADS / VRAM_NBD_CONNECTIONS are auto-set to nproc at install and should match each other. More connections let the daemon drain concurrent swap I/O in parallel; the benefit saturates around your physical core count, and single-stream workloads do not use it at all (see Performance).

After changing, run sudo systemctl daemon-reload && sudo systemctl restart vram-swap-nbd.

Power management

The installer asks whether to enable power-aware management on first install. If enabled, the service automatically stops when you unplug from AC (or when battery drops below a threshold), and restarts when power is restored. Manual systemctl stop is always respected and won't be overridden.

To change settings after install, edit /etc/nbd-vram.conf. Changes take effect on the next poll (within 60 seconds) or immediately on the next AC plug/unplug event.

Using the GPU at the same time

NBD-VRAM uses your VRAM, so while it is active that memory is not available to anything else on the card. The installer asks how much to allocate and suggests an amount based on your setup: if this card does not drive a display (a hybrid laptop, or a workstation with a separate display GPU) it recommends nearly all of the VRAM, since the card is otherwise idle; if it does drive your display, it leaves headroom for the desktop and games. Whatever you choose, a GPU app started afterwards only gets the slice that is left, and if memory is being swapped while the GPU is busy the card does both jobs at once - rendering and serving swap copies - and neither is happy.

If you want to use the GPU heavily at the same time, pick one:

Leave headroom choose a smaller allocation when the installer asks, or change VRAM_SETUP_SIZE_MB in /etc/systemd/system/vram-swap-nbd.service later (e.g. 4096 keeps 4 GB for the GPU).

Stop it while you need the card sudo systemctl stop vram-swap-nbd, then start it again afterwards. Swapped pages migrate back to RAM and other swap first.

This is the natural trade-off of swapping to VRAM: it is free memory, right up until you want the GPU for something else.

Smoke test (without installing)

sudo bash test-nbd.sh

Allocates VRAM, connects the NBD device, does a 1 MiB write/readback check, activates swap, then prints teardown instructions. install.sh handles teardown automatically if a test instance is running.

To stress the full partition after the smoke test passes:

sudo bash test-fill.sh

Writes the entire VRAM partition with zeros, verifies a sample read back, then auto-restores swap on exit.

Memory safety

The daemon backs a swap device, which creates two subtle deadlock risks under heavy memory pressure. Both froze early builds; both are now handled. Together they are why the daemon stays responsive while the entire VRAM swap fills at zero free RAM.

The kernel must never page out the daemon's own memory, or a fault on it would route back through the busy daemon and hang. The daemon pins all its pages in RAM with mlockall(MCL_CURRENT | MCL_FUTURE).
While serving a swap write the daemon still needs to allocate memory, and at zero free RAM that allocation would normally trigger reclaim - which is itself waiting on the very write the daemon is doing. The daemon marks itself with prctl(PR_SET_IO_FLUSHER) (the same mechanism the kernel uses for the nbd socket, and what NFS-Ganesha and libfuse use) so its allocations never fall into that trap. It also runs with OOMScoreAdjust=-1000, so the kernel never kills it under pressure.

The daemon is multi-threaded - one worker and one connection per CPU - so it keeps up with concurrent swap traffic instead of saturating and stalling the system.

Performance

Tested on RTX 3070 Laptop (8 GB VRAM), Ryzen 9 5900HX, kernel 6.17, Pop!_OS, against NVMe cryptswap (dm-crypt, PCIe 4.0). O_DIRECT. Each test was run 3 times; the numbers and the gif for each are from a representative (median) run.

NBD-VRAM turns otherwise-idle VRAM into a fast, zero-wear tier of swap. Its strengths are the ones that matter for everyday use: microsecond latency for the sporadic page faults that make a machine feel laggy, no SSD wear, and stable behaviour under heavy pressure. It sits above your SSD swap in priority, so it absorbs pressure first - for free.

Run any of these yourself (state is restored on exit; fio/ioping auto-install):

sudo bash benchmarks/bench-latency.sh         # per-operation latency
sudo bash benchmarks/bench-iops-parallel.sh   # 4K IOPS under concurrent load
sudo bash benchmarks/bench-iops.sh            # 4K IOPS, light/sporadic access
sudo bash benchmarks/bench-throughput.sh      # sequential dd
sudo bash benchmarks/bench-pressure.sh        # survival under heavy pressure

Latency

Device	min	avg	max
NVMe	115 us	8.7 ms	10.1 ms
NBD-VRAM	90 us	257 us	437 us

About 34x lower average latency. An NVMe drive sleeps between sporadic requests (APST power saving) and wakes cold almost every time, paying a multi-millisecond penalty. VRAM has no power states - it answers in microseconds, every time. This is the case that dominates real desktop use: memory pressure is usually individual 4K page faults arriving seconds apart, each one stalling a process until swap responds. At ~9 ms per fault you feel it; at 257 us you don't.

Concurrent 4K IOPS

Under concurrent pressure - every CPU faulting at once, as in a parallel build or a wall of browser tabs - the daemon's worker threads spread requests across all its connections:

Device	IOPS	bandwidth
NVMe	240k	936 MiB/s
NBD-VRAM	312k	1219 MiB/s

Multi-threading is what gets NBD-VRAM here: single-threaded it manages ~77k on this workload, multi-threaded ~312k - a ~4x gain, and well past what swap demand needs. NVMe's concurrent IOPS swings hard with drive temperature: throttled under this sustained run it fell to 240k (below NBD-VRAM), while a cool drive measures far higher (~860k). NBD-VRAM holds steady around 310k either way.

Single-stream 4K IOPS

Device	read IOPS	write IOPS	bandwidth
NVMe	59k	59k	229 MiB/s
NBD-VRAM	42k	42k	165 MiB/s

One process faulting at a time - the light case. Only one request is ever in flight, so thread count makes no difference here, and both devices are far quicker than sporadic access needs.

Sequential throughput

Device	write	read
NVMe	2.8 GB/s	3.1 GB/s
NBD-VRAM	1.9 GB/s	2.7 GB/s

Big sequential streams are an NVMe's home turf, and they are not how swap behaves - the kernel moves random 4K pages, not multi-megabyte streams. Shown for completeness; ~2 GB/s is plenty for swapping a stream back in.

Surviving heavy pressure

Fill the entire 7 GB VRAM swap with RAM at zero free, and the machine stays responsive - no freeze, no deadlock. Earlier single-threaded builds hard-froze here; the multi-threaded daemon plus the two safeguards in Memory safety keep it alive.

GPU under load stress test

The nastier test I ran was using the GPU while it was swapping: a 3D render plus a CUDA compute load on the NVIDIA card, with RAM driven to zero so swap floods the same VRAM. It degrades gracefully rather than falling over - the GPU app keeps rendering (the card pegged near 100%), the daemon keeps serving swap, and the machine stays usable, just laggy. Nothing crashed.

The only hard limit is capacity, not stability: while the daemon holds its VRAM, a GPU app gets only what is left, so a large allocation simply fails to start. That is a tuning question, not a crash - see Using the GPU at the same time.

SSD wear

Swap is write-heavy, and SSD NAND has a finite number of write cycles. Sending that churn to VRAM (DRAM-like, no wear) instead of your SSD spares its endurance - which matters most on exactly the soldered-everything laptops this is built for.

Uninstall

sudo bash uninstall.sh

License

MIT - Sean Lobjoit (c0dejedi)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
benchmarks		benchmarks
systemd		systemd
udev		udev
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo.gif		demo.gif
install.sh		install.sh
nbd-vram-connect.sh		nbd-vram-connect.sh
nbd-vram-disconnect.sh		nbd-vram-disconnect.sh
nbd-vram-power-check.sh		nbd-vram-power-check.sh
nbd-vram.c		nbd-vram.c
nbd-vram.conf		nbd-vram.conf
test-fill.sh		test-fill.sh
test-nbd.sh		test-nbd.sh
uninstall.sh		uninstall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nbd-vram

This is `alpha` software. `USE AT OWN RISK`

Use your NVIDIA GPU's VRAM as swap space on Linux.

How it works

What it's for

Why not the NVIDIA P2P API?

Requirements

Install

Configuration

Power management

Using the GPU at the same time

Smoke test (without installing)

Memory safety

Performance

Latency

Concurrent 4K IOPS

Single-stream 4K IOPS

Sequential throughput

Surviving heavy pressure

GPU under load stress test

SSD wear

Uninstall

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

nbd-vram

This is alpha software. USE AT OWN RISK

Use your NVIDIA GPU's VRAM as swap space on Linux.

How it works

What it's for

Why not the NVIDIA P2P API?

Requirements

Install

Configuration

Power management

Using the GPU at the same time

Smoke test (without installing)

Memory safety

Performance

Latency

Concurrent 4K IOPS

Single-stream 4K IOPS

Sequential throughput

Surviving heavy pressure

GPU under load stress test

SSD wear

Uninstall

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

This is `alpha` software. `USE AT OWN RISK`

Packages