This repository provides an open-source version of the DOCA GPUNetIO and DOCA Verbs libraries. The features included here are limited to enabling GPUDirect Async Kernel-Initiated (GDAKI) network communication technology over RDMA protocols (InfiniBand and RoCE) using a DOCA-like API in an open-source environment.
The table below highlights the key differences between this DOCA GPUNetIO open-source project and the full DOCA GPUNetIO SDK:
| Item | DOCA Full SDK | DOCA Open Source |
|---|---|---|
| Verbs CPU control path | Close-source shared library | Open-source C++ files |
| GPUNetIO CPU control path | Close-source shared library | Open-source C++ files |
| GPUNetIO GPU data path for RDMA Verbs one-sided | Yes | Yes |
| GPUNetIO GPU data path for RDMA Verbs two-sided | Yes | No |
| GPUNetIO GPU data path for Ethernet | Yes | No |
| GPUNetIO GPU data path for DMA | Yes | No |
The Full SDK is more comprehensive and includes additional features that are not part of this open-source release. It is important to note, however, that the CUDA header files for the GPUNetIO Verbs data path are identical between the open-source and full versions.
The overarching goal of DOCA GPUNetIO (both Open Source and Full) is to consolidate multiple GDAKI implementations into a unified driver and library with consistent host- and device-side interfaces. This common foundation can be shared across current and future consumers of GDAKI technology such as NVSHMEM, NCCL, and GPUDirect. This approach promotes knowledge sharing while reducing the engineering effort required for long-term maintenance.
CPU control path:
- Interfaces to create and manage completion queues (CQs) and queue pairs (QPs) in CPU/GPU memory.
- Support for connecting QPs over Reliable Connection (RC) transport.
- Move CQ/QP resources between CPU and GPU memory.
- Compatibility with standard
verbsresources (MRs, PDs, context, device attributes, etc.).
GPU data path:
- Device-side APIs to post direct work requests (WRs) and poll completion responses (CQEs).
- Directly ring NIC doorbells from the GPU (update registers).
For a deep dive into features, see the official DOCA GPUNetIO documentation and DOCA Verbs documentation.
To enable GDAKI technology with the DOCA API, an application must be divided into two phases. A CPU control path phase, which initializes devices, allocates memory, and performs other setup tasks. A GPU data path phase, where a CUDA kernel is launched and GPUNetIO CUDA functions are used within it.
- Open an RDMA device context:
ibv_open_device. - Allocate a PD:
ibv_alloc_pd. - Register memory regions:
ibv_reg_mr. - Create a GPUNetIO handler:
doca_gpu_create. - Create CQ and QP using
doca_verbs_*functions. - Connect QPs with remote peers using
doca_verbs_qp_modify. - Export QPs and CQs to GPU memory using:
doca_gpu_verbs_export_cqanddoca_gpu_verbs_export_qp
- Launch a GPU kernel
- Post work requests using:
- High-level API in CUDA header files
doca_gpunetio_dev_verbs_onesided.cuhanddoca_gpunetio_dev_verbs_counter.cuhstarting withdoca_gpu_dev_verbs_* - Low-level API (advanced users) in CUDA header files like
doca_gpunetio_dev_verbs_qp.cuhanddoca_gpunetio_dev_verbs_cq.cuhlikedoca_gpu_dev_verbs_wqe_prepare_*,doca_gpu_dev_verbs_submit
- Poll completions with:
doca_gpu_dev_verbs_poll_cq_*
Mixing high- and low-level APIs is not recommended.
Some systems do not support direct NIC doorbell ringing from GPU SMs. In this case, a CUDA kernel can post WQEs and poll CQEs in GPU memory, but it cannot update the network card registers.
In such scenarios, DOCA GPUNetIO GDAKI can still be used by enabling CPU-assisted mode: the GPU notifies a CPU thread, which rings the NIC doorbell on its behalf. This mode provides a reliable fallback (with lower performance) and requires a CPU thread to periodically call doca_gpu_verbs_cpu_proxy_progress().
To build the host-side library libdoca_gpunetio.so:
cd doca-gpunetio
make -jThis generates a lib directory containing the shared library.
Logs are managed by macro DOCA_LOG, relying on syslog with different log levels:
- EMERG
- ALERT
- CRIT
- ERR
- WARNING
- NOTICE
- INFO
- DEBUG
By default, the EMERG level (0) is set. To print the DOCA_LOG with higher level, please set the DOCA_GPUNETIO_LOG
environment variable to the right level number.
Two examples are included to demonstrate usage and measure performance.
Make sure to build libdoca_gpunetio.so before compiling examples.
All examples require both a client and a server running on network-connected machines. GPU timers can be enabled per operation by setting
#define KERNEL_DEBUG_TIMES 1(useful for debugging, not recommended for performance testing).
Additional samples are available in the NVIDIA DOCA Full Samples repository.
The following command lines assume samples are running on systems where GPU is at PCIe address 8A:00.0 and NIC interface is mlx5_0.
This example is a GDAKI perftest ib_write_bw-like benchmark where client launches a CUDA kernel to execute the high-level doca_gpu_dev_verbs_put operation.
Server doesn't launch any CUDA kernel: upon user typing ctrl+c, server validate data received from client.
Build:
cd doca-gpunetio/examples/gpunetio_verbs_put_bw
make -jRun (server):
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/doca-gpunetio/lib DOCA_GPUNETIO_LOG=6 ./gpunetio_verbs_put_bw -g 8A:00.0 -d mlx5_0Run (client):
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/doca-gpunetio/lib DOCA_GPUNETIO_LOG=6 ./gpunetio_verbs_put_bw -g 8A:00.0 -d mlx5_0 -c 192.168.1.64Modes:
- CUDA Thread execution scope (default).
- CUDA Warp execution scope: add
-e 1. - CPU proxy mode: add
-p 1.
Validation success message (server):
Validation successful! Data received correctly from client.
This example is a GDAKI perftest ib_write_lat-like benchmark where Client and server both launch CUDA kernels using low-level APIs.
Build:
cd doca-gpunetio/examples/gpunetio_verbs_write_lat
make -jRun (server):
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/doca-gpunetio/lib DOCA_GPUNETIO_LOG=6 ./gpunetio_verbs_write_lat -g 8A:00.0 -d mlx5_0Run (client):
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/doca-gpunetio/lib DOCA_GPUNETIO_LOG=6 ./gpunetio_verbs_write_lat -g 8A:00.0 -d mlx5_0 -c <server_ip_address>If you use this software in your work, please cite the official DOCA GPUNetIO documentation.
This project is developed internally and released as open source. We currently do not accept external contributions.
We appreciate community discussion and feedback in support of DOCA GPUNetIO Open users and developers. We ask that users:
- Review the DOCA SDK Programming Guide for system configuration, technology explaination, API, etc...
- Ask questions on the NVIDIA DOCA Support Forum.
- Report issues on the GitHub Issues board.
See the LICENSE.txt file.