A synthetic benchmarking tool to measure peak capabilities of vulkan devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case.
Download Windows/Linux/MacOS Executable for Intel/AMD/Nvidia/Apple GPU
https://github.com/nihui/vkpeak/releases
vkpeak.exevkpeak will choose the default vulkan device.
If you need to specify device id, then
vkpeak.exe 0The only parameter 0 is the device id.
If you encounter a crash or error, try upgrading your GPU driver:
- Intel: https://downloadcenter.intel.com/product/80939/Graphics-Drivers
- AMD: https://www.amd.com/en/support
- NVIDIA: https://www.nvidia.com/Download/index.aspx
- Clone this project with all submodules
git clone https://github.com/nihui/vkpeak.git
cd vkpeak
git submodule update --init --recursive- Build with CMake
- You can pass -DVulkan_LIBRARY=<path to your macOS/lib/MoltenVK.xcframework/macos-arm64_x86_64/libMoltenVK.a> option to link static MoltenVK library on MacOS, MoltenVK is part of Vulkan SDK from https://vulkan.lunarg.com/
mkdir build
cd build
cmake ..
cmake --build . -j 4NVIDIA RTX5060Ti 16GB
device = NVIDIA GeForce RTX 5060 Ti
fp32-scalar = 17137.46 GFLOPS
fp32-vec4 = 16910.07 GFLOPS
fp16-scalar = 12730.03 GFLOPS
fp16-vec4 = 12715.02 GFLOPS
fp16-matrix = 101485.35 GFLOPS
fp64-scalar = 398.59 GFLOPS
fp64-vec4 = 394.08 GFLOPS
int32-scalar = 12703.68 GIOPS
int32-vec4 = 12181.98 GIOPS
int16-scalar = 12690.05 GIOPS
int16-vec4 = 12208.29 GIOPS
int64-scalar = 3104.59 GIOPS
int64-vec4 = 2666.86 GIOPS
int8-dotprod = 16101.59 GIOPS
int8-matrix = 202947.80 GIOPS
bf16-dotprod = 0.00 GFLOPS
bf16-matrix = 0.00 GFLOPS
fp8-matrix = 0.00 GFLOPS
bf8-matrix = 0.00 GFLOPS
copy-h2h = 18.17 GBPS
copy-h2d = 17.93 GBPS
copy-d2h = 18.09 GBPS
copy-d2d = 190.70 GBPS
AMD RX9060XT 16GB
device = AMD Radeon Graphics (RADV GFX1200)
fp32-scalar = 17606.54 GFLOPS
fp32-vec4 = 12155.22 GFLOPS
fp16-scalar = 16921.16 GFLOPS
fp16-vec4 = 27833.48 GFLOPS
fp16-matrix = 105337.66 GFLOPS
fp64-scalar = 442.80 GFLOPS
fp64-vec4 = 437.55 GFLOPS
int32-scalar = 2804.59 GIOPS
int32-vec4 = 2796.74 GIOPS
int16-scalar = 15034.62 GIOPS
int16-vec4 = 26356.38 GIOPS
int64-scalar = 932.14 GIOPS
int64-vec4 = 768.53 GIOPS
int8-dotprod = 53893.32 GIOPS
int8-matrix = 194476.41 GIOPS
bf16-dotprod = 24427.68 GFLOPS
bf16-matrix = 105099.82 GFLOPS
fp8-matrix = 205061.72 GFLOPS
bf8-matrix = 208234.02 GFLOPS
copy-h2h = 21.05 GBPS
copy-h2d = 21.17 GBPS
copy-d2h = 23.70 GBPS
copy-d2d = 145.23 GBPS
- https://github.com/Tencent/ncnn for fast neural network inference on ALL PLATFORMS