Overview

This project is a work-in-progress. Everything is subject to change.

NVBench is a C++17 library designed to simplify CUDA kernel benchmarking. It features:

Parameter sweeps: a powerful and flexible "axis" system explores a kernel's configuration space. Parameters may be dynamic numbers/strings or static types.
Runtime customization: A rich command-line interface allows redefinition of parameter axes, CUDA device selection, changing output formats, and more.
Throughput calculations: Compute and report:
- Item throughput (elements/second)
- Global memory bandwidth usage (bytes/second and per-device %-of-peak-bw)
Multiple output formats: Currently supports markdown (default) and CSV output.
Manual timer mode: (optional) Explicitly start/stop timing in a benchmark implementation.
Multiple measurement types:
- Cold Measurements:
  - Each sample runs the benchmark once with a clean device L2 cache.
  - GPU and CPU times are reported.
- Batch Measurements:
  - Executes the benchmark multiple times back-to-back and records total time.
  - Reports the average execution time (total time / number of executions).

Getting Started

Minimal Benchmark

A basic kernel benchmark can be created with just a few lines of CUDA C++:

void my_benchmark(nvbench::state& state) {
  state.exec([](nvbench::launch& launch) { 
    my_kernel<<<num_blocks, 256, 0, launch.get_stream()>>>();
  });
}
NVBENCH_BENCH(my_benchmark);

See Benchmarks for information on customizing benchmarks and implementing parameter sweeps.

Command Line Interface

Each benchmark executable produced by NVBench provides a rich set of command-line options for configuring benchmark execution at runtime. See the CLI overview and CLI axis specification for more information.

Examples

This repository provides a number of examples that demonstrate various NVBench features and usecases:

To get started using NVBench with your own kernels, consider trying out the NVBench Demo Project . nvbench_demo provides a simple CMake project that uses NVBench to build an example benchmark. It's a great way to experiment with the library without a lot of investment.

License

NVBench is released under the Apache 2.0 License with LLVM exceptions. See LICENSE.

Scope and Related Projects

NVBench will measure the CPU and CUDA GPU execution time of a single host-side critical region per benchmark. It is intended for regression testing and parameter tuning of individual kernels. For in-depth analysis of end-to-end performance of multiple applications, the NVIDIA Nsight tools are more appropriate.

NVBench is focused on evaluating the performance of CUDA kernels and is not optimized for CPU microbenchmarks. This may change in the future, but for now, consider using Google Benchmark for high resolution CPU benchmarks.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
cmake		cmake
docs		docs
examples		examples
nvbench		nvbench
testing		testing
.clang-format		.clang-format
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Getting Started

Minimal Benchmark

Command Line Interface

Examples

License

Scope and Related Projects

About

Releases

Packages

Languages

License

alliepiper/nvbench

Folders and files

Latest commit

History

Repository files navigation

Overview

Getting Started

Minimal Benchmark

Command Line Interface

Examples

License

Scope and Related Projects

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages