Samples for CUDA Developers which demonstrates features in CUDA Toolkit
-
Updated
Jul 26, 2024 - C
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Deep learning in Rust, with shape checked tensors and neural networks
CUDA Core Compute Libraries
Safe rust wrapper around CUDA toolkit
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.
CUDA Kernel Benchmarking Library
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Kernel Tuner
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Some CUDA design patterns and a bit of template magic for CUDA
Spiking Neural Networks in C++ with strong GPU acceleration through CUDA
CUDA kernel author's tools
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Triton implementation of FlashAttention2 that adds Custom Masks.
A tool for examining GPU scheduling behavior.
Add a description, image, and links to the cuda-kernels topic page so that developers can more easily learn about it.
To associate your repository with the cuda-kernels topic, visit your repo's landing page and select "manage topics."