Stars
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Development repository for the Triton language and compiler
FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.
FlagGems is an operator library for large language models implemented in the Triton Language.
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
可编译的 nvidia opencl 官方 实例代码,https://developer.nvidia.com/opencl
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
C++ Parallel Computing and Asynchronous Networking Framework
This is an implementation of sgemm_kernel on L1d cache.
Correlation demo in OpenCL that uses local memory.
The official rendering library for PAG (Portable Animated Graphics) files that renders After Effects animations natively across multiple platforms.
A simple high performance CUDA GEMM implementation.