Stars
A lightweight, single-header C++11 Jinja2 template engine for LLM chat templates.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Development repository for the Triton language and compiler
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.
FlagGems is an operator library for large language models implemented in the Triton Language.
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
可编译的 nvidia opencl 官方 实例代码,https://developer.nvidia.com/opencl
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
C++ Parallel Computing and Asynchronous Networking Framework
This is an implementation of sgemm_kernel on L1d cache.
Correlation demo in OpenCL that uses local memory.
The official rendering library for PAG (Portable Animated Graphics) files that renders After Effects animations natively across multiple platforms.