🎯
Focusing
- Shanghai
-
21:01
(UTC +08:00)
Stars
8
stars
written in C++
Clear filter
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Fast, Flexible and Portable Structured Generation