-
tilelang Public
Forked from tile-ai/tilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Python Other UpdatedMay 15, 2026 -
Megakernels Public
Forked from HazyResearch/MegakernelsKernels, of the mega variety :)
Python MIT License UpdatedMay 12, 2026 -
-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
-
ai-infra-notes Public
Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)
6 UpdatedMar 21, 2026 -
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
-
mirage Public
Forked from mirage-project/mirageMirage Persistent Kernel: Compiling LLMs into a MegaKernel
C++ Apache License 2.0 UpdatedJan 8, 2026 -
flux Public
Forked from bytedance/fluxA fast communication-overlapping library for tensor/expert parallelism on GPUs.
C++ Apache License 2.0 UpdatedDec 19, 2025 -
tvm Public
Forked from apache/tvmOpen Machine Learning Compiler Framework
Python Apache License 2.0 UpdatedNov 27, 2025 -
mlc-llm Public
Forked from mlc-ai/mlc-llmUniversal LLM Deployment Engine with ML Compilation
Python Apache License 2.0 UpdatedNov 26, 2025 -
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
-
-
nvshmem Public
Forked from NVIDIA/nvshmemNVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
C++ Other UpdatedSep 10, 2025 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedAug 14, 2025 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
C++ Apache License 2.0 UpdatedJul 14, 2025 -
SageAttention Public
Forked from thu-ml/SageAttentionQuantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
-
marlin Public
Forked from IST-DASLab/marlinFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Cuda Apache License 2.0 UpdatedJun 29, 2025 -
hpc Public
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
-
pocket-ai Public
A Portable Toolkit for deploying Edge AI and HPC (opencl, vulkan, simd, task scheduling)
-
TinyNeuralNetwork Public
Forked from alibaba/TinyNeuralNetworkTinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
Python MIT License UpdatedMar 4, 2025 -
lighteval Public
Forked from huggingface/lightevalLighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
-
tflite_micro Public
Forked from tensorflow/tflite-microInfrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
C++ Apache License 2.0 UpdatedJul 17, 2024 -
-
ecas Public
ECAS is a library for edge AI computing acceleration.
-
patterns Public
A collection of architectural patterns and design patterns.
-
-
cpy Public
Notes on calling each other between C and python.
C++ Apache License 2.0 UpdatedAug 15, 2021 -
-
deeplearning-paper-notes Public
Reading notes on deep learning papers---深度学习论文阅读笔记 (2013-2018)