Lists (5)
Sort Name ascending (A-Z)
Stars
AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization
Our first fully AI generated deep learning system
An unbiased CPU benchmark by OpenCV that provides an evaluation of different CPUs under real-world computer vision and AI workloads.
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Light Image Video Generation Inference Framework
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
FlagGems is an operator library for large language models implemented in the Triton Language.
CUDA & Triton Learning Project: Flash Attention 实现探索
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
Simple high-throughput inference library
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Wan: Open and Advanced Large-Scale Video Generative Models
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Triton for OpenCL backend, and use mlir-translate to get source OpenCL code
The simplest, fastest repository for training/finetuning small-sized VLMs.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
woct0rdho / triton-windows
Forked from triton-lang/tritonFork of the Triton language and compiler for Windows support and easy installation
DeepEP: an efficient expert-parallel communication library
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.