Lists (5)
Sort Name ascending (A-Z)
Stars
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
FlagGems is an operator library for large language models implemented in the Triton Language.
CUDA & Triton Learning Project: Flash Attention 实现探索
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
Simple high-throughput inference library
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Wan: Open and Advanced Large-Scale Video Generative Models
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Triton for OpenCL backend, and use mlir-translate to get source OpenCL code
The simplest, fastest repository for training/finetuning small-sized VLMs.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
woct0rdho / triton-windows
Forked from triton-lang/tritonFork of the Triton language and compiler for Windows support and easy installation
DeepEP: an efficient expert-parallel communication library
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
collection of benchmarks to measure basic GPU capabilities
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
CUDA Matrix Multiplication Optimization
A Easy-to-understand TensorOp Matmul Tutorial