Lists (5)
Sort Name ascending (A-Z)
Stars
Use Garry Tan's exact Claude Code setup: 15 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA
AI agents running research on single-GPU nanochat training automatically
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Official inference framework for 1-bit LLMs
Clspv is a compiler for OpenCL C to Vulkan compute shaders
Official codebase for the MLSys 2026 paper "IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference". It enables high-fidelity and high-speed LLM/ViT deployment on ARM CPUs.
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE, WebAssembly, VSX, RISC-V))
AI Edge Quantizer: flexible post training quantization for LiteRT models.
The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞
let coding agents use ncu skills analysis cuda program automatically!
Build resilient language agents as graphs.
llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work
AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization
Our first fully AI generated deep learning system
An unbiased CPU benchmark by OpenCV that provides an evaluation of different CPUs under real-world computer vision and AI workloads.
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Light Image Video Generation Inference Framework
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
FlagGems is an operator library for large language models implemented in the Triton Language.
CUDA & Triton Learning Project: Flash Attention 实现探索
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
Simple high-throughput inference library