- San Fransisco Bay Area
Starred repositories
Helpful kernel tutorials and examples for tile-based GPU programming
An API-compatible, drop-in replacement for Apple's Foundation Models framework with support for custom language model providers.
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
Post-training with Tinker
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
(WIP) A small but powerful, homemade PyTorch from scratch.
Hierarchical Reasoning Model Official Release
An extremely fast Python type checker and language server, written in Rust.
Artificial Neural Engine Machine Learning Library
Free, simple, fast interactive diagrams for any GitHub repository
Supporting PyTorch models with the Google AI Edge TFLite runtime.
LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via efficient conversion, runtime, and optimization
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
A minimal GPU design in Verilog to learn how GPUs work from the ground up
ModernBERT model optimized for Apple Neural Engine.
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O