Highlights
- Pro
- All languages
- Assembly
- AutoHotkey
- Batchfile
- C
- C#
- C++
- CSS
- Coq
- Cuda
- Dockerfile
- Emacs Lisp
- Fortran
- Gnuplot
- Go
- Groff
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- LLVM
- Lua
- MLIR
- Makefile
- Markdown
- Nim
- OCaml
- Objective-C
- Perl
- Python
- Roff
- Ruby
- Rust
- SCSS
- Sass
- Scala
- Shell
- Swift
- TeX
- TypeScript
- VHDL
- Verilog
- Vim Script
- Visual Basic .NET
Starred repositories
A library to analyze PyTorch traces.
Official Project Page for Deep Delta Learning (https://huggingface.co/papers/2601.00417)
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
A cross-platform desktop All-in-One assistant tool for Claude Code, Codex, OpenCode & Gemini CLI.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Visual Skills Pack for Obsidian: generate Canvas, Excalidraw, and Mermaid diagrams from text with Claude Code
An agentic skills framework & software development methodology that works.
Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Securely synchronize files with your devices on iOS using Syncthing
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
"Paper2Slides: From Paper to Presentation in One Click"
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向真正的"Vibe PPT"; 支持上传任意模板图片;上传任意素材&智能解析;一句话/大纲/页面描述自动生成PPT;口头修改指定区域、一键导出可编辑ppt - An AI-native PPT generator based on nano banana pro🍌
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Ring attention implementation with flash attention
Helpful tools and examples for working with flex-attention
My learning notes for ML SYS.
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs