- GuangZhou,China
Lists (17)
Sort Name ascending (A-Z)
- All languages
- ANTLR
- ASL
- ActionScript
- AppleScript
- Assembly
- Awk
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Crystal
- Cuda
- Dart
- Dockerfile
- EJS
- Elixir
- Erlang
- F#
- Fennel
- Go
- Go Template
- Groovy
- HCL
- HTML
- Handlebars
- Haskell
- Java
- JavaScript
- Jinja
- Jsonnet
- Jupyter Notebook
- Kotlin
- LLVM
- Less
- Logos
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- Mermaid
- MoonScript
- Mustache
- Nu
- OCaml
- Objective-C
- Objective-C++
- PHP
- PLpgSQL
- Perl
- PowerShell
- PureBasic
- Python
- Raku
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Smarty
- Solidity
- Starlark
- Swift
- TeX
- TypeScript
- Typst
- V
- Vim Script
- Vue
- WebAssembly
- templ
Starred repositories
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
📖 从零基础到面试通关 —— 22节课彻底搞懂大语言模型 | Learn MiniMind: 系统化学习LLM训练全流程
The absolute trainer to light up AI agents.
哈佛大学 Transformer 经典入门教程 annotated-transformer-Chinese 中文版 Transformer 论文 Attention is All You Need 的 pytorch 中文注释代码实现,翻译自harvardnlp/annotated-transformer
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Agentic RL on Any Harness at Scale
Online playground for OpenAPI tokenizers
🎨 Local-first, open-source Claude Design alternative. 🖥️ Native desktop app. ⚡ 259+ Skills · ✨ 142+ Design Systems 🖼️ Web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sa…
Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)
A self-learning tutorail for CUDA High Performance Programing.
A kernel library written in tilelang
Inference payload processor for llm-d
FlashKDA: high-performance Kimi Delta Attention kernels
Efficient and unified implementations for TopK-based sparse attention
Design principles for agent ergonomics. Higher accuracy with lower token cost than both MCP and regular CLI.
Learn LLM internals step by step - from tokenization to attention to inference optimization.
Official specification for Token-Oriented Object Notation (TOON)
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
Use Codex from Claude Code to review code or delegate tasks.
Fast, accurate & comprehensive text measurement & layout
Harness Engineering 学习指南 — 从概念理解到独立实践的深度学习档案
Run Anthropic's Claude Code CLI with OpenAI models such as GPT-5-Codex, GPT-5.1, and others via a local LiteLLM proxy.
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
LLAMA Turboquant implementation with CUDA support