Skip to content
View withlin's full-sized avatar
🧸
🧸
  • GuangZhou,China

Block or report withlin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The best agent harness.

TypeScript 10,375 574 Updated Jun 15, 2026

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 10,571 1,064 Updated Jul 1, 2024

📖 从零基础到面试通关 —— 22节课彻底搞懂大语言模型 | Learn MiniMind: 系统化学习LLM训练全流程

TypeScript 372 34 Updated Apr 1, 2026

The absolute trainer to light up AI agents.

Python 17,310 1,515 Updated Apr 29, 2026

记录我在cs336学习时的笔记和作业

Python 912 29 Updated May 2, 2026
C++ 97 3 Updated Jul 20, 2025

哈佛大学 Transformer 经典入门教程 annotated-transformer-Chinese 中文版 Transformer 论文 Attention is All You Need 的 pytorch 中文注释代码实现,翻译自harvardnlp/annotated-transformer

Jupyter Notebook 90 12 Updated Jan 19, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 4,278 332 Updated Jun 13, 2026

Agentic RL on Any Harness at Scale

Python 554 57 Updated Jun 13, 2026

Online playground for OpenAPI tokenizers

TypeScript 1,618 176 Updated Apr 24, 2025

OpenKB: Open LLM Knowledge Base

Python 2,059 230 Updated Jun 15, 2026

🎨 Local-first, open-source Claude Design alternative. 🖥️ Native desktop app. ⚡ 259+ Skills · ✨ 142+ Design Systems 🖼️ Web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sa…

TypeScript 64,915 7,274 Updated Jun 15, 2026

Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)

Python 82 6 Updated May 29, 2026

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 1,016 103 Updated Jan 14, 2026

A kernel library written in tilelang

Python 1,586 138 Updated Apr 23, 2026

Inference payload processor for llm-d

Go 9 21 Updated Jun 14, 2026

FlashKDA: high-performance Kimi Delta Attention kernels

Cuda 449 38 Updated May 26, 2026

Efficient and unified implementations for TopK-based sparse attention

Cuda 35 1 Updated Apr 20, 2026

Design principles for agent ergonomics. Higher accuracy with lower token cost than both MCP and regular CLI.

TypeScript 861 34 Updated Jun 11, 2026

Learn LLM internals step by step - from tokenization to attention to inference optimization.

1,071 94 Updated Jun 14, 2026

Official specification for Token-Oriented Object Notation (TOON)

JavaScript 294 34 Updated Jun 12, 2026

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 519 64 Updated Jun 12, 2026

Use Codex from Claude Code to review code or delegate tasks.

JavaScript 20,980 1,268 Updated Jun 14, 2026

Fast, accurate & comprehensive text measurement & layout

TypeScript 48,425 2,695 Updated Jun 12, 2026

Harness Engineering 学习指南 — 从概念理解到独立实践的深度学习档案

Shell 3,786 335 Updated Jun 11, 2026

Run Anthropic's Claude Code CLI with OpenAI models such as GPT-5-Codex, GPT-5.1, and others via a local LiteLLM proxy.

Python 235 25 Updated Jan 4, 2026

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 4,165 355 Updated May 25, 2026

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,063 1,320 Updated Jun 15, 2026

LLAMA Turboquant implementation with CUDA support

C++ 657 71 Updated Jun 4, 2026
Next