Skip to content
View FirwoodLin's full-sized avatar
💭
🎣
💭
🎣
  • Shanghai, China
  • 00:05 (UTC +08:00)

Highlights

  • Pro

Block or report FirwoodLin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OmX - Oh My codeX: Your codex is not alone. Add hooks, agent teams, HUDs, and so much more.

TypeScript 16,387 1,555 Updated Apr 5, 2026

Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs

HTML 896 128 Updated Mar 15, 2026

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,995 317 Updated Apr 3, 2026

An agentic skills framework & software development methodology that works.

Shell 135,921 11,405 Updated Apr 2, 2026

VocoType 是一款运行在本地端侧的隐私安全语音输入工具,通过快捷键即可将语音实时转换为文字并自动输入到当前应用。支持语音转文字MCP、AI 优化文本、自定义替换词典、录音视频转文字等功能,让语音输入更高效、更安全。

Python 514 52 Updated Mar 23, 2026

本人的科研经验

11,136 575 Updated Mar 7, 2026

Curated collection of papers in machine learning systems

532 36 Updated Feb 7, 2026

A framework for efficient model inference with omni-modality models

Python 4,125 693 Updated Apr 5, 2026

GPT-SoVITS ONNX Inference Engine & Model Converter

Python 1,474 101 Updated Apr 1, 2026

🧨 TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

Python 74 13 Updated Nov 27, 2025

Ring attention implementation with flash attention

Python 1,001 97 Updated Sep 10, 2025

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 1,167 168 Updated Apr 5, 2026

My learning notes for ML SYS.

Python 5,896 383 Updated Apr 3, 2026

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 173 14 Updated Feb 11, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,276 131 Updated Apr 5, 2026

Nano vLLM

Python 12,698 1,875 Updated Nov 3, 2025

From scratch implementation of a vision language model in pure PyTorch

Jupyter Notebook 258 32 Updated May 6, 2024

[DAC2024, TensorSSA] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

C++ 2 Updated Sep 7, 2023

[DAC2025] Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing

Python 1 Updated Jan 26, 2025

DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit

C++ 95 8 Updated Mar 31, 2026

[DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

C++ 15 1 Updated Jan 13, 2024

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,621 872 Updated Dec 22, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 4,058 299 Updated Mar 26, 2026

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

424 27 Updated Mar 3, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

679 23 Updated Feb 24, 2026

Awesome Eino Projects for Learning | 学习 Eino AI 开发框架的项目库

Go 15 1 Updated Apr 14, 2025

🐈️ 纯真数据库 IPIP.net 格式版,Make qqwry.ipdb Great Again!!!

JavaScript 567 80 Updated Dec 11, 2025

Simulate keyboard Input with GUI,模拟键盘输入带GUI,破解禁止粘贴

Python 242 16 Updated Mar 10, 2026

Lab2A-D, Lab3A-B, and Lab4A-B in different branches tagged these names so you can easily handle individual parts

Go 173 16 Updated Dec 20, 2023

Master programming by recreating your favorite technologies from scratch.

Markdown 486,321 45,755 Updated Feb 21, 2026
Next