Skip to content
View FirwoodLin's full-sized avatar
💭
🎣
💭
🎣
  • Shanghai, China
  • 15:08 (UTC +08:00)

Highlights

  • Pro

Block or report FirwoodLin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,986 314 Updated Apr 1, 2026

An agentic skills framework & software development methodology that works.

Shell 128,877 10,573 Updated Mar 31, 2026

VocoType 是一款运行在本地端侧的隐私安全语音输入工具,通过快捷键即可将语音实时转换为文字并自动输入到当前应用。支持语音转文字MCP、AI 优化文本、自定义替换词典、录音视频转文字等功能,让语音输入更高效、更安全。

Python 498 53 Updated Mar 23, 2026

本人的科研经验

11,072 572 Updated Mar 7, 2026

Curated collection of papers in machine learning systems

530 36 Updated Feb 7, 2026

A framework for efficient model inference with omni-modality models

Python 4,078 667 Updated Apr 1, 2026

GPT-SoVITS ONNX Inference Engine & Model Converter

Python 1,465 98 Updated Jan 28, 2026

🧨 TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

Python 74 13 Updated Nov 27, 2025

Ring attention implementation with flash attention

Python 999 96 Updated Sep 10, 2025

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 1,162 164 Updated Apr 1, 2026

My learning notes for ML SYS.

Python 5,820 377 Updated Apr 1, 2026

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 172 14 Updated Feb 11, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,267 131 Updated Apr 1, 2026

Nano vLLM

Python 12,620 1,837 Updated Nov 3, 2025

From scratch implementation of a vision language model in pure PyTorch

Jupyter Notebook 257 31 Updated May 6, 2024

[DAC2024, TensorSSA] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

C++ 2 Updated Sep 7, 2023

[DAC2025] Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing

Python 1 Updated Jan 26, 2025

DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit

C++ 95 8 Updated Mar 31, 2026

[DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

C++ 15 1 Updated Jan 13, 2024

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,589 866 Updated Dec 22, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 4,049 297 Updated Mar 26, 2026

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

420 26 Updated Mar 3, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

678 23 Updated Feb 24, 2026

Awesome Eino Projects for Learning | 学习 Eino AI 开发框架的项目库

Go 15 1 Updated Apr 14, 2025

🐈️ 纯真数据库 IPIP.net 格式版,Make qqwry.ipdb Great Again!!!

JavaScript 566 79 Updated Dec 11, 2025

Simulate keyboard Input with GUI,模拟键盘输入带GUI,破解禁止粘贴

Python 242 16 Updated Mar 10, 2026

Lab2A-D, Lab3A-B, and Lab4A-B in different branches tagged these names so you can easily handle individual parts

Go 173 16 Updated Dec 20, 2023

Master programming by recreating your favorite technologies from scratch.

Markdown 485,166 45,633 Updated Feb 21, 2026

A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群

Go 3,831 602 Updated Sep 14, 2025

A course to build distributed key-value service based on TiKV model

Go 3,902 1,082 Updated May 3, 2025
Next