Skip to content
View FirwoodLin's full-sized avatar
💭
🎣
💭
🎣
  • Shanghai, China
  • 19:30 (UTC +08:00)

Highlights

  • Pro

Block or report FirwoodLin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,978 313 Updated Mar 30, 2026

An agentic skills framework & software development methodology that works.

Shell 124,508 10,145 Updated Mar 26, 2026

VocoType 是一款运行在本地端侧的隐私安全语音输入工具,通过快捷键即可将语音实时转换为文字并自动输入到当前应用。支持语音转文字MCP、AI 优化文本、自定义替换词典、录音视频转文字等功能,让语音输入更高效、更安全。

Python 485 52 Updated Mar 23, 2026

本人的科研经验

11,050 569 Updated Mar 7, 2026

Curated collection of papers in machine learning systems

530 36 Updated Feb 7, 2026

A framework for efficient model inference with omni-modality models

Python 4,014 653 Updated Mar 30, 2026

GPT-SoVITS ONNX Inference Engine & Model Converter

Python 1,460 98 Updated Jan 28, 2026

🧨 TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

Python 74 13 Updated Nov 27, 2025

Ring attention implementation with flash attention

Python 998 96 Updated Sep 10, 2025

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 1,158 163 Updated Mar 30, 2026

My learning notes for ML SYS.

Python 5,809 377 Updated Mar 19, 2026

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 172 14 Updated Feb 11, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,262 131 Updated Mar 30, 2026

Nano vLLM

Python 12,537 1,815 Updated Nov 3, 2025

From scratch implementation of a vision language model in pure PyTorch

Jupyter Notebook 257 31 Updated May 6, 2024

[DAC2024, TensorSSA] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

C++ 2 Updated Sep 7, 2023

[DAC2025] Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing

Python 1 Updated Jan 26, 2025

DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit

C++ 95 8 Updated Mar 29, 2026

[DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

C++ 15 1 Updated Jan 13, 2024

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,561 864 Updated Dec 22, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 4,041 296 Updated Mar 26, 2026

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

420 26 Updated Mar 3, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

678 23 Updated Feb 24, 2026

Awesome Eino Projects for Learning | 学习 Eino AI 开发框架的项目库

Go 15 1 Updated Apr 14, 2025

🐈️ 纯真数据库 IPIP.net 格式版,Make qqwry.ipdb Great Again!!!

JavaScript 566 79 Updated Dec 11, 2025

Simulate keyboard Input with GUI,模拟键盘输入带GUI,破解禁止粘贴

Python 241 16 Updated Mar 10, 2026

Lab2A-D, Lab3A-B, and Lab4A-B in different branches tagged these names so you can easily handle individual parts

Go 173 16 Updated Dec 20, 2023

Master programming by recreating your favorite technologies from scratch.

Markdown 484,637 45,595 Updated Feb 21, 2026

A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群

Go 3,830 602 Updated Sep 14, 2025

A course to build distributed key-value service based on TiKV model

Go 3,901 1,082 Updated May 3, 2025
Next