Skip to content
View Weili17's full-sized avatar

Block or report Weili17

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA

TypeScript 60,763 8,013 Updated Apr 1, 2026

Linux kernel source tree

C 226,425 61,321 Updated Mar 31, 2026

Offline optimization of your disaggregated Dynamo graph

Python 242 89 Updated Apr 1, 2026

Machine Learning Engineering Open Book

Python 17,590 1,115 Updated Mar 16, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,892 548 Updated Mar 13, 2026

Nano vLLM

Python 12,625 1,842 Updated Nov 3, 2025
Jupyter Notebook 602 27 Updated Aug 23, 2024

High Performance LLM Inference Operator Library

C++ 806 78 Updated Feb 5, 2026

[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Python 145 12 Updated Dec 4, 2024

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 4,956 446 Updated Apr 1, 2026

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 3,037 259 Updated Apr 1, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 344,535 68,327 Updated Apr 1, 2026

A hyperparameter optimization framework

Python 13,813 1,291 Updated Apr 1, 2026

分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

Jupyter Notebook 1,481 117 Updated Mar 28, 2026

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 2,314 327 Updated Apr 1, 2026

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Python 559 75 Updated Mar 31, 2026

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 323 61 Updated Apr 1, 2026

Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions

Python 1,190 62 Updated Apr 1, 2026

Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)

C++ 298 13 Updated Nov 24, 2025

eBPF Developer Tutorial: Learning eBPF Step by Step with Examples

C 4,014 572 Updated Mar 11, 2026

ebpf-go is a pure-Go library to read, modify and load eBPF programs and attach them to various hooks in the Linux kernel.

Go 7,631 840 Updated Apr 1, 2026

在常规推荐系统算法和系统双优化的范式下,一线公司针对单个任务或单个业务的效果挖掘几乎达到极限。从2019年我们开始关注多种信息的萃取融合,提出了OneRec算法,希望通过平台或外部各种各样的信息来进行知识集成,打破数据孤岛,极大扩充推荐的“Extra World Knowledge”。 已实践的算法包括行为数据,内容描述,社交信息,知识图谱等。在OneRec,每种信息和整体算法的集成是可插拔…

Python 228 32 Updated Jan 13, 2026

[Pytorch] Generative retrieval model using semantic IDs from "Recommender Systems with Generative Retrieval"

Python 764 106 Updated Apr 1, 2026
CMake 44 66 Updated Apr 1, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,400 135 Updated Mar 11, 2026

Unified Collective Communication Library

C 300 127 Updated Mar 31, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 690 40 Updated Mar 8, 2026
Next