Skip to content
View Weili17's full-sized avatar

Block or report Weili17

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA

TypeScript 68,319 9,495 Updated Apr 9, 2026

Linux kernel source tree

C 227,744 61,491 Updated Apr 9, 2026

Offline optimization of your disaggregated Dynamo graph

Python 252 96 Updated Apr 9, 2026

Machine Learning Engineering Open Book

Python 17,651 1,119 Updated Mar 16, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,947 557 Updated Mar 13, 2026

Nano vLLM

Python 12,773 1,896 Updated Nov 3, 2025
Jupyter Notebook 602 28 Updated Aug 23, 2024

High Performance LLM Inference Operator Library

C++ 817 81 Updated Apr 9, 2026

[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Python 145 12 Updated Dec 4, 2024

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 5,010 455 Updated Apr 9, 2026

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 3,062 265 Updated Apr 9, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 353,281 71,300 Updated Apr 9, 2026

A hyperparameter optimization framework

Python 13,898 1,296 Updated Apr 8, 2026

分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

Jupyter Notebook 1,587 128 Updated Apr 8, 2026

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 2,420 340 Updated Apr 9, 2026

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Python 562 77 Updated Apr 2, 2026

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 330 63 Updated Apr 9, 2026

Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions

Python 1,239 63 Updated Apr 9, 2026

Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)

C++ 299 13 Updated Nov 24, 2025

eBPF Developer Tutorial: Learning eBPF Step by Step with Examples

C 4,037 573 Updated Mar 11, 2026

ebpf-go is a pure-Go library to read, modify and load eBPF programs and attach them to various hooks in the Linux kernel.

Go 7,650 848 Updated Apr 2, 2026

在常规推荐系统算法和系统双优化的范式下,一线公司针对单个任务或单个业务的效果挖掘几乎达到极限。从2019年我们开始关注多种信息的萃取融合,提出了OneRec算法,希望通过平台或外部各种各样的信息来进行知识集成,打破数据孤岛,极大扩充推荐的“Extra World Knowledge”。 已实践的算法包括行为数据,内容描述,社交信息,知识图谱等。在OneRec,每种信息和整体算法的集成是可插拔…

Python 228 32 Updated Jan 13, 2026

[Pytorch] Generative retrieval model using semantic IDs from "Recommender Systems with Generative Retrieval"

Python 767 108 Updated Apr 1, 2026
CMake 45 66 Updated Apr 9, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,403 136 Updated Mar 11, 2026

Unified Collective Communication Library

C 300 128 Updated Apr 9, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 696 41 Updated Mar 8, 2026
Next