A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 2,425 341 Updated Apr 10, 2026

Tencent / AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Python 562 78 Updated Apr 2, 2026

vllm-project / speculators

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 331 63 Updated Apr 9, 2026

mjun0812 / flash-attention-prebuild-wheels

Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions

Python 1,240 63 Updated Apr 10, 2026

eunomia-bpf / eGPU

Forked from eunomia-bpf/bpftime

Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)

C++ 300 14 Updated Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weili17

Block or report Weili17

Starred repositories

garrytan / gstack

torvalds / linux

meituan-longcat / SGLang-FluentLLM

ai-dynamo / aiconfigurator

stas00 / ml-engineering

sgl-project / mini-sglang

GeeeekExplorer / nano-vllm

apoorvumang / prompt-lookup-decoding

Tencent / hpc-ops

Infini-AI-Lab / MagicDec

inclusionAI / AReaL

alibaba / ROLL

openclaw / openclaw

optuna / optuna

CalvinXKY / InfraTech

thinking-machines-lab / batch_invariant_ops

NVIDIA / Model-Optimizer

Tencent / AngelSlim

vllm-project / speculators

mjun0812 / flash-attention-prebuild-wheels

eunomia-bpf / eGPU

eunomia-bpf / bpf-developer-tutorial

cilium / ebpf

xuanjixiao / onerec

EdoardoBotta / RQ-VAE-Recommender

rapidsai / rapids-cmake

ByteDance-Seed / Triton-distributed

StarryVae / RDMA-tutorial

openucx / ucc

tile-ai / TileRT

Starred topics

Tensorflow

Vue.js

Android