feifeibear

Jiarui Fang（方佳瑞） feifeibear

Democratizing AGI

1.7k followers · 102 following

Achievements

x3 x4 x3

Achievements

x3 x4 x3

Stars

LLM Inference

19 repositories

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 907 102 Updated Jul 10, 2025

Infini-AI-Lab / MagicDec

[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Python 135 9 Updated Dec 4, 2024

feifeibear / ChituAttention

Quantized Attention on GPU

Python 44 Updated Nov 22, 2024

sgl-project / sgl-learning-materials

Materials for learning SGLang

702 51 Updated Dec 15, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,260 1,193 Updated Dec 25, 2025

hemingkx / Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 345 45 Updated Apr 22, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 4,358 616 Updated Dec 25, 2025

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 872 93 Updated Aug 22, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,017 220 Updated Dec 9, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 91,979 14,240 Updated Dec 25, 2025

feifeibear / LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 119 5 Updated Mar 13, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,078 233 Updated Dec 18, 2025

meta-pytorch / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,169 568 Updated Aug 22, 2025

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,592 389 Updated Dec 25, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,167 12,181 Updated Dec 25, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,832 1,038 Updated Dec 24, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,000 780 Updated Dec 23, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,323 195 Updated Mar 24, 2025

deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,876 431 Updated Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jiarui Fang（方佳瑞） feifeibear

Achievements

Achievements

Block or report feifeibear

LLM Inference

zhihu / ZhiLight

Infini-AI-Lab / MagicDec

feifeibear / ChituAttention

sgl-project / sgl-learning-materials

kvcache-ai / ktransformers

hemingkx / Spec-Bench

flashinfer-ai / flashinfer

feifeibear / LLMSpeculativeSampling

HazyResearch / ThunderKittens

ggml-org / llama.cpp

feifeibear / LLMRoofline

SafeAILab / EAGLE

meta-pytorch / gpt-fast

pytorch / ao

vllm-project / vllm

deepseek-ai / DeepEP

deepseek-ai / DeepGEMM

deepseek-ai / EPLB

deepseek-ai / smallpond