-
ByteDance
- Shanghai, China
-
22:22
(UTC +08:00) - https://fangjiarui.github.io/
- https://www.zhihu.com/people/feifeibear
- in/fangjiarui
LLM Inference
A highly optimized LLM inference acceleration engine for Llama and its variants.
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
FlashInfer: Kernel Library for LLM Serving
Fast inference from large lauguage models via speculative decoding
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
PyTorch native quantization and sparsity for training and inference
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A lightweight data processing framework built on DuckDB and 3FS.