ranjiewwen

Follow

🎯

Focusing

jiewen ranjiewwen

🎯

Focusing

Follow

CV&NLP

350 followers · 215 following

algorithmic engineer
chengdu

Achievements

Achievements

Highlights

Developer Program Member

Organizations

Lists (3)

Sort

CV

computer vision

LLM

large language model

NLP

natural language processing

Stars

alipay / PainlessInferenceAcceleration

Accelerate inference without tears

Python 374 23 Updated Jan 23, 2026

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

Python 481 31 Updated Jan 8, 2024

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,177 72 Updated Mar 31, 2026

deepspeedai / DeepSpeed-Kernels

C++ 72 19 Updated Mar 26, 2025

meta-pytorch / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,190 572 Updated Aug 22, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,648 2,758 Updated Aug 12, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,209 397 Updated Jul 11, 2024

Ucas-HaoranWei / Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Python 630 43 Updated Dec 30, 2024

DLYuanGod / TinyGPT-V

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Python 1,309 79 Updated Feb 5, 2026

wangzhaode / llm-export

llm-export can export llm model to onnx.

Python 347 40 Updated Oct 24, 2025

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,720 197 Updated Jun 25, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,253 269 Updated Feb 20, 2026

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,124 358 Updated Mar 26, 2026

mistralai / mistral-inference

Official inference library for Mistral models

Jupyter Notebook 10,753 1,033 Updated Feb 26, 2026

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 19,350 2,485 Updated Aug 6, 2024

Tiiny-AI / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 9,245 546 Updated Jan 24, 2026

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 478 37 Updated Mar 15, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,325 82 Updated Mar 6, 2025

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 751 95 Updated Apr 7, 2024

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Python 4,114 293 Updated Nov 8, 2024

NascentCore / llm-numbers-cn

中文版 llm-numbers

131 6 Updated Dec 25, 2023

ray-project / llm-numbers

Numbers every LLM developer should know

4,290 140 Updated Jan 16, 2024

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,264 2,252 Updated Apr 4, 2026

JIA-Lab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,694 287 Updated Aug 14, 2024

openai / openai-python

The official Python library for the OpenAI API

Python 30,368 4,687 Updated Apr 4, 2026

OpenPPL / ppl.nn.llm

140 18 Updated Apr 23, 2024

huggingface / safetensors

Simple, safe way to store and distribute tensors

Python 3,676 303 Updated Apr 2, 2026

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,488 307 Updated Jul 17, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,749 679 Updated Apr 4, 2026

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,995 316 Updated Apr 3, 2026