ranjiewwen

Follow

🎯

Focusing

jiewen ranjiewwen

🎯

Focusing

Follow

CV&NLP

351 followers · 215 following

algorithmic engineer
chengdu

Achievements

Achievements

Highlights

Developer Program Member

Organizations

Lists (3)

Sort

CV

computer vision

LLM

large language model

NLP

natural language processing

Stars

alipay / PainlessInferenceAcceleration

Accelerate inference without tears

Python 375 23 Updated Jan 23, 2026

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

Python 479 31 Updated Jan 8, 2024

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,206 75 Updated Apr 18, 2026

deepspeedai / DeepSpeed-Kernels

C++ 72 19 Updated Mar 26, 2025

meta-pytorch / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,204 572 Updated Aug 22, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,739 2,761 Updated Aug 12, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,225 398 Updated Jul 11, 2024

Ucas-HaoranWei / Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Python 630 43 Updated Dec 30, 2024

DLYuanGod / TinyGPT-V

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Python 1,315 79 Updated Feb 5, 2026

wangzhaode / llm-export

llm-export can export llm model to onnx.

Python 350 40 Updated Oct 24, 2025

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,729 201 Updated Jun 25, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,311 272 Updated Feb 20, 2026

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,188 372 Updated Apr 20, 2026

mistralai / mistral-inference

Official inference library for Mistral models

Jupyter Notebook 10,783 1,040 Updated Apr 20, 2026

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 19,454 2,525 Updated Aug 6, 2024

Tiiny-AI / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 9,400 570 Updated Jan 24, 2026

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 479 37 Updated Mar 15, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,333 83 Updated Mar 6, 2025

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 752 95 Updated Apr 7, 2024

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Python 4,109 293 Updated Nov 8, 2024

NascentCore / llm-numbers-cn

中文版 llm-numbers

134 6 Updated Dec 25, 2023

ray-project / llm-numbers

Numbers every LLM developer should know

4,300 140 Updated Jan 16, 2024

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,516 2,333 Updated Apr 30, 2026

JIA-Lab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,692 286 Updated Aug 14, 2024

openai / openai-python

The official Python library for the OpenAI API

Python 30,644 4,758 Updated Apr 29, 2026

OpenPPL / ppl.nn.llm

140 18 Updated Apr 23, 2024

safetensors / safetensors

Simple, safe way to store and distribute tensors

Python 3,733 312 Updated Apr 28, 2026

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,520 315 Updated Jul 17, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,832 692 Updated Apr 29, 2026

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,035 322 Updated Apr 30, 2026