Skip to content
View yurayli's full-sized avatar

Block or report yurayli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Inference engine

8 repositories

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,382 588 Updated Oct 28, 2024

Fast inference engine for Transformer models

C++ 4,194 434 Updated Dec 5, 2025

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Python 3,218 609 Updated Dec 17, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,416 1,963 Updated Dec 17, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,646 12,034 Updated Dec 18, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 21,597 3,786 Updated Dec 18, 2025

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go 157,819 13,943 Updated Dec 17, 2025

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 864 70 Updated Dec 11, 2025