-
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Other UpdatedJun 9, 2026 -
sglang Public
Forked from sgl-project/sglangSGLang is a high-performance serving framework for large language models and multimodal models.
Python Apache License 2.0 UpdatedMay 29, 2026 -
Model-Optimizer Public
Forked from NVIDIA/Model-OptimizerA unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
Python Apache License 2.0 UpdatedMay 1, 2026 -