Stars
Agentic RL on Any Harness at Scale
A safetensors extension to efficiently store sparse quantized tensors on disk
DFlash: Block Diffusion for Flash Speculative Decoding
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
Ongoing research training transformer models at scale
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
Contrib repository for the OpenTelemetry Collector
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
Context OpenTelemetry Collector processor
The Triton backend for the ONNX Runtime.
Splits Keras with Tensorflow backends into two or more submodels.
DSPy: The framework for programming—not prompting—language models
ONNXMLTools enables conversion of models to ONNX
onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
Set of tools to assess and improve LLM security.
Master programming by recreating your favorite technologies from scratch.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
The Triton TensorRT-LLM Backend
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
🐍 A Python lib for (de)serializing Python objects to/from JSON
Large Language Model Text Generation Inference