Lists (7)
Sort Name ascending (A-Z)
Stars
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
A lightweight, local-first, and 🆓 experiment tracking library from Hugging Face 🤗
Algorithm powering the For You feed on X
🚀2.3x faster than MinIO for 4KB object payloads. RustFS is an open-source, S3-compatible high-performance object storage system supporting migration and coexistence with other S3-compatible platfor…
Development repository for the Triton language and compiler
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Open source framework to vibecode and prototype voice agents with Gradium APIs
Review-first terminal diff viewer for agentic coders
🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
high-performance linear attention kernel library built on TileLang
Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and improve overall performance.
A Datacenter Scale Distributed Inference Serving Framework
Desktop app to manage markdown knowledge bases
Evaluate and improve models and agents using environments
Harbor is a framework for running agent evaluations and creating and using RL environments.
A benchmark for LLMs on complicated tasks in the terminal
Scalable toolkit for efficient model reinforcement
Agentic RL on Any Harness at Scale
MoBA: Mixture of Block Attention for Long-Context LLMs
🎥 Make videos programmatically with React
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
PyTorch building blocks for the OLMo ecosystem
Open-source framework for the research and development of foundation models.
Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, datasets, and full end-to-end reference examples to build with Nemotron models
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
The agent that grows with you