A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Apr 6, 2026 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere
Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode
LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)
GPU-accelerated WhisperX on NVIDIA Blackwell (SM_121) - DGX Spark compatible
Multi-model LLM serving for NVIDIA DGX Spark with vLLM, web UI, and tool calling
An empirical study of benchmarking LLM inference with KV cache offloading using vLLM and LMCache on NVIDIA GB200 with high-bandwidth NVLink-C2C .
SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
🔧 Fine-tune large language models efficiently on NVIDIA DGX Spark with LoRA adapters and optimized quantization for high performance.
Style-Bert-VITS2のRTX 5090 (Blackwell) 対応版。Windowsネイティブ環境でのGPU動作を実現。PyTorch nightly cu128、triton-windows統合、自動CPU/GPUフォールバック機構を搭載。
🧭 Enhance navigation with VLN-YuanNav, a visual-language model using advanced memory and decision-making for effective exploration.
🚀 Build and explore OpenAI's GPT-OSS model from scratch in Python, unlocking the mechanics of large language models.
Enterprise-grade Sovereign AI Stack optimized for NVIDIA Blackwell (sm_120) & vLLM. Features 256K context window, 5.8k tok/s prefill, and integrated observability via Langfuse.
Add a description, image, and links to the blackwell topic page so that developers can more easily learn about it.
To associate your repository with the blackwell topic, visit your repo's landing page and select "manage topics."