🧭 Enhance navigation with VLN-YuanNav, a visual-language model using advanced memory and decision-making for effective exploration.
-
Updated
Mar 27, 2026 - Python
🧭 Enhance navigation with VLN-YuanNav, a visual-language model using advanced memory and decision-making for effective exploration.
📊 Summarize merged PRs daily with vLLM, ensuring you stay updated on key changes and enhancements in your projects.
🔧 Fine-tune large language models efficiently on NVIDIA DGX Spark with LoRA adapters and optimized quantization for high performance.
🚀 Build and explore OpenAI's GPT-OSS model from scratch in Python, unlocking the mechanics of large language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
An AI memory system that never forgets. Qdrant vectors + FalkorDB knowledge graph + neural reranking, self-hosted on 3 GPUs.
🚀 Deploy and manage vLLM with ready-made skills for modular automation, adhering to the Anthropics template for seamless integration.
High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)
SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
Production LLM deployment specs for NVIDIA Blackwell GPUs (RTX Pro 6000, DGX Spark). Includes vLLM configurations, benchmarks, load balancer, and throughput calculators for NVFP4/FP8/MoE models.
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere
Deploy Nemotron 3 Nano 30B on NVIDIA DGX Spark using TensorRT-LLM (Blackwell GB10, NVFP4 quantization, OpenAI-compatible API)
Deploy Nemotron 3 Nano 30B with 1M context window on NVIDIA DGX Spark using llama.cpp (Blackwell sm_121, Q4_0 KV cache quantization)
Production C++ FIX 4.4 execution gateway for YUCLAW. Compiles on ARM64 DGX Spark Blackwell. Graduated execution levels 0-4. Real risk controls at the hardware layer.
Technical insights from r/LocalLLaMA — vLLM, FP8, NVFP4, Blackwell GPU benchmarks, and more. Unverified community knowledge, generated by Nemotron 9B. Issues welcome.
Add a description, image, and links to the blackwell topic page so that developers can more easily learn about it.
To associate your repository with the blackwell topic, visit your repo's landing page and select "manage topics."