blackwell

Star

Here are 26 public repositories matching this topic...

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Apr 6, 2026
Python

sgl-project / sglang

Star

SGLang is a high-performance serving framework for large language models and multimodal models.

reinforcement-learning cuda inference transformer moe attention llama glm minimax wan diffusion vlm blackwell llm qwen deepseek gpt-oss qwen-image

Updated Apr 6, 2026
Python

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Apr 6, 2026
Python

GradientHQ / parallax

Star

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

python distributed-systems chatbot pytorch transformer llama glm minimax kimi blackwell large-language-models llm llm-serving qwen deepseek oss-gpt decentralized-inference

Updated Mar 24, 2026
Python

egaoharu-kensei / flash-attention-triton

Star

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

Updated Jan 12, 2026
Python

waybarrios / dgx-spark-finetune-llm

Star

LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)

deep-learning pytorch nvidia lora quantization fine-tuning blackwell llm nvfp4 dgx-spark transformer-engine mxfp8

Updated Dec 22, 2025
Python

Mekopa / whisperx-blackwell

Star

GPU-accelerated WhisperX on NVIDIA Blackwell (SM_121) - DGX Spark compatible

audio docker machine-learning deep-learning gpu cuda pytorch nvidia speech-recognition transcription asr speaker-diarization dgx blackwell pyannote whisperx dgx-spark sm-121

Updated Jan 25, 2026
Python

dataforgex / dgx_spark

Star

Multi-model LLM serving for NVIDIA DGX Spark with vLLM, web UI, and tool calling

ocr chatbot blackwell openai-api private-search-engine vllm private-llm dgx-spark

Updated Jan 24, 2026
Python

xxrjun / gb200-kvcache-offload-study

Star

An empirical study of benchmarking LLM inference with KV cache offloading using vLLM and LMCache on NVIDIA GB200 with high-bandwidth NVLink-C2C .

offloading blackwell kvcache gb200

Updated Dec 20, 2025
Python

actypedef / SharQ

Star

SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference

sparsity quantization blackwell llm llm-inference nvfp4

Updated Apr 2, 2026
Python

m96-chan / PyGPUkit

Star

Minimal GPU runtime for Python - high-performance CUDA kernels, memory management, and LLM inference without heavy dependencies

python rust gpu numpy cuda inference hopper ampere tensorcore blackwell llm safetensors

Updated Feb 20, 2026
Python

Sggin1 / spark-ai-containers

Star

Docker containers for AI models on NVIDIA DGX Spark (GB10, SM121, aarch64). TurboQuant KV cache compression + mamba-ssm aarch64 build.

aarch64 blackwell kv-cache vllm nvfp4 dgx-spark mamba-ssm sm121 turboquant

Updated Mar 31, 2026
Python

hiroki-abe-58 / HY-WorldPlay-WinBlackwell

Star

HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency

nvidia blackwell worldmodel hunyuan3d blackwell-gpu hunyuang

Updated Feb 3, 2026
Python

MoHussein197 / dgx-spark-finetune-llm

Star

🔧 Fine-tune large language models efficiently on NVIDIA DGX Spark with LoRA adapters and optimized quantization for high performance.

deep-learning pytorch nvidia lora quantization fine-tuning blackwell llm nvfp4 dgx-spark transformer-engine mxfp8

Updated Apr 6, 2026
Python

hiroki-abe-58 / Style-BERT-VITS2-GeneLab-Blackwell

Star

Style-Bert-VITS2のRTX 5090 (Blackwell) 対応版。Windowsネイティブ環境でのGPU動作を実現。PyTorch nightly cu128、triton-windows統合、自動CPU/GPUフォールバック機構を搭載。

windows nvidia tts windows-native blackwell japanese-tts style-bert-vits2

Updated Jan 31, 2026
Python

abuttan1979 / VLN-YuanNav

Star

🧭 Enhance navigation with VLN-YuanNav, a visual-language model using advanced memory and decision-making for effective exploration.

android linux framework trading navigation amd inference pytorch wechat android-framework quantitative-finance flutter gityuan model-serving yuan blackwell genshin llm yuanbao

Updated Apr 6, 2026
Python

MGD-Ben / GPT-OSS

Star

🚀 Build and explore OpenAI's GPT-OSS model from scratch in Python, unlocking the mechanics of large language models.

agent amd cuda openai mistral vlm ai-agents fine-tuning kimi blackwell stable-diffusion chatgpt llm-serving qwen deepseek deepseek-v3 gpt-oss gpt-oss-120b

Updated Apr 6, 2026
Python

Fortnumsound / LaQuisha_complete-chat-browser_model-loader_and-backend_for-running-GGUF-models_with-Llama.cpp

Star

A fast API booty-licious back-end for running GGUF models with Llama.cpp

python api cuda nvidia llama quantized textui blackwell gguf 5090 5080

Updated Sep 21, 2025
Python

informatico-madrid / Sovereign-Blackwell-vLLM-Stack

Star

Enterprise-grade Sovereign AI Stack optimized for NVIDIA Blackwell (sm_120) & vLLM. Features 256K context window, 5.8k tok/s prefill, and integrated observability via Langfuse.

cuda blackwell vllm langfuse litellm rtx-5090 qwen3 sovereign-ai self-hosted-llm llm-infrastructure

Updated Jan 21, 2026
Python

kurcontko / blackwell-infer

Star

LLM inference setup for NVIDIA Blackwell GPUs with FP4 quantization

docker inference nvidia blackwell llm vllm sglang

Updated Feb 12, 2026
Python

Improve this page

Add a description, image, and links to the blackwell topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the blackwell topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blackwell

Here are 26 public repositories matching this topic...

vllm-project / vllm

sgl-project / sglang

NVIDIA / TensorRT-LLM

GradientHQ / parallax

egaoharu-kensei / flash-attention-triton

waybarrios / dgx-spark-finetune-llm

Mekopa / whisperx-blackwell

dataforgex / dgx_spark

xxrjun / gb200-kvcache-offload-study

actypedef / SharQ

m96-chan / PyGPUkit

Sggin1 / spark-ai-containers

hiroki-abe-58 / HY-WorldPlay-WinBlackwell

MoHussein197 / dgx-spark-finetune-llm

hiroki-abe-58 / Style-BERT-VITS2-GeneLab-Blackwell

abuttan1979 / VLN-YuanNav

MGD-Ben / GPT-OSS

Fortnumsound / LaQuisha_complete-chat-browser_model-loader_and-backend_for-running-GGUF-models_with-Llama.cpp

informatico-madrid / Sovereign-Blackwell-vLLM-Stack

kurcontko / blackwell-infer

Improve this page

Add this topic to your repo