DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
-
Updated
Mar 13, 2025 - Python
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
Cog Single GPU Quantized Implementation of Step-Video-T2V
Production-grade GPU acceleration for robot learning. 10-20× faster training on NVIDIA H100/A100. Nsight validated.
One-offs.
LLaMA-Factory FP8 training environment for NVIDIA Hopper GPUs. Fixes common configuration issues causing 2x slowdown with FP8 mixed precision.
Add a description, image, and links to the h100 topic page so that developers can more easily learn about it.
To associate your repository with the h100 topic, visit your repo's landing page and select "manage topics."