Skip to content
#

fp16

Here are 42 public repositories matching this topic...

An AI-powered MLOps assistant for effortless model compression. Upload PyTorch models to chat with a local LLM expert, receive hardware-aware optimization advice, and perform one-click FP16/INT8 quantization to reduce model size and latency.

  • Updated Sep 11, 2025
  • HTML

A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.

  • Updated Apr 25, 2026
  • Python

A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.

  • Updated Nov 2, 2025
  • Cuda

Build, run, and setup scripts for the complete TensorRT-LLM pipeline on RTX A6000 Ada (SM89). Reproducible path from HuggingFace checkpoint to deployable .engine file, with FP16 baseline and FP8 quantization. Companion material to the 4-part blog series on ai-box.eu — in preparation for the NVIDIA TensorRT Edge-LLM ecosystem.

  • Updated May 16, 2026
  • Shell

Improve this page

Add a description, image, and links to the fp16 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fp16 topic, visit your repo's landing page and select "manage topics."

Learn more