Build software better, together

DSinghania13 / ModelShrink

An AI-powered MLOps assistant for effortless model compression. Upload PyTorch models to chat with a local LLM expert, receive hardware-aware optimization advice, and perform one-click FP16/INT8 quantization to reduce model size and latency.

deep-learning pytorch gpu-acceleration quantization fp16 int8 model-compression flask-app mlops inference-optimization edge-ai huggingface model-optimization ai-copilot open-source-ai devloper-tools llm-assistant

Updated Sep 11, 2025
HTML

tk-yasuno / deepseek-v3-quantization-analysis

Star

Comprehensive performance analysis of DeepSeek V3 quantization levels (FP16, Q8_0, Q4_0) on 16GB GPU environments.

quantization model-evaluation fp16 gpu-performance latency-analysis model-quantization inference-acceleration model-optimization llm-inference llm-optimization deepseek-v3 throughput-analysis

Updated Sep 27, 2025
Python

JohnClaw / llama-3.2-1b.vb

Star

llama 3.2 1b fp16 cpu inference in one file of pure VB.NET

inference llama vb-net vbnet fp16 inference-engine visual-basic-net basic-programming model-serving cpu-inference llm llms visual-basic-dot-net llm-serving llm-inference llama3 llama3-2

Updated Jun 14, 2025
Visual Basic .NET

JohnG4489 / halffloat

Star

Implémentation algorithmique du format FP16 (IEEE-754) en C

c algorithm embedded portable c99 floating-point numerical fp16 binary16 iee-754

Updated Feb 26, 2026
C

DanhLent / ieee754-fpu-16bit

Star

16-bit Floating Point Unit implementation in Verilog

asic fpga verilog floating-point ieee754 digital-design half-precision fp16 fpu

Updated Jan 11, 2026
Verilog

angelolamonaca / PyTorch-Precision-Converter

Star

A flexible utility for converting tensor precision in PyTorch models and safetensors files, enabling efficient deployment across various platforms.

machine-learning deep-learning utilities deployment toolkit optimization pytorch fp16 model-compression model-conversion model-checkpoint bf16 tensor-precision efficient-deployment

Updated Aug 24, 2023
Python

kadykov / speech-note-config

Star

Custom Speech Note configuration with FP16 and q8_0 WhisperCpp models for improved speech-to-text accuracy.

configuration speech-recognition speech-to-text whisper fp16 custom-models whisper-cpp speech-note

Updated Apr 18, 2026

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Apr 25, 2026
Python

ModelPiper / PiperSR

Star

First super-resolution model designed for Apple Neural Engine. 2x upscale, real-time, on-device. Built by Ben Racicot.

machine-learning real-time image-processing super-resolution upscale fp16 coreml on-device apple-silicon apple-neural-engine

Updated Mar 27, 2026
Python

floriankark / transformer

Star

Transformer implementation in pytorch trained on NVIDIA A100 in fp16

pytorch transformer attention fp16 attention-is-all-you-need byte-pair-encoding a100

Updated Jan 19, 2025
Python

ChaseDreamInfinity / gpu-inference-acceleration-lab

Star

Hands-on GPU inference acceleration experiments across Jetson and cloud GPUs using ONNX Runtime, TensorRT, FP16, and model benchmarks.

benchmark cuda nvidia jetson fp16 tensorrt onnx edge-ai onnxruntime orin siglip a10g

Updated May 10, 2026
Python

FlosMume / CUDA-AI-Inference-Starter

Star

A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.

benchmarking deep-learning cuda nvidia high-performance-computing fp16 int8 inference-engine onnx tensor-rt devtech model-optimization gpu-inference ai-deployment adge-ai

Updated Nov 2, 2025
Cuda

umitkacar / onnx-tensorrt-optimization

Star

40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment

Updated Nov 14, 2025
Python

LeonByte / SearchEngine

Star

Multimodal search engine using CLIP embeddings for bidirectional image-text retrieval.

search-engine scikit-learn foss jupyter-notebook python3 memory-management image-to-text batch-processing text-to-image fp16 gpu-optimization multimodal-deep-learning flickr8k-dataset bidirectional-search local-first sentence-transformers poetry-python gradio-interface clip-vit-b-16

Updated Sep 9, 2025
Jupyter Notebook

custom-build-robots / tensorrt-llm-edge-prep

Star

Build, run, and setup scripts for the complete TensorRT-LLM pipeline on RTX A6000 Ada (SM89). Reproducible path from HuggingFace checkpoint to deployable .engine file, with FP16 baseline and FP8 quantization. Companion material to the 4-part blog series on ai-box.eu — in preparation for the NVIDIA TensorRT Edge-LLM ecosystem.

inference nvidia quantization rtx fp16 ai-agents edge-ai llm ada-architecture fp8 tensorrt-llm

Updated May 16, 2026
Shell

DKrishna007 / pointpillars-3d-detection-jetson

Star

PointPillars TensorRT FP16 on Jetson AGX Orin | 42 FPS / <24ms | ByteTrack 3D MOT 78.4% MOTA | 3.8x speedup

lidar jetson fp16 tensorrt 3d-object-detection edge-ai pointpillars bytetrack

Updated May 18, 2026
Python

shogo82148 / floats

Sponsor

Star

The floats package provides types for handling multi-precision floating-point numbers.

fp16 binary16 binary32 binary64 fp32 fp64 binary128 fp128 binary256 fp256

Updated May 19, 2026
Go

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Feb 20, 2026
Python

obsidianplusplus / YOLOv5-TensorRT-Accelerator

Star

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda fp16 tensorrt int8 pycuda yolov5 dynamic-shapes-cuda-stream

Updated Mar 29, 2025
Python

Adxell / CNN-downcasting-fashion-model

Star

The goal of this reposotory is create a downcasting the Fashion MNIST model from FP32 to FP16 using pytorch

cnn-model fp16 fp32

Updated Sep 22, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp16

Here are 42 public repositories matching this topic...

DSinghania13 / ModelShrink

tk-yasuno / deepseek-v3-quantization-analysis

JohnClaw / llama-3.2-1b.vb

JohnG4489 / halffloat

DanhLent / ieee754-fpu-16bit

angelolamonaca / PyTorch-Precision-Converter

kadykov / speech-note-config

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

ModelPiper / PiperSR

floriankark / transformer

ChaseDreamInfinity / gpu-inference-acceleration-lab

FlosMume / CUDA-AI-Inference-Starter

umitkacar / onnx-tensorrt-optimization

LeonByte / SearchEngine

custom-build-robots / tensorrt-llm-edge-prep

DKrishna007 / pointpillars-3d-detection-jetson

shogo82148 / floats

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

obsidianplusplus / YOLOv5-TensorRT-Accelerator

Adxell / CNN-downcasting-fashion-model

Improve this page

Add this topic to your repo