Efficient in-memory representation for ONNX, in Python
-
Updated
Apr 9, 2026 - Python
Efficient in-memory representation for ONNX, in Python
DA2Lite is an automated model compression toolkit for PyTorch.
本仓库包含了完整的深度学习应用开发流程,以经典的手写字符识别为例,基于LeNet网络构建。推理部分使用torch、onnxruntime以及openvino框架💖
Don't Think It Twice: Exploit Shift Invariance for Efficient Online Streaming Inference of CNNs
Model quantization techniques for efficient LLM inference. Experiments with INT8, INT4, and mixed-precision quantization.
Arbitrary Numbers
PyTorch implementation of normalization-free LLMs investigating entropic behavior to find desirable activation functions
ptdeco is a library for model optimization by matrix decomposition built on top of PyTorch
Convert and optimize BirdNET models for ONNX Runtime inference on GPUs, CPUs, and embedded devices
TinyML & Edge AI: On-device inference, model quantization, embedded ML, ultra-low-power AI for microcontrollers and IoT devices.
Mobile AI: iOS CoreML, Android TFLite, on-device inference, ONNX, TensorRT, and ML deployment for smartphones.
A lightweight, mobile-optimized Neural Machine Translation (NMT) framework in PyTorch. LingoLite features a modern transformer architecture with state-of-the-art optimizations for efficient multilingual translation on resource-constrained devices.
Semantic model router with parallel LLM classification, prompt caching, and vision short-circuiting. Optimizes request routing with sub-100ms overhead for Open WebUI.
Tools and experiments for converting Human Activity Recognition (HAR) models to TensorFlow Lite for efficient on-device inference on mobile and wearable devices.
Minimal Reproducibility Study of (https://arxiv.org/abs/1911.05248). Experiments with Compression of Deep Neural Networks
This is an End to End project and Api deployment for Spain electricity shortfall prediction
This project is built to detect spam messages using a Long Short-Term Memory (LSTM) model combined with Word2Vec as the word embedding technique. The model has been optimized using Grid Search, achieving a best accuracy of 95.65%.
Comprehensive performance analysis of DeepSeek V3 quantization levels (FP16, Q8_0, Q4_0) on 16GB GPU environments.
Optimized IDKL Model for Visible-Infrared Person Re-Identification focusing on efficiency for resource-constrained hardware.
Add a description, image, and links to the model-optimization topic page so that developers can more easily learn about it.
To associate your repository with the model-optimization topic, visit your repo's landing page and select "manage topics."