Stars
EVF is A web application framework for managing and optimizing deep learning models for edge devices with GPU support and real-time monitoring capabilities.
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
A 28× Compressed Wav2Lip for Efficient Talking Face Generation [ICCV'23 Demo] [MLSys'23 Workshop] [NVIDIA GTC'23]
A library for training, compressing and deploying computer vision models (including ViT) with edge devices
Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
TAO Toolkit deep learning networks with PyTorch backend
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
The official NetsPresso Python package.
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
LLaMa/RWKV onnx models, quantization and testcase
A timeline of the latest AI models for audio generation, starting in 2023!
Development repository for the Triton language and compiler
Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
Stable Diffusion with Core ML on Apple Silicon
pytorch implementation of "Emotional Voice Conversion using Multitask Learning with Text-to-Speech", Accepted to ICASSP 2020