Stars
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
On-device AI across mobile, embedded and edge for PyTorch
Large Language Model Text Generation Inference
A high-throughput and memory-efficient inference and serving engine for LLMs