Stars
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
[ICML 2026] Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
A framework for few-shot evaluation of language models.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
[NeurIPS 2025] PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
[NER 2025 Spotlight] WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition