LLM/VLM
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
EVA Series: Visual Representation Fantasies from BAAI
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
✨✨Latest Advances on Multimodal Large Language Models
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Awesome-LLM: a curated list of Large Language Model
A collection of all available inference solutions for the LLMs
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Famous Vision Language Models and Their Architectures
Collection of AWESOME vision-language models for vision tasks
[T-IV] This repository collects research papers of large Vision Language Models in Autonomous driving and Intelligent Transportation System. The repository will be continuously updated to track the…
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
A book for Learning the Foundations of LLMs
R1-onevision, a visual language model capable of deep CoT reasoning.
Online playground for OpenAPI tokenizers
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
The Next Step Forward in Multimodal LLM Alignment
Modeling, training, eval, and inference code for OLMo
Build, evaluate and train General Multi-Agent Assistance with ease
The simplest, fastest repository for training/finetuning small-sized VLMs.