Skip to content
View zw-ruan's full-sized avatar

Block or report zw-ruan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

LLM/VLM

30 repositories

VisionLLM Series

Python 1,131 59 Updated Feb 27, 2025

LLM101n: Let's build a Storyteller

35,946 1,962 Updated Aug 1, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,209 2,688 Updated Aug 12, 2024

EVA Series: Visual Representation Fantasies from BAAI

Python 2,627 188 Updated Aug 1, 2024

Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)

Jupyter Notebook 94 5 Updated Oct 19, 2024

DataComp for Language Models

HTML 1,402 129 Updated Sep 9, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,827 1,084 Updated Dec 24, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,432 7,814 Updated Dec 24, 2025

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

3,269 112 Updated Sep 8, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,379 1,455 Updated Nov 28, 2025

✨✨Latest Advances on Multimodal Large Language Models

17,054 1,098 Updated Dec 23, 2025

✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Python 150 11 Updated Oct 21, 2025

Awesome-LLM: a curated list of Large Language Model

25,858 2,226 Updated Jul 31, 2025

A collection of all available inference solutions for the LLMs

93 5 Updated Mar 1, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,465 180 Updated Mar 28, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,445 479 Updated Aug 7, 2024

Famous Vision Language Models and Their Architectures

Markdown 1,123 52 Updated Feb 24, 2025

Collection of AWESOME vision-language models for vision tasks

3,039 229 Updated Oct 14, 2025

Explainability for Vision Transformers

Python 1,022 108 Updated Mar 12, 2022

[T-IV] This repository collects research papers of large Vision Language Models in Autonomous driving and Intelligent Transportation System. The repository will be continuously updated to track the…

435 31 Updated Apr 1, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Jupyter Notebook 64 6 Updated Jul 22, 2025

A book for Learning the Foundations of LLMs

15,022 1,387 Updated Dec 12, 2025

R1-onevision, a visual language model capable of deep CoT reasoning.

Python 574 16 Updated Apr 13, 2025

Online playground for OpenAPI tokenizers

TypeScript 1,462 160 Updated Apr 24, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,313 58 Updated Dec 7, 2025

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Python 912 48 Updated Oct 25, 2025

The Next Step Forward in Multimodal LLM Alignment

Python 193 8 Updated May 1, 2025

Modeling, training, eval, and inference code for OLMo

Python 6,249 691 Updated Nov 24, 2025

Build, evaluate and train General Multi-Agent Assistance with ease

Python 1,080 109 Updated Dec 24, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,441 434 Updated Oct 27, 2025