Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,786 1,076 Updated Dec 23, 2025

pudding0503 / github-badge-collection

Github README 常用徽章和图表合集

256 32 Updated Apr 19, 2025

hawtim / badge-collection

收集各种有意思的徽章

JavaScript 103 9 Updated Oct 22, 2020

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,154 193 Updated Oct 9, 2025

NJU-LINK / MVU-Eval

Python 12 2 Updated Nov 11, 2025

Hokhim2 / CVBench

Python 11 Updated Aug 28, 2025

qaxuDev / ToG-Bench

Official Implementation for the paper: "ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos"

1 Updated Dec 4, 2025

mbzuai-oryx / VideoGLaMM

[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Python 93 5 Updated Apr 14, 2025

gaostar123 / DeViL

1+1 > 2 : Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning

9 Updated Dec 13, 2025

bytedance / vidi

The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"

Python 541 34 Updated Dec 11, 2025

wgcyeo / WorldMM

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Python 27 1 Updated Dec 3, 2025

InternLM / ARM-Thinker

Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"

Python 72 1 Updated Dec 5, 2025

ut-vision / SFHand

Python 6 Updated Nov 28, 2025

Wiselnn570 / VideoRoPE

[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++

Python 209 4 Updated Jul 28, 2025

lern-to-write / STC

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Python 21 1 Updated Dec 7, 2025

QwenLM / Qwen-Agent

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 12,751 1,177 Updated Sep 26, 2025

YU-deep / VisMem

53 Updated Nov 14, 2025

xiaomi-research / timeviper

[arxiv'25] TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

Python 19 Updated Dec 11, 2025

markendo / downscaling_intelligence

Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models

Python 22 Updated Nov 24, 2025

ziqipang / MR-Video

MR. Video: MapReduce is the Principle for Long Video Understanding

28 Updated Apr 23, 2025

facebookresearch / sam3

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 6,356 741 Updated Dec 21, 2025

yongliang-wu / Repurpose

[AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Python 21 3 Updated Apr 17, 2025

Xuyang Liu xuyang-liu16

Lists (13)

CoT

Diffusion Acceleration

🌟 KV Cache Compression

😵‍💫 Mitigating Hallucination

🔥 MLLMs

✨ Token Compression

🚀 Token Compression for MLLM

📹 Training-free VideoLLMs

🤔Video-R1

🤖 VideoAgent

🎥 VideoBench

VideoSTG

Visual Grounding

Stars