[ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.

Python 458 25 Updated Apr 7, 2026

X-PLUG / MobileAgent

Mobile-Agent: The Powerful GUI Agent Family

Python 8,826 887 Updated May 14, 2026

NJU-LINK / MT-Video-Bench

The Source Code for MT-Video-Bench @ ACL Findings 2026

Python 20 2 Updated Jan 20, 2026

NJU-LINK / IF-VidCap

The Source Code for IF-VidCap @ICLR 2026

Python 19 1 Updated Oct 22, 2025

NVlabs / OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 672 52 Updated Feb 26, 2026

bytedance / video-SALMONN-2

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…

Python 197 25 Updated Feb 23, 2026

NJU-LINK / OmniVideoBench

The Source Code for OmniVideoBench @ICLR 2026

Python 73 4 Updated Feb 12, 2026

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,832 265 Updated Apr 23, 2026

Zulko / moviepy

Video editing with Python

Python 14,689 2,074 Updated Mar 7, 2026

modelscope / FunASR

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Python 17,987 1,845 Updated Jun 11, 2026

yongliang-wu / DFT

[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.

Python 578 24 Updated Jan 4, 2026

baichuan-inc / Baichuan-Omni-1.5

Python 191 9 Updated Feb 8, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 4,022 324 Updated Jun 12, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,517 4,858 Updated Jun 14, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 24,145 2,829 Updated Jun 10, 2026

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 1,025 99 Updated Sep 10, 2025

OpenBMB / MiniCPM-V

A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone

Python 25,627 2,007 Updated Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caorui-Li

Highlights

Block or report Caorui-Li

Stars

NJU-LINK / CodeTracer

redai-infra / Relax

aiming-lab / SkillRL

ZhangqiJiang07 / GEditBench_v2

tanweai / pua

RockyChen0205 / rocky-skills

camel-ai / camel

verl-project / verl

Gen-Verse / LatentMAS

RockyChen0205 / STGE-Former

HUST-AI-HYZ / MemoryAgentBench

facebookresearch / Ego4d

bytedance / UI-TARS

EvolvingLMMs-Lab / multimodal-search-r1