Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,416 42 Updated Mar 9, 2026

Audio-Reasoning-Challenge / Audio-Reasoning-Challenge-Baselines

The baselines of ARC-Challenge-Interspeech2026

Python 58 5 Updated Dec 1, 2025

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,262 122 Updated Mar 23, 2026

NemoYuan2008 / SJTU-Thesis-Proposal

上海交通大学开题报告/中期报告LaTeX模板（非官方） Shanghai Jiao Tong University LaTeX templates for thesis proposals and annual reports (unofficial)

TeX 159 10 Updated Jan 25, 2026

meituan-longcat / LongCat-Video

Python 2,248 340 Updated Apr 2, 2026

JiaQiSJTU / VisionInText

A benchmark on visual perception in text strings for both LLMs and MLLMs.

Python 14 1 Updated Apr 7, 2026

GAIR-NLP / O1-Journey

O1 Replication Journey

1,999 61 Updated Jan 14, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,665 252 Updated Jan 8, 2026

hyperai / awesome-ai4s

AI for Science 论文解读合集（持续更新ing），论文/数据集/教程下载：hyper.ai

3,176 502 Updated Mar 22, 2025

liuxuannan / Awesome-Multimodal-Jailbreak

A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models

315 13 Updated Jan 11, 2026

RayeRen / acad-homepage.github.io

AcadHomepage: A Modern and Responsive Academic Personal Homepage

SCSS 2,704 5,561 Updated Apr 12, 2026

meituan-longcat / LongCat-Flash-Chat

1,325 67 Updated Apr 2, 2026

ChocoWu / Awesome-Scene-Graph-Generation

This is a repository for listing papers on scene graph generation and application.

630 42 Updated Apr 9, 2026

aiben-ch / LMM-Evaluation-Survey

Official repo for 'Large Multimodal Models Evaluation: A Survey'

101 10 Updated Mar 16, 2026

JiaQiSJTU / UserCentricLeaderboard

This project introduces a novel, user-centric leaderboard for Large Language Models (LLMs) that moves beyond one-size-fits-all evaluations. Our framework empowers users to create personalized ranki…

Python 4 Updated Jan 26, 2026

X-LANCE / CogBench

Python 7 Updated Jul 7, 2025

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,419 1,326 Updated Jul 9, 2025

apple / ml-diffucoder

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Python 812 56 Updated Jul 9, 2025

VectorSpaceLab / OmniGen2

OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871

Jupyter Notebook 4,044 23 Updated Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiujie Song xiujiesong

Achievements

Achievements

Highlights

Block or report xiujiesong

Stars

HKUDS / nanobot

cft0808 / edict

showlab / Show-o

nerfies / nerfies.github.io

eliahuhorwitz / Academic-project-page-template

black-forest-labs / flux

UniDial / UniDial-EvalKit

JiaQiSJTU / EvolIF

Tongyi-MAI / Z-Image

Tyrrrz / YoutubeDownloader

XiaomiMiMo / MiMo-V2-Flash

zhaochen0110 / Awesome_Think_With_Images