Stars
AI agents running research on single-GPU nanochat training automatically
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).
FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…
Statistical Learning course in USTC. 中科大统计学习(刘东)课程复习资料。
Collection of papers about video-audio understanding
Official implementation of RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
A Comprehensive Dataset for Advanced Image Generation and Editing}
[ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grained visual understanding".
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
New generation of CLIP with strong fine grained discrimination capability, ICML2026 and ICML2025
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
[Extended verision ICLR 2025 Blog Track] Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
This is a repo to track the latest autoregressive visual generation papers.
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
A collection of awesome text-to-image generation studies.
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.