Stars
An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …
"Paper2Slides: From Paper to Presentation in One Click"
Paper2Agent is a multi-agent AI system that automatically transforms research papers into interactive AI agents with minimal human input.
[NeurIPS 2025] Open-source Multi-agent Poster Generation from Papers
"OpenSpace: Make Your Agents: Smarter, Low-Cost, Self-Evolving" -- Community: https://open-space.cloud/
Reference code for the Meta-Harness paper.
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]
MazeBench: Can multimodal LLMs solve visual mazes, or do they just brute-force in token space? Benchmark, 110-maze eval set, and paper (arXiv:2603.26839).
Janus-Series: Unified Multimodal Understanding and Generation Models
ThinkGen: Generalized Thinking for Visual Generation
LLaDA2.0-Uni: Understanding and Generation the World.
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"
[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Translate PDF, EPub, webpage, metadata, annotations, notes to the target language. Support 20+ translate services.
The development and future prospects of large multimodal reasoning models.
一套为研究生和学术研究者设计的完整AI Prompt库 📖 包含内容: ✨ 40+ 精心设计的AI Prompt ✨ 论文选题系统方法(生成、评估、论证) ✨ 论文查找快速方案(8个不同方案) ✨ 文献综述框架和工具 ✨ Excel自动评估表格 ✨ 3个完整的论证模板 🚀 核心优势: ⚡ 节省时间 50-70%(选题3-5天而不是2-3周) 🎯 科学方法(基于系统的5维度评估体系) 💡 即插…
An Efficient "Factory" to Build Multiple LoRA Adapters