Stars
Official Implementation of Paper Transfer between Modalities with MetaQueries
Academic Websites.
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
Native Multimodal Models are World Learners
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.
OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
This is a project on visual spatial reasoning tasks-SIBench
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Official inference repo for FLUX.1 models
OmniGen2: Exploration to Advanced Multimodal Generation.
From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
The implementation of Extreme Viewpoint 4D Video Generation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Open-Sora: Democratizing Efficient Video Production for All