Skip to content
View song2yu's full-sized avatar

Block or report song2yu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 167 8 Updated Jun 27, 2025

Official Implementation of Paper Transfer between Modalities with MetaQueries

Python 278 8 Updated Oct 12, 2025

Academic Websites.

HTML 2 Updated Dec 7, 2025

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 6,165 713 Updated Dec 11, 2025

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,816 107 Updated Dec 8, 2025

Native Multimodal Models are World Learners

Python 1,367 51 Updated Nov 28, 2025

[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Python 449 20 Updated Nov 29, 2025

Official implementation of BLIP3o-Series

Python 1,610 72 Updated Nov 29, 2025

Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.

Python 1,548 94 Updated Nov 9, 2025

Contexts Optical Compression

Python 21,492 1,921 Updated Oct 25, 2025

OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.

Jupyter Notebook 104 6 Updated Jul 11, 2025

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 558 46 Updated Oct 30, 2025

Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.

Python 88 6 Updated Nov 26, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,641 53 Updated Nov 15, 2025

Open-source unified multimodal model

Python 5,478 481 Updated Oct 27, 2025

Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

Python 335 11 Updated Dec 16, 2025

Echo-4o

Jupyter Notebook 462 28 Updated Dec 9, 2025

This is a project on visual spatial reasoning tasks-SIBench

Python 21 Updated Nov 4, 2025

A review and pathway to world model

Python 1 Updated Nov 11, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,438 1,991 Updated Nov 1, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 6,428 360 Updated Nov 11, 2025

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Python 238 5 Updated Aug 15, 2025

Official inference repo for FLUX.1 models

Python 24,928 1,831 Updated Jul 31, 2025
Python 111 3 Updated Nov 1, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,972 12 Updated Dec 2, 2025

From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

Python 2,557 344 Updated Oct 17, 2025

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 821 25 Updated Nov 25, 2025

The implementation of Extreme Viewpoint 4D Video Generation

Python 249 17 Updated Sep 6, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,473 1,153 Updated Nov 21, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 28,132 2,815 Updated Apr 30, 2025
Next