Skip to content
View SeuTao's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report SeuTao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 136 Updated Apr 7, 2026

利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

Python 87,137 12,458 Updated Jun 13, 2026

Official Code of NAVA: Native Audio-Visual Alignment for Generation.

Python 185 21 Updated Jun 8, 2026
Python 1,723 200 Updated Nov 15, 2025

​​Unlimited-length talking video generation​​ that supports image-to-video and video-to-video generation

Python 6,884 1,212 Updated May 22, 2026

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Python 726 36 Updated Jun 3, 2026

Code for "OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation"

Python 94 5 Updated Jun 1, 2026

Lens is a 3.8B-parameter text-to-image diffusion model that achieves quality competitive with and in several cases surpassing models like FLUX and SD3, while requiring significantly less training c…

Python 239 17 Updated May 25, 2026

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

Python 216 17 Updated May 16, 2026
Python 915 68 Updated Apr 13, 2026

Unified Codebase for Advanced World Models.

Python 817 43 Updated Jun 11, 2026

Inference script for Oasis 500M

Python 2,100 181 Updated Nov 8, 2024

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

Python 2,235 237 Updated Mar 30, 2026

Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation

Python 711 28 Updated Jun 9, 2026

open source code for Tencent tFold

Python 158 27 Updated Mar 14, 2025

ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.

Python 65 2 Updated Mar 3, 2026

Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, from Deepmind

Python 70 Updated Jun 8, 2026

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 848 58 Updated Jun 13, 2026

A framework for efficient model inference with omni-modality models

Python 5,130 1,106 Updated Jun 13, 2026

Project Lyra: Open Generative 3D World Models

Python 2,086 222 Updated Jun 11, 2026
Python 1,895 178 Updated May 4, 2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Python 365 33 Updated Apr 7, 2026

[ICML'26] Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis

Python 508 19 Updated May 23, 2026

AI agents running research on single-GPU nanochat training automatically

Python 86,491 12,529 Updated Mar 26, 2026

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Python 252 14 Updated Apr 29, 2026

A feed-forward 3D foundation model for reconstructing scenes from streaming data

Python 7,197 712 Updated Jun 2, 2026

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 267 18 Updated Oct 4, 2025

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,647 94 Updated Mar 16, 2025

JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.

Python 2,169 157 Updated Jun 12, 2026
Next