Skip to content
View EzioBy's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • The Hong Kong University of Science and Technology
  • 05:58 (UTC +08:00)

Block or report EzioBy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Foundation Model for Generalist Gaming Agents

Python 811 99 Updated Dec 23, 2025
Python 98 3 Updated Dec 19, 2025

WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency

Python 709 44 Updated Dec 23, 2025

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Python 207 10 Updated Dec 15, 2025

Official Implementations for Paper - MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues

Python 104 7 Updated Dec 3, 2025
Python 36 2 Updated Dec 11, 2025

Official inference repo for FLUX.2 models

Python 1,260 64 Updated Dec 1, 2025

Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"

Python 280 7 Updated Nov 19, 2025

Official Repo for Paper <WEAVE: Unleashing and Benchmarking the Interleaved Cross-modality Comprehension and Generation>

Jupyter Notebook 36 Updated Nov 18, 2025

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

19,051 1,985 Updated Dec 12, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 21,922 3,847 Updated Dec 23, 2025

[SIGGRAPH Asia'25] Enabling Reference-based Camera Control via Context without Explicit 3D Estimation

Python 139 11 Updated Oct 17, 2025

Official Implementation of DRA-Ctrl (Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis)

Python 120 12 Updated Aug 15, 2025

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

Python 226 9 Updated Dec 21, 2025

Are Video Models Ready as Zero-shot Reasoners?

Python 84 4 Updated Nov 24, 2025

Native Multimodal Models are World Learners

Python 1,372 52 Updated Nov 28, 2025
Python 1,738 77 Updated Dec 16, 2025

Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'

Python 40 Updated Oct 27, 2025

Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Python 564 105 Updated Nov 26, 2025

Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"

Python 333 19 Updated Nov 2, 2025

Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"

Python 134 1 Updated Nov 12, 2025

Contexts Optical Compression

Python 21,555 1,927 Updated Oct 25, 2025

Fast and Universal 3D reconstruction model for versatile tasks

Python 914 78 Updated Dec 18, 2025

Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"

Jupyter Notebook 118 2 Updated Dec 22, 2025

Krea Realtime 14B. An open-source realtime AI video model.

Python 428 24 Updated Nov 13, 2025

Official implementation of the paper: "PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System"

Python 217 13 Updated Oct 14, 2025

[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Python 539 43 Updated Oct 29, 2025

📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.

339 15 Updated Oct 16, 2025

Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.

Python 88 6 Updated Nov 26, 2025
Next