-
The Hong Kong University of Science and Technology
-
05:58
(UTC +08:00) - https://bqy.info/
Lists (8)
Sort Name ascending (A-Z)
Stars
A Foundation Model for Generalist Gaming Agents
WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Official Implementations for Paper - MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
Official inference repo for FLUX.2 models
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"
Official Repo for Paper <WEAVE: Unleashing and Benchmarking the Interleaved Cross-modality Comprehension and Generation>
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
SGLang is a fast serving framework for large language models and vision language models.
[SIGGRAPH Asia'25] Enabling Reference-based Camera Control via Context without Explicit 3D Estimation
Official Implementation of DRA-Ctrl (Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis)
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Native Multimodal Models are World Learners
Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"
Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"
Fast and Universal 3D reconstruction model for versatile tasks
Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"
Krea Realtime 14B. An open-source realtime AI video model.
Official implementation of the paper: "PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System"
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.