-
Nanyang Technological University
- Singapore
- https://buaacyw.github.io/
Starred repositories
A curated list of Diffusion Model in RL resources (continually updated)
The official implementation of "DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation". (arXiv 2601.22153)
Interactive visualizations of the geometric intuition behind diffusion models.
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Official inference repo for FLUX.2 models
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
[ICLR 26] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
An Open-Ended Embodied Agent with Large Language Models
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation
[arXiv 2025] Official implementation of BiCo: Composing Concepts from Images and Videos via Concept-prompt Binding
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Atom3d, atomising geometry, is a mesh processing toolbox specifically designed for 3D learning.
A curated list of recent diffusion models for video generation, editing, and various other applications.
Official implementation of "DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training".
official implement of "Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation"
Thesis Latex Template for Nanyang Technological University (NTU)
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
This repository collects papers on VLLM applications. We will update new papers irregularly.
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
[ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling