-
Nankai University
- Tianjin China
-
19:27
(UTC +08:00) - 2311671@mail.nankai.edu.cn
- https://ichubai.github.io/Mysite/
Highlights
- Pro
Lists (6)
Sort Name ascending (A-Z)
Stars
RISE-Video: Can Video Generators Decode Implicit World Rules?
Awesome Multimodal Modeling [Covers MLLM, UMM, and NMM]
Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.
🌌 A collaborative list of awesome software for exploring Physics concepts
Helios: Real Real-Time Long Video Generation Model
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
Python 3.8+ toolbox for submitting jobs to Slurm
Generative World Renderer: an AI-native Renderer for Games and Virtual Worlds. 面向游戏与虚拟世界的AI原生渲染引擎
[SIGGRAPH 2026] OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation
VP2 Benchmark (A Control-Centric Benchmark for Video Prediction, ICLR 2023)
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
Official Codebase for "DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos"
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
[CVPR'26 Highlight] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Scan the Hallucination Citation of Academic papers. Convert second-hand citation to official version
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
Offical Implementation of Captain-Safari [CVPR 2026]
The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)
World Simulator Assistant for Physics-Aware Text-to-Video Generation
[ICLR 2026] Astra : General Interactive World Model with Autoregressive Denoising"
Official code and data from DexWM ("World Models Can Leverage Human Videos for Dexterous Manipulation").
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
Official implementation of "Repurposing Geometric Foundation Models for Multi-view Diffusion"