-
Nankai University
- Tianjin China
-
11:48
(UTC +08:00) - 2311671@mail.nankai.edu.cn
- https://ichubai.github.io/Mysite/
Highlights
- Pro
Lists (6)
Sort Name ascending (A-Z)
Stars
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
Python 3.8+ toolbox for submitting jobs to Slurm
[SIGGRAPH 2026] OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation
VP2 Benchmark (A Control-Centric Benchmark for Video Prediction, ICLR 2023)
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
Official Codebase for "DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos"
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
[CVPR 2026] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Scan the Hallucination Citation of Academic papers. Convert second-hand citation to official version
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
Offical Implementation of Captain-Safari [CVPR 2026]
The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)
World Simulator Assistant for Physics-Aware Text-to-Video Generation
[ICLR 2026] Astra : General Interactive World Model with Autoregressive Denoising"
Official code and data from DexWM ("World Models Can Leverage Human Videos for Dexterous Manipulation").
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
Official implementation of "Repurposing Geometric Foundation Models for Multi-view Diffusion"
Official code for "LagerNVS Latent Geometry for Fully Neural Real-time Novel View Synthesis" (CVPR 2026)
A generative world for general-purpose robotics & embodied AI learning.
Code for "EgoX: Egocentric Video Generation from a Single Exocentric Video"
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
official code for "magicworld: towards long-horizon stability for interactive video world exploration"