-
Shanghai AI Lab
- Shanghai, China
- @Haoyu__Guo
Lists (24)
Sort Name ascending (A-Z)
2DV
3D segmentation
3DV
4D
Acceleration / Compression
Datasets
Experience
Framework
GAN
Generation
Human
Indoor
Inverse rendering
Learning
MVS / Stereo matching
NLP
Other
Representation
Review / Survey
RL
SfM / SLAM
Surface reconstruction
Tools
View synthesis
Stars
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Recipes to train reward model for RLHF.
[NeurIPS 2025] Pixel-Perfect Depth
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Fully Open Framework for Democratized Multimodal Training
Code of BRIDGE: Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
A minimal implementation of DeepMind's Genie world model
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Official repository for the UAE paper, unified-GRPO, and unified-Bench
[NeurIPS 2025 (Spotlight)] The implementation for the paper "4DGT Learning a 4D Gaussian Transformer Using Real-World Monocular Videos"
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
slime is an LLM post-training framework for RL Scaling.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
A fork to add multimodal model training to open-r1
Witness the aha moment of VLM with less than $3.
Fully open reproduction of DeepSeek-R1
Train transformer language models with reinforcement learning.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL