Highlights
- Pro
Lists (29)
Sort Name ascending (A-Z)
2d edit
3D
3D edit
4D
agent
attention encoder
COT
cross-modality
depth
detection
diffusion
diffusion+3D
frameworks
GAN
gaussian splatting
interactive segmentation
LV-models
mamba
multi-modal
multi-modalities
NLP
open-vocabulary
others
PEFT
RL
segmentation
tracking
video
WSSS
weakly supervised semantic segmentationStars
Rethinking One-Step Image Editing through ChordEdit: Reproduction, Simplification, and New Insights
DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images
[CVPR 2026 Best Paper Finalist] Pixel Diffusion Transformers for Image Generation
Code repo for EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models
GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video
Awesome Audio-Visual Intelligence, Survey of Audio-Visual Intelligence
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
[CVPR 2026] WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories (WorldExpand of HY-World 2.0)
[CVPR 2026] Official code of the paper "Meta-CoT: Enhancing Granularity and Generalization in Image Editing"
HY-SOAR:Self-Correction for Optimal Alignment and Refinement in Diffusion Models
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
[CVPR 2026 (Highlight)] Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction
FrameCrafter: Novel View Synthesis as Video Completion
Information collection for the Happy Horse AI video generator model. Official demo and updates at happyhorses.io.
[CVPR2026] VOSR: A Vision-Only Generative Model for Image Super-Resolution
[ICML 2026] WorldMirror: Fast and Universal 3D reconstruction model for versatile tasks
Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"
Now, Stronger AI Pushes Frontiers, Stronger Our Shared Future.
Our method reconstructs 3D worlds from video diffusion models using non-rigid alignment to resolve inherent 3D inconsistencies in the generated sequences.
CheXOne: A Reasoning-Enabled Vision–Language Foundation Model for Chest X-ray Interpretation