Lists (1)
Sort Name ascending (A-Z)
Stars
Skill package for ML/CV/NLP paper writing, curated and adapted from Prof. Peng Sida's open notes for Codex, Claude Code, and Gemini.
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
[CVPR 2026] InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Litex is a simple formal language Learnable in 2 hours.
🔥🔥🔥[AAAI 2026 Oral] Official Implementation of Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Native Multimodal Models are World Learners
[NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
[CVPR 2024] EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Code for "SAM-guided Graph Cut for 3D Instance Segmentation" ECCV 2024
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
[ICCV 2023, Oral] Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
🚀💪Maximize your efficiency and productivity. The ultimate hub to manage, customize, and share prompts. (English/中文/Español/العربية). 让生产力加倍的 AI 快捷指令。更高效地管理提示词,在分享社区中发现适用于不同场景的灵感。
KeypointNeRF Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints
Official repository of NeuMan: Neural Human Radiance Field from a Single Video (ECCV 2022)
Hosts the Multiface dataset, which is a multi-view dataset of multiple identities performing a sequence of facial expressions.
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
Code of [ECCV 2022] "AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture"