-
Tsinghua University
- Beijing, China
Stars
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Efficient vision foundation models for high-resolution generation and perception.
The best OSS video generation models, created by Genmo
This repository is the official implementation of Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer.
Official implementation of MagicClay: Sculpting Meshes with Generative Neural Fields (Siggraph Asia 2024)
Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". (CVPR 2022 Oral)
Code of "NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation", CVPR 2023
[ICCV 2021, Oral] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
Making large AI models cheaper, faster and more accessible
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Official inference repo for FLUX.1 models
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
[NeurIPS 2024] Boosting the performance of consistency models with PCM!
Official repo for consistency models.
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
[CVPR 2023] The official implementation of CVPR 2023 paper "Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes"
High-Resolution Image Synthesis with Latent Diffusion Models
Taming Transformers for High-Resolution Image Synthesis