Lists (1)
Sort Name ascending (A-Z)
Stars
Towards Scalable Pre-training of Visual Tokenizers for Generation
Krea Realtime 14B. An open-source realtime AI video model.
Hierarchical Reasoning Model Official Release
Latest Advances on System-2 Reasoning
Witness the aha moment of VLM with less than $3.
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
[TMLR 2025🔥] A survey for the autoregressive models in vision.
This is a repo to track the latest autoregressive visual generation papers.
Lets make video diffusion practical!
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Wan: Open and Advanced Large-Scale Video Generative Models
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
[ICLR 2025] Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Real time interactive streaming digital human
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Python client for Baidu Yun (Personal Cloud Storage) 百度云/百度网盘Python客户端
2024 up-to-date list of DATASETS, CODEBASES and PAPERS on Multi-Task Learning (MTL), from Machine Learning perspective.
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."
Download and preprocess voxceleb datasets.