Highlights
- Pro
Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Official baseline for ICASSP 2026 URGENT Challenge Track 2 (Speech Quality Assessment)
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
the missing toolbox for an async world
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
This is the repository for the Tool Learning survey.
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes; NeurIPS 2024; Official code
[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation
SoTA LLM for converting natural language questions to SQL queries
LlamaIndex is the leading document agent and OCR platform
Chat language model that can use tools and interpret the results
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
Official code for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)
Accurate stronghold calculator for Minecraft speedrunning.
📖 A curated list of resources dedicated to talking face.