Highlights
- Pro
Stars
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
LlamaIndex is the leading document agent and OCR platform
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Making large AI models cheaper, faster and more accessible
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Using the jedi autocompletion library for VIM.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin…
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
Chat language model that can use tools and interpret the results
Some Conferences' accepted paper lists (including AI, ML, Robotic)
[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code
Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes; NeurIPS 2024; Official code
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
Real-time Voice Activity Detection in Noisy Eniviroments using Deep Neural Networks
the missing toolbox for an async world