Stars
Multilingual Voice Understanding Model
MARS5 speech model (TTS) from CAMB.AI
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Open source real-time translation app for Android that runs locally
Foundational model for human-like, expressive TTS
A generative speech model for daily dialogue.
llama3 implementation one matrix multiplication at a time
Inference and training library for high-quality TTS models.
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Official repo for WavCraft, an AI agent for audio creation and editing
Awesome speech/audio LLMs, representation learning, and codec models
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
A lightweight library for Frechet Audio Distance calculation.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
Open-Sora: Democratizing Efficient Video Production for All
AI powered speech denoising and enhancement
VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.
Audio Codec Speech processing Universal PERformance Benchmark
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support