Stars
Fast and memory-efficient exact attention
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
pyright fork with various type checking improvements, improved vscode support and pylance features built into the language server
Edit, preview and share mermaid charts/diagrams. New implementation of the live editor.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Awesome speech/audio LLMs, representation learning, and codec models
Your one-stop solution for voice dataset creation
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Text Normalization & Inverse Text Normalization
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
How to use our public wav2vec2 dimensional emotion model
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".
Added vLLM support to IndexTTS for faster inference.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
idiap / coqui-ai-TTS
Forked from coqui-ai/TTS🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
A bot for automatic First Lady job in Last War mobile game
MAGI-1: Autoregressive Video Generation at Scale
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
No fortress, purely open ground. OpenManus is Coming.